You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@ignite.apache.org by GitBox <gi...@apache.org> on 2020/04/01 22:03:33 UTC

[GitHub] [ignite] Mmuzaf opened a new pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Mmuzaf opened a new pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r409103164
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManager.java
 ##########
 @@ -361,19 +356,19 @@ public static String partDeltaFileName(int partId) {
         MetricRegistry mreg = cctx.kernalContext().metric().registry(SNAPSHOT_METRICS);
 
         mreg.register("LastSnapshotStartTime", () -> lastSeenSnpFut.startTime,
-            "The system time approximated by 10 ms when the last cluster snapshot operation has been started.");
+            "The system time approximated by 10 ms of the last started cluster snapshot request on this node.");
         mreg.register("LastSnapshotEndTime", () -> lastSeenSnpFut.endTime,
-            "The system time approximated by 10 ms when the last cluster snapshot operation has been finished.");
+            "The system time approximated by 10 ms of the last started cluster snapshot request on this node.");
 
 Review comment:
   Fixed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r409024440
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManager.java
 ##########
 @@ -0,0 +1,1986 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.channels.FileChannel;
+import java.nio.file.FileVisitResult;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.SimpleFileVisitor;
+import java.nio.file.attribute.BasicFileAttributes;
+import java.util.ArrayDeque;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Queue;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.Executor;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.concurrent.atomic.LongAdder;
+import java.util.function.BiConsumer;
+import java.util.function.BiFunction;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.events.DiscoveryEvent;
+import org.apache.ignite.failure.FailureContext;
+import org.apache.ignite.failure.FailureType;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.GridTopic;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.events.DiscoveryCustomEvent;
+import org.apache.ignite.internal.managers.communication.GridIoManager;
+import org.apache.ignite.internal.managers.communication.GridMessageListener;
+import org.apache.ignite.internal.managers.communication.TransmissionCancelledException;
+import org.apache.ignite.internal.managers.communication.TransmissionHandler;
+import org.apache.ignite.internal.managers.communication.TransmissionMeta;
+import org.apache.ignite.internal.managers.communication.TransmissionPolicy;
+import org.apache.ignite.internal.managers.discovery.DiscoCache;
+import org.apache.ignite.internal.managers.discovery.DiscoveryCustomMessage;
+import org.apache.ignite.internal.managers.discovery.GridDiscoveryManager;
+import org.apache.ignite.internal.managers.eventstorage.DiscoveryEventListener;
+import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.CacheType;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.PartitionsExchangeAware;
+import org.apache.ignite.internal.processors.cache.persistence.StorageException;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.filename.PdsFolderSettings;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.MetastorageLifecycleListener;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadOnlyMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadWriteMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.tree.io.PageIO;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.processors.marshaller.MappedName;
+import org.apache.ignite.internal.processors.metric.MetricRegistry;
+import org.apache.ignite.internal.processors.metric.impl.LongAdderMetric;
+import org.apache.ignite.internal.util.GridBusyLock;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.distributed.InitMessage;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.tostring.GridToStringInclude;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.A;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.S;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.lang.IgniteUuid;
+import org.apache.ignite.thread.IgniteThreadPoolExecutor;
+import org.apache.ignite.thread.OomExceptionHandler;
+import org.jetbrains.annotations.Nullable;
+
+import static java.nio.file.StandardOpenOption.READ;
+import static org.apache.ignite.cluster.ClusterState.active;
+import static org.apache.ignite.configuration.IgniteConfiguration.DFLT_SNAPSHOT_DIRECTORY;
+import static org.apache.ignite.events.EventType.EVT_NODE_FAILED;
+import static org.apache.ignite.events.EventType.EVT_NODE_LEFT;
+import static org.apache.ignite.internal.IgniteFeatures.PERSISTENCE_CACHE_SNAPSHOT;
+import static org.apache.ignite.internal.IgniteFeatures.nodeSupports;
+import static org.apache.ignite.internal.MarshallerContextImpl.mappingFileStoreWorkDir;
+import static org.apache.ignite.internal.MarshallerContextImpl.saveMappings;
+import static org.apache.ignite.internal.events.DiscoveryCustomEvent.EVT_DISCOVERY_CUSTOM_EVT;
+import static org.apache.ignite.internal.managers.communication.GridIoPolicy.SYSTEM_POOL;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.INDEX_PARTITION;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.MAX_PARTITION_ID;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.resolveBinaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.INDEX_FILE_NAME;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.PART_FILE_TEMPLATE;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFile;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFileName;
+import static org.apache.ignite.internal.processors.cache.persistence.filename.PdsConsistentIdProcessor.DB_DEFAULT_FOLDER;
+import static org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId.getFlagByPartId;
+import static org.apache.ignite.internal.util.IgniteUtils.isLocalNodeCoordinator;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.END_SNAPSHOT;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.START_SNAPSHOT;
+
+/**
+ * Internal implementation of snapshot operations over persistence caches.
+ * <p>
+ * There are two major actions available:
+ * <ul>
+ *     <li>Create snapshot of the whole cluster cache groups by triggering PME to achieve consistency.</li>
+ *     <li>Create local snapshot of requested cache groups and send it to the node which request this operation.
+ *     Cache groups will be transmitted using internal API for transferring files. See {@link TransmissionHandler}.</li>
+ * </ul>
+ */
+public class IgniteSnapshotManager extends GridCacheSharedManagerAdapter
+    implements IgniteSnapshot, PartitionsExchangeAware, MetastorageLifecycleListener {
+    /** File with delta pages suffix. */
+    public static final String DELTA_SUFFIX = ".delta";
+
+    /** File name template consists of delta pages. */
+    public static final String PART_DELTA_TEMPLATE = PART_FILE_TEMPLATE + DELTA_SUFFIX;
+
+    /** File name template for index delta pages. */
+    public static final String INDEX_DELTA_NAME = INDEX_FILE_NAME + DELTA_SUFFIX;
+
+    /** Text Reason for checkpoint to start snapshot operation. */
+    public static final String CP_SNAPSHOT_REASON = "Checkpoint started to enforce snapshot operation: %s";
+
+    /** Name prefix for each remote snapshot operation. */
+    public static final String RMT_SNAPSHOT_PREFIX = "snapshot_";
+
+    /** Default snapshot directory for loading remote snapshots. */
+    public static final String DFLT_SNAPSHOT_TMP_DIR = "snp";
+
+    /** Timeout in millisecond for snapshot operations. */
+    public static final long DFLT_SNAPSHOT_TIMEOUT = 15_000L;
+
+    /** Snapshot in progress error message. */
+    public static final String SNP_IN_PROGRESS_ERR_MSG = "Operation rejected due to the snapshot operation in progress.";
+
+    /** Error message to finalize snapshot tasks. */
+    public static final String SNP_NODE_STOPPING_ERR_MSG = "Snapshot has been cancelled due to the local node " +
+        "is stopping";
+
+    /** Metastorage key to save currently running snapshot. */
+    public static final String SNP_RUNNING_KEY = "snapshot-running";
+
+    /** Snapshot metrics prefix. */
+    public static final String SNAPSHOT_METRICS = "snapshot";
+
+    /** Prefix for snapshot threads. */
+    private static final String SNAPSHOT_RUNNER_THREAD_PREFIX = "snapshot-runner";
+
+    /** Total number of thread to perform local snapshot. */
+    private static final int SNAPSHOT_THREAD_POOL_SIZE = 4;
+
+    /** Default snapshot topic to receive snapshots from remote node. */
+    private static final Object DFLT_INITIAL_SNAPSHOT_TOPIC = GridTopic.TOPIC_SNAPSHOT.topic("rmt_snp");
+
+    /** File transmission parameter of cache group id. */
+    private static final String SNP_GRP_ID_PARAM = "grpId";
+
+    /** File transmission parameter of cache partition id. */
+    private static final String SNP_PART_ID_PARAM = "partId";
+
+    /** File transmission parameter of node-sender directory path with its consistentId (e.g. db/IgniteNode0). */
+    private static final String SNP_DB_NODE_PATH_PARAM = "dbNodePath";
+
+    /** File transmission parameter of a cache directory with is currently sends its partitions. */
+    private static final String SNP_CACHE_DIR_NAME_PARAM = "cacheDirName";
+
+    /** Snapshot parameter name for a file transmission. */
+    private static final String SNP_NAME_PARAM = "snpName";
+
+    /** Total snapshot files count which receiver should expect to receive. */
+    private static final String SNP_PARTITIONS_CNT = "partsCnt";
+
+    /**
+     * Local buffer to perform copy-on-write operations with pages for {@code SnapshotFutureTask.PageStoreSerialWriter}s.
+     * It is important to have only only buffer per thread (instead of creating each buffer per
+     * each {@code SnapshotFutureTask.PageStoreSerialWriter}) this is redundant and can lead to OOM errors. Direct buffer
+     * deallocate only when ByteBuffer is garbage collected, but it can get out of off-heap memory before it.
+     */
+    private final ThreadLocal<ByteBuffer> locBuff;
+
+    /** Map of registered cache snapshot processes and their corresponding contexts. */
+    private final ConcurrentMap<String, SnapshotFutureTask> locSnpTasks = new ConcurrentHashMap<>();
+
+    /** Lock to protect the resources is used. */
+    private final GridBusyLock busyLock = new GridBusyLock();
+
+    /** Requested snapshot from remote node. */
+    private final AtomicReference<RemoteSnapshotFuture> rmtSnpReq = new AtomicReference<>();
+
+    /** Mutex used to order cluster snapshot operation progress. */
+    private final Object snpOpMux = new Object();
+
+    /** Take snapshot operation procedure. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> startSnpProc;
+
+    /** Check previously performed snapshot operation and delete uncompleted files if need. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> endSnpProc;
+
+    /** Resolved persistent data storage settings. */
+    private volatile PdsFolderSettings pdsSettings;
+
+    /** Fully initialized metastorage. */
+    private volatile ReadWriteMetastorage metaStorage;
+
+    /** Local snapshot sender factory. */
+    private Function<String, SnapshotSender> locSndrFactory = localSnapshotSenderFactory();
+
+    /** Main snapshot directory to save created snapshots. */
+    private volatile File locSnpDir;
+
+    /**
+     * Working directory for loaded snapshots from the remote nodes and storing
+     * temporary partition delta-files of locally started snapshot process.
+     */
+    private File tmpWorkDir;
+
+    /** Factory to working with delta as file storage. */
+    private volatile FileIOFactory ioFactory = new RandomAccessFileIOFactory();
+
+    /** Factory to create page store for restore. */
+    private volatile BiFunction<Integer, Boolean, FilePageStoreFactory> storeFactory;
+
+    /** Snapshot thread pool to perform local partition snapshots. */
+    private ExecutorService snpRunner;
+
+    /** System discovery message listener. */
+    private DiscoveryEventListener discoLsnr;
+
+    /** Cluster snapshot operation requested by user. */
+    private ClusterSnapshotFuture clusterSnpFut;
+
+    /** Current snapshot operation on local node. */
+    private volatile SnapshotOperationRequest clusterSnpRq;
 
 Review comment:
   Fixed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r407971814
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManager.java
 ##########
 @@ -0,0 +1,1894 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.channels.FileChannel;
+import java.nio.file.FileVisitResult;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.SimpleFileVisitor;
+import java.nio.file.attribute.BasicFileAttributes;
+import java.util.ArrayDeque;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Queue;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.Executor;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.concurrent.atomic.LongAdder;
+import java.util.function.BiConsumer;
+import java.util.function.BiFunction;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.events.DiscoveryEvent;
+import org.apache.ignite.failure.FailureContext;
+import org.apache.ignite.failure.FailureType;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.GridTopic;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.events.DiscoveryCustomEvent;
+import org.apache.ignite.internal.managers.communication.GridIoManager;
+import org.apache.ignite.internal.managers.communication.GridMessageListener;
+import org.apache.ignite.internal.managers.communication.TransmissionCancelledException;
+import org.apache.ignite.internal.managers.communication.TransmissionHandler;
+import org.apache.ignite.internal.managers.communication.TransmissionMeta;
+import org.apache.ignite.internal.managers.communication.TransmissionPolicy;
+import org.apache.ignite.internal.managers.discovery.DiscoCache;
+import org.apache.ignite.internal.managers.discovery.DiscoveryCustomMessage;
+import org.apache.ignite.internal.managers.discovery.GridDiscoveryManager;
+import org.apache.ignite.internal.managers.eventstorage.DiscoveryEventListener;
+import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.CacheType;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.PartitionsExchangeAware;
+import org.apache.ignite.internal.processors.cache.persistence.StorageException;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.filename.PdsFolderSettings;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.MetastorageLifecycleListener;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadOnlyMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadWriteMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.tree.io.PageIO;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.processors.marshaller.MappedName;
+import org.apache.ignite.internal.processors.metric.impl.LongAdderMetric;
+import org.apache.ignite.internal.util.GridBusyLock;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.distributed.InitMessage;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.tostring.GridToStringInclude;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.S;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.lang.IgniteUuid;
+import org.apache.ignite.thread.IgniteThreadPoolExecutor;
+import org.apache.ignite.thread.OomExceptionHandler;
+import org.jetbrains.annotations.Nullable;
+
+import static java.nio.file.StandardOpenOption.READ;
+import static org.apache.ignite.cluster.ClusterState.active;
+import static org.apache.ignite.configuration.IgniteConfiguration.DFLT_SNAPSHOT_DIRECTORY;
+import static org.apache.ignite.events.EventType.EVT_NODE_FAILED;
+import static org.apache.ignite.events.EventType.EVT_NODE_LEFT;
+import static org.apache.ignite.internal.IgniteFeatures.PERSISTENCE_CACHE_SNAPSHOT;
+import static org.apache.ignite.internal.IgniteFeatures.nodeSupports;
+import static org.apache.ignite.internal.MarshallerContextImpl.saveMappings;
+import static org.apache.ignite.internal.events.DiscoveryCustomEvent.EVT_DISCOVERY_CUSTOM_EVT;
+import static org.apache.ignite.internal.managers.communication.GridIoPolicy.SYSTEM_POOL;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.INDEX_PARTITION;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.MAX_PARTITION_ID;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.INDEX_FILE_NAME;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.PART_FILE_TEMPLATE;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFile;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFileName;
+import static org.apache.ignite.internal.processors.cache.persistence.filename.PdsConsistentIdProcessor.DB_DEFAULT_FOLDER;
+import static org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId.getFlagByPartId;
+import static org.apache.ignite.internal.util.IgniteUtils.isLocalNodeCoordinator;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.END_SNAPSHOT;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.START_SNAPSHOT;
+
+/**
+ * Internal implementation of snapshot operations over persistence caches.
+ * <p>
+ * There are two major actions available:
+ * <ul>
+ *     <li>Create snapshot of the whole cluster cache groups by triggering PME to achieve consistency.</li>
+ *     <li>Create local snapshot of requested cache groups and send it to the node which request this operation.
+ *     Cache groups will be transmitted using internal API for transferring files. See {@link TransmissionHandler}.</li>
+ * </ul>
+ */
+public class IgniteSnapshotManager extends GridCacheSharedManagerAdapter
+    implements IgniteSnapshot, PartitionsExchangeAware, MetastorageLifecycleListener {
+    /** File with delta pages suffix. */
+    public static final String DELTA_SUFFIX = ".delta";
+
+    /** File name template consists of delta pages. */
+    public static final String PART_DELTA_TEMPLATE = PART_FILE_TEMPLATE + DELTA_SUFFIX;
+
+    /** File name template for index delta pages. */
+    public static final String INDEX_DELTA_NAME = INDEX_FILE_NAME + DELTA_SUFFIX;
+
+    /** Text Reason for checkpoint to start snapshot operation. */
+    public static final String CP_SNAPSHOT_REASON = "Checkpoint started to enforce snapshot operation: %s";
+
+    /** Name prefix for each remote snapshot operation. */
+    public static final String RMT_SNAPSHOT_PREFIX = "snapshot_";
+
+    /** Default snapshot directory for loading remote snapshots. */
+    public static final String DFLT_SNAPSHOT_TMP_DIR = "snp";
+
+    /** Timeout in millisecond for snapshot operations. */
+    public static final long DFLT_SNAPSHOT_TIMEOUT = 15_000L;
+
+    /** Snapshot in progress error message. */
+    public static final String SNP_IN_PROGRESS_ERR_MSG = "Operation rejected due to the snapshot operation in progress.";
+
+    /** Error message to finalize snapshot tasks. */
+    public static final String SNP_NODE_STOPPING_ERR_MSG = "Snapshot has been cancelled due to the local node " +
+        "is stopping";
+
+    /** Metastorage key to save currently running snapshot. */
+    public static final String SNP_RUNNING_KEY = "snapshot-running";
+
+    /** Prefix for snapshot threads. */
+    private static final String SNAPSHOT_RUNNER_THREAD_PREFIX = "snapshot-runner";
+
+    /** Total number of thread to perform local snapshot. */
+    private static final int SNAPSHOT_THREAD_POOL_SIZE = 4;
+
+    /** Default snapshot topic to receive snapshots from remote node. */
+    private static final Object DFLT_INITIAL_SNAPSHOT_TOPIC = GridTopic.TOPIC_SNAPSHOT.topic("rmt_snp");
+
+    /** File transmission parameter of cache group id. */
+    private static final String SNP_GRP_ID_PARAM = "grpId";
+
+    /** File transmission parameter of cache partition id. */
+    private static final String SNP_PART_ID_PARAM = "partId";
+
+    /** File transmission parameter of node-sender directory path with its consistentId (e.g. db/IgniteNode0). */
+    private static final String SNP_DB_NODE_PATH_PARAM = "dbNodePath";
+
+    /** File transmission parameter of a cache directory with is currently sends its partitions. */
+    private static final String SNP_CACHE_DIR_NAME_PARAM = "cacheDirName";
+
+    /** Snapshot parameter name for a file transmission. */
+    private static final String SNP_NAME_PARAM = "snpName";
+
+    /** Total snapshot files count which receiver should expect to receive. */
+    private static final String SNP_PARTITIONS_CNT = "partsCnt";
+
+    /**
+     * Local buffer to perform copy-on-write operations with pages for {@code SnapshotFutureTask.PageStoreSerialWriter}s.
+     * It is important to have only only buffer per thread (instead of creating each buffer per
+     * each {@code SnapshotFutureTask.PageStoreSerialWriter}) this is redundant and can lead to OOM errors. Direct buffer
+     * deallocate only when ByteBuffer is garbage collected, but it can get out of off-heap memory before it.
+     */
+    private final ThreadLocal<ByteBuffer> locBuff;
+
+    /** Map of registered cache snapshot processes and their corresponding contexts. */
+    private final ConcurrentMap<String, SnapshotFutureTask> locSnpTasks = new ConcurrentHashMap<>();
+
+    /** Lock to protect the resources is used. */
+    private final GridBusyLock busyLock = new GridBusyLock();
+
+    /** Requested snapshot from remote node. */
+    private final AtomicReference<RemoteSnapshotFuture> rmtSnpReq = new AtomicReference<>();
+
+    /** Mutex used to order cluster snapshot operation progress. */
+    private final Object snpOpMux = new Object();
+
+    /** Take snapshot operation procedure. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> startSnpProc;
+
+    /** Check previously preformed snapshot operation and delete uncompleted files if need. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> endSnpProc;
+
+    /** Resolved persistent data storage settings. */
+    private volatile PdsFolderSettings pdsSettings;
+
+    /** Fully initialized metastorage. */
+    private volatile ReadWriteMetastorage metaStorage;
+
+    /** Local snapshot sender factory. */
+    private Function<String, SnapshotSender> locSndrFactory = this::localSnapshotSender;
+
+    /** Main snapshot directory to save created snapshots. */
+    private volatile File locSnpDir;
+
+    /**
+     * Working directory for loaded snapshots from the remote nodes and storing
+     * temporary partition delta-files of locally started snapshot process.
+     */
+    private File tmpWorkDir;
+
+    /** Factory to working with delta as file storage. */
+    private volatile FileIOFactory ioFactory = new RandomAccessFileIOFactory();
+
+    /** Factory to create page store for restore. */
+    private volatile BiFunction<Integer, Boolean, FilePageStoreFactory> storeFactory;
+
+    /** Snapshot thread pool to perform local partition snapshots. */
+    private ExecutorService snpRunner;
+
+    /** System discovery message listener. */
+    private DiscoveryEventListener discoLsnr;
+
+    /** Cluster snapshot operation requested by user. */
+    private GridFutureAdapter<Void> clusterSnpFut;
+
+    /** Current snapshot operation on local node. */
+    private volatile SnapshotOperationRequest clusterSnpRq;
+
+    /** {@code true} if recovery process occurred for snapshot. */
+    private volatile boolean recovered;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public IgniteSnapshotManager(GridKernalContext ctx) {
+        locBuff = ThreadLocal.withInitial(() ->
+            ByteBuffer.allocateDirect(ctx.config().getDataStorageConfiguration().getPageSize())
+                .order(ByteOrder.nativeOrder()));
+
+        startSnpProc = new DistributedProcess<>(ctx, START_SNAPSHOT, this::startLocalSnapshot,
+            this::startLocalSnapshotResult);
+
+        endSnpProc = new DistributedProcess<>(ctx, END_SNAPSHOT, this::endLocalSnapshot,
+            this::endLocalSnapshotResult);
+    }
+
+    /**
+     * @param snapshotCacheDir Snapshot directory to store files.
+     * @param partId Cache partition identifier.
+     * @return A file representation.
+     */
+    public static File partDeltaFile(File snapshotCacheDir, int partId) {
+        return new File(snapshotCacheDir, partDeltaFileName(partId));
+    }
+
+    /**
+     * @param partId Partition id.
+     * @return File name of delta partition pages.
+     */
+    public static String partDeltaFileName(int partId) {
+        assert partId <= MAX_PARTITION_ID || partId == INDEX_PARTITION;
+
+        return partId == INDEX_PARTITION ? INDEX_DELTA_NAME : String.format(PART_DELTA_TEMPLATE, partId);
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void start0() throws IgniteCheckedException {
+        super.start0();
+
+        GridKernalContext ctx = cctx.kernalContext();
+
+        if (ctx.clientNode())
+            return;
+
+        if (!CU.isPersistenceEnabled(ctx.config()))
+            return;
+
+        snpRunner = new IgniteThreadPoolExecutor(SNAPSHOT_RUNNER_THREAD_PREFIX,
+            cctx.igniteInstanceName(),
+            SNAPSHOT_THREAD_POOL_SIZE,
+            SNAPSHOT_THREAD_POOL_SIZE,
+            IgniteConfiguration.DFLT_THREAD_KEEP_ALIVE_TIME,
+            new LinkedBlockingQueue<>(),
+            SYSTEM_POOL,
+            new OomExceptionHandler(ctx));
+
+        assert cctx.pageStore() instanceof FilePageStoreManager;
+
+        FilePageStoreManager storeMgr = (FilePageStoreManager)cctx.pageStore();
+
+        pdsSettings = cctx.kernalContext().pdsFolderResolver().resolveFolders();
+
+        locSnpDir = snapshotPath(ctx.config()).toFile();
+        tmpWorkDir = Paths.get(storeMgr.workDir().getAbsolutePath(), DFLT_SNAPSHOT_TMP_DIR).toFile();
+
+        U.ensureDirectory(locSnpDir, "snapshot work directory", log);
+        U.ensureDirectory(tmpWorkDir, "temp directory for snapshot creation", log);
+
+        storeFactory = storeMgr::getPageStoreFactory;
+
+        cctx.exchange().registerExchangeAwareComponent(this);
+        ctx.internalSubscriptionProcessor().registerMetastorageListener(this);
+
+        // Receive remote snapshots requests.
+        cctx.gridIO().addMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC, new GridMessageListener() {
+            @Override public void onMessage(UUID nodeId, Object msg, byte plc) {
+                if (!busyLock.enterBusy())
+                    return;
+
+                try {
+                    if (msg instanceof SnapshotRequestMessage) {
+                        SnapshotRequestMessage reqMsg0 = (SnapshotRequestMessage)msg;
+                        String snpName = reqMsg0.snapshotName();
+
+                        synchronized (this) {
+                            SnapshotFutureTask task = lastScheduledRemoteSnapshotTask(nodeId);
+
+                            if (task != null) {
+                                // Task will also be removed from local map due to the listener on future done.
+                                task.cancel();
+
+                                log.info("Snapshot request has been cancelled due to another request received " +
+                                    "[prevSnpResp=" + task + ", msg0=" + reqMsg0 + ']');
+                            }
+                        }
+
+                        SnapshotFutureTask task = registerSnapshotTask(snpName,
+                            nodeId,
+                            reqMsg0.parts(),
+                            remoteSnapshotSender(snpName, nodeId));
+
+                        task.listen(f -> {
+                            if (f.error() == null)
+                                return;
+
+                            U.error(log, "Failed to process request of creating a snapshot " +
+                                "[from=" + nodeId + ", msg=" + reqMsg0 + ']', f.error());
+
+                            try {
+                                cctx.gridIO().sendToCustomTopic(nodeId,
+                                    DFLT_INITIAL_SNAPSHOT_TOPIC,
+                                    new SnapshotResponseMessage(reqMsg0.snapshotName(), f.error().getMessage()),
+                                    SYSTEM_POOL);
+                            }
+                            catch (IgniteCheckedException ex0) {
+                                U.error(log, "Fail to send the response message with processing snapshot request " +
+                                    "error [request=" + reqMsg0 + ", nodeId=" + nodeId + ']', ex0);
+                            }
+                        });
+
+                        task.start();
+                    }
+                    else if (msg instanceof SnapshotResponseMessage) {
+                        SnapshotResponseMessage respMsg0 = (SnapshotResponseMessage)msg;
+
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.snpName.equals(respMsg0.snapshotName())) {
+                            if (log.isInfoEnabled()) {
+                                log.info("A stale snapshot response message has been received. Will be ignored " +
+                                    "[fromNodeId=" + nodeId + ", response=" + respMsg0 + ']');
+                            }
+
+                            return;
+                        }
+
+                        if (respMsg0.errorMessage() != null) {
+                            fut0.onDone(new IgniteCheckedException("Request cancelled. The snapshot operation stopped " +
+                                "on the remote node with an error: " + respMsg0.errorMessage()));
+                        }
+                    }
+                }
+                catch (Throwable e) {
+                    U.error(log, "Processing snapshot request from remote node fails with an error", e);
+
+                    cctx.kernalContext().failure().process(new FailureContext(FailureType.CRITICAL_ERROR, e));
+                }
+                finally {
+                    busyLock.leaveBusy();
+                }
+            }
+        });
+
+        cctx.gridEvents().addDiscoveryEventListener(discoLsnr = (evt, discoCache) -> {
+            if (!busyLock.enterBusy())
+                return;
+
+            try {
+                UUID leftNodeId = evt.eventNode().id();
+
+                if (evt.type() == EVT_DISCOVERY_CUSTOM_EVT) {
+                    DiscoveryCustomEvent evt0 = (DiscoveryCustomEvent)evt;
+
+                    if (evt0.customMessage() instanceof InitMessage) {
+                        InitMessage<?> msg = (InitMessage<?>)evt0.customMessage();
+
+                        // This happens when #takeSnapshot() method already invoked and distributed process
+                        // starts its action.
+                        if (msg.type() == START_SNAPSHOT.ordinal()) {
+                            assert clusterSnpRq != null ||
+                                !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()) : evt;
+
+                            DiscoveryCustomEvent customEvt = new DiscoveryCustomEvent();
+
+                            customEvt.node(evt0.node());
+                            customEvt.eventNode(evt0.eventNode());
+                            customEvt.affinityTopologyVersion(evt0.affinityTopologyVersion());
+                            customEvt.customMessage(new SnapshotStartDiscoveryMessage(discoCache, msg.processId()));
+
+                            // Handle new event inside discovery thread, so no guarantees will be violated.
+                            cctx.exchange().onDiscoveryEvent(customEvt, discoCache);
+                        }
+                    }
+                }
+                else if (evt.type() == EVT_NODE_LEFT || evt.type() == EVT_NODE_FAILED) {
+                    SnapshotOperationRequest snpRq = clusterSnpRq;
+
+                    for (SnapshotFutureTask sctx : locSnpTasks.values()) {
+                        if (sctx.sourceNodeId().equals(leftNodeId) ||
+                            (snpRq != null &&
+                                snpRq.snpName.equals(sctx.snapshotName()) &&
+                                snpRq.bltNodes.contains(leftNodeId))) {
+                            sctx.acceptException(new ClusterTopologyCheckedException("The node which requested snapshot " +
+                                "creation has left the grid"));
+                        }
+                    }
+
+                    RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                    if (snpTrFut != null && snpTrFut.rmtNodeId.equals(leftNodeId)) {
+                        snpTrFut.onDone(new ClusterTopologyCheckedException("The node from which a snapshot has been " +
+                            "requested left the grid"));
+                    }
+                }
+            }
+            finally {
+                busyLock.leaveBusy();
+            }
+        }, EVT_NODE_LEFT, EVT_NODE_FAILED, EVT_DISCOVERY_CUSTOM_EVT);
+
+        // Remote snapshot handler.
+        cctx.kernalContext().io().addTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC, new TransmissionHandler() {
+            @Override public void onEnd(UUID nodeId) {
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                assert snpTrFut.stores.isEmpty() : snpTrFut.stores.entrySet();
+                assert snpTrFut.partsLeft == 0 : snpTrFut;
+
+                snpTrFut.onDone();
+
+                log.info("Requested snapshot from remote node has been fully received " +
+                    "[snpName=" + snpTrFut.snpName + ", snpTrans=" + snpTrFut + ']');
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onException(UUID nodeId, Throwable err) {
+                RemoteSnapshotFuture fut = rmtSnpReq.get();
+
+                if (fut == null)
+                    return;
+
+                if (fut.rmtNodeId.equals(nodeId))
+                    fut.onDone(err);
+            }
+
+            /** {@inheritDoc} */
+            @Override public String filePath(UUID nodeId, TransmissionMeta fileMeta) {
+                Integer partId = (Integer)fileMeta.params().get(SNP_PART_ID_PARAM);
+                String rmtDbNodePath = (String)fileMeta.params().get(SNP_DB_NODE_PATH_PARAM);
+                String cacheDirName = (String)fileMeta.params().get(SNP_CACHE_DIR_NAME_PARAM);
+
+                RemoteSnapshotFuture transFut = resolve(nodeId, fileMeta);
+
+                try {
+                    File cacheDir = U.resolveWorkDirectory(tmpWorkDir.getAbsolutePath(),
+                        Paths.get(transFut.snpName, rmtDbNodePath, cacheDirName).toString(),
+                        false);
+
+                    return new File(cacheDir, getPartitionFileName(partId)).getAbsolutePath();
+                }
+                catch (IgniteCheckedException e) {
+                    throw new IgniteException(e);
+                }
+            }
+
+            /**
+             * @param nodeId Remote node id.
+             * @param meta Transmission meta.
+             * @return Resolved transmission future.
+             */
+            private RemoteSnapshotFuture resolve(UUID nodeId, TransmissionMeta meta) {
+                String snpName = (String)meta.params().get(SNP_NAME_PARAM);
+                Integer partsCnt = (Integer)meta.params().get(SNP_PARTITIONS_CNT);
+
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                if (snpTrFut == null || !snpTrFut.snpName.equals(snpName)) {
+                    throw new TransmissionCancelledException("Stale snapshot transmission will be ignored " +
+                        "[snpName=" + snpName + ", meta=" + meta + ", snpTrFut=" + snpTrFut + ']');
+                }
+
+                assert snpTrFut.snpName.equals(snpName) && snpTrFut.rmtNodeId.equals(nodeId) :
+                    "Another transmission in progress [snpTrFut=" + snpTrFut + ", nodeId=" + snpName + ']';
+
+                if (snpTrFut.partsLeft == -1)
+                    snpTrFut.partsLeft = partsCnt;
+
+                return snpTrFut;
+            }
+
+            /**
+             * @param snpTrans Current snapshot transmission.
+             * @param grpPartId Pair of group id and its partition id.
+             */
+            private void finishRecover(RemoteSnapshotFuture snpTrans, GroupPartitionId grpPartId) {
+                FilePageStore pageStore = null;
+
+                try {
+                    pageStore = snpTrans.stores.remove(grpPartId);
+
+                    pageStore.finishRecover();
+
+                    snpTrans.partConsumer.accept(new File(pageStore.getFileAbsolutePath()), grpPartId);
+
+                    snpTrans.partsLeft--;
+                }
+                catch (StorageException e) {
+                    throw new IgniteException(e);
+                }
+                finally {
+                    U.closeQuiet(pageStore);
+                }
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<ByteBuffer> chunkHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+
+                RemoteSnapshotFuture snpTrFut = resolve(nodeId, initMeta);
+
+                GroupPartitionId grpPartId = new GroupPartitionId(grpId, partId);
+                FilePageStore pageStore = snpTrFut.stores.get(grpPartId);
+
+                if (pageStore == null) {
+                    throw new IgniteException("Partition must be loaded before applying snapshot delta pages " +
+                        "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                pageStore.beginRecover();
+
+                // No snapshot delta pages received. Finalize recovery.
+                if (initMeta.count() == 0)
+                    finishRecover(snpTrFut, grpPartId);
+
+                return new Consumer<ByteBuffer>() {
+                    final LongAdder transferred = new LongAdder();
+
+                    @Override public void accept(ByteBuffer buff) {
+                        try {
+                            assert initMeta.count() != 0 : initMeta;
+
+                            RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                            if (fut0 == null || !fut0.equals(snpTrFut) || fut0.isCancelled()) {
+                                throw new TransmissionCancelledException("Snapshot request is cancelled " +
+                                    "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                            }
+
+                            pageStore.write(PageIO.getPageId(buff), buff, 0, false);
+
+                            transferred.add(buff.capacity());
+
+                            if (transferred.longValue() == initMeta.count())
+                                finishRecover(snpTrFut, grpPartId);
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                    }
+                };
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<File> fileHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+                String snpName = (String)initMeta.params().get(SNP_NAME_PARAM);
+
+                assert grpId != null;
+                assert partId != null;
+                assert snpName != null;
+                assert storeFactory != null;
+
+                RemoteSnapshotFuture transFut = rmtSnpReq.get();
+
+                if (transFut == null) {
+                    throw new IgniteException("Snapshot transmission with given name doesn't exists " +
+                        "[snpName=" + snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                return new Consumer<File>() {
+                    @Override public void accept(File file) {
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.equals(transFut) || fut0.isCancelled()) {
+                            throw new TransmissionCancelledException("Snapshot request is cancelled [snpName=" + snpName +
+                                ", grpId=" + grpId + ", partId=" + partId + ']');
+                        }
+
+                        busyLock.enterBusy();
+
+                        try {
+                            FilePageStore pageStore = (FilePageStore)storeFactory
+                                .apply(grpId, false)
+                                .createPageStore(getFlagByPartId(partId),
+                                    file::toPath,
+                                    new LongAdderMetric("NO_OP", null));
+
+                            transFut.stores.put(new GroupPartitionId(grpId, partId), pageStore);
+
+                            pageStore.init();
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                        finally {
+                            busyLock.leaveBusy();
+                        }
+                    }
+                };
+            }
+        });
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void stop0(boolean cancel) {
+        busyLock.block();
+
+        try {
+            // Try stop all snapshot processing if not yet.
+            for (SnapshotFutureTask sctx : locSnpTasks.values())
+                sctx.acceptException(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+            locSnpTasks.clear();
+
+            RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+            if (snpTrFut != null)
+                snpTrFut.cancel();
+
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null) {
+                    clusterSnpFut.onDone(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+                    clusterSnpFut = null;
+                }
+            }
+
+            if (snpRunner != null)
+                snpRunner.shutdownNow();
+
+            cctx.kernalContext().io().removeMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC);
+            cctx.kernalContext().io().removeTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC);
+
+            if (discoLsnr != null)
+                cctx.kernalContext().event().removeDiscoveryEventListener(discoLsnr);
+
+            cctx.exchange().unregisterExchangeAwareComponent(this);
+        }
+        finally {
+            busyLock.unblock();
+        }
+    }
+
+    /**
+     * Concurrently traverse the snapshot directory for given local node folder name and
+     * delete recursively all files from it if exist.
+     *
+     * @param snpDir Snapshot dire
+     * @param folderName Local node folder name (see U.maskForFileName with consistent id).
+     */
+    public static void deleteSnapshot(File snpDir, String folderName) {
+        if (!snpDir.exists())
+            return;
+
+        assert snpDir.isDirectory() : snpDir;
+
+        try {
+            List<Path> dirs = new ArrayList<>();
+
+            Files.walkFileTree(snpDir.toPath(), new SimpleFileVisitor<Path>() {
 
 Review comment:
   It will be much simpler to understand, if you delete only explicitly listed dirs:
   snapshotName/binary_meta/nodeName
   snapshotName/db/nodeName
   etc.
   In the current implementation, if you name node "db" for example, you will lose all other nodes data.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r408340291
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotMXBeanImpl.java
 ##########
 @@ -0,0 +1,47 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.util.List;
+import org.apache.ignite.internal.GridKernalContextImpl;
+import org.apache.ignite.mxbean.SnapshotMXBean;
+
+/**
+ * Snapshot MBean features.
+ */
+public class SnapshotMXBeanImpl implements SnapshotMXBean {
+    /** Instance of snapshot cache shared manager. */
+    private final IgniteSnapshotManager mgr;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotMXBeanImpl(GridKernalContextImpl ctx) {
+        mgr = ctx.cache().context().snapshotMgr();
+    }
+
+    /** {@inheritDoc} */
+    @Override public void createSnapshot(String snpName) {
+        mgr.createSnapshot(snpName).get();
 
 Review comment:
   I think the `in-progress` state must be enough for such a case. I've changed the code, please, take a look.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r409738469
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotFutureTask.java
 ##########
 @@ -0,0 +1,881 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.Closeable;
+import java.io.File;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.atomic.AtomicBoolean;
+import java.util.concurrent.atomic.AtomicIntegerArray;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.concurrent.locks.ReadWriteLock;
+import java.util.concurrent.locks.ReentrantReadWriteLock;
+import java.util.function.BooleanSupplier;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.internal.pagemem.PageIdUtils;
+import org.apache.ignite.internal.pagemem.store.PageStore;
+import org.apache.ignite.internal.pagemem.store.PageWriteListener;
+import org.apache.ignite.internal.processors.cache.CacheGroupContext;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtLocalPartition;
+import org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionState;
+import org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopology;
+import org.apache.ignite.internal.processors.cache.persistence.DbCheckpointListener;
+import org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.tree.io.PageIO;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.processors.marshaller.MappedName;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.lang.IgniteThrowableRunner;
+import org.apache.ignite.internal.util.tostring.GridToStringExclude;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.A;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.S;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.INDEX_PARTITION;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.cacheDirName;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.cacheWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFile;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.partDeltaFile;
+
+/**
+ *
+ */
+class SnapshotFutureTask extends GridFutureAdapter<Boolean> implements DbCheckpointListener {
+    /** Shared context. */
+    private final GridCacheSharedContext<?, ?> cctx;
+
+    /** Ignite logger. */
+    private final IgniteLogger log;
+
+    /** Node id which cause snapshot operation. */
+    private final UUID srcNodeId;
+
+    /** Unique identifier of snapshot process. */
+    private final String snpName;
+
+    /** Snapshot working directory on file system. */
+    private final File tmpTaskWorkDir;
+
+    /** Local buffer to perform copy-on-write operations for {@link PageStoreSerialWriter}. */
+    private final ThreadLocal<ByteBuffer> locBuff;
+
+    /** IO factory which will be used for creating snapshot delta-writers. */
+    private final FileIOFactory ioFactory;
+
+    /**
+     * The length of file size per each cache partition file.
+     * Partition has value greater than zero only for partitions in OWNING state.
+     * Information collected under checkpoint write lock.
+     */
+    private final Map<GroupPartitionId, Long> partFileLengths = new HashMap<>();
+
+    /**
+     * Map of partitions to snapshot and theirs corresponding delta PageStores.
+     * Writers are pinned to the snapshot context due to controlling partition
+     * processing supplier.
+     */
+    private final Map<GroupPartitionId, PageStoreSerialWriter> partDeltaWriters = new HashMap<>();
+
+    /** Snapshot data sender. */
+    @GridToStringExclude
+    private final SnapshotSender snpSndr;
+
+    /**
+     * Requested map of cache groups and its partitions to include into snapshot. If array of partitions
+     * is {@code null} than all OWNING partitions for given cache groups will be included into snapshot.
+     * In this case if all of partitions have OWNING state the index partition also will be included.
+     * <p>
+     * If partitions for particular cache group are not provided that they will be collected and added
+     * on checkpoint under the write lock.
+     */
+    private final Map<Integer, Set<Integer>> parts;
+
+    /** Cache group and corresponding partitions collected under the checkpoint write lock. */
+    private final Map<Integer, Set<Integer>> processed = new HashMap<>();
+
+    /** Checkpoint end future. */
+    private final CompletableFuture<Boolean> cpEndFut = new CompletableFuture<>();
+
+    /** Future to wait until checkpoint mark phase will be finished and snapshot tasks scheduled. */
+    private final GridFutureAdapter<Void> startedFut = new GridFutureAdapter<>();
+
+    /** Absolute snapshot storage path. */
+    private File tmpSnpDir;
+
+    /** Future which will be completed when task requested to be closed. Will be executed on system pool. */
+    private volatile CompletableFuture<Void> closeFut;
+
+    /** An exception which has been occurred during snapshot processing. */
+    private final AtomicReference<Throwable> err = new AtomicReference<>();
+
+    /** Flag indicates that task already scheduled on checkpoint. */
+    private final AtomicBoolean started = new AtomicBoolean();
+
+    /**
+     * @param e Finished snapshot task future with particular exception.
+     */
+    public SnapshotFutureTask(IgniteCheckedException e) {
+        A.notNull(e, "Exception for a finished snapshot task must be not null");
+
+        cctx = null;
+        log = null;
+        snpName = null;
+        srcNodeId = null;
+        tmpTaskWorkDir = null;
+        snpSndr = null;
+
+        err.set(e);
+        startedFut.onDone(e);
+        onDone(e);
+        parts = null;
+        ioFactory = null;
+        locBuff = null;
+    }
+
+    /**
+     * @param snpName Unique identifier of snapshot task.
+     * @param ioFactory Factory to working with delta as file storage.
+     * @param parts Map of cache groups and its partitions to include into snapshot, if set of partitions
+     * is {@code null} than all OWNING partitions for given cache groups will be included into snapshot.
+     */
+    public SnapshotFutureTask(
+        GridCacheSharedContext<?, ?> cctx,
+        UUID srcNodeId,
+        String snpName,
+        File tmpWorkDir,
+        FileIOFactory ioFactory,
+        SnapshotSender snpSndr,
+        Map<Integer, Set<Integer>> parts,
+        ThreadLocal<ByteBuffer> locBuff
+    ) {
+        A.notNull(snpName, "Snapshot name cannot be empty or null");
+        A.notNull(snpSndr, "Snapshot sender which handles execution tasks must be not null");
+        A.notNull(snpSndr.executor(), "Executor service must be not null");
+
+        this.parts = parts;
+        this.cctx = cctx;
+        this.log = cctx.logger(SnapshotFutureTask.class);
+        this.snpName = snpName;
+        this.srcNodeId = srcNodeId;
+        this.tmpTaskWorkDir = new File(tmpWorkDir, snpName);
+        this.snpSndr = snpSndr;
+        this.ioFactory = ioFactory;
+        this.locBuff = locBuff;
+    }
+
+    /**
+     * @return Snapshot name.
+     */
+    public String snapshotName() {
+        return snpName;
+    }
+
+    /**
+     * @return Node id which triggers this operation.
+     */
+    public UUID sourceNodeId() {
+        return srcNodeId;
+    }
+
+    /**
+     * @return Type of snapshot operation.
+     */
+    public Class<? extends SnapshotSender> type() {
+        return snpSndr.getClass();
+    }
+
+    /**
+     * @return Set of cache groups included into snapshot operation.
+     */
+    public Set<Integer> affectedCacheGroups() {
+        return parts.keySet();
+    }
+
+    /**
+     * @param th An exception which occurred during snapshot processing.
+     */
+    public void acceptException(Throwable th) {
+        if (th == null)
+            return;
+
+        if (err.compareAndSet(null, th))
+            closeAsync();
+
+        startedFut.onDone(th);
+
+        U.log(log, "Snapshot task has accepted exception to stop itself: " + th);
+    }
+
+    /** {@inheritDoc} */
+    @Override public boolean onDone(@Nullable Boolean res, @Nullable Throwable err) {
+        for (PageStoreSerialWriter writer : partDeltaWriters.values())
+            U.closeQuiet(writer);
+
+        snpSndr.close(err);
+
+        if (tmpSnpDir != null)
+            U.delete(tmpSnpDir);
+
+        // Delete snapshot directory if no other files exists.
+        try {
+            if (U.fileCount(tmpTaskWorkDir.toPath()) == 0 || err != null)
+                U.delete(tmpTaskWorkDir.toPath());
+        }
+        catch (IOException e) {
+            log.error("Snapshot directory doesn't exist [snpName=" + snpName + ", dir=" + tmpTaskWorkDir + ']');
+        }
+
+        if (err != null)
+            startedFut.onDone(err);
+
+        return super.onDone(res, err);
+    }
+
+    /**
+     * @throws IgniteCheckedException If fails.
+     */
+    public void awaitStarted() throws IgniteCheckedException {
+        startedFut.get();
+    }
+
+    /**
+     * @return {@code true} if current task requested to be stopped.
+     */
+    private boolean stopping() {
+        return err.get() != null;
+    }
+
+    /**
+     * Initiates snapshot task.
+     *
+     * @return {@code true} if task started by this call.
+     */
+    public boolean start() {
+        if (stopping())
+            return false;
+
+        try {
+            if (!started.compareAndSet(false, true))
+                return false;
+
+            tmpSnpDir = U.resolveWorkDirectory(tmpTaskWorkDir.getAbsolutePath(),
+                databaseRelativePath(cctx.kernalContext().pdsFolderResolver().resolveFolders().folderName()),
+                false);
+
+            for (Integer grpId : parts.keySet()) {
+                CacheGroupContext gctx = cctx.cache().cacheGroup(grpId);
 
 Review comment:
   I've fixed this behavior. Tests for cluster and remote snapshot have been added too.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r410166763
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/IgniteCacheDatabaseSharedManager.java
 ##########
 @@ -156,7 +156,6 @@
     /** First eviction was warned flag. */
     private volatile boolean firstEvictWarn;
 
-
 
 Review comment:
   No changes in this class except indent and NL fix

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r409044229
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManager.java
 ##########
 @@ -0,0 +1,1986 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.channels.FileChannel;
+import java.nio.file.FileVisitResult;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.SimpleFileVisitor;
+import java.nio.file.attribute.BasicFileAttributes;
+import java.util.ArrayDeque;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Queue;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.Executor;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.concurrent.atomic.LongAdder;
+import java.util.function.BiConsumer;
+import java.util.function.BiFunction;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.events.DiscoveryEvent;
+import org.apache.ignite.failure.FailureContext;
+import org.apache.ignite.failure.FailureType;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.GridTopic;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.events.DiscoveryCustomEvent;
+import org.apache.ignite.internal.managers.communication.GridIoManager;
+import org.apache.ignite.internal.managers.communication.GridMessageListener;
+import org.apache.ignite.internal.managers.communication.TransmissionCancelledException;
+import org.apache.ignite.internal.managers.communication.TransmissionHandler;
+import org.apache.ignite.internal.managers.communication.TransmissionMeta;
+import org.apache.ignite.internal.managers.communication.TransmissionPolicy;
+import org.apache.ignite.internal.managers.discovery.DiscoCache;
+import org.apache.ignite.internal.managers.discovery.DiscoveryCustomMessage;
+import org.apache.ignite.internal.managers.discovery.GridDiscoveryManager;
+import org.apache.ignite.internal.managers.eventstorage.DiscoveryEventListener;
+import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.CacheType;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.PartitionsExchangeAware;
+import org.apache.ignite.internal.processors.cache.persistence.StorageException;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.filename.PdsFolderSettings;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.MetastorageLifecycleListener;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadOnlyMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadWriteMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.tree.io.PageIO;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.processors.marshaller.MappedName;
+import org.apache.ignite.internal.processors.metric.MetricRegistry;
+import org.apache.ignite.internal.processors.metric.impl.LongAdderMetric;
+import org.apache.ignite.internal.util.GridBusyLock;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.distributed.InitMessage;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.tostring.GridToStringInclude;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.A;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.S;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.lang.IgniteUuid;
+import org.apache.ignite.thread.IgniteThreadPoolExecutor;
+import org.apache.ignite.thread.OomExceptionHandler;
+import org.jetbrains.annotations.Nullable;
+
+import static java.nio.file.StandardOpenOption.READ;
+import static org.apache.ignite.cluster.ClusterState.active;
+import static org.apache.ignite.configuration.IgniteConfiguration.DFLT_SNAPSHOT_DIRECTORY;
+import static org.apache.ignite.events.EventType.EVT_NODE_FAILED;
+import static org.apache.ignite.events.EventType.EVT_NODE_LEFT;
+import static org.apache.ignite.internal.IgniteFeatures.PERSISTENCE_CACHE_SNAPSHOT;
+import static org.apache.ignite.internal.IgniteFeatures.nodeSupports;
+import static org.apache.ignite.internal.MarshallerContextImpl.mappingFileStoreWorkDir;
+import static org.apache.ignite.internal.MarshallerContextImpl.saveMappings;
+import static org.apache.ignite.internal.events.DiscoveryCustomEvent.EVT_DISCOVERY_CUSTOM_EVT;
+import static org.apache.ignite.internal.managers.communication.GridIoPolicy.SYSTEM_POOL;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.INDEX_PARTITION;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.MAX_PARTITION_ID;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.resolveBinaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.INDEX_FILE_NAME;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.PART_FILE_TEMPLATE;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFile;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFileName;
+import static org.apache.ignite.internal.processors.cache.persistence.filename.PdsConsistentIdProcessor.DB_DEFAULT_FOLDER;
+import static org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId.getFlagByPartId;
+import static org.apache.ignite.internal.util.IgniteUtils.isLocalNodeCoordinator;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.END_SNAPSHOT;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.START_SNAPSHOT;
+
+/**
+ * Internal implementation of snapshot operations over persistence caches.
+ * <p>
+ * There are two major actions available:
+ * <ul>
+ *     <li>Create snapshot of the whole cluster cache groups by triggering PME to achieve consistency.</li>
+ *     <li>Create local snapshot of requested cache groups and send it to the node which request this operation.
+ *     Cache groups will be transmitted using internal API for transferring files. See {@link TransmissionHandler}.</li>
+ * </ul>
+ */
+public class IgniteSnapshotManager extends GridCacheSharedManagerAdapter
+    implements IgniteSnapshot, PartitionsExchangeAware, MetastorageLifecycleListener {
+    /** File with delta pages suffix. */
+    public static final String DELTA_SUFFIX = ".delta";
+
+    /** File name template consists of delta pages. */
+    public static final String PART_DELTA_TEMPLATE = PART_FILE_TEMPLATE + DELTA_SUFFIX;
+
+    /** File name template for index delta pages. */
+    public static final String INDEX_DELTA_NAME = INDEX_FILE_NAME + DELTA_SUFFIX;
+
+    /** Text Reason for checkpoint to start snapshot operation. */
+    public static final String CP_SNAPSHOT_REASON = "Checkpoint started to enforce snapshot operation: %s";
+
+    /** Name prefix for each remote snapshot operation. */
+    public static final String RMT_SNAPSHOT_PREFIX = "snapshot_";
+
+    /** Default snapshot directory for loading remote snapshots. */
+    public static final String DFLT_SNAPSHOT_TMP_DIR = "snp";
+
+    /** Timeout in millisecond for snapshot operations. */
+    public static final long DFLT_SNAPSHOT_TIMEOUT = 15_000L;
+
+    /** Snapshot in progress error message. */
+    public static final String SNP_IN_PROGRESS_ERR_MSG = "Operation rejected due to the snapshot operation in progress.";
+
+    /** Error message to finalize snapshot tasks. */
+    public static final String SNP_NODE_STOPPING_ERR_MSG = "Snapshot has been cancelled due to the local node " +
+        "is stopping";
+
+    /** Metastorage key to save currently running snapshot. */
+    public static final String SNP_RUNNING_KEY = "snapshot-running";
+
+    /** Snapshot metrics prefix. */
+    public static final String SNAPSHOT_METRICS = "snapshot";
+
+    /** Prefix for snapshot threads. */
+    private static final String SNAPSHOT_RUNNER_THREAD_PREFIX = "snapshot-runner";
+
+    /** Total number of thread to perform local snapshot. */
+    private static final int SNAPSHOT_THREAD_POOL_SIZE = 4;
+
+    /** Default snapshot topic to receive snapshots from remote node. */
+    private static final Object DFLT_INITIAL_SNAPSHOT_TOPIC = GridTopic.TOPIC_SNAPSHOT.topic("rmt_snp");
+
+    /** File transmission parameter of cache group id. */
+    private static final String SNP_GRP_ID_PARAM = "grpId";
+
+    /** File transmission parameter of cache partition id. */
+    private static final String SNP_PART_ID_PARAM = "partId";
+
+    /** File transmission parameter of node-sender directory path with its consistentId (e.g. db/IgniteNode0). */
+    private static final String SNP_DB_NODE_PATH_PARAM = "dbNodePath";
+
+    /** File transmission parameter of a cache directory with is currently sends its partitions. */
+    private static final String SNP_CACHE_DIR_NAME_PARAM = "cacheDirName";
+
+    /** Snapshot parameter name for a file transmission. */
+    private static final String SNP_NAME_PARAM = "snpName";
+
+    /** Total snapshot files count which receiver should expect to receive. */
+    private static final String SNP_PARTITIONS_CNT = "partsCnt";
+
+    /**
+     * Local buffer to perform copy-on-write operations with pages for {@code SnapshotFutureTask.PageStoreSerialWriter}s.
+     * It is important to have only only buffer per thread (instead of creating each buffer per
+     * each {@code SnapshotFutureTask.PageStoreSerialWriter}) this is redundant and can lead to OOM errors. Direct buffer
+     * deallocate only when ByteBuffer is garbage collected, but it can get out of off-heap memory before it.
+     */
+    private final ThreadLocal<ByteBuffer> locBuff;
+
+    /** Map of registered cache snapshot processes and their corresponding contexts. */
+    private final ConcurrentMap<String, SnapshotFutureTask> locSnpTasks = new ConcurrentHashMap<>();
+
+    /** Lock to protect the resources is used. */
+    private final GridBusyLock busyLock = new GridBusyLock();
+
+    /** Requested snapshot from remote node. */
+    private final AtomicReference<RemoteSnapshotFuture> rmtSnpReq = new AtomicReference<>();
+
+    /** Mutex used to order cluster snapshot operation progress. */
+    private final Object snpOpMux = new Object();
+
+    /** Take snapshot operation procedure. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> startSnpProc;
+
+    /** Check previously performed snapshot operation and delete uncompleted files if need. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> endSnpProc;
+
+    /** Resolved persistent data storage settings. */
+    private volatile PdsFolderSettings pdsSettings;
+
+    /** Fully initialized metastorage. */
+    private volatile ReadWriteMetastorage metaStorage;
+
+    /** Local snapshot sender factory. */
+    private Function<String, SnapshotSender> locSndrFactory = localSnapshotSenderFactory();
+
+    /** Main snapshot directory to save created snapshots. */
+    private volatile File locSnpDir;
+
+    /**
+     * Working directory for loaded snapshots from the remote nodes and storing
+     * temporary partition delta-files of locally started snapshot process.
+     */
+    private File tmpWorkDir;
+
+    /** Factory to working with delta as file storage. */
+    private volatile FileIOFactory ioFactory = new RandomAccessFileIOFactory();
+
+    /** Factory to create page store for restore. */
+    private volatile BiFunction<Integer, Boolean, FilePageStoreFactory> storeFactory;
+
+    /** Snapshot thread pool to perform local partition snapshots. */
+    private ExecutorService snpRunner;
+
+    /** System discovery message listener. */
+    private DiscoveryEventListener discoLsnr;
+
+    /** Cluster snapshot operation requested by user. */
+    private ClusterSnapshotFuture clusterSnpFut;
+
+    /** Current snapshot operation on local node. */
+    private volatile SnapshotOperationRequest clusterSnpRq;
+
+    /** {@code true} if recovery process occurred for snapshot. */
+    private volatile boolean recovered;
+
+    /** Last seen cluster snapshot operation. */
+    private volatile ClusterSnapshotFuture lastSeenSnpFut = new ClusterSnapshotFuture();
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public IgniteSnapshotManager(GridKernalContext ctx) {
+        locBuff = ThreadLocal.withInitial(() ->
+            ByteBuffer.allocateDirect(ctx.config().getDataStorageConfiguration().getPageSize())
+                .order(ByteOrder.nativeOrder()));
+
+        startSnpProc = new DistributedProcess<>(ctx, START_SNAPSHOT, this::initLocalSnapshotStartStage,
+            this::processLocalSnapshotStartStageResult);
+
+        endSnpProc = new DistributedProcess<>(ctx, END_SNAPSHOT, this::initLocalSnapshotEndStage,
+            this::processLocalSnapshotEndStageResult);
+    }
+
+    /**
+     * @param snapshotCacheDir Snapshot directory to store files.
+     * @param partId Cache partition identifier.
+     * @return A file representation.
+     */
+    public static File partDeltaFile(File snapshotCacheDir, int partId) {
+        return new File(snapshotCacheDir, partDeltaFileName(partId));
+    }
+
+    /**
+     * @param partId Partition id.
+     * @return File name of delta partition pages.
+     */
+    public static String partDeltaFileName(int partId) {
+        assert partId <= MAX_PARTITION_ID || partId == INDEX_PARTITION;
+
+        return partId == INDEX_PARTITION ? INDEX_DELTA_NAME : String.format(PART_DELTA_TEMPLATE, partId);
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void start0() throws IgniteCheckedException {
+        super.start0();
+
+        GridKernalContext ctx = cctx.kernalContext();
+
+        if (ctx.clientNode())
+            return;
+
+        if (!CU.isPersistenceEnabled(ctx.config()))
+            return;
+
+        snpRunner = new IgniteThreadPoolExecutor(SNAPSHOT_RUNNER_THREAD_PREFIX,
+            cctx.igniteInstanceName(),
+            SNAPSHOT_THREAD_POOL_SIZE,
+            SNAPSHOT_THREAD_POOL_SIZE,
+            IgniteConfiguration.DFLT_THREAD_KEEP_ALIVE_TIME,
+            new LinkedBlockingQueue<>(),
+            SYSTEM_POOL,
+            new OomExceptionHandler(ctx));
+
+        assert cctx.pageStore() instanceof FilePageStoreManager;
+
+        FilePageStoreManager storeMgr = (FilePageStoreManager)cctx.pageStore();
+
+        pdsSettings = cctx.kernalContext().pdsFolderResolver().resolveFolders();
+
+        locSnpDir = resolveSnapshotWorkDirectory(ctx.config());
+        tmpWorkDir = Paths.get(storeMgr.workDir().getAbsolutePath(), DFLT_SNAPSHOT_TMP_DIR).toFile();
+
+        U.ensureDirectory(locSnpDir, "snapshot work directory", log);
+        U.ensureDirectory(tmpWorkDir, "temp directory for snapshot creation", log);
+
+        MetricRegistry mreg = cctx.kernalContext().metric().registry(SNAPSHOT_METRICS);
+
+        mreg.register("LastSnapshotStartTime", () -> lastSeenSnpFut.startTime,
+            "The system time approximated by 10 ms when the last cluster snapshot operation has been started.");
+        mreg.register("LastSnapshotEndTime", () -> lastSeenSnpFut.endTime,
+            "The system time approximated by 10 ms when the last cluster snapshot operation has been finished.");
+        mreg.register("LastSnapshotName", () -> lastSeenSnpFut.name, String.class,
+            "The name of last started cluster snapshot operation.");
+        mreg.register("LastSnapshotErrorMessage",
+            () -> lastSeenSnpFut.error() == null ? null : lastSeenSnpFut.error().getMessage(),
+            String.class,
+            "The error message of last started cluster snapshot operation which fail. This value will be 'null' " +
+                "if last snapshot operation completed successfully.");
+        mreg.register("localSnapshotList", this::getSnapshots, List.class,
+            "The list of all known snapshots currently saved on the local node with respect to " +
+                "configured via IgniteConfiguration a snapshot path.");
+
+        storeFactory = storeMgr::getPageStoreFactory;
+
+        cctx.exchange().registerExchangeAwareComponent(this);
+        ctx.internalSubscriptionProcessor().registerMetastorageListener(this);
+
+        // Receive remote snapshots requests.
+        cctx.gridIO().addMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC, new GridMessageListener() {
+            @Override public void onMessage(UUID nodeId, Object msg, byte plc) {
+                if (!busyLock.enterBusy())
+                    return;
+
+                try {
+                    if (msg instanceof SnapshotRequestMessage) {
+                        SnapshotRequestMessage reqMsg0 = (SnapshotRequestMessage)msg;
+                        String snpName = reqMsg0.snapshotName();
+
+                        synchronized (this) {
+                            SnapshotFutureTask task = lastScheduledRemoteSnapshotTask(nodeId);
+
+                            if (task != null) {
+                                // Task will also be removed from local map due to the listener on future done.
+                                task.cancel();
+
+                                log.info("Snapshot request has been cancelled due to another request received " +
+                                    "[prevSnpResp=" + task + ", msg0=" + reqMsg0 + ']');
+                            }
+                        }
+
+                        SnapshotFutureTask task = registerSnapshotTask(snpName,
+                            nodeId,
+                            reqMsg0.parts(),
+                            remoteSnapshotSender(snpName, nodeId));
+
+                        task.listen(f -> {
+                            if (f.error() == null)
+                                return;
+
+                            U.error(log, "Failed to process request of creating a snapshot " +
+                                "[from=" + nodeId + ", msg=" + reqMsg0 + ']', f.error());
+
+                            try {
+                                cctx.gridIO().sendToCustomTopic(nodeId,
+                                    DFLT_INITIAL_SNAPSHOT_TOPIC,
+                                    new SnapshotResponseMessage(reqMsg0.snapshotName(), f.error().getMessage()),
+                                    SYSTEM_POOL);
+                            }
+                            catch (IgniteCheckedException ex0) {
+                                U.error(log, "Fail to send the response message with processing snapshot request " +
+                                    "error [request=" + reqMsg0 + ", nodeId=" + nodeId + ']', ex0);
+                            }
+                        });
+
+                        task.start();
+                    }
+                    else if (msg instanceof SnapshotResponseMessage) {
+                        SnapshotResponseMessage respMsg0 = (SnapshotResponseMessage)msg;
+
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.snpName.equals(respMsg0.snapshotName())) {
+                            if (log.isInfoEnabled()) {
+                                log.info("A stale snapshot response message has been received. Will be ignored " +
+                                    "[fromNodeId=" + nodeId + ", response=" + respMsg0 + ']');
+                            }
+
+                            return;
+                        }
+
+                        if (respMsg0.errorMessage() != null) {
+                            fut0.onDone(new IgniteCheckedException("Request cancelled. The snapshot operation stopped " +
+                                "on the remote node with an error: " + respMsg0.errorMessage()));
+                        }
+                    }
+                }
+                catch (Throwable e) {
+                    U.error(log, "Processing snapshot request from remote node fails with an error", e);
+
+                    cctx.kernalContext().failure().process(new FailureContext(FailureType.CRITICAL_ERROR, e));
+                }
+                finally {
+                    busyLock.leaveBusy();
+                }
+            }
+        });
+
+        cctx.gridEvents().addDiscoveryEventListener(discoLsnr = (evt, discoCache) -> {
+            if (!busyLock.enterBusy())
+                return;
+
+            try {
+                UUID leftNodeId = evt.eventNode().id();
+
+                if (evt.type() == EVT_DISCOVERY_CUSTOM_EVT) {
+                    DiscoveryCustomEvent evt0 = (DiscoveryCustomEvent)evt;
+
+                    if (evt0.customMessage() instanceof InitMessage) {
+                        InitMessage<?> msg = (InitMessage<?>)evt0.customMessage();
+
+                        // This happens when #takeSnapshot() method already invoked and distributed process
+                        // starts its action.
+                        if (msg.type() == START_SNAPSHOT.ordinal()) {
+                            assert clusterSnpRq != null ||
+                                !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()) : evt;
+
+                            DiscoveryCustomEvent customEvt = new DiscoveryCustomEvent();
+
+                            customEvt.node(evt0.node());
+                            customEvt.eventNode(evt0.eventNode());
+                            customEvt.affinityTopologyVersion(evt0.affinityTopologyVersion());
+                            customEvt.customMessage(new SnapshotStartDiscoveryMessage(discoCache, msg.processId()));
+
+                            // Handle new event inside discovery thread, so no guarantees will be violated.
+                            cctx.exchange().onDiscoveryEvent(customEvt, discoCache);
+                        }
+                    }
+                }
+                else if (evt.type() == EVT_NODE_LEFT || evt.type() == EVT_NODE_FAILED) {
+                    SnapshotOperationRequest snpRq = clusterSnpRq;
+
+                    for (SnapshotFutureTask sctx : locSnpTasks.values()) {
+                        if (sctx.sourceNodeId().equals(leftNodeId) ||
+                            (snpRq != null &&
+                                snpRq.snpName.equals(sctx.snapshotName()) &&
+                                snpRq.bltNodes.contains(leftNodeId))) {
+                            sctx.acceptException(new ClusterTopologyCheckedException("The node which requested snapshot " +
+                                "creation has left the grid"));
+                        }
+                    }
+
+                    RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                    if (snpTrFut != null && snpTrFut.rmtNodeId.equals(leftNodeId)) {
+                        snpTrFut.onDone(new ClusterTopologyCheckedException("The node from which a snapshot has been " +
+                            "requested left the grid"));
+                    }
+                }
+            }
+            finally {
+                busyLock.leaveBusy();
+            }
+        }, EVT_NODE_LEFT, EVT_NODE_FAILED, EVT_DISCOVERY_CUSTOM_EVT);
+
+        // Remote snapshot handler.
+        cctx.kernalContext().io().addTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC, new TransmissionHandler() {
+            @Override public void onEnd(UUID nodeId) {
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                assert snpTrFut.stores.isEmpty() : snpTrFut.stores.entrySet();
+                assert snpTrFut.partsLeft == 0 : snpTrFut;
+
+                snpTrFut.onDone();
+
+                log.info("Requested snapshot from remote node has been fully received " +
+                    "[snpName=" + snpTrFut.snpName + ", snpTrans=" + snpTrFut + ']');
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onException(UUID nodeId, Throwable err) {
+                RemoteSnapshotFuture fut = rmtSnpReq.get();
+
+                if (fut == null)
+                    return;
+
+                if (fut.rmtNodeId.equals(nodeId))
+                    fut.onDone(err);
+            }
+
+            /** {@inheritDoc} */
+            @Override public String filePath(UUID nodeId, TransmissionMeta fileMeta) {
+                Integer partId = (Integer)fileMeta.params().get(SNP_PART_ID_PARAM);
+                String rmtDbNodePath = (String)fileMeta.params().get(SNP_DB_NODE_PATH_PARAM);
+                String cacheDirName = (String)fileMeta.params().get(SNP_CACHE_DIR_NAME_PARAM);
+
+                RemoteSnapshotFuture transFut = resolve(nodeId, fileMeta);
+
+                try {
+                    File cacheDir = U.resolveWorkDirectory(tmpWorkDir.getAbsolutePath(),
+                        Paths.get(transFut.snpName, rmtDbNodePath, cacheDirName).toString(),
+                        false);
+
+                    return new File(cacheDir, getPartitionFileName(partId)).getAbsolutePath();
+                }
+                catch (IgniteCheckedException e) {
+                    throw new IgniteException(e);
+                }
+            }
+
+            /**
+             * @param nodeId Remote node id.
+             * @param meta Transmission meta.
+             * @return Resolved transmission future.
+             */
+            private RemoteSnapshotFuture resolve(UUID nodeId, TransmissionMeta meta) {
+                String snpName = (String)meta.params().get(SNP_NAME_PARAM);
+                Integer partsCnt = (Integer)meta.params().get(SNP_PARTITIONS_CNT);
+
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                if (snpTrFut == null || !snpTrFut.snpName.equals(snpName)) {
+                    throw new TransmissionCancelledException("Stale snapshot transmission will be ignored " +
+                        "[snpName=" + snpName + ", meta=" + meta + ", snpTrFut=" + snpTrFut + ']');
+                }
+
+                assert snpTrFut.snpName.equals(snpName) && snpTrFut.rmtNodeId.equals(nodeId) :
+                    "Another transmission in progress [snpTrFut=" + snpTrFut + ", nodeId=" + snpName + ']';
+
+                if (snpTrFut.partsLeft == -1)
+                    snpTrFut.partsLeft = partsCnt;
+
+                return snpTrFut;
+            }
+
+            /**
+             * @param snpTrans Current snapshot transmission.
+             * @param grpPartId Pair of group id and its partition id.
+             */
+            private void finishRecover(RemoteSnapshotFuture snpTrans, GroupPartitionId grpPartId) {
+                FilePageStore pageStore = null;
+
+                try {
+                    pageStore = snpTrans.stores.remove(grpPartId);
+
+                    pageStore.finishRecover();
+
+                    snpTrans.partConsumer.accept(new File(pageStore.getFileAbsolutePath()), grpPartId);
+
+                    snpTrans.partsLeft--;
+                }
+                catch (StorageException e) {
+                    throw new IgniteException(e);
+                }
+                finally {
+                    U.closeQuiet(pageStore);
+                }
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<ByteBuffer> chunkHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+
+                RemoteSnapshotFuture snpTrFut = resolve(nodeId, initMeta);
+
+                GroupPartitionId grpPartId = new GroupPartitionId(grpId, partId);
+                FilePageStore pageStore = snpTrFut.stores.get(grpPartId);
+
+                if (pageStore == null) {
+                    throw new IgniteException("Partition must be loaded before applying snapshot delta pages " +
+                        "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                pageStore.beginRecover();
+
+                // No snapshot delta pages received. Finalize recovery.
+                if (initMeta.count() == 0)
+                    finishRecover(snpTrFut, grpPartId);
+
+                return new Consumer<ByteBuffer>() {
+                    final LongAdder transferred = new LongAdder();
+
+                    @Override public void accept(ByteBuffer buff) {
+                        try {
+                            assert initMeta.count() != 0 : initMeta;
+
+                            RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                            if (fut0 == null || !fut0.equals(snpTrFut) || fut0.isCancelled()) {
+                                throw new TransmissionCancelledException("Snapshot request is cancelled " +
+                                    "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                            }
+
+                            pageStore.write(PageIO.getPageId(buff), buff, 0, false);
+
+                            transferred.add(buff.capacity());
+
+                            if (transferred.longValue() == initMeta.count())
+                                finishRecover(snpTrFut, grpPartId);
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                    }
+                };
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<File> fileHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+                String snpName = (String)initMeta.params().get(SNP_NAME_PARAM);
+
+                assert grpId != null;
+                assert partId != null;
+                assert snpName != null;
+                assert storeFactory != null;
+
+                RemoteSnapshotFuture transFut = rmtSnpReq.get();
+
+                if (transFut == null) {
+                    throw new IgniteException("Snapshot transmission with given name doesn't exists " +
+                        "[snpName=" + snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                return new Consumer<File>() {
+                    @Override public void accept(File file) {
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.equals(transFut) || fut0.isCancelled()) {
+                            throw new TransmissionCancelledException("Snapshot request is cancelled [snpName=" + snpName +
+                                ", grpId=" + grpId + ", partId=" + partId + ']');
+                        }
+
+                        busyLock.enterBusy();
+
+                        try {
+                            FilePageStore pageStore = (FilePageStore)storeFactory
+                                .apply(grpId, false)
+                                .createPageStore(getFlagByPartId(partId),
+                                    file::toPath,
+                                    new LongAdderMetric("NO_OP", null));
+
+                            transFut.stores.put(new GroupPartitionId(grpId, partId), pageStore);
+
+                            pageStore.init();
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                        finally {
+                            busyLock.leaveBusy();
+                        }
+                    }
+                };
+            }
+        });
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void stop0(boolean cancel) {
+        busyLock.block();
+
+        try {
+            // Try stop all snapshot processing if not yet.
+            for (SnapshotFutureTask sctx : locSnpTasks.values())
+                sctx.acceptException(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+            locSnpTasks.clear();
+
+            RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+            if (snpTrFut != null)
+                snpTrFut.cancel();
+
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null) {
+                    clusterSnpFut.onDone(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+                    clusterSnpFut = null;
+                }
+            }
+
+            if (snpRunner != null)
+                snpRunner.shutdownNow();
+
+            cctx.kernalContext().io().removeMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC);
+            cctx.kernalContext().io().removeTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC);
+
+            if (discoLsnr != null)
+                cctx.kernalContext().event().removeDiscoveryEventListener(discoLsnr);
+
+            cctx.exchange().unregisterExchangeAwareComponent(this);
+        }
+        finally {
+            busyLock.unblock();
+        }
+    }
+
+    /**
+     * @param snpDir Snapshot dire
+     * @param folderName Local node folder name (see {@link U#maskForFileName} with consistent id).
+     */
+    public static void deleteSnapshot(File snpDir, String folderName) {
+        if (!snpDir.exists())
+            return;
+
+        assert snpDir.isDirectory() : snpDir;
+
+        try {
+            File binDir = resolveBinaryWorkDir(snpDir.getAbsolutePath(), folderName);
+            File dbDir = U.resolveWorkDirectory(snpDir.getAbsolutePath(), databaseRelativePath(folderName), false);
+
+            U.delete(binDir);
+            U.delete(dbDir);
+
+            File marshDir = mappingFileStoreWorkDir(snpDir.getAbsolutePath());
+
+            // Concurrently traverse the snapshot marshaller directory and delete all files.
+            Files.walkFileTree(marshDir.toPath(), new SimpleFileVisitor<Path>() {
+                @Override public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException {
+                    U.delete(file);
+
+                    return FileVisitResult.CONTINUE;
+                }
+
+                @Override public FileVisitResult visitFileFailed(Path file, IOException exc) {
+                    // Skip files which can be concurrently removed from FileTree.
+                    return FileVisitResult.CONTINUE;
+                }
+            });
+
+            File db = new File(snpDir, DB_DEFAULT_FOLDER);
+
+            if (!db.exists() || db.list().length == 0)
+                U.delete(snpDir);
+        }
+        catch (IOException | IgniteCheckedException e) {
+            throw new IgniteException(e);
+        }
+    }
+
+    /**
+     * @param snpName Snapshot name.
+     * @return Local snapshot directory for snapshot with given name.
+     */
+    public File snapshotLocalDir(String snpName) {
+        assert locSnpDir != null;
+
+        return new File(locSnpDir, snpName);
+    }
+
+    /**
+     * @return Node snapshot working directory.
+     */
+    public File snapshotTmpDir() {
+        assert tmpWorkDir != null;
+
+        return tmpWorkDir;
+    }
+
+    /**
+     * @param req Request on snapshot creation.
+     * @return Future which will be completed when a snapshot has been started.
+     */
+    private IgniteInternalFuture<SnapshotOperationResponse> initLocalSnapshotStartStage(SnapshotOperationRequest req) {
+        if (cctx.kernalContext().clientNode() ||
+            !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()))
+            return new GridFinishedFuture<>();
+
+        // Executed inside discovery notifier thread, prior to firing discovery custom event,
+        // so it is safe to set new snapshot task inside this method without synchronization.
+        if (clusterSnpRq != null) {
+            return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot operation has been rejected. " +
+                "Another snapshot operation in progress [req=" + req + ", curr=" + clusterSnpRq + ']'));
+        }
+
+        // Collection of pairs group and appropriate cache partition to be snapshot.
+        Map<Integer, Set<Integer>> parts = new HashMap<>();
+
+        for (Integer grpId : req.grpIds)
+            parts.put(grpId, null);
+
+        SnapshotFutureTask task0 = registerSnapshotTask(req.snpName,
+            req.srcNodeId,
+            parts,
+            locSndrFactory.apply(req.snpName));
+
+        clusterSnpRq = req;
+
+        return task0.chain(f -> new SnapshotOperationResponse());
+    }
+
+    /**
+     * @param id Request id.
+     * @param res Results.
+     * @param err Errors.
+     */
+    private void processLocalSnapshotStartStageResult(UUID id, Map<UUID, SnapshotOperationResponse> res, Map<UUID, Exception> err) {
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        if (snpRq == null)
+            return;
+
+        if (!snpRq.rqId.equals(id)) {
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null && clusterSnpFut.rqId.equals(id)) {
+                    clusterSnpFut.onDone(new IgniteCheckedException("Snapshot operation failed due to another snapshot " +
+                        "operation in progress: " + snpRq.snpName));
+
+                    clusterSnpFut = null;
+                }
+
+                return;
+            }
+        }
+
+        if (isLocalNodeCoordinator(cctx.discovery())) {
+            Set<UUID> missed = new HashSet<>(snpRq.bltNodes);
+            missed.removeAll(res.keySet());
+            missed.removeAll(err.keySet());
+
+            snpRq.hasErr = !F.isEmpty(err) || !missed.isEmpty();
+
+            if (snpRq.hasErr) {
+                U.warn(log, "Execution of local snapshot tasks fails or them haven't been executed " +
+                    "due to some of nodes left the cluster. Uncompleted snapshot will be deleted " +
+                    "[err=" + err + ", missed=" + missed + ']');
+            }
+
+            endSnpProc.start(UUID.randomUUID(), snpRq);
+        }
+    }
+
+    /**
+     * @param req Request on snapshot creation.
+     * @return Future which will be completed when the snapshot will be finalized.
+     */
+    private IgniteInternalFuture<SnapshotOperationResponse> initLocalSnapshotEndStage(SnapshotOperationRequest req) {
+        if (clusterSnpRq == null)
+            return new GridFinishedFuture<>(new SnapshotOperationResponse());
+
+        try {
+            if (req.hasErr)
+                deleteSnapshot(snapshotLocalDir(req.snpName), pdsSettings.folderName());
+
+            removeLastMetaStorageKey();
+        }
+        catch (Exception e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        return new GridFinishedFuture<>(new SnapshotOperationResponse());
+    }
+
+    /**
+     * @param id Request id.
+     * @param res Results.
+     * @param err Errors.
+     */
+    private void processLocalSnapshotEndStageResult(UUID id, Map<UUID, SnapshotOperationResponse> res, Map<UUID, Exception> err) {
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        if (snpRq == null)
+            return;
+
+        Set<UUID> endFail = new HashSet<>(snpRq.bltNodes);
+        endFail.removeAll(res.keySet());
+
+        clusterSnpRq = null;
+
+        synchronized (snpOpMux) {
+            if (clusterSnpFut != null) {
+                if (endFail.isEmpty() && !snpRq.hasErr) {
+                    clusterSnpFut.onDone();
+
+                    if (log.isInfoEnabled())
+                        log.info("Cluster-wide snapshot operation finished successfully [req=" + snpRq + ']');
+                }
+                else {
+                    clusterSnpFut.onDone(new IgniteCheckedException("Snapshot creation has been finished with an error. " +
+                        "Local snapshot tasks may not finished completely or finalizing results fails " +
+                        "[hasErr" + snpRq.hasErr + ", fail=" + endFail + ']'));
+                }
+
+                clusterSnpFut = null;
+            }
+        }
+    }
+
+    /**
+     * @return {@code True} if snapshot operation is in progress.
+     */
+    public boolean isSnapshotCreating() {
+        if (clusterSnpRq != null)
+            return true;
+
+        synchronized (snpOpMux) {
+            return clusterSnpRq != null || clusterSnpFut != null;
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public List<String> getSnapshots() {
+        if (cctx.kernalContext().clientNode())
+            throw new UnsupportedOperationException("Client and daemon nodes can not perform this operation.");
+
+        synchronized (snpOpMux) {
+            return Arrays.stream(locSnpDir.listFiles(File::isDirectory))
+                .map(File::getName)
+                .collect(Collectors.toList());
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public IgniteFuture<Void> createSnapshot(String name) {
+        A.notNullOrEmpty(name, "name");
+
+        try {
+            if (cctx.kernalContext().clientNode())
+                throw new UnsupportedOperationException("Client and daemon nodes can not perform this operation.");
+
+            if (!IgniteFeatures.allNodesSupports(cctx.discovery().allNodes(), PERSISTENCE_CACHE_SNAPSHOT))
+                throw new IgniteException("Not all nodes in the cluster support a snapshot operation.");
+
+            if (!active(cctx.kernalContext().state().clusterState().state()))
+                throw new IgniteException("Snapshot operation has been rejected. The cluster is inactive.");
+
+            DiscoveryDataClusterState clusterState = cctx.kernalContext().state().clusterState();
+
+            if (!clusterState.hasBaselineTopology())
+                throw new IgniteException("Snapshot operation has been rejected. The baseline topology is not configured for cluster.");
+
+            ClusterSnapshotFuture snpFut0;
+
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null && !clusterSnpFut.isDone())
+                    throw new IgniteException("Create snapshot request has been rejected. The previous snapshot operation was not completed.");
+
+                if (clusterSnpRq != null)
+                    throw new IgniteException("Create snapshot request has been rejected. Parallel snapshot processes are not allowed.");
+
+                if (getSnapshots().contains(name))
+                    throw new IgniteException("Create snapshot request has been rejected. Snapshot with given name already exists.");
+
+                snpFut0 = new ClusterSnapshotFuture(UUID.randomUUID(), name);
+
+                clusterSnpFut = snpFut0;
+                lastSeenSnpFut = snpFut0;
+            }
+
+            List<Integer> grps = cctx.cache().persistentGroups().stream()
+                .filter(g -> cctx.cache().cacheType(g.cacheOrGroupName()) == CacheType.USER)
+                .filter(g -> !g.config().isEncryptionEnabled())
+                .map(CacheGroupDescriptor::groupId)
+                .collect(Collectors.toList());
+
+            List<ClusterNode> srvNodes = cctx.discovery().serverNodes(AffinityTopologyVersion.NONE);
+
+            startSnpProc.start(snpFut0.rqId, new SnapshotOperationRequest(snpFut0.rqId,
+                cctx.localNodeId(),
+                name,
+                grps,
+                new HashSet<>(F.viewReadOnly(srvNodes,
+                    F.node2id(),
+                    (node) -> CU.baselineNode(node, clusterState)))));
+
+            if (log.isInfoEnabled())
+                log.info("Cluster-wide snapshot operation started [snpName=" + name + ", grps=" + grps + ']');
+
+            return new IgniteFutureImpl<>(snpFut0);
+        }
+        catch (Exception e) {
+            U.error(log, "Start snapshot operation failed", e);
+
+            lastSeenSnpFut = new ClusterSnapshotFuture(name, e);
+
+            return new IgniteFinishedFutureImpl<>(e);
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onReadyForReadWrite(ReadWriteMetastorage metaStorage) throws IgniteCheckedException {
+        synchronized (snpOpMux) {
+            this.metaStorage = metaStorage;
+
+            if (recovered)
+                removeLastMetaStorageKey();
+
+            recovered = false;
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onReadyForRead(ReadOnlyMetastorage metaStorage) throws IgniteCheckedException {
+        // Snapshot which has not been completed due to the local node crashed must be deleted.
+        String snpName = (String)metaStorage.read(SNP_RUNNING_KEY);
+
+        if (snpName == null)
+            return;
+
+        recovered = true;
+
+        for (File tmp : snapshotTmpDir().listFiles())
+            U.delete(tmp);
+
+        deleteSnapshot(snapshotLocalDir(snpName), pdsSettings.folderName());
+
+        if (log.isInfoEnabled()) {
+            log.info("Previous attempt to create snapshot fail due to the local node crash. All resources " +
+                "related to snapshot operation have been deleted: " + snpName);
+        }
+    }
+
+    /**
+     * @param evt Discovery event to check.
+     * @return {@code true} if exchange started by snapshot operation.
+     */
+    public static boolean isSnapshotOperation(DiscoveryEvent evt) {
+        return !evt.eventNode().isClient() &&
+            evt.type() == EVT_DISCOVERY_CUSTOM_EVT &&
+            ((DiscoveryCustomEvent)evt).customMessage() instanceof SnapshotStartDiscoveryMessage;
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onDoneBeforeTopologyUnlock(GridDhtPartitionsExchangeFuture fut) {
+        if (clusterSnpRq == null || cctx.kernalContext().clientNode())
+            return;
+
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        SnapshotFutureTask task = locSnpTasks.get(snpRq.snpName);
+
+        if (task == null)
+            return;
+
+        if (task.start()) {
+            cctx.database().forceCheckpoint(String.format("Start snapshot operation: %s", snpRq.snpName));
+
+            // schedule task on checkpoint and wait when it starts
+            try {
+                task.awaitStarted();
+            }
+            catch (IgniteCheckedException e) {
+                U.error(log, "Fail to wait while cluster-wide snapshot operation started", e);
+            }
+        }
+    }
+
+    /**
+     * @param parts Collection of pairs group and appropriate cache partition to be snapshot.
+     * @param rmtNodeId The remote node to connect to.
+     * @param partConsumer Received partition handler.
+     * @return Future which will be completed when requested snapshot fully received.
+     */
+    public IgniteInternalFuture<Void> createRemoteSnapshot(
 
 Review comment:
   Fixed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r409381244
 
 

 ##########
 File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotSelfTest.java
 ##########
 @@ -0,0 +1,734 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.file.OpenOption;
+import java.util.Collections;
+import java.util.List;
+import java.util.Random;
+import java.util.concurrent.Callable;
+import java.util.concurrent.CountDownLatch;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.function.Function;
+import java.util.function.Predicate;
+import org.apache.ignite.Ignite;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.Ignition;
+import org.apache.ignite.cache.CacheAtomicityMode;
+import org.apache.ignite.cache.affinity.rendezvous.RendezvousAffinityFunction;
+import org.apache.ignite.cache.query.ScanQuery;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.IgniteInterruptedCheckedException;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.TestRecordingCommunicationSpi;
+import org.apache.ignite.internal.events.DiscoveryCustomEvent;
+import org.apache.ignite.internal.managers.discovery.DiscoveryCustomMessage;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionDemandMessage;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionSupplyMessage;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.PartitionsExchangeAware;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.metric.MetricRegistry;
+import org.apache.ignite.internal.processors.metric.impl.ObjectGauge;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.distributed.FullMessage;
+import org.apache.ignite.internal.util.distributed.SingleNodeMessage;
+import org.apache.ignite.internal.util.typedef.G;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.spi.metric.LongMetric;
+import org.apache.ignite.testframework.GridTestUtils;
+import org.apache.ignite.transactions.Transaction;
+import org.junit.Before;
+import org.junit.Test;
+
+import static org.apache.ignite.cluster.ClusterState.ACTIVE;
+import static org.apache.ignite.internal.events.DiscoveryCustomEvent.EVT_DISCOVERY_CUSTOM_EVT;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.SNAPSHOT_METRICS;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.SNP_IN_PROGRESS_ERR_MSG;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.SNP_NODE_STOPPING_ERR_MSG;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.isSnapshotOperation;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.resolveSnapshotWorkDirectory;
+import static org.apache.ignite.testframework.GridTestUtils.assertThrowsAnyCause;
+import static org.apache.ignite.testframework.GridTestUtils.assertThrowsWithCause;
+
+/**
+ * Cluster-wide snapshot test.
+ */
+public class IgniteClusterSnapshotSelfTest extends AbstractSnapshotSelfTest {
+    /** Random instance. */
+    private static final Random R = new Random();
+
+    /** Time to wait while rebalance may happen. */
+    private static final long REBALANCE_AWAIT_TIME = GridTestUtils.SF.applyLB(10_000, 3_000);
+
+    /** Cache configuration for test. */
+    private static CacheConfiguration<Integer, Integer> txCcfg = new CacheConfiguration<Integer, Integer>("txCacheName")
+        .setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL)
+        .setBackups(2)
+        .setAffinity(new RendezvousAffinityFunction(false)
+            .setPartitions(CACHE_PARTS_COUNT));
+
+    /** {@code true} if node should be started in separate jvm. */
+    protected volatile boolean jvm;
+
+    /** @throws Exception If fails. */
+    @Before
+    @Override public void beforeTestSnapshot() throws Exception {
+        super.beforeTestSnapshot();
+
+        jvm = false;
+    }
+
+    /**
+     * Take snapshot from the whole cluster and check snapshot consistency.
+     * Note: Client nodes and server nodes not in baseline topology must not be affected.
+     *
+     * @throws Exception If fails.
+     */
+    @Test
+    public void testConsistentClusterSnapshotUnderLoad() throws Exception {
+        int grids = 3;
+        String snpName = "backup23012020";
+        AtomicInteger atKey = new AtomicInteger(CACHE_KEYS_RANGE);
+        AtomicInteger txKey = new AtomicInteger(CACHE_KEYS_RANGE);
+
+        IgniteEx ignite = startGrids(grids);
+        startClientGrid();
+
+        ignite.cluster().baselineAutoAdjustEnabled(false);
+        ignite.cluster().state(ACTIVE);
+
+        // Start node not in baseline.
+        IgniteEx notBltIgnite = startGrid(grids);
+        File locSnpDir = snp(notBltIgnite).snapshotLocalDir(SNAPSHOT_NAME);
+        String notBltDirName = folderName(notBltIgnite);
+
+        IgniteCache<Integer, Integer> cache = ignite.createCache(txCcfg);
+
+        for (int idx = 0; idx < CACHE_KEYS_RANGE; idx++) {
+            cache.put(txKey.incrementAndGet(), -1);
+            ignite.cache(DEFAULT_CACHE_NAME).put(atKey.incrementAndGet(), -1);
+        }
+
+        forceCheckpoint();
+
+        CountDownLatch loadLatch = new CountDownLatch(1);
+
+        ignite.context().cache().context().exchange().registerExchangeAwareComponent(new PartitionsExchangeAware() {
+            /** {@inheritDoc} */
+            @Override public void onInitBeforeTopologyLock(GridDhtPartitionsExchangeFuture fut) {
+                if (fut.firstEvent().type() != EVT_DISCOVERY_CUSTOM_EVT)
+                    return;
+
+                // First discovery custom event will be a snapshot operation.
+                assertTrue(isSnapshotOperation(fut.firstEvent()));
+                assertTrue("Snapshot must use pme-free exchange", fut.context().exchangeFreeSwitch());
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onInitAfterTopologyLock(GridDhtPartitionsExchangeFuture fut) {
+                if (fut.firstEvent().type() != EVT_DISCOVERY_CUSTOM_EVT)
+                    return;
+
+                DiscoveryCustomMessage msg = ((DiscoveryCustomEvent)fut.firstEvent()).customMessage();
+
+                assertNotNull(msg);
+
+                if (msg instanceof SnapshotDiscoveryMessage)
+                    loadLatch.countDown();
+            }
+        });
+
+        // Start cache load
 
 Review comment:
   Point

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r408970551
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManager.java
 ##########
 @@ -0,0 +1,1986 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.channels.FileChannel;
+import java.nio.file.FileVisitResult;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.SimpleFileVisitor;
+import java.nio.file.attribute.BasicFileAttributes;
+import java.util.ArrayDeque;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Queue;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.Executor;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.concurrent.atomic.LongAdder;
+import java.util.function.BiConsumer;
+import java.util.function.BiFunction;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.events.DiscoveryEvent;
+import org.apache.ignite.failure.FailureContext;
+import org.apache.ignite.failure.FailureType;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.GridTopic;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.events.DiscoveryCustomEvent;
+import org.apache.ignite.internal.managers.communication.GridIoManager;
+import org.apache.ignite.internal.managers.communication.GridMessageListener;
+import org.apache.ignite.internal.managers.communication.TransmissionCancelledException;
+import org.apache.ignite.internal.managers.communication.TransmissionHandler;
+import org.apache.ignite.internal.managers.communication.TransmissionMeta;
+import org.apache.ignite.internal.managers.communication.TransmissionPolicy;
+import org.apache.ignite.internal.managers.discovery.DiscoCache;
+import org.apache.ignite.internal.managers.discovery.DiscoveryCustomMessage;
+import org.apache.ignite.internal.managers.discovery.GridDiscoveryManager;
+import org.apache.ignite.internal.managers.eventstorage.DiscoveryEventListener;
+import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.CacheType;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.PartitionsExchangeAware;
+import org.apache.ignite.internal.processors.cache.persistence.StorageException;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.filename.PdsFolderSettings;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.MetastorageLifecycleListener;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadOnlyMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadWriteMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.tree.io.PageIO;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.processors.marshaller.MappedName;
+import org.apache.ignite.internal.processors.metric.MetricRegistry;
+import org.apache.ignite.internal.processors.metric.impl.LongAdderMetric;
+import org.apache.ignite.internal.util.GridBusyLock;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.distributed.InitMessage;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.tostring.GridToStringInclude;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.A;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.S;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.lang.IgniteUuid;
+import org.apache.ignite.thread.IgniteThreadPoolExecutor;
+import org.apache.ignite.thread.OomExceptionHandler;
+import org.jetbrains.annotations.Nullable;
+
+import static java.nio.file.StandardOpenOption.READ;
+import static org.apache.ignite.cluster.ClusterState.active;
+import static org.apache.ignite.configuration.IgniteConfiguration.DFLT_SNAPSHOT_DIRECTORY;
+import static org.apache.ignite.events.EventType.EVT_NODE_FAILED;
+import static org.apache.ignite.events.EventType.EVT_NODE_LEFT;
+import static org.apache.ignite.internal.IgniteFeatures.PERSISTENCE_CACHE_SNAPSHOT;
+import static org.apache.ignite.internal.IgniteFeatures.nodeSupports;
+import static org.apache.ignite.internal.MarshallerContextImpl.mappingFileStoreWorkDir;
+import static org.apache.ignite.internal.MarshallerContextImpl.saveMappings;
+import static org.apache.ignite.internal.events.DiscoveryCustomEvent.EVT_DISCOVERY_CUSTOM_EVT;
+import static org.apache.ignite.internal.managers.communication.GridIoPolicy.SYSTEM_POOL;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.INDEX_PARTITION;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.MAX_PARTITION_ID;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.resolveBinaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.INDEX_FILE_NAME;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.PART_FILE_TEMPLATE;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFile;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFileName;
+import static org.apache.ignite.internal.processors.cache.persistence.filename.PdsConsistentIdProcessor.DB_DEFAULT_FOLDER;
+import static org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId.getFlagByPartId;
+import static org.apache.ignite.internal.util.IgniteUtils.isLocalNodeCoordinator;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.END_SNAPSHOT;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.START_SNAPSHOT;
+
+/**
+ * Internal implementation of snapshot operations over persistence caches.
+ * <p>
+ * There are two major actions available:
+ * <ul>
+ *     <li>Create snapshot of the whole cluster cache groups by triggering PME to achieve consistency.</li>
+ *     <li>Create local snapshot of requested cache groups and send it to the node which request this operation.
+ *     Cache groups will be transmitted using internal API for transferring files. See {@link TransmissionHandler}.</li>
+ * </ul>
+ */
+public class IgniteSnapshotManager extends GridCacheSharedManagerAdapter
+    implements IgniteSnapshot, PartitionsExchangeAware, MetastorageLifecycleListener {
+    /** File with delta pages suffix. */
+    public static final String DELTA_SUFFIX = ".delta";
+
+    /** File name template consists of delta pages. */
+    public static final String PART_DELTA_TEMPLATE = PART_FILE_TEMPLATE + DELTA_SUFFIX;
+
+    /** File name template for index delta pages. */
+    public static final String INDEX_DELTA_NAME = INDEX_FILE_NAME + DELTA_SUFFIX;
+
+    /** Text Reason for checkpoint to start snapshot operation. */
+    public static final String CP_SNAPSHOT_REASON = "Checkpoint started to enforce snapshot operation: %s";
+
+    /** Name prefix for each remote snapshot operation. */
+    public static final String RMT_SNAPSHOT_PREFIX = "snapshot_";
+
+    /** Default snapshot directory for loading remote snapshots. */
+    public static final String DFLT_SNAPSHOT_TMP_DIR = "snp";
+
+    /** Timeout in millisecond for snapshot operations. */
+    public static final long DFLT_SNAPSHOT_TIMEOUT = 15_000L;
+
+    /** Snapshot in progress error message. */
+    public static final String SNP_IN_PROGRESS_ERR_MSG = "Operation rejected due to the snapshot operation in progress.";
+
+    /** Error message to finalize snapshot tasks. */
+    public static final String SNP_NODE_STOPPING_ERR_MSG = "Snapshot has been cancelled due to the local node " +
+        "is stopping";
+
+    /** Metastorage key to save currently running snapshot. */
+    public static final String SNP_RUNNING_KEY = "snapshot-running";
+
+    /** Snapshot metrics prefix. */
+    public static final String SNAPSHOT_METRICS = "snapshot";
+
+    /** Prefix for snapshot threads. */
+    private static final String SNAPSHOT_RUNNER_THREAD_PREFIX = "snapshot-runner";
+
+    /** Total number of thread to perform local snapshot. */
+    private static final int SNAPSHOT_THREAD_POOL_SIZE = 4;
+
+    /** Default snapshot topic to receive snapshots from remote node. */
+    private static final Object DFLT_INITIAL_SNAPSHOT_TOPIC = GridTopic.TOPIC_SNAPSHOT.topic("rmt_snp");
+
+    /** File transmission parameter of cache group id. */
+    private static final String SNP_GRP_ID_PARAM = "grpId";
+
+    /** File transmission parameter of cache partition id. */
+    private static final String SNP_PART_ID_PARAM = "partId";
+
+    /** File transmission parameter of node-sender directory path with its consistentId (e.g. db/IgniteNode0). */
+    private static final String SNP_DB_NODE_PATH_PARAM = "dbNodePath";
+
+    /** File transmission parameter of a cache directory with is currently sends its partitions. */
+    private static final String SNP_CACHE_DIR_NAME_PARAM = "cacheDirName";
+
+    /** Snapshot parameter name for a file transmission. */
+    private static final String SNP_NAME_PARAM = "snpName";
+
+    /** Total snapshot files count which receiver should expect to receive. */
+    private static final String SNP_PARTITIONS_CNT = "partsCnt";
+
+    /**
+     * Local buffer to perform copy-on-write operations with pages for {@code SnapshotFutureTask.PageStoreSerialWriter}s.
+     * It is important to have only only buffer per thread (instead of creating each buffer per
+     * each {@code SnapshotFutureTask.PageStoreSerialWriter}) this is redundant and can lead to OOM errors. Direct buffer
+     * deallocate only when ByteBuffer is garbage collected, but it can get out of off-heap memory before it.
+     */
+    private final ThreadLocal<ByteBuffer> locBuff;
+
+    /** Map of registered cache snapshot processes and their corresponding contexts. */
+    private final ConcurrentMap<String, SnapshotFutureTask> locSnpTasks = new ConcurrentHashMap<>();
+
+    /** Lock to protect the resources is used. */
+    private final GridBusyLock busyLock = new GridBusyLock();
+
+    /** Requested snapshot from remote node. */
+    private final AtomicReference<RemoteSnapshotFuture> rmtSnpReq = new AtomicReference<>();
+
+    /** Mutex used to order cluster snapshot operation progress. */
+    private final Object snpOpMux = new Object();
+
+    /** Take snapshot operation procedure. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> startSnpProc;
+
+    /** Check previously performed snapshot operation and delete uncompleted files if need. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> endSnpProc;
+
+    /** Resolved persistent data storage settings. */
+    private volatile PdsFolderSettings pdsSettings;
+
+    /** Fully initialized metastorage. */
+    private volatile ReadWriteMetastorage metaStorage;
+
+    /** Local snapshot sender factory. */
+    private Function<String, SnapshotSender> locSndrFactory = localSnapshotSenderFactory();
+
+    /** Main snapshot directory to save created snapshots. */
+    private volatile File locSnpDir;
+
+    /**
+     * Working directory for loaded snapshots from the remote nodes and storing
+     * temporary partition delta-files of locally started snapshot process.
+     */
+    private File tmpWorkDir;
+
+    /** Factory to working with delta as file storage. */
+    private volatile FileIOFactory ioFactory = new RandomAccessFileIOFactory();
+
+    /** Factory to create page store for restore. */
+    private volatile BiFunction<Integer, Boolean, FilePageStoreFactory> storeFactory;
+
+    /** Snapshot thread pool to perform local partition snapshots. */
+    private ExecutorService snpRunner;
+
+    /** System discovery message listener. */
+    private DiscoveryEventListener discoLsnr;
+
+    /** Cluster snapshot operation requested by user. */
+    private ClusterSnapshotFuture clusterSnpFut;
+
+    /** Current snapshot operation on local node. */
+    private volatile SnapshotOperationRequest clusterSnpRq;
+
+    /** {@code true} if recovery process occurred for snapshot. */
+    private volatile boolean recovered;
+
+    /** Last seen cluster snapshot operation. */
+    private volatile ClusterSnapshotFuture lastSeenSnpFut = new ClusterSnapshotFuture();
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public IgniteSnapshotManager(GridKernalContext ctx) {
+        locBuff = ThreadLocal.withInitial(() ->
+            ByteBuffer.allocateDirect(ctx.config().getDataStorageConfiguration().getPageSize())
+                .order(ByteOrder.nativeOrder()));
+
+        startSnpProc = new DistributedProcess<>(ctx, START_SNAPSHOT, this::initLocalSnapshotStartStage,
+            this::processLocalSnapshotStartStageResult);
+
+        endSnpProc = new DistributedProcess<>(ctx, END_SNAPSHOT, this::initLocalSnapshotEndStage,
+            this::processLocalSnapshotEndStageResult);
+    }
+
+    /**
+     * @param snapshotCacheDir Snapshot directory to store files.
+     * @param partId Cache partition identifier.
+     * @return A file representation.
+     */
+    public static File partDeltaFile(File snapshotCacheDir, int partId) {
+        return new File(snapshotCacheDir, partDeltaFileName(partId));
+    }
+
+    /**
+     * @param partId Partition id.
+     * @return File name of delta partition pages.
+     */
+    public static String partDeltaFileName(int partId) {
+        assert partId <= MAX_PARTITION_ID || partId == INDEX_PARTITION;
+
+        return partId == INDEX_PARTITION ? INDEX_DELTA_NAME : String.format(PART_DELTA_TEMPLATE, partId);
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void start0() throws IgniteCheckedException {
+        super.start0();
+
+        GridKernalContext ctx = cctx.kernalContext();
+
+        if (ctx.clientNode())
+            return;
+
+        if (!CU.isPersistenceEnabled(ctx.config()))
+            return;
+
+        snpRunner = new IgniteThreadPoolExecutor(SNAPSHOT_RUNNER_THREAD_PREFIX,
+            cctx.igniteInstanceName(),
+            SNAPSHOT_THREAD_POOL_SIZE,
+            SNAPSHOT_THREAD_POOL_SIZE,
+            IgniteConfiguration.DFLT_THREAD_KEEP_ALIVE_TIME,
+            new LinkedBlockingQueue<>(),
+            SYSTEM_POOL,
+            new OomExceptionHandler(ctx));
+
+        assert cctx.pageStore() instanceof FilePageStoreManager;
+
+        FilePageStoreManager storeMgr = (FilePageStoreManager)cctx.pageStore();
+
+        pdsSettings = cctx.kernalContext().pdsFolderResolver().resolveFolders();
+
+        locSnpDir = resolveSnapshotWorkDirectory(ctx.config());
+        tmpWorkDir = Paths.get(storeMgr.workDir().getAbsolutePath(), DFLT_SNAPSHOT_TMP_DIR).toFile();
+
+        U.ensureDirectory(locSnpDir, "snapshot work directory", log);
+        U.ensureDirectory(tmpWorkDir, "temp directory for snapshot creation", log);
+
+        MetricRegistry mreg = cctx.kernalContext().metric().registry(SNAPSHOT_METRICS);
+
+        mreg.register("LastSnapshotStartTime", () -> lastSeenSnpFut.startTime,
+            "The system time approximated by 10 ms when the last cluster snapshot operation has been started.");
+        mreg.register("LastSnapshotEndTime", () -> lastSeenSnpFut.endTime,
+            "The system time approximated by 10 ms when the last cluster snapshot operation has been finished.");
+        mreg.register("LastSnapshotName", () -> lastSeenSnpFut.name, String.class,
+            "The name of last started cluster snapshot operation.");
+        mreg.register("LastSnapshotErrorMessage",
+            () -> lastSeenSnpFut.error() == null ? null : lastSeenSnpFut.error().getMessage(),
+            String.class,
+            "The error message of last started cluster snapshot operation which fail. This value will be 'null' " +
+                "if last snapshot operation completed successfully.");
+        mreg.register("localSnapshotList", this::getSnapshots, List.class,
+            "The list of all known snapshots currently saved on the local node with respect to " +
+                "configured via IgniteConfiguration a snapshot path.");
+
+        storeFactory = storeMgr::getPageStoreFactory;
+
+        cctx.exchange().registerExchangeAwareComponent(this);
+        ctx.internalSubscriptionProcessor().registerMetastorageListener(this);
+
+        // Receive remote snapshots requests.
+        cctx.gridIO().addMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC, new GridMessageListener() {
+            @Override public void onMessage(UUID nodeId, Object msg, byte plc) {
+                if (!busyLock.enterBusy())
+                    return;
+
+                try {
+                    if (msg instanceof SnapshotRequestMessage) {
+                        SnapshotRequestMessage reqMsg0 = (SnapshotRequestMessage)msg;
+                        String snpName = reqMsg0.snapshotName();
+
+                        synchronized (this) {
+                            SnapshotFutureTask task = lastScheduledRemoteSnapshotTask(nodeId);
+
+                            if (task != null) {
+                                // Task will also be removed from local map due to the listener on future done.
+                                task.cancel();
+
+                                log.info("Snapshot request has been cancelled due to another request received " +
+                                    "[prevSnpResp=" + task + ", msg0=" + reqMsg0 + ']');
+                            }
+                        }
+
+                        SnapshotFutureTask task = registerSnapshotTask(snpName,
+                            nodeId,
+                            reqMsg0.parts(),
+                            remoteSnapshotSender(snpName, nodeId));
+
+                        task.listen(f -> {
+                            if (f.error() == null)
+                                return;
+
+                            U.error(log, "Failed to process request of creating a snapshot " +
+                                "[from=" + nodeId + ", msg=" + reqMsg0 + ']', f.error());
+
+                            try {
+                                cctx.gridIO().sendToCustomTopic(nodeId,
+                                    DFLT_INITIAL_SNAPSHOT_TOPIC,
+                                    new SnapshotResponseMessage(reqMsg0.snapshotName(), f.error().getMessage()),
+                                    SYSTEM_POOL);
+                            }
+                            catch (IgniteCheckedException ex0) {
+                                U.error(log, "Fail to send the response message with processing snapshot request " +
+                                    "error [request=" + reqMsg0 + ", nodeId=" + nodeId + ']', ex0);
+                            }
+                        });
+
+                        task.start();
+                    }
+                    else if (msg instanceof SnapshotResponseMessage) {
+                        SnapshotResponseMessage respMsg0 = (SnapshotResponseMessage)msg;
+
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.snpName.equals(respMsg0.snapshotName())) {
+                            if (log.isInfoEnabled()) {
+                                log.info("A stale snapshot response message has been received. Will be ignored " +
+                                    "[fromNodeId=" + nodeId + ", response=" + respMsg0 + ']');
+                            }
+
+                            return;
+                        }
+
+                        if (respMsg0.errorMessage() != null) {
+                            fut0.onDone(new IgniteCheckedException("Request cancelled. The snapshot operation stopped " +
+                                "on the remote node with an error: " + respMsg0.errorMessage()));
+                        }
+                    }
+                }
+                catch (Throwable e) {
+                    U.error(log, "Processing snapshot request from remote node fails with an error", e);
+
+                    cctx.kernalContext().failure().process(new FailureContext(FailureType.CRITICAL_ERROR, e));
+                }
+                finally {
+                    busyLock.leaveBusy();
+                }
+            }
+        });
+
+        cctx.gridEvents().addDiscoveryEventListener(discoLsnr = (evt, discoCache) -> {
+            if (!busyLock.enterBusy())
+                return;
+
+            try {
+                UUID leftNodeId = evt.eventNode().id();
+
+                if (evt.type() == EVT_DISCOVERY_CUSTOM_EVT) {
+                    DiscoveryCustomEvent evt0 = (DiscoveryCustomEvent)evt;
+
+                    if (evt0.customMessage() instanceof InitMessage) {
+                        InitMessage<?> msg = (InitMessage<?>)evt0.customMessage();
+
+                        // This happens when #takeSnapshot() method already invoked and distributed process
+                        // starts its action.
+                        if (msg.type() == START_SNAPSHOT.ordinal()) {
+                            assert clusterSnpRq != null ||
+                                !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()) : evt;
+
+                            DiscoveryCustomEvent customEvt = new DiscoveryCustomEvent();
+
+                            customEvt.node(evt0.node());
+                            customEvt.eventNode(evt0.eventNode());
+                            customEvt.affinityTopologyVersion(evt0.affinityTopologyVersion());
+                            customEvt.customMessage(new SnapshotStartDiscoveryMessage(discoCache, msg.processId()));
+
+                            // Handle new event inside discovery thread, so no guarantees will be violated.
+                            cctx.exchange().onDiscoveryEvent(customEvt, discoCache);
+                        }
+                    }
+                }
+                else if (evt.type() == EVT_NODE_LEFT || evt.type() == EVT_NODE_FAILED) {
+                    SnapshotOperationRequest snpRq = clusterSnpRq;
+
+                    for (SnapshotFutureTask sctx : locSnpTasks.values()) {
+                        if (sctx.sourceNodeId().equals(leftNodeId) ||
+                            (snpRq != null &&
+                                snpRq.snpName.equals(sctx.snapshotName()) &&
+                                snpRq.bltNodes.contains(leftNodeId))) {
+                            sctx.acceptException(new ClusterTopologyCheckedException("The node which requested snapshot " +
+                                "creation has left the grid"));
+                        }
+                    }
+
+                    RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                    if (snpTrFut != null && snpTrFut.rmtNodeId.equals(leftNodeId)) {
+                        snpTrFut.onDone(new ClusterTopologyCheckedException("The node from which a snapshot has been " +
+                            "requested left the grid"));
+                    }
+                }
+            }
+            finally {
+                busyLock.leaveBusy();
+            }
+        }, EVT_NODE_LEFT, EVT_NODE_FAILED, EVT_DISCOVERY_CUSTOM_EVT);
+
+        // Remote snapshot handler.
+        cctx.kernalContext().io().addTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC, new TransmissionHandler() {
+            @Override public void onEnd(UUID nodeId) {
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                assert snpTrFut.stores.isEmpty() : snpTrFut.stores.entrySet();
+                assert snpTrFut.partsLeft == 0 : snpTrFut;
+
+                snpTrFut.onDone();
+
+                log.info("Requested snapshot from remote node has been fully received " +
+                    "[snpName=" + snpTrFut.snpName + ", snpTrans=" + snpTrFut + ']');
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onException(UUID nodeId, Throwable err) {
+                RemoteSnapshotFuture fut = rmtSnpReq.get();
+
+                if (fut == null)
+                    return;
+
+                if (fut.rmtNodeId.equals(nodeId))
+                    fut.onDone(err);
+            }
+
+            /** {@inheritDoc} */
+            @Override public String filePath(UUID nodeId, TransmissionMeta fileMeta) {
+                Integer partId = (Integer)fileMeta.params().get(SNP_PART_ID_PARAM);
+                String rmtDbNodePath = (String)fileMeta.params().get(SNP_DB_NODE_PATH_PARAM);
+                String cacheDirName = (String)fileMeta.params().get(SNP_CACHE_DIR_NAME_PARAM);
+
+                RemoteSnapshotFuture transFut = resolve(nodeId, fileMeta);
+
+                try {
+                    File cacheDir = U.resolveWorkDirectory(tmpWorkDir.getAbsolutePath(),
+                        Paths.get(transFut.snpName, rmtDbNodePath, cacheDirName).toString(),
+                        false);
+
+                    return new File(cacheDir, getPartitionFileName(partId)).getAbsolutePath();
+                }
+                catch (IgniteCheckedException e) {
+                    throw new IgniteException(e);
+                }
+            }
+
+            /**
+             * @param nodeId Remote node id.
+             * @param meta Transmission meta.
+             * @return Resolved transmission future.
+             */
+            private RemoteSnapshotFuture resolve(UUID nodeId, TransmissionMeta meta) {
+                String snpName = (String)meta.params().get(SNP_NAME_PARAM);
+                Integer partsCnt = (Integer)meta.params().get(SNP_PARTITIONS_CNT);
+
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                if (snpTrFut == null || !snpTrFut.snpName.equals(snpName)) {
+                    throw new TransmissionCancelledException("Stale snapshot transmission will be ignored " +
+                        "[snpName=" + snpName + ", meta=" + meta + ", snpTrFut=" + snpTrFut + ']');
+                }
+
+                assert snpTrFut.snpName.equals(snpName) && snpTrFut.rmtNodeId.equals(nodeId) :
+                    "Another transmission in progress [snpTrFut=" + snpTrFut + ", nodeId=" + snpName + ']';
+
+                if (snpTrFut.partsLeft == -1)
+                    snpTrFut.partsLeft = partsCnt;
+
+                return snpTrFut;
+            }
+
+            /**
+             * @param snpTrans Current snapshot transmission.
+             * @param grpPartId Pair of group id and its partition id.
+             */
+            private void finishRecover(RemoteSnapshotFuture snpTrans, GroupPartitionId grpPartId) {
+                FilePageStore pageStore = null;
+
+                try {
+                    pageStore = snpTrans.stores.remove(grpPartId);
+
+                    pageStore.finishRecover();
+
+                    snpTrans.partConsumer.accept(new File(pageStore.getFileAbsolutePath()), grpPartId);
+
+                    snpTrans.partsLeft--;
+                }
+                catch (StorageException e) {
+                    throw new IgniteException(e);
+                }
+                finally {
+                    U.closeQuiet(pageStore);
+                }
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<ByteBuffer> chunkHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+
+                RemoteSnapshotFuture snpTrFut = resolve(nodeId, initMeta);
+
+                GroupPartitionId grpPartId = new GroupPartitionId(grpId, partId);
+                FilePageStore pageStore = snpTrFut.stores.get(grpPartId);
+
+                if (pageStore == null) {
+                    throw new IgniteException("Partition must be loaded before applying snapshot delta pages " +
+                        "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                pageStore.beginRecover();
+
+                // No snapshot delta pages received. Finalize recovery.
+                if (initMeta.count() == 0)
+                    finishRecover(snpTrFut, grpPartId);
+
+                return new Consumer<ByteBuffer>() {
+                    final LongAdder transferred = new LongAdder();
+
+                    @Override public void accept(ByteBuffer buff) {
+                        try {
+                            assert initMeta.count() != 0 : initMeta;
+
+                            RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                            if (fut0 == null || !fut0.equals(snpTrFut) || fut0.isCancelled()) {
+                                throw new TransmissionCancelledException("Snapshot request is cancelled " +
+                                    "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                            }
+
+                            pageStore.write(PageIO.getPageId(buff), buff, 0, false);
+
+                            transferred.add(buff.capacity());
+
+                            if (transferred.longValue() == initMeta.count())
+                                finishRecover(snpTrFut, grpPartId);
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                    }
+                };
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<File> fileHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+                String snpName = (String)initMeta.params().get(SNP_NAME_PARAM);
+
+                assert grpId != null;
+                assert partId != null;
+                assert snpName != null;
+                assert storeFactory != null;
+
+                RemoteSnapshotFuture transFut = rmtSnpReq.get();
+
+                if (transFut == null) {
+                    throw new IgniteException("Snapshot transmission with given name doesn't exists " +
+                        "[snpName=" + snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                return new Consumer<File>() {
+                    @Override public void accept(File file) {
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.equals(transFut) || fut0.isCancelled()) {
+                            throw new TransmissionCancelledException("Snapshot request is cancelled [snpName=" + snpName +
+                                ", grpId=" + grpId + ", partId=" + partId + ']');
+                        }
+
+                        busyLock.enterBusy();
+
+                        try {
+                            FilePageStore pageStore = (FilePageStore)storeFactory
+                                .apply(grpId, false)
+                                .createPageStore(getFlagByPartId(partId),
+                                    file::toPath,
+                                    new LongAdderMetric("NO_OP", null));
+
+                            transFut.stores.put(new GroupPartitionId(grpId, partId), pageStore);
+
+                            pageStore.init();
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                        finally {
+                            busyLock.leaveBusy();
+                        }
+                    }
+                };
+            }
+        });
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void stop0(boolean cancel) {
+        busyLock.block();
+
+        try {
+            // Try stop all snapshot processing if not yet.
+            for (SnapshotFutureTask sctx : locSnpTasks.values())
+                sctx.acceptException(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+            locSnpTasks.clear();
+
+            RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+            if (snpTrFut != null)
+                snpTrFut.cancel();
+
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null) {
+                    clusterSnpFut.onDone(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+                    clusterSnpFut = null;
+                }
+            }
+
+            if (snpRunner != null)
+                snpRunner.shutdownNow();
+
+            cctx.kernalContext().io().removeMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC);
+            cctx.kernalContext().io().removeTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC);
+
+            if (discoLsnr != null)
+                cctx.kernalContext().event().removeDiscoveryEventListener(discoLsnr);
+
+            cctx.exchange().unregisterExchangeAwareComponent(this);
+        }
+        finally {
+            busyLock.unblock();
+        }
+    }
+
+    /**
+     * @param snpDir Snapshot dire
+     * @param folderName Local node folder name (see {@link U#maskForFileName} with consistent id).
+     */
+    public static void deleteSnapshot(File snpDir, String folderName) {
+        if (!snpDir.exists())
+            return;
+
+        assert snpDir.isDirectory() : snpDir;
+
+        try {
+            File binDir = resolveBinaryWorkDir(snpDir.getAbsolutePath(), folderName);
+            File dbDir = U.resolveWorkDirectory(snpDir.getAbsolutePath(), databaseRelativePath(folderName), false);
+
+            U.delete(binDir);
+            U.delete(dbDir);
+
+            File marshDir = mappingFileStoreWorkDir(snpDir.getAbsolutePath());
+
+            // Concurrently traverse the snapshot marshaller directory and delete all files.
+            Files.walkFileTree(marshDir.toPath(), new SimpleFileVisitor<Path>() {
+                @Override public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException {
+                    U.delete(file);
+
+                    return FileVisitResult.CONTINUE;
+                }
+
+                @Override public FileVisitResult visitFileFailed(Path file, IOException exc) {
+                    // Skip files which can be concurrently removed from FileTree.
+                    return FileVisitResult.CONTINUE;
+                }
+            });
+
+            File db = new File(snpDir, DB_DEFAULT_FOLDER);
+
+            if (!db.exists() || db.list().length == 0)
+                U.delete(snpDir);
+        }
+        catch (IOException | IgniteCheckedException e) {
+            throw new IgniteException(e);
+        }
+    }
+
+    /**
+     * @param snpName Snapshot name.
+     * @return Local snapshot directory for snapshot with given name.
+     */
+    public File snapshotLocalDir(String snpName) {
+        assert locSnpDir != null;
+
+        return new File(locSnpDir, snpName);
+    }
+
+    /**
+     * @return Node snapshot working directory.
+     */
+    public File snapshotTmpDir() {
+        assert tmpWorkDir != null;
+
+        return tmpWorkDir;
+    }
+
+    /**
+     * @param req Request on snapshot creation.
+     * @return Future which will be completed when a snapshot has been started.
+     */
+    private IgniteInternalFuture<SnapshotOperationResponse> initLocalSnapshotStartStage(SnapshotOperationRequest req) {
+        if (cctx.kernalContext().clientNode() ||
+            !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()))
+            return new GridFinishedFuture<>();
+
+        // Executed inside discovery notifier thread, prior to firing discovery custom event,
+        // so it is safe to set new snapshot task inside this method without synchronization.
+        if (clusterSnpRq != null) {
+            return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot operation has been rejected. " +
+                "Another snapshot operation in progress [req=" + req + ", curr=" + clusterSnpRq + ']'));
+        }
+
+        // Collection of pairs group and appropriate cache partition to be snapshot.
+        Map<Integer, Set<Integer>> parts = new HashMap<>();
+
+        for (Integer grpId : req.grpIds)
+            parts.put(grpId, null);
+
+        SnapshotFutureTask task0 = registerSnapshotTask(req.snpName,
+            req.srcNodeId,
+            parts,
+            locSndrFactory.apply(req.snpName));
+
+        clusterSnpRq = req;
+
+        return task0.chain(f -> new SnapshotOperationResponse());
+    }
+
+    /**
+     * @param id Request id.
+     * @param res Results.
+     * @param err Errors.
+     */
+    private void processLocalSnapshotStartStageResult(UUID id, Map<UUID, SnapshotOperationResponse> res, Map<UUID, Exception> err) {
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        if (snpRq == null)
+            return;
+
+        if (!snpRq.rqId.equals(id)) {
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null && clusterSnpFut.rqId.equals(id)) {
+                    clusterSnpFut.onDone(new IgniteCheckedException("Snapshot operation failed due to another snapshot " +
+                        "operation in progress: " + snpRq.snpName));
+
+                    clusterSnpFut = null;
+                }
+
+                return;
+            }
+        }
+
+        if (isLocalNodeCoordinator(cctx.discovery())) {
+            Set<UUID> missed = new HashSet<>(snpRq.bltNodes);
+            missed.removeAll(res.keySet());
+            missed.removeAll(err.keySet());
+
+            snpRq.hasErr = !F.isEmpty(err) || !missed.isEmpty();
+
+            if (snpRq.hasErr) {
+                U.warn(log, "Execution of local snapshot tasks fails or them haven't been executed " +
+                    "due to some of nodes left the cluster. Uncompleted snapshot will be deleted " +
+                    "[err=" + err + ", missed=" + missed + ']');
+            }
+
+            endSnpProc.start(UUID.randomUUID(), snpRq);
+        }
+    }
+
+    /**
+     * @param req Request on snapshot creation.
+     * @return Future which will be completed when the snapshot will be finalized.
+     */
+    private IgniteInternalFuture<SnapshotOperationResponse> initLocalSnapshotEndStage(SnapshotOperationRequest req) {
+        if (clusterSnpRq == null)
+            return new GridFinishedFuture<>(new SnapshotOperationResponse());
+
+        try {
+            if (req.hasErr)
+                deleteSnapshot(snapshotLocalDir(req.snpName), pdsSettings.folderName());
+
+            removeLastMetaStorageKey();
+        }
+        catch (Exception e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        return new GridFinishedFuture<>(new SnapshotOperationResponse());
+    }
+
+    /**
+     * @param id Request id.
+     * @param res Results.
+     * @param err Errors.
+     */
+    private void processLocalSnapshotEndStageResult(UUID id, Map<UUID, SnapshotOperationResponse> res, Map<UUID, Exception> err) {
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        if (snpRq == null)
+            return;
+
+        Set<UUID> endFail = new HashSet<>(snpRq.bltNodes);
+        endFail.removeAll(res.keySet());
+
+        clusterSnpRq = null;
+
+        synchronized (snpOpMux) {
+            if (clusterSnpFut != null) {
+                if (endFail.isEmpty() && !snpRq.hasErr) {
+                    clusterSnpFut.onDone();
+
+                    if (log.isInfoEnabled())
+                        log.info("Cluster-wide snapshot operation finished successfully [req=" + snpRq + ']');
+                }
+                else {
+                    clusterSnpFut.onDone(new IgniteCheckedException("Snapshot creation has been finished with an error. " +
+                        "Local snapshot tasks may not finished completely or finalizing results fails " +
+                        "[hasErr" + snpRq.hasErr + ", fail=" + endFail + ']'));
+                }
+
+                clusterSnpFut = null;
+            }
+        }
+    }
+
+    /**
+     * @return {@code True} if snapshot operation is in progress.
+     */
+    public boolean isSnapshotCreating() {
+        if (clusterSnpRq != null)
+            return true;
+
+        synchronized (snpOpMux) {
+            return clusterSnpRq != null || clusterSnpFut != null;
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public List<String> getSnapshots() {
+        if (cctx.kernalContext().clientNode())
+            throw new UnsupportedOperationException("Client and daemon nodes can not perform this operation.");
+
+        synchronized (snpOpMux) {
+            return Arrays.stream(locSnpDir.listFiles(File::isDirectory))
+                .map(File::getName)
+                .collect(Collectors.toList());
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public IgniteFuture<Void> createSnapshot(String name) {
+        A.notNullOrEmpty(name, "name");
+
+        try {
+            if (cctx.kernalContext().clientNode())
+                throw new UnsupportedOperationException("Client and daemon nodes can not perform this operation.");
+
+            if (!IgniteFeatures.allNodesSupports(cctx.discovery().allNodes(), PERSISTENCE_CACHE_SNAPSHOT))
+                throw new IgniteException("Not all nodes in the cluster support a snapshot operation.");
+
+            if (!active(cctx.kernalContext().state().clusterState().state()))
+                throw new IgniteException("Snapshot operation has been rejected. The cluster is inactive.");
+
+            DiscoveryDataClusterState clusterState = cctx.kernalContext().state().clusterState();
+
+            if (!clusterState.hasBaselineTopology())
+                throw new IgniteException("Snapshot operation has been rejected. The baseline topology is not configured for cluster.");
+
+            ClusterSnapshotFuture snpFut0;
+
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null && !clusterSnpFut.isDone())
+                    throw new IgniteException("Create snapshot request has been rejected. The previous snapshot operation was not completed.");
+
+                if (clusterSnpRq != null)
+                    throw new IgniteException("Create snapshot request has been rejected. Parallel snapshot processes are not allowed.");
+
+                if (getSnapshots().contains(name))
+                    throw new IgniteException("Create snapshot request has been rejected. Snapshot with given name already exists.");
+
+                snpFut0 = new ClusterSnapshotFuture(UUID.randomUUID(), name);
+
+                clusterSnpFut = snpFut0;
+                lastSeenSnpFut = snpFut0;
+            }
+
+            List<Integer> grps = cctx.cache().persistentGroups().stream()
+                .filter(g -> cctx.cache().cacheType(g.cacheOrGroupName()) == CacheType.USER)
+                .filter(g -> !g.config().isEncryptionEnabled())
+                .map(CacheGroupDescriptor::groupId)
+                .collect(Collectors.toList());
+
+            List<ClusterNode> srvNodes = cctx.discovery().serverNodes(AffinityTopologyVersion.NONE);
+
+            startSnpProc.start(snpFut0.rqId, new SnapshotOperationRequest(snpFut0.rqId,
+                cctx.localNodeId(),
+                name,
+                grps,
+                new HashSet<>(F.viewReadOnly(srvNodes,
+                    F.node2id(),
+                    (node) -> CU.baselineNode(node, clusterState)))));
+
+            if (log.isInfoEnabled())
+                log.info("Cluster-wide snapshot operation started [snpName=" + name + ", grps=" + grps + ']');
+
+            return new IgniteFutureImpl<>(snpFut0);
+        }
+        catch (Exception e) {
+            U.error(log, "Start snapshot operation failed", e);
+
+            lastSeenSnpFut = new ClusterSnapshotFuture(name, e);
+
+            return new IgniteFinishedFutureImpl<>(e);
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onReadyForReadWrite(ReadWriteMetastorage metaStorage) throws IgniteCheckedException {
+        synchronized (snpOpMux) {
+            this.metaStorage = metaStorage;
+
+            if (recovered)
+                removeLastMetaStorageKey();
+
+            recovered = false;
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onReadyForRead(ReadOnlyMetastorage metaStorage) throws IgniteCheckedException {
+        // Snapshot which has not been completed due to the local node crashed must be deleted.
+        String snpName = (String)metaStorage.read(SNP_RUNNING_KEY);
+
+        if (snpName == null)
+            return;
+
+        recovered = true;
+
+        for (File tmp : snapshotTmpDir().listFiles())
+            U.delete(tmp);
+
+        deleteSnapshot(snapshotLocalDir(snpName), pdsSettings.folderName());
+
+        if (log.isInfoEnabled()) {
+            log.info("Previous attempt to create snapshot fail due to the local node crash. All resources " +
+                "related to snapshot operation have been deleted: " + snpName);
+        }
+    }
+
+    /**
+     * @param evt Discovery event to check.
+     * @return {@code true} if exchange started by snapshot operation.
+     */
+    public static boolean isSnapshotOperation(DiscoveryEvent evt) {
+        return !evt.eventNode().isClient() &&
+            evt.type() == EVT_DISCOVERY_CUSTOM_EVT &&
+            ((DiscoveryCustomEvent)evt).customMessage() instanceof SnapshotStartDiscoveryMessage;
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onDoneBeforeTopologyUnlock(GridDhtPartitionsExchangeFuture fut) {
+        if (clusterSnpRq == null || cctx.kernalContext().clientNode())
+            return;
+
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        SnapshotFutureTask task = locSnpTasks.get(snpRq.snpName);
+
+        if (task == null)
+            return;
+
+        if (task.start()) {
+            cctx.database().forceCheckpoint(String.format("Start snapshot operation: %s", snpRq.snpName));
+
+            // schedule task on checkpoint and wait when it starts
+            try {
+                task.awaitStarted();
+            }
+            catch (IgniteCheckedException e) {
+                U.error(log, "Fail to wait while cluster-wide snapshot operation started", e);
+            }
+        }
+    }
+
+    /**
+     * @param parts Collection of pairs group and appropriate cache partition to be snapshot.
+     * @param rmtNodeId The remote node to connect to.
+     * @param partConsumer Received partition handler.
+     * @return Future which will be completed when requested snapshot fully received.
+     */
+    public IgniteInternalFuture<Void> createRemoteSnapshot(
+        UUID rmtNodeId,
+        Map<Integer, Set<Integer>> parts,
+        BiConsumer<File, GroupPartitionId> partConsumer
+    ) {
+        assert partConsumer != null;
+
+        ClusterNode rmtNode = cctx.discovery().node(rmtNodeId);
+
+        if (!nodeSupports(rmtNode, PERSISTENCE_CACHE_SNAPSHOT))
+            return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot on remote node is not supported: " + rmtNode.id()));
+
+        if (rmtNode == null) {
+            return new GridFinishedFuture<>(new ClusterTopologyCheckedException("Snapshot request cannot be performed. " +
+                "Remote node left the grid [rmtNodeId=" + rmtNodeId + ']'));
+        }
+
+        String snpName = RMT_SNAPSHOT_PREFIX + UUID.randomUUID().toString();
+
+        RemoteSnapshotFuture snpTransFut = new RemoteSnapshotFuture(rmtNodeId, snpName, partConsumer);
+
+        busyLock.enterBusy();
+        SnapshotRequestMessage msg0;
+
+        try {
+            msg0 = new SnapshotRequestMessage(snpName, parts);
+
+            RemoteSnapshotFuture fut = rmtSnpReq.get();
+
+            try {
+                if (fut != null)
+                    fut.get(DFLT_SNAPSHOT_TIMEOUT, TimeUnit.MILLISECONDS);
+            }
+            catch (IgniteCheckedException e) {
+                if (log.isInfoEnabled())
+                    log.info("The previous snapshot request finished with an exception:" + e.getMessage());
+            }
+
+            try {
+                if (rmtSnpReq.compareAndSet(null, snpTransFut)) {
+                    cctx.gridIO().sendOrderedMessage(rmtNode, DFLT_INITIAL_SNAPSHOT_TOPIC, msg0, SYSTEM_POOL,
+                        Long.MAX_VALUE, true);
+                }
+                else
+                    return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot request has been concurrently interrupted."));
+
+            }
+            catch (IgniteCheckedException e) {
+                rmtSnpReq.compareAndSet(snpTransFut, null);
+
+                return new GridFinishedFuture<>(e);
+            }
+        }
+        finally {
+            busyLock.leaveBusy();
+        }
+
+        if (log.isInfoEnabled()) {
+            log.info("Snapshot request is sent to the remote node [rmtNodeId=" + rmtNodeId +
+                ", msg0=" + msg0 + ", snpTransFut=" + snpTransFut +
+                ", topVer=" + cctx.discovery().topologyVersionEx() + ']');
+        }
+
+        return snpTransFut;
+    }
+
+    /**
+     * @param grps List of cache groups which will be destroyed.
+     */
+    public void onCacheGroupsStopped(List<Integer> grps) {
+        for (SnapshotFutureTask sctx : locSnpTasks.values()) {
+            Set<Integer> retain = new HashSet<>(grps);
+            retain.retainAll(sctx.affectedCacheGroups());
+
+            if (!retain.isEmpty()) {
+                sctx.acceptException(new IgniteCheckedException("Snapshot has been interrupted due to some of the required " +
+                    "cache groups stopped: " + retain));
+            }
+        }
+    }
+
+    /**
+     * @param snpName Unique snapshot name.
+     * @param srcNodeId Node id which cause snapshot operation.
+     * @param parts Collection of pairs group and appropriate cache partition to be snapshot.
+     * @param snpSndr Factory which produces snapshot receiver instance.
+     * @return Snapshot operation task which should be registered on checkpoint to run.
+     */
+    SnapshotFutureTask registerSnapshotTask(
+        String snpName,
+        UUID srcNodeId,
+        Map<Integer, Set<Integer>> parts,
+        SnapshotSender snpSndr
+    ) {
+        if (!busyLock.enterBusy())
+            return new SnapshotFutureTask(new IgniteCheckedException("Snapshot manager is stopping [locNodeId=" + cctx.localNodeId() + ']'));
+
+        try {
+            if (locSnpTasks.containsKey(snpName))
+                return new SnapshotFutureTask(new IgniteCheckedException("Snapshot with requested name is already scheduled: " + snpName));
+
+            SnapshotFutureTask snpFutTask;
+
+            SnapshotFutureTask prev = locSnpTasks.putIfAbsent(snpName,
+                snpFutTask = new SnapshotFutureTask(cctx,
+                    srcNodeId,
+                    snpName,
+                    tmpWorkDir,
+                    ioFactory,
+                    snpSndr,
+                    parts,
+                    locBuff));
+
+            if (prev != null)
+                return new SnapshotFutureTask(new IgniteCheckedException("Snapshot with requested name is already scheduled: " + snpName));
+
+            if (log.isInfoEnabled()) {
+                log.info("Snapshot task has been registered on local node [sctx=" + this +
+                    ", topVer=" + cctx.discovery().topologyVersionEx() + ']');
+            }
+
+            snpFutTask.listen(f -> locSnpTasks.remove(snpName));
+
+            return snpFutTask;
+        }
+        finally {
+            busyLock.leaveBusy();
+        }
+    }
+
+    /**
+     * @param factory Factory which produces {@link LocalSnapshotSender} implementation.
+     */
+    void setLocalSnapshotSenderFactory(Function<String, SnapshotSender> factory) {
+        locSndrFactory = factory;
+    }
+
+    /**
+     * @return Factory which produces {@link LocalSnapshotSender} implementation.
+     */
+    Function<String, SnapshotSender> localSnapshotSenderFactory() {
+        return LocalSnapshotSender::new;
+    }
+
+    /**
+     * @param snpName Snapshot name.
+     * @param rmtNodeId Remote node id to send snapshot to.
+     * @return Snapshot sender instance.
+     */
+    SnapshotSender remoteSnapshotSender(String snpName, UUID rmtNodeId) {
+        // Remote snapshots can be send only by single threaded executor since only one transmissionSender created.
+        return new RemoteSnapshotSender(log,
+            new SequentialExecutorWrapper(log, snpRunner),
+            () -> databaseRelativePath(pdsSettings.folderName()),
+            cctx.gridIO().openTransmissionSender(rmtNodeId, DFLT_INITIAL_SNAPSHOT_TOPIC),
+            snpName);
+    }
+
+    /** Snapshot finished successfully or already restored. Key can be removed. */
+    private void removeLastMetaStorageKey() throws IgniteCheckedException {
+        cctx.database().checkpointReadLock();
+
+        try {
+            metaStorage.remove(SNP_RUNNING_KEY);
+        }
+        finally {
+            cctx.database().checkpointReadUnlock();
+        }
+    }
+
+    /**
+     * @return The executor service used to run snapshot tasks.
+     */
+    ExecutorService snapshotExecutorService() {
+        assert snpRunner != null;
+
+        return snpRunner;
+    }
+
+    /**
+     * @param ioFactory Factory to create IO interface over a page stores.
+     */
+    void ioFactory(FileIOFactory ioFactory) {
+        this.ioFactory = ioFactory;
+    }
+
+    /**
+     * @param nodeId Remote node id on which requests has been registered.
+     * @return Snapshot future related to given node id.
+     */
+    SnapshotFutureTask lastScheduledRemoteSnapshotTask(UUID nodeId) {
+        return locSnpTasks.values().stream()
+            .filter(t -> t.type() == RemoteSnapshotSender.class && t.sourceNodeId().equals(nodeId))
+            .findFirst()
+            .orElse(null);
+    }
+
+    /**
+     * @return Relative configured path of persistence data storage directory for the local node.
+     * Example: {@code snapshotWorkDir/db/IgniteNodeName0}
+     */
+    static String databaseRelativePath(String folderName) {
+        return Paths.get(DB_DEFAULT_FOLDER, folderName).toString();
+    }
+
+    /**
+     * @param cfg Ignite configuration.
+     * @return Snapshot work path.
+     */
+    static File resolveSnapshotWorkDirectory(IgniteConfiguration cfg) {
+        try {
+            return cfg.getSnapshotPath() == null ?
+                U.resolveWorkDirectory(cfg.getWorkDirectory(), DFLT_SNAPSHOT_DIRECTORY, false) :
+                U.resolveWorkDirectory(cfg.getWorkDirectory(), cfg.getSnapshotPath(), false);
+        }
+        catch (IgniteCheckedException e) {
+            throw new IgniteException(e);
+        }
+    }
+
+    /** Remote snapshot future which tracks remote snapshot transmission result. */
+    private class RemoteSnapshotFuture extends GridFutureAdapter<Void> {
+        /** Snapshot name to create. */
+        private final String snpName;
+
+        /** Remote node id to request snapshot from. */
+        private final UUID rmtNodeId;
+
+        /** Collection of partition to be received. */
+        private final Map<GroupPartitionId, FilePageStore> stores = new ConcurrentHashMap<>();
+
+        /** Partition handler given by request initiator. */
+        private final BiConsumer<File, GroupPartitionId> partConsumer;
+
+        /** Counter which show how many partitions left to be received. */
+        private int partsLeft = -1;
+
+        /**
+         * @param partConsumer Received partition handler.
+         */
+        public RemoteSnapshotFuture(UUID rmtNodeId, String snpName, BiConsumer<File, GroupPartitionId> partConsumer) {
+            this.snpName = snpName;
+            this.rmtNodeId = rmtNodeId;
+            this.partConsumer = partConsumer;
+        }
+
+        /** {@inheritDoc} */
+        @Override public boolean cancel() {
+            return onCancelled();
+        }
+
+        /** {@inheritDoc} */
+        @Override protected boolean onDone(@Nullable Void res, @Nullable Throwable err, boolean cancel) {
+            assert err != null || cancel || stores.isEmpty() : "Not all file storage processed: " + stores;
+
+            rmtSnpReq.compareAndSet(this, null);
+
+            if (err != null || cancel) {
+                // Close non finished file storage.
+                for (Map.Entry<GroupPartitionId, FilePageStore> entry : stores.entrySet()) {
+                    FilePageStore store = entry.getValue();
+
+                    try {
+                        store.stop(true);
+                    }
+                    catch (StorageException e) {
+                        log.warning("Error stopping received file page store", e);
+                    }
+                }
+            }
+
+            U.delete(Paths.get(tmpWorkDir.getAbsolutePath(), snpName));
+
+            return super.onDone(res, err, cancel);
+        }
+
+        /** {@inheritDoc} */
+        @Override public boolean equals(Object o) {
+            if (this == o)
+                return true;
+
+            if (o == null || getClass() != o.getClass())
+                return false;
+
+            RemoteSnapshotFuture fut = (RemoteSnapshotFuture)o;
+
+            return rmtNodeId.equals(fut.rmtNodeId) &&
+                snpName.equals(fut.snpName);
+        }
+
+        /** {@inheritDoc} */
+        @Override public int hashCode() {
+            return Objects.hash(rmtNodeId, snpName);
+        }
+
+        /** {@inheritDoc} */
+        @Override public String toString() {
+            return S.toString(RemoteSnapshotFuture.class, this);
+        }
+    }
+
+    /**
+     * Such an executor can executes tasks not in a single thread, but executes them
+     * on different threads sequentially. It's important for some {@link SnapshotSender}'s
+     * to process sub-task sequentially due to all these sub-tasks may share a single socket
+     * channel to send data to.
+     */
+    private static class SequentialExecutorWrapper implements Executor {
+        /** Ignite logger. */
+        private final IgniteLogger log;
+
+        /** Queue of task to execute. */
+        private final Queue<Runnable> tasks = new ArrayDeque<>();
+
+        /** Delegate executor. */
+        private final Executor executor;
+
+        /** Currently running task. */
+        private volatile Runnable active;
+
+        /** If wrapped executor is shutting down. */
+        private volatile boolean stopping;
+
+        /**
+         * @param executor Executor to run tasks on.
+         */
+        public SequentialExecutorWrapper(IgniteLogger log, Executor executor) {
+            this.log = log.getLogger(SequentialExecutorWrapper.class);
+            this.executor = executor;
+        }
+
+        /** {@inheritDoc} */
+        @Override public synchronized void execute(final Runnable r) {
+            assert !stopping : "Task must be cancelled prior to the wrapped executor is shutting down.";
+
+            tasks.offer(() -> {
+                try {
+                    r.run();
+                }
+                finally {
+                    scheduleNext();
+                }
+            });
+
+            if (active == null)
+                scheduleNext();
+        }
+
+        /** */
+        protected synchronized void scheduleNext() {
+            if ((active = tasks.poll()) != null) {
+                try {
+                    executor.execute(active);
+                }
+                catch (RejectedExecutionException e) {
+                    tasks.clear();
+
+                    stopping = true;
+
+                    log.warning("Task is outdated. Wrapped executor is shutting down.", e);
+                }
+            }
+        }
+    }
+
+    /**
+     *
+     */
+    private static class RemoteSnapshotSender extends SnapshotSender {
+        /** The sender which sends files to remote node. */
+        private final GridIoManager.TransmissionSender sndr;
+
+        /** Relative node path initializer. */
+        private final Supplier<String> initPath;
+
+        /** Snapshot name */
+        private final String snpName;
+
+        /** Local node persistent directory with consistent id. */
+        private String relativeNodePath;
+
+        /** The number of cache partition files expected to be processed. */
+        private int partsCnt;
+
+        /**
+         * @param log Ignite logger.
+         * @param sndr File sender instance.
+         * @param snpName Snapshot name.
+         */
+        public RemoteSnapshotSender(
+            IgniteLogger log,
+            Executor exec,
+            Supplier<String> initPath,
+            GridIoManager.TransmissionSender sndr,
+            String snpName
+        ) {
+            super(log, exec);
+
+            this.sndr = sndr;
+            this.snpName = snpName;
+            this.initPath = initPath;
+        }
+
+        /** {@inheritDoc} */
+        @Override protected void init(int partsCnt) {
+            this.partsCnt = partsCnt;
+
+            relativeNodePath = initPath.get();
+
+            if (relativeNodePath == null)
+                throw new IgniteException("Relative node path cannot be empty.");
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendPart0(File part, String cacheDirName, GroupPartitionId pair, Long len) {
+            try {
+                assert part.exists();
+                assert len > 0 : "Requested partitions has incorrect file length " +
+                    "[pair=" + pair + ", cacheDirName=" + cacheDirName + ']';
+
+                sndr.send(part, 0, len, transmissionParams(snpName, cacheDirName, pair), TransmissionPolicy.FILE);
+
+                if (log.isInfoEnabled()) {
+                    log.info("Partition file has been send [part=" + part.getName() + ", pair=" + pair +
+                        ", length=" + len + ']');
+                }
+            }
+            catch (TransmissionCancelledException e) {
+                if (log.isInfoEnabled()) {
+                    log.info("Transmission partition file has been interrupted [part=" + part.getName() +
+                        ", pair=" + pair + ']');
+                }
+            }
+            catch (IgniteCheckedException | InterruptedException | IOException e) {
+                U.error(log, "Error sending partition file [part=" + part.getName() + ", pair=" + pair +
+                    ", length=" + len + ']', e);
+
+                throw new IgniteException(e);
+            }
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendDelta0(File delta, String cacheDirName, GroupPartitionId pair) {
+            try {
+                sndr.send(delta, transmissionParams(snpName, cacheDirName, pair), TransmissionPolicy.CHUNK);
+
+                if (log.isInfoEnabled())
+                    log.info("Delta pages storage has been send [part=" + delta.getName() + ", pair=" + pair + ']');
+            }
+            catch (TransmissionCancelledException e) {
+                if (log.isInfoEnabled()) {
+                    log.info("Transmission delta pages has been interrupted [part=" + delta.getName() +
+                        ", pair=" + pair + ']');
+                }
+            }
+            catch (IgniteCheckedException | InterruptedException | IOException e) {
+                U.error(log, "Error sending delta file  [part=" + delta.getName() + ", pair=" + pair + ']', e);
+
+                throw new IgniteException(e);
+            }
+        }
+
+        /**
+         * @param cacheDirName Cache directory name.
+         * @param pair Cache group id with corresponding partition id.
+         * @return Map of params.
+         */
+        private Map<String, Serializable> transmissionParams(String snpName, String cacheDirName,
+            GroupPartitionId pair) {
+            Map<String, Serializable> params = new HashMap<>();
+
+            params.put(SNP_GRP_ID_PARAM, pair.getGroupId());
+            params.put(SNP_PART_ID_PARAM, pair.getPartitionId());
+            params.put(SNP_DB_NODE_PATH_PARAM, relativeNodePath);
+            params.put(SNP_CACHE_DIR_NAME_PARAM, cacheDirName);
+            params.put(SNP_NAME_PARAM, snpName);
+            params.put(SNP_PARTITIONS_CNT, partsCnt);
+
+            return params;
+        }
+
+        /** {@inheritDoc} */
+        @Override public void close0(@Nullable Throwable th) {
+            U.closeQuiet(sndr);
+
+            if (th == null) {
+                if (log.isInfoEnabled())
+                    log.info("The remote snapshot sender closed normally [snpName=" + snpName + ']');
+            }
+            else {
+                U.warn(log, "The remote snapshot sender closed due to an error occurred while processing " +
+                    "snapshot operation [snpName=" + snpName + ']', th);
+            }
+        }
+    }
+
+    /**
+     * Snapshot sender which writes all data to local directory.
+     */
+    private class LocalSnapshotSender extends SnapshotSender {
+        /** Snapshot name. */
+        private final String snpName;
+
+        /** Local snapshot directory. */
+        private final File snpLocDir;
+
+        /** Local node snapshot directory calculated on snapshot directory. */
+        private File dbDir;
+
+        /** Size of page. */
+        private final int pageSize;
+
+        /**
+         * @param snpName Snapshot name.
+         */
+        public LocalSnapshotSender(String snpName) {
+            super(IgniteSnapshotManager.this.log, snpRunner);
+
+            this.snpName = snpName;
+            snpLocDir = snapshotLocalDir(snpName);
+            pageSize = cctx.kernalContext().config().getDataStorageConfiguration().getPageSize();
+        }
+
+        /** {@inheritDoc} */
+        @Override protected void init(int partsCnt) {
+            dbDir = new File (snpLocDir, databaseRelativePath(pdsSettings.folderName()));
+
+            if (dbDir.exists()) {
+                throw new IgniteException("Snapshot with given name already exists " +
+                    "[snpName=" + snpName + ", absPath=" + dbDir.getAbsolutePath() + ']');
+            }
+
+            cctx.database().checkpointReadLock();
+
+            try {
+                assert metaStorage != null && metaStorage.read(SNP_RUNNING_KEY) == null :
+                    "The previous snapshot hasn't been completed correctly";
+
+                metaStorage.write(SNP_RUNNING_KEY, snpName);
+
+                U.ensureDirectory(dbDir, "snapshot work directory", log);
+            }
+            catch (IgniteCheckedException e) {
+                throw new IgniteException(e);
+            }
+            finally {
+                cctx.database().checkpointReadUnlock();
+            }
+
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendCacheConfig0(File ccfg, String cacheDirName) {
+            assert dbDir != null;
+
+            try {
+                File cacheDir = U.resolveWorkDirectory(dbDir.getAbsolutePath(), cacheDirName, false);
+
+                copy(ccfg, new File(cacheDir, ccfg.getName()), ccfg.length());
+            }
+            catch (IgniteCheckedException | IOException e) {
+                throw new IgniteException(e);
+            }
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendMarshallerMeta0(List<Map<Integer, MappedName>> mappings) {
+            if (mappings == null)
+                return;
+
+            saveMappings(cctx.kernalContext(), mappings, snpLocDir);
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendBinaryMeta0(Collection<BinaryType> types) {
+            if (types == null)
+                return;
+
+            cctx.kernalContext().cacheObjects().saveMetadata(types, snpLocDir);
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendPart0(File part, String cacheDirName, GroupPartitionId pair, Long len) {
+            try {
+                if (len == 0)
+                    return;
+
+                File cacheDir = U.resolveWorkDirectory(dbDir.getAbsolutePath(), cacheDirName, false);
+
+                File snpPart = new File(cacheDir, part.getName());
+
+                if (!snpPart.exists() || snpPart.delete())
+                    snpPart.createNewFile();
+
+                copy(part, snpPart, len);
+
+                if (log.isInfoEnabled()) {
+                    log.info("Partition has been snapshot [snapshotDir=" + dbDir.getAbsolutePath() +
+                        ", cacheDirName=" + cacheDirName + ", part=" + part.getName() +
+                        ", length=" + part.length() + ", snapshot=" + snpPart.getName() + ']');
+                }
+            }
+            catch (IOException | IgniteCheckedException ex) {
+                throw new IgniteException(ex);
+            }
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendDelta0(File delta, String cacheDirName, GroupPartitionId pair) {
+            File snpPart = getPartitionFile(dbDir, cacheDirName, pair.getPartitionId());
+
+            if (log.isInfoEnabled()) {
+                log.info("Start partition snapshot recovery with the given delta page file [part=" + snpPart +
+                    ", delta=" + delta + ']');
+            }
+
+            try (FileIO fileIo = ioFactory.create(delta, READ);
+                 FilePageStore pageStore = (FilePageStore)storeFactory
+                     .apply(pair.getGroupId(), false)
+                     .createPageStore(getFlagByPartId(pair.getPartitionId()),
+                         snpPart::toPath,
+                         new LongAdderMetric("NO_OP", null))
+            ) {
+                ByteBuffer pageBuf = ByteBuffer.allocate(pageSize)
+                    .order(ByteOrder.nativeOrder());
+
+                long totalBytes = fileIo.size();
+
+                assert totalBytes % pageSize == 0 : "Given file with delta pages has incorrect size: " + fileIo.size();
+
+                pageStore.beginRecover();
+
+                for (long pos = 0; pos < totalBytes; pos += pageSize) {
+                    long read = fileIo.readFully(pageBuf, pos);
+
+                    assert read == pageBuf.capacity();
+
+                    pageBuf.flip();
+
+                    if (log.isDebugEnabled()) {
+                        log.debug("Read page given delta file [path=" + delta.getName() +
+                            ", pageId=" + PageIO.getPageId(pageBuf) + ", pos=" + pos + ", pages=" + (totalBytes / pageSize) +
+                            ", crcBuff=" + FastCrc.calcCrc(pageBuf, pageBuf.limit()) + ", crcPage=" + PageIO.getCrc(pageBuf) + ']');
+
+                        pageBuf.rewind();
+                    }
+
+                    pageStore.write(PageIO.getPageId(pageBuf), pageBuf, 0, false);
+
+                    pageBuf.flip();
+                }
+
+                pageStore.finishRecover();
+            }
+            catch (IOException | IgniteCheckedException e) {
+                throw new IgniteException(e);
+            }
+        }
+
+        /** {@inheritDoc} */
+        @Override protected void close0(@Nullable Throwable th) {
+            if (th == null) {
+                if (log.isInfoEnabled())
+                    log.info("Local snapshot sender closed, resources released [dbNodeSnpDir=" + dbDir + ']');
+            }
+            else {
+                deleteSnapshot(snpLocDir, pdsSettings.folderName());
+
+                U.warn(log, "Local snapshot sender closed due to an error occurred", th);
+            }
+        }
+
+        /**
+         * @param from Copy from file.
+         * @param to Copy data to file.
+         * @param length Number of bytes to copy from beginning.
+         * @throws IOException If fails.
+         */
+        private void copy(File from, File to, long length) throws IOException {
+            try (FileIO src = ioFactory.create(from, READ);
+                 FileChannel dest = new FileOutputStream(to).getChannel()) {
+                if (src.size() < length) {
+                    throw new IgniteException("The source file to copy has to enough length " +
+                        "[expected=" + length + ", actual=" + src.size() + ']');
+                }
+
+                src.position(0);
+
+                long written = 0;
+
+                while (written < length)
+                    written += src.transferTo(written, length - written, dest);
+            }
+        }
+    }
+
+    /** Snapshot start request for {@link DistributedProcess} initiate message. */
+    private static class SnapshotOperationRequest implements Serializable {
+        /** Serial version uid. */
+        private static final long serialVersionUID = 0L;
+
+        /** Unique snapshot request id. */
+        private final UUID rqId;
+
+        /** Source node id which trigger request. */
+        private final UUID srcNodeId;
+
+        /** Snapshot name. */
+        private final String snpName;
+
+        @GridToStringInclude
+        /** The list of cache groups to include into snapshot. */
 
 Review comment:
   Please reorder comment and annotation.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r409418709
 
 

 ##########
 File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotSelfTest.java
 ##########
 @@ -0,0 +1,734 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.file.OpenOption;
+import java.util.Collections;
+import java.util.List;
+import java.util.Random;
+import java.util.concurrent.Callable;
+import java.util.concurrent.CountDownLatch;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.function.Function;
+import java.util.function.Predicate;
+import org.apache.ignite.Ignite;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.Ignition;
+import org.apache.ignite.cache.CacheAtomicityMode;
+import org.apache.ignite.cache.affinity.rendezvous.RendezvousAffinityFunction;
+import org.apache.ignite.cache.query.ScanQuery;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.IgniteInterruptedCheckedException;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.TestRecordingCommunicationSpi;
+import org.apache.ignite.internal.events.DiscoveryCustomEvent;
+import org.apache.ignite.internal.managers.discovery.DiscoveryCustomMessage;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionDemandMessage;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionSupplyMessage;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.PartitionsExchangeAware;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.metric.MetricRegistry;
+import org.apache.ignite.internal.processors.metric.impl.ObjectGauge;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.distributed.FullMessage;
+import org.apache.ignite.internal.util.distributed.SingleNodeMessage;
+import org.apache.ignite.internal.util.typedef.G;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.spi.metric.LongMetric;
+import org.apache.ignite.testframework.GridTestUtils;
+import org.apache.ignite.transactions.Transaction;
+import org.junit.Before;
+import org.junit.Test;
+
+import static org.apache.ignite.cluster.ClusterState.ACTIVE;
+import static org.apache.ignite.internal.events.DiscoveryCustomEvent.EVT_DISCOVERY_CUSTOM_EVT;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.SNAPSHOT_METRICS;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.SNP_IN_PROGRESS_ERR_MSG;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.SNP_NODE_STOPPING_ERR_MSG;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.isSnapshotOperation;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.resolveSnapshotWorkDirectory;
+import static org.apache.ignite.testframework.GridTestUtils.assertThrowsAnyCause;
+import static org.apache.ignite.testframework.GridTestUtils.assertThrowsWithCause;
+
+/**
+ * Cluster-wide snapshot test.
+ */
+public class IgniteClusterSnapshotSelfTest extends AbstractSnapshotSelfTest {
 
 Review comment:
   There should be test added for transactional consistency, i.e.:
   1. Start the concurrent transactional activity with some checkable invariant (for example set random balance for accounts on init, concurrently transfer random sums from one account two another, invariant: total sum for all accounts at any time should be the same)
   2. Take snapshot.
   3. Stop transactional activity.
   4. Restore from the snapshot and check invariant.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r409043797
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManager.java
 ##########
 @@ -0,0 +1,1986 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.channels.FileChannel;
+import java.nio.file.FileVisitResult;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.SimpleFileVisitor;
+import java.nio.file.attribute.BasicFileAttributes;
+import java.util.ArrayDeque;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Queue;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.Executor;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.concurrent.atomic.LongAdder;
+import java.util.function.BiConsumer;
+import java.util.function.BiFunction;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.events.DiscoveryEvent;
+import org.apache.ignite.failure.FailureContext;
+import org.apache.ignite.failure.FailureType;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.GridTopic;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.events.DiscoveryCustomEvent;
+import org.apache.ignite.internal.managers.communication.GridIoManager;
+import org.apache.ignite.internal.managers.communication.GridMessageListener;
+import org.apache.ignite.internal.managers.communication.TransmissionCancelledException;
+import org.apache.ignite.internal.managers.communication.TransmissionHandler;
+import org.apache.ignite.internal.managers.communication.TransmissionMeta;
+import org.apache.ignite.internal.managers.communication.TransmissionPolicy;
+import org.apache.ignite.internal.managers.discovery.DiscoCache;
+import org.apache.ignite.internal.managers.discovery.DiscoveryCustomMessage;
+import org.apache.ignite.internal.managers.discovery.GridDiscoveryManager;
+import org.apache.ignite.internal.managers.eventstorage.DiscoveryEventListener;
+import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.CacheType;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.PartitionsExchangeAware;
+import org.apache.ignite.internal.processors.cache.persistence.StorageException;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.filename.PdsFolderSettings;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.MetastorageLifecycleListener;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadOnlyMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadWriteMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.tree.io.PageIO;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.processors.marshaller.MappedName;
+import org.apache.ignite.internal.processors.metric.MetricRegistry;
+import org.apache.ignite.internal.processors.metric.impl.LongAdderMetric;
+import org.apache.ignite.internal.util.GridBusyLock;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.distributed.InitMessage;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.tostring.GridToStringInclude;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.A;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.S;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.lang.IgniteUuid;
+import org.apache.ignite.thread.IgniteThreadPoolExecutor;
+import org.apache.ignite.thread.OomExceptionHandler;
+import org.jetbrains.annotations.Nullable;
+
+import static java.nio.file.StandardOpenOption.READ;
+import static org.apache.ignite.cluster.ClusterState.active;
+import static org.apache.ignite.configuration.IgniteConfiguration.DFLT_SNAPSHOT_DIRECTORY;
+import static org.apache.ignite.events.EventType.EVT_NODE_FAILED;
+import static org.apache.ignite.events.EventType.EVT_NODE_LEFT;
+import static org.apache.ignite.internal.IgniteFeatures.PERSISTENCE_CACHE_SNAPSHOT;
+import static org.apache.ignite.internal.IgniteFeatures.nodeSupports;
+import static org.apache.ignite.internal.MarshallerContextImpl.mappingFileStoreWorkDir;
+import static org.apache.ignite.internal.MarshallerContextImpl.saveMappings;
+import static org.apache.ignite.internal.events.DiscoveryCustomEvent.EVT_DISCOVERY_CUSTOM_EVT;
+import static org.apache.ignite.internal.managers.communication.GridIoPolicy.SYSTEM_POOL;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.INDEX_PARTITION;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.MAX_PARTITION_ID;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.resolveBinaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.INDEX_FILE_NAME;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.PART_FILE_TEMPLATE;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFile;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFileName;
+import static org.apache.ignite.internal.processors.cache.persistence.filename.PdsConsistentIdProcessor.DB_DEFAULT_FOLDER;
+import static org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId.getFlagByPartId;
+import static org.apache.ignite.internal.util.IgniteUtils.isLocalNodeCoordinator;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.END_SNAPSHOT;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.START_SNAPSHOT;
+
+/**
+ * Internal implementation of snapshot operations over persistence caches.
+ * <p>
+ * There are two major actions available:
+ * <ul>
+ *     <li>Create snapshot of the whole cluster cache groups by triggering PME to achieve consistency.</li>
+ *     <li>Create local snapshot of requested cache groups and send it to the node which request this operation.
+ *     Cache groups will be transmitted using internal API for transferring files. See {@link TransmissionHandler}.</li>
+ * </ul>
+ */
+public class IgniteSnapshotManager extends GridCacheSharedManagerAdapter
+    implements IgniteSnapshot, PartitionsExchangeAware, MetastorageLifecycleListener {
+    /** File with delta pages suffix. */
+    public static final String DELTA_SUFFIX = ".delta";
+
+    /** File name template consists of delta pages. */
+    public static final String PART_DELTA_TEMPLATE = PART_FILE_TEMPLATE + DELTA_SUFFIX;
+
+    /** File name template for index delta pages. */
+    public static final String INDEX_DELTA_NAME = INDEX_FILE_NAME + DELTA_SUFFIX;
+
+    /** Text Reason for checkpoint to start snapshot operation. */
+    public static final String CP_SNAPSHOT_REASON = "Checkpoint started to enforce snapshot operation: %s";
+
+    /** Name prefix for each remote snapshot operation. */
+    public static final String RMT_SNAPSHOT_PREFIX = "snapshot_";
+
+    /** Default snapshot directory for loading remote snapshots. */
+    public static final String DFLT_SNAPSHOT_TMP_DIR = "snp";
+
+    /** Timeout in millisecond for snapshot operations. */
+    public static final long DFLT_SNAPSHOT_TIMEOUT = 15_000L;
+
+    /** Snapshot in progress error message. */
+    public static final String SNP_IN_PROGRESS_ERR_MSG = "Operation rejected due to the snapshot operation in progress.";
+
+    /** Error message to finalize snapshot tasks. */
+    public static final String SNP_NODE_STOPPING_ERR_MSG = "Snapshot has been cancelled due to the local node " +
+        "is stopping";
+
+    /** Metastorage key to save currently running snapshot. */
+    public static final String SNP_RUNNING_KEY = "snapshot-running";
+
+    /** Snapshot metrics prefix. */
+    public static final String SNAPSHOT_METRICS = "snapshot";
+
+    /** Prefix for snapshot threads. */
+    private static final String SNAPSHOT_RUNNER_THREAD_PREFIX = "snapshot-runner";
+
+    /** Total number of thread to perform local snapshot. */
+    private static final int SNAPSHOT_THREAD_POOL_SIZE = 4;
+
+    /** Default snapshot topic to receive snapshots from remote node. */
+    private static final Object DFLT_INITIAL_SNAPSHOT_TOPIC = GridTopic.TOPIC_SNAPSHOT.topic("rmt_snp");
+
+    /** File transmission parameter of cache group id. */
+    private static final String SNP_GRP_ID_PARAM = "grpId";
+
+    /** File transmission parameter of cache partition id. */
+    private static final String SNP_PART_ID_PARAM = "partId";
+
+    /** File transmission parameter of node-sender directory path with its consistentId (e.g. db/IgniteNode0). */
+    private static final String SNP_DB_NODE_PATH_PARAM = "dbNodePath";
+
+    /** File transmission parameter of a cache directory with is currently sends its partitions. */
+    private static final String SNP_CACHE_DIR_NAME_PARAM = "cacheDirName";
+
+    /** Snapshot parameter name for a file transmission. */
+    private static final String SNP_NAME_PARAM = "snpName";
+
+    /** Total snapshot files count which receiver should expect to receive. */
+    private static final String SNP_PARTITIONS_CNT = "partsCnt";
+
+    /**
+     * Local buffer to perform copy-on-write operations with pages for {@code SnapshotFutureTask.PageStoreSerialWriter}s.
+     * It is important to have only only buffer per thread (instead of creating each buffer per
+     * each {@code SnapshotFutureTask.PageStoreSerialWriter}) this is redundant and can lead to OOM errors. Direct buffer
+     * deallocate only when ByteBuffer is garbage collected, but it can get out of off-heap memory before it.
+     */
+    private final ThreadLocal<ByteBuffer> locBuff;
+
+    /** Map of registered cache snapshot processes and their corresponding contexts. */
+    private final ConcurrentMap<String, SnapshotFutureTask> locSnpTasks = new ConcurrentHashMap<>();
+
+    /** Lock to protect the resources is used. */
+    private final GridBusyLock busyLock = new GridBusyLock();
+
+    /** Requested snapshot from remote node. */
+    private final AtomicReference<RemoteSnapshotFuture> rmtSnpReq = new AtomicReference<>();
+
+    /** Mutex used to order cluster snapshot operation progress. */
+    private final Object snpOpMux = new Object();
+
+    /** Take snapshot operation procedure. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> startSnpProc;
+
+    /** Check previously performed snapshot operation and delete uncompleted files if need. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> endSnpProc;
+
+    /** Resolved persistent data storage settings. */
+    private volatile PdsFolderSettings pdsSettings;
+
+    /** Fully initialized metastorage. */
+    private volatile ReadWriteMetastorage metaStorage;
+
+    /** Local snapshot sender factory. */
+    private Function<String, SnapshotSender> locSndrFactory = localSnapshotSenderFactory();
+
+    /** Main snapshot directory to save created snapshots. */
+    private volatile File locSnpDir;
+
+    /**
+     * Working directory for loaded snapshots from the remote nodes and storing
+     * temporary partition delta-files of locally started snapshot process.
+     */
+    private File tmpWorkDir;
+
+    /** Factory to working with delta as file storage. */
+    private volatile FileIOFactory ioFactory = new RandomAccessFileIOFactory();
+
+    /** Factory to create page store for restore. */
+    private volatile BiFunction<Integer, Boolean, FilePageStoreFactory> storeFactory;
+
+    /** Snapshot thread pool to perform local partition snapshots. */
+    private ExecutorService snpRunner;
+
+    /** System discovery message listener. */
+    private DiscoveryEventListener discoLsnr;
+
+    /** Cluster snapshot operation requested by user. */
+    private ClusterSnapshotFuture clusterSnpFut;
+
+    /** Current snapshot operation on local node. */
+    private volatile SnapshotOperationRequest clusterSnpRq;
+
+    /** {@code true} if recovery process occurred for snapshot. */
+    private volatile boolean recovered;
+
+    /** Last seen cluster snapshot operation. */
+    private volatile ClusterSnapshotFuture lastSeenSnpFut = new ClusterSnapshotFuture();
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public IgniteSnapshotManager(GridKernalContext ctx) {
+        locBuff = ThreadLocal.withInitial(() ->
+            ByteBuffer.allocateDirect(ctx.config().getDataStorageConfiguration().getPageSize())
+                .order(ByteOrder.nativeOrder()));
+
+        startSnpProc = new DistributedProcess<>(ctx, START_SNAPSHOT, this::initLocalSnapshotStartStage,
+            this::processLocalSnapshotStartStageResult);
+
+        endSnpProc = new DistributedProcess<>(ctx, END_SNAPSHOT, this::initLocalSnapshotEndStage,
+            this::processLocalSnapshotEndStageResult);
+    }
+
+    /**
+     * @param snapshotCacheDir Snapshot directory to store files.
+     * @param partId Cache partition identifier.
+     * @return A file representation.
+     */
+    public static File partDeltaFile(File snapshotCacheDir, int partId) {
+        return new File(snapshotCacheDir, partDeltaFileName(partId));
+    }
+
+    /**
+     * @param partId Partition id.
+     * @return File name of delta partition pages.
+     */
+    public static String partDeltaFileName(int partId) {
+        assert partId <= MAX_PARTITION_ID || partId == INDEX_PARTITION;
+
+        return partId == INDEX_PARTITION ? INDEX_DELTA_NAME : String.format(PART_DELTA_TEMPLATE, partId);
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void start0() throws IgniteCheckedException {
+        super.start0();
+
+        GridKernalContext ctx = cctx.kernalContext();
+
+        if (ctx.clientNode())
+            return;
+
+        if (!CU.isPersistenceEnabled(ctx.config()))
+            return;
+
+        snpRunner = new IgniteThreadPoolExecutor(SNAPSHOT_RUNNER_THREAD_PREFIX,
+            cctx.igniteInstanceName(),
+            SNAPSHOT_THREAD_POOL_SIZE,
+            SNAPSHOT_THREAD_POOL_SIZE,
+            IgniteConfiguration.DFLT_THREAD_KEEP_ALIVE_TIME,
+            new LinkedBlockingQueue<>(),
+            SYSTEM_POOL,
+            new OomExceptionHandler(ctx));
+
+        assert cctx.pageStore() instanceof FilePageStoreManager;
+
+        FilePageStoreManager storeMgr = (FilePageStoreManager)cctx.pageStore();
+
+        pdsSettings = cctx.kernalContext().pdsFolderResolver().resolveFolders();
+
+        locSnpDir = resolveSnapshotWorkDirectory(ctx.config());
+        tmpWorkDir = Paths.get(storeMgr.workDir().getAbsolutePath(), DFLT_SNAPSHOT_TMP_DIR).toFile();
+
+        U.ensureDirectory(locSnpDir, "snapshot work directory", log);
+        U.ensureDirectory(tmpWorkDir, "temp directory for snapshot creation", log);
+
+        MetricRegistry mreg = cctx.kernalContext().metric().registry(SNAPSHOT_METRICS);
+
+        mreg.register("LastSnapshotStartTime", () -> lastSeenSnpFut.startTime,
+            "The system time approximated by 10 ms when the last cluster snapshot operation has been started.");
+        mreg.register("LastSnapshotEndTime", () -> lastSeenSnpFut.endTime,
+            "The system time approximated by 10 ms when the last cluster snapshot operation has been finished.");
+        mreg.register("LastSnapshotName", () -> lastSeenSnpFut.name, String.class,
+            "The name of last started cluster snapshot operation.");
+        mreg.register("LastSnapshotErrorMessage",
+            () -> lastSeenSnpFut.error() == null ? null : lastSeenSnpFut.error().getMessage(),
+            String.class,
+            "The error message of last started cluster snapshot operation which fail. This value will be 'null' " +
+                "if last snapshot operation completed successfully.");
+        mreg.register("localSnapshotList", this::getSnapshots, List.class,
+            "The list of all known snapshots currently saved on the local node with respect to " +
+                "configured via IgniteConfiguration a snapshot path.");
+
+        storeFactory = storeMgr::getPageStoreFactory;
+
+        cctx.exchange().registerExchangeAwareComponent(this);
+        ctx.internalSubscriptionProcessor().registerMetastorageListener(this);
+
+        // Receive remote snapshots requests.
+        cctx.gridIO().addMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC, new GridMessageListener() {
+            @Override public void onMessage(UUID nodeId, Object msg, byte plc) {
+                if (!busyLock.enterBusy())
+                    return;
+
+                try {
+                    if (msg instanceof SnapshotRequestMessage) {
+                        SnapshotRequestMessage reqMsg0 = (SnapshotRequestMessage)msg;
+                        String snpName = reqMsg0.snapshotName();
+
+                        synchronized (this) {
+                            SnapshotFutureTask task = lastScheduledRemoteSnapshotTask(nodeId);
+
+                            if (task != null) {
+                                // Task will also be removed from local map due to the listener on future done.
+                                task.cancel();
+
+                                log.info("Snapshot request has been cancelled due to another request received " +
+                                    "[prevSnpResp=" + task + ", msg0=" + reqMsg0 + ']');
+                            }
+                        }
+
+                        SnapshotFutureTask task = registerSnapshotTask(snpName,
+                            nodeId,
+                            reqMsg0.parts(),
+                            remoteSnapshotSender(snpName, nodeId));
+
+                        task.listen(f -> {
+                            if (f.error() == null)
+                                return;
+
+                            U.error(log, "Failed to process request of creating a snapshot " +
+                                "[from=" + nodeId + ", msg=" + reqMsg0 + ']', f.error());
+
+                            try {
+                                cctx.gridIO().sendToCustomTopic(nodeId,
+                                    DFLT_INITIAL_SNAPSHOT_TOPIC,
+                                    new SnapshotResponseMessage(reqMsg0.snapshotName(), f.error().getMessage()),
+                                    SYSTEM_POOL);
+                            }
+                            catch (IgniteCheckedException ex0) {
+                                U.error(log, "Fail to send the response message with processing snapshot request " +
+                                    "error [request=" + reqMsg0 + ", nodeId=" + nodeId + ']', ex0);
+                            }
+                        });
+
+                        task.start();
+                    }
+                    else if (msg instanceof SnapshotResponseMessage) {
+                        SnapshotResponseMessage respMsg0 = (SnapshotResponseMessage)msg;
+
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.snpName.equals(respMsg0.snapshotName())) {
+                            if (log.isInfoEnabled()) {
+                                log.info("A stale snapshot response message has been received. Will be ignored " +
+                                    "[fromNodeId=" + nodeId + ", response=" + respMsg0 + ']');
+                            }
+
+                            return;
+                        }
+
+                        if (respMsg0.errorMessage() != null) {
+                            fut0.onDone(new IgniteCheckedException("Request cancelled. The snapshot operation stopped " +
+                                "on the remote node with an error: " + respMsg0.errorMessage()));
+                        }
+                    }
+                }
+                catch (Throwable e) {
+                    U.error(log, "Processing snapshot request from remote node fails with an error", e);
+
+                    cctx.kernalContext().failure().process(new FailureContext(FailureType.CRITICAL_ERROR, e));
+                }
+                finally {
+                    busyLock.leaveBusy();
+                }
+            }
+        });
+
+        cctx.gridEvents().addDiscoveryEventListener(discoLsnr = (evt, discoCache) -> {
+            if (!busyLock.enterBusy())
+                return;
+
+            try {
+                UUID leftNodeId = evt.eventNode().id();
+
+                if (evt.type() == EVT_DISCOVERY_CUSTOM_EVT) {
+                    DiscoveryCustomEvent evt0 = (DiscoveryCustomEvent)evt;
+
+                    if (evt0.customMessage() instanceof InitMessage) {
+                        InitMessage<?> msg = (InitMessage<?>)evt0.customMessage();
+
+                        // This happens when #takeSnapshot() method already invoked and distributed process
+                        // starts its action.
+                        if (msg.type() == START_SNAPSHOT.ordinal()) {
+                            assert clusterSnpRq != null ||
+                                !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()) : evt;
+
+                            DiscoveryCustomEvent customEvt = new DiscoveryCustomEvent();
+
+                            customEvt.node(evt0.node());
+                            customEvt.eventNode(evt0.eventNode());
+                            customEvt.affinityTopologyVersion(evt0.affinityTopologyVersion());
+                            customEvt.customMessage(new SnapshotStartDiscoveryMessage(discoCache, msg.processId()));
+
+                            // Handle new event inside discovery thread, so no guarantees will be violated.
+                            cctx.exchange().onDiscoveryEvent(customEvt, discoCache);
+                        }
+                    }
+                }
+                else if (evt.type() == EVT_NODE_LEFT || evt.type() == EVT_NODE_FAILED) {
+                    SnapshotOperationRequest snpRq = clusterSnpRq;
+
+                    for (SnapshotFutureTask sctx : locSnpTasks.values()) {
+                        if (sctx.sourceNodeId().equals(leftNodeId) ||
+                            (snpRq != null &&
+                                snpRq.snpName.equals(sctx.snapshotName()) &&
+                                snpRq.bltNodes.contains(leftNodeId))) {
+                            sctx.acceptException(new ClusterTopologyCheckedException("The node which requested snapshot " +
+                                "creation has left the grid"));
+                        }
+                    }
+
+                    RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                    if (snpTrFut != null && snpTrFut.rmtNodeId.equals(leftNodeId)) {
+                        snpTrFut.onDone(new ClusterTopologyCheckedException("The node from which a snapshot has been " +
+                            "requested left the grid"));
+                    }
+                }
+            }
+            finally {
+                busyLock.leaveBusy();
+            }
+        }, EVT_NODE_LEFT, EVT_NODE_FAILED, EVT_DISCOVERY_CUSTOM_EVT);
+
+        // Remote snapshot handler.
+        cctx.kernalContext().io().addTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC, new TransmissionHandler() {
+            @Override public void onEnd(UUID nodeId) {
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                assert snpTrFut.stores.isEmpty() : snpTrFut.stores.entrySet();
+                assert snpTrFut.partsLeft == 0 : snpTrFut;
+
+                snpTrFut.onDone();
+
+                log.info("Requested snapshot from remote node has been fully received " +
+                    "[snpName=" + snpTrFut.snpName + ", snpTrans=" + snpTrFut + ']');
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onException(UUID nodeId, Throwable err) {
+                RemoteSnapshotFuture fut = rmtSnpReq.get();
+
+                if (fut == null)
+                    return;
+
+                if (fut.rmtNodeId.equals(nodeId))
+                    fut.onDone(err);
+            }
+
+            /** {@inheritDoc} */
+            @Override public String filePath(UUID nodeId, TransmissionMeta fileMeta) {
+                Integer partId = (Integer)fileMeta.params().get(SNP_PART_ID_PARAM);
+                String rmtDbNodePath = (String)fileMeta.params().get(SNP_DB_NODE_PATH_PARAM);
+                String cacheDirName = (String)fileMeta.params().get(SNP_CACHE_DIR_NAME_PARAM);
+
+                RemoteSnapshotFuture transFut = resolve(nodeId, fileMeta);
+
+                try {
+                    File cacheDir = U.resolveWorkDirectory(tmpWorkDir.getAbsolutePath(),
+                        Paths.get(transFut.snpName, rmtDbNodePath, cacheDirName).toString(),
+                        false);
+
+                    return new File(cacheDir, getPartitionFileName(partId)).getAbsolutePath();
+                }
+                catch (IgniteCheckedException e) {
+                    throw new IgniteException(e);
+                }
+            }
+
+            /**
+             * @param nodeId Remote node id.
+             * @param meta Transmission meta.
+             * @return Resolved transmission future.
+             */
+            private RemoteSnapshotFuture resolve(UUID nodeId, TransmissionMeta meta) {
+                String snpName = (String)meta.params().get(SNP_NAME_PARAM);
+                Integer partsCnt = (Integer)meta.params().get(SNP_PARTITIONS_CNT);
+
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                if (snpTrFut == null || !snpTrFut.snpName.equals(snpName)) {
+                    throw new TransmissionCancelledException("Stale snapshot transmission will be ignored " +
+                        "[snpName=" + snpName + ", meta=" + meta + ", snpTrFut=" + snpTrFut + ']');
+                }
+
+                assert snpTrFut.snpName.equals(snpName) && snpTrFut.rmtNodeId.equals(nodeId) :
+                    "Another transmission in progress [snpTrFut=" + snpTrFut + ", nodeId=" + snpName + ']';
+
+                if (snpTrFut.partsLeft == -1)
+                    snpTrFut.partsLeft = partsCnt;
+
+                return snpTrFut;
+            }
+
+            /**
+             * @param snpTrans Current snapshot transmission.
+             * @param grpPartId Pair of group id and its partition id.
+             */
+            private void finishRecover(RemoteSnapshotFuture snpTrans, GroupPartitionId grpPartId) {
+                FilePageStore pageStore = null;
+
+                try {
+                    pageStore = snpTrans.stores.remove(grpPartId);
+
+                    pageStore.finishRecover();
+
+                    snpTrans.partConsumer.accept(new File(pageStore.getFileAbsolutePath()), grpPartId);
+
+                    snpTrans.partsLeft--;
+                }
+                catch (StorageException e) {
+                    throw new IgniteException(e);
+                }
+                finally {
+                    U.closeQuiet(pageStore);
+                }
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<ByteBuffer> chunkHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+
+                RemoteSnapshotFuture snpTrFut = resolve(nodeId, initMeta);
+
+                GroupPartitionId grpPartId = new GroupPartitionId(grpId, partId);
+                FilePageStore pageStore = snpTrFut.stores.get(grpPartId);
+
+                if (pageStore == null) {
+                    throw new IgniteException("Partition must be loaded before applying snapshot delta pages " +
+                        "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                pageStore.beginRecover();
+
+                // No snapshot delta pages received. Finalize recovery.
+                if (initMeta.count() == 0)
+                    finishRecover(snpTrFut, grpPartId);
+
+                return new Consumer<ByteBuffer>() {
+                    final LongAdder transferred = new LongAdder();
+
+                    @Override public void accept(ByteBuffer buff) {
+                        try {
+                            assert initMeta.count() != 0 : initMeta;
+
+                            RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                            if (fut0 == null || !fut0.equals(snpTrFut) || fut0.isCancelled()) {
+                                throw new TransmissionCancelledException("Snapshot request is cancelled " +
+                                    "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                            }
+
+                            pageStore.write(PageIO.getPageId(buff), buff, 0, false);
+
+                            transferred.add(buff.capacity());
+
+                            if (transferred.longValue() == initMeta.count())
+                                finishRecover(snpTrFut, grpPartId);
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                    }
+                };
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<File> fileHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+                String snpName = (String)initMeta.params().get(SNP_NAME_PARAM);
+
+                assert grpId != null;
+                assert partId != null;
+                assert snpName != null;
+                assert storeFactory != null;
+
+                RemoteSnapshotFuture transFut = rmtSnpReq.get();
+
+                if (transFut == null) {
+                    throw new IgniteException("Snapshot transmission with given name doesn't exists " +
+                        "[snpName=" + snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                return new Consumer<File>() {
+                    @Override public void accept(File file) {
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.equals(transFut) || fut0.isCancelled()) {
+                            throw new TransmissionCancelledException("Snapshot request is cancelled [snpName=" + snpName +
+                                ", grpId=" + grpId + ", partId=" + partId + ']');
+                        }
+
+                        busyLock.enterBusy();
+
+                        try {
+                            FilePageStore pageStore = (FilePageStore)storeFactory
+                                .apply(grpId, false)
+                                .createPageStore(getFlagByPartId(partId),
+                                    file::toPath,
+                                    new LongAdderMetric("NO_OP", null));
+
+                            transFut.stores.put(new GroupPartitionId(grpId, partId), pageStore);
+
+                            pageStore.init();
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                        finally {
+                            busyLock.leaveBusy();
+                        }
+                    }
+                };
+            }
+        });
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void stop0(boolean cancel) {
+        busyLock.block();
+
+        try {
+            // Try stop all snapshot processing if not yet.
+            for (SnapshotFutureTask sctx : locSnpTasks.values())
+                sctx.acceptException(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+            locSnpTasks.clear();
+
+            RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+            if (snpTrFut != null)
+                snpTrFut.cancel();
+
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null) {
+                    clusterSnpFut.onDone(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+                    clusterSnpFut = null;
+                }
+            }
+
+            if (snpRunner != null)
+                snpRunner.shutdownNow();
+
+            cctx.kernalContext().io().removeMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC);
+            cctx.kernalContext().io().removeTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC);
+
+            if (discoLsnr != null)
+                cctx.kernalContext().event().removeDiscoveryEventListener(discoLsnr);
+
+            cctx.exchange().unregisterExchangeAwareComponent(this);
+        }
+        finally {
+            busyLock.unblock();
+        }
+    }
+
+    /**
+     * @param snpDir Snapshot dire
+     * @param folderName Local node folder name (see {@link U#maskForFileName} with consistent id).
+     */
+    public static void deleteSnapshot(File snpDir, String folderName) {
+        if (!snpDir.exists())
+            return;
+
+        assert snpDir.isDirectory() : snpDir;
+
+        try {
+            File binDir = resolveBinaryWorkDir(snpDir.getAbsolutePath(), folderName);
+            File dbDir = U.resolveWorkDirectory(snpDir.getAbsolutePath(), databaseRelativePath(folderName), false);
+
+            U.delete(binDir);
+            U.delete(dbDir);
+
+            File marshDir = mappingFileStoreWorkDir(snpDir.getAbsolutePath());
+
+            // Concurrently traverse the snapshot marshaller directory and delete all files.
+            Files.walkFileTree(marshDir.toPath(), new SimpleFileVisitor<Path>() {
+                @Override public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException {
+                    U.delete(file);
+
+                    return FileVisitResult.CONTINUE;
+                }
+
+                @Override public FileVisitResult visitFileFailed(Path file, IOException exc) {
+                    // Skip files which can be concurrently removed from FileTree.
+                    return FileVisitResult.CONTINUE;
+                }
+            });
+
+            File db = new File(snpDir, DB_DEFAULT_FOLDER);
+
+            if (!db.exists() || db.list().length == 0)
+                U.delete(snpDir);
+        }
+        catch (IOException | IgniteCheckedException e) {
+            throw new IgniteException(e);
+        }
+    }
+
+    /**
+     * @param snpName Snapshot name.
+     * @return Local snapshot directory for snapshot with given name.
+     */
+    public File snapshotLocalDir(String snpName) {
+        assert locSnpDir != null;
+
+        return new File(locSnpDir, snpName);
+    }
+
+    /**
+     * @return Node snapshot working directory.
+     */
+    public File snapshotTmpDir() {
+        assert tmpWorkDir != null;
+
+        return tmpWorkDir;
+    }
+
+    /**
+     * @param req Request on snapshot creation.
+     * @return Future which will be completed when a snapshot has been started.
+     */
+    private IgniteInternalFuture<SnapshotOperationResponse> initLocalSnapshotStartStage(SnapshotOperationRequest req) {
+        if (cctx.kernalContext().clientNode() ||
+            !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()))
+            return new GridFinishedFuture<>();
+
+        // Executed inside discovery notifier thread, prior to firing discovery custom event,
+        // so it is safe to set new snapshot task inside this method without synchronization.
+        if (clusterSnpRq != null) {
+            return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot operation has been rejected. " +
+                "Another snapshot operation in progress [req=" + req + ", curr=" + clusterSnpRq + ']'));
+        }
+
+        // Collection of pairs group and appropriate cache partition to be snapshot.
+        Map<Integer, Set<Integer>> parts = new HashMap<>();
+
+        for (Integer grpId : req.grpIds)
+            parts.put(grpId, null);
+
+        SnapshotFutureTask task0 = registerSnapshotTask(req.snpName,
+            req.srcNodeId,
+            parts,
+            locSndrFactory.apply(req.snpName));
+
+        clusterSnpRq = req;
+
+        return task0.chain(f -> new SnapshotOperationResponse());
+    }
+
+    /**
+     * @param id Request id.
+     * @param res Results.
+     * @param err Errors.
+     */
+    private void processLocalSnapshotStartStageResult(UUID id, Map<UUID, SnapshotOperationResponse> res, Map<UUID, Exception> err) {
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        if (snpRq == null)
+            return;
+
+        if (!snpRq.rqId.equals(id)) {
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null && clusterSnpFut.rqId.equals(id)) {
+                    clusterSnpFut.onDone(new IgniteCheckedException("Snapshot operation failed due to another snapshot " +
+                        "operation in progress: " + snpRq.snpName));
+
+                    clusterSnpFut = null;
+                }
+
+                return;
+            }
+        }
+
+        if (isLocalNodeCoordinator(cctx.discovery())) {
+            Set<UUID> missed = new HashSet<>(snpRq.bltNodes);
+            missed.removeAll(res.keySet());
+            missed.removeAll(err.keySet());
+
+            snpRq.hasErr = !F.isEmpty(err) || !missed.isEmpty();
+
+            if (snpRq.hasErr) {
+                U.warn(log, "Execution of local snapshot tasks fails or them haven't been executed " +
+                    "due to some of nodes left the cluster. Uncompleted snapshot will be deleted " +
+                    "[err=" + err + ", missed=" + missed + ']');
+            }
+
+            endSnpProc.start(UUID.randomUUID(), snpRq);
+        }
+    }
+
+    /**
+     * @param req Request on snapshot creation.
+     * @return Future which will be completed when the snapshot will be finalized.
+     */
+    private IgniteInternalFuture<SnapshotOperationResponse> initLocalSnapshotEndStage(SnapshotOperationRequest req) {
+        if (clusterSnpRq == null)
+            return new GridFinishedFuture<>(new SnapshotOperationResponse());
+
+        try {
+            if (req.hasErr)
+                deleteSnapshot(snapshotLocalDir(req.snpName), pdsSettings.folderName());
+
+            removeLastMetaStorageKey();
+        }
+        catch (Exception e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        return new GridFinishedFuture<>(new SnapshotOperationResponse());
+    }
+
+    /**
+     * @param id Request id.
+     * @param res Results.
+     * @param err Errors.
+     */
+    private void processLocalSnapshotEndStageResult(UUID id, Map<UUID, SnapshotOperationResponse> res, Map<UUID, Exception> err) {
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        if (snpRq == null)
+            return;
+
+        Set<UUID> endFail = new HashSet<>(snpRq.bltNodes);
+        endFail.removeAll(res.keySet());
+
+        clusterSnpRq = null;
+
+        synchronized (snpOpMux) {
+            if (clusterSnpFut != null) {
+                if (endFail.isEmpty() && !snpRq.hasErr) {
+                    clusterSnpFut.onDone();
+
+                    if (log.isInfoEnabled())
+                        log.info("Cluster-wide snapshot operation finished successfully [req=" + snpRq + ']');
+                }
+                else {
+                    clusterSnpFut.onDone(new IgniteCheckedException("Snapshot creation has been finished with an error. " +
+                        "Local snapshot tasks may not finished completely or finalizing results fails " +
+                        "[hasErr" + snpRq.hasErr + ", fail=" + endFail + ']'));
+                }
+
+                clusterSnpFut = null;
+            }
+        }
+    }
+
+    /**
+     * @return {@code True} if snapshot operation is in progress.
+     */
+    public boolean isSnapshotCreating() {
+        if (clusterSnpRq != null)
+            return true;
+
+        synchronized (snpOpMux) {
+            return clusterSnpRq != null || clusterSnpFut != null;
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public List<String> getSnapshots() {
+        if (cctx.kernalContext().clientNode())
+            throw new UnsupportedOperationException("Client and daemon nodes can not perform this operation.");
+
+        synchronized (snpOpMux) {
+            return Arrays.stream(locSnpDir.listFiles(File::isDirectory))
+                .map(File::getName)
+                .collect(Collectors.toList());
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public IgniteFuture<Void> createSnapshot(String name) {
+        A.notNullOrEmpty(name, "name");
+
+        try {
+            if (cctx.kernalContext().clientNode())
+                throw new UnsupportedOperationException("Client and daemon nodes can not perform this operation.");
+
+            if (!IgniteFeatures.allNodesSupports(cctx.discovery().allNodes(), PERSISTENCE_CACHE_SNAPSHOT))
+                throw new IgniteException("Not all nodes in the cluster support a snapshot operation.");
+
+            if (!active(cctx.kernalContext().state().clusterState().state()))
+                throw new IgniteException("Snapshot operation has been rejected. The cluster is inactive.");
+
+            DiscoveryDataClusterState clusterState = cctx.kernalContext().state().clusterState();
+
+            if (!clusterState.hasBaselineTopology())
+                throw new IgniteException("Snapshot operation has been rejected. The baseline topology is not configured for cluster.");
+
+            ClusterSnapshotFuture snpFut0;
+
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null && !clusterSnpFut.isDone())
+                    throw new IgniteException("Create snapshot request has been rejected. The previous snapshot operation was not completed.");
+
+                if (clusterSnpRq != null)
+                    throw new IgniteException("Create snapshot request has been rejected. Parallel snapshot processes are not allowed.");
+
+                if (getSnapshots().contains(name))
+                    throw new IgniteException("Create snapshot request has been rejected. Snapshot with given name already exists.");
+
+                snpFut0 = new ClusterSnapshotFuture(UUID.randomUUID(), name);
+
+                clusterSnpFut = snpFut0;
+                lastSeenSnpFut = snpFut0;
+            }
+
+            List<Integer> grps = cctx.cache().persistentGroups().stream()
+                .filter(g -> cctx.cache().cacheType(g.cacheOrGroupName()) == CacheType.USER)
+                .filter(g -> !g.config().isEncryptionEnabled())
+                .map(CacheGroupDescriptor::groupId)
+                .collect(Collectors.toList());
+
+            List<ClusterNode> srvNodes = cctx.discovery().serverNodes(AffinityTopologyVersion.NONE);
+
+            startSnpProc.start(snpFut0.rqId, new SnapshotOperationRequest(snpFut0.rqId,
+                cctx.localNodeId(),
+                name,
+                grps,
+                new HashSet<>(F.viewReadOnly(srvNodes,
+                    F.node2id(),
+                    (node) -> CU.baselineNode(node, clusterState)))));
+
+            if (log.isInfoEnabled())
+                log.info("Cluster-wide snapshot operation started [snpName=" + name + ", grps=" + grps + ']');
+
+            return new IgniteFutureImpl<>(snpFut0);
+        }
+        catch (Exception e) {
+            U.error(log, "Start snapshot operation failed", e);
+
+            lastSeenSnpFut = new ClusterSnapshotFuture(name, e);
+
+            return new IgniteFinishedFutureImpl<>(e);
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onReadyForReadWrite(ReadWriteMetastorage metaStorage) throws IgniteCheckedException {
+        synchronized (snpOpMux) {
+            this.metaStorage = metaStorage;
+
+            if (recovered)
+                removeLastMetaStorageKey();
+
+            recovered = false;
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onReadyForRead(ReadOnlyMetastorage metaStorage) throws IgniteCheckedException {
+        // Snapshot which has not been completed due to the local node crashed must be deleted.
+        String snpName = (String)metaStorage.read(SNP_RUNNING_KEY);
+
+        if (snpName == null)
+            return;
+
+        recovered = true;
+
+        for (File tmp : snapshotTmpDir().listFiles())
+            U.delete(tmp);
+
+        deleteSnapshot(snapshotLocalDir(snpName), pdsSettings.folderName());
+
+        if (log.isInfoEnabled()) {
+            log.info("Previous attempt to create snapshot fail due to the local node crash. All resources " +
+                "related to snapshot operation have been deleted: " + snpName);
+        }
+    }
+
+    /**
+     * @param evt Discovery event to check.
+     * @return {@code true} if exchange started by snapshot operation.
+     */
+    public static boolean isSnapshotOperation(DiscoveryEvent evt) {
+        return !evt.eventNode().isClient() &&
+            evt.type() == EVT_DISCOVERY_CUSTOM_EVT &&
+            ((DiscoveryCustomEvent)evt).customMessage() instanceof SnapshotStartDiscoveryMessage;
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onDoneBeforeTopologyUnlock(GridDhtPartitionsExchangeFuture fut) {
+        if (clusterSnpRq == null || cctx.kernalContext().clientNode())
+            return;
+
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        SnapshotFutureTask task = locSnpTasks.get(snpRq.snpName);
+
+        if (task == null)
+            return;
+
+        if (task.start()) {
+            cctx.database().forceCheckpoint(String.format("Start snapshot operation: %s", snpRq.snpName));
+
+            // schedule task on checkpoint and wait when it starts
+            try {
+                task.awaitStarted();
+            }
+            catch (IgniteCheckedException e) {
+                U.error(log, "Fail to wait while cluster-wide snapshot operation started", e);
+            }
+        }
+    }
+
+    /**
+     * @param parts Collection of pairs group and appropriate cache partition to be snapshot.
+     * @param rmtNodeId The remote node to connect to.
+     * @param partConsumer Received partition handler.
+     * @return Future which will be completed when requested snapshot fully received.
+     */
+    public IgniteInternalFuture<Void> createRemoteSnapshot(
+        UUID rmtNodeId,
+        Map<Integer, Set<Integer>> parts,
+        BiConsumer<File, GroupPartitionId> partConsumer
+    ) {
+        assert partConsumer != null;
+
+        ClusterNode rmtNode = cctx.discovery().node(rmtNodeId);
+
+        if (!nodeSupports(rmtNode, PERSISTENCE_CACHE_SNAPSHOT))
+            return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot on remote node is not supported: " + rmtNode.id()));
+
+        if (rmtNode == null) {
+            return new GridFinishedFuture<>(new ClusterTopologyCheckedException("Snapshot request cannot be performed. " +
+                "Remote node left the grid [rmtNodeId=" + rmtNodeId + ']'));
+        }
+
+        String snpName = RMT_SNAPSHOT_PREFIX + UUID.randomUUID().toString();
+
+        RemoteSnapshotFuture snpTransFut = new RemoteSnapshotFuture(rmtNodeId, snpName, partConsumer);
+
+        busyLock.enterBusy();
+        SnapshotRequestMessage msg0;
+
+        try {
+            msg0 = new SnapshotRequestMessage(snpName, parts);
+
+            RemoteSnapshotFuture fut = rmtSnpReq.get();
+
+            try {
+                if (fut != null)
+                    fut.get(DFLT_SNAPSHOT_TIMEOUT, TimeUnit.MILLISECONDS);
+            }
+            catch (IgniteCheckedException e) {
+                if (log.isInfoEnabled())
+                    log.info("The previous snapshot request finished with an exception:" + e.getMessage());
+            }
+
+            try {
+                if (rmtSnpReq.compareAndSet(null, snpTransFut)) {
+                    cctx.gridIO().sendOrderedMessage(rmtNode, DFLT_INITIAL_SNAPSHOT_TOPIC, msg0, SYSTEM_POOL,
+                        Long.MAX_VALUE, true);
+                }
+                else
+                    return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot request has been concurrently interrupted."));
+
+            }
+            catch (IgniteCheckedException e) {
+                rmtSnpReq.compareAndSet(snpTransFut, null);
+
+                return new GridFinishedFuture<>(e);
+            }
+        }
+        finally {
+            busyLock.leaveBusy();
+        }
+
+        if (log.isInfoEnabled()) {
+            log.info("Snapshot request is sent to the remote node [rmtNodeId=" + rmtNodeId +
+                ", msg0=" + msg0 + ", snpTransFut=" + snpTransFut +
+                ", topVer=" + cctx.discovery().topologyVersionEx() + ']');
+        }
+
+        return snpTransFut;
+    }
+
+    /**
+     * @param grps List of cache groups which will be destroyed.
+     */
+    public void onCacheGroupsStopped(List<Integer> grps) {
+        for (SnapshotFutureTask sctx : locSnpTasks.values()) {
+            Set<Integer> retain = new HashSet<>(grps);
+            retain.retainAll(sctx.affectedCacheGroups());
+
+            if (!retain.isEmpty()) {
+                sctx.acceptException(new IgniteCheckedException("Snapshot has been interrupted due to some of the required " +
+                    "cache groups stopped: " + retain));
+            }
+        }
+    }
+
+    /**
+     * @param snpName Unique snapshot name.
+     * @param srcNodeId Node id which cause snapshot operation.
+     * @param parts Collection of pairs group and appropriate cache partition to be snapshot.
+     * @param snpSndr Factory which produces snapshot receiver instance.
+     * @return Snapshot operation task which should be registered on checkpoint to run.
+     */
+    SnapshotFutureTask registerSnapshotTask(
+        String snpName,
+        UUID srcNodeId,
+        Map<Integer, Set<Integer>> parts,
+        SnapshotSender snpSndr
+    ) {
+        if (!busyLock.enterBusy())
+            return new SnapshotFutureTask(new IgniteCheckedException("Snapshot manager is stopping [locNodeId=" + cctx.localNodeId() + ']'));
+
+        try {
+            if (locSnpTasks.containsKey(snpName))
+                return new SnapshotFutureTask(new IgniteCheckedException("Snapshot with requested name is already scheduled: " + snpName));
+
+            SnapshotFutureTask snpFutTask;
+
+            SnapshotFutureTask prev = locSnpTasks.putIfAbsent(snpName,
+                snpFutTask = new SnapshotFutureTask(cctx,
+                    srcNodeId,
+                    snpName,
+                    tmpWorkDir,
+                    ioFactory,
+                    snpSndr,
+                    parts,
+                    locBuff));
+
+            if (prev != null)
+                return new SnapshotFutureTask(new IgniteCheckedException("Snapshot with requested name is already scheduled: " + snpName));
+
+            if (log.isInfoEnabled()) {
+                log.info("Snapshot task has been registered on local node [sctx=" + this +
+                    ", topVer=" + cctx.discovery().topologyVersionEx() + ']');
+            }
+
+            snpFutTask.listen(f -> locSnpTasks.remove(snpName));
+
+            return snpFutTask;
+        }
+        finally {
+            busyLock.leaveBusy();
+        }
+    }
+
+    /**
+     * @param factory Factory which produces {@link LocalSnapshotSender} implementation.
+     */
+    void setLocalSnapshotSenderFactory(Function<String, SnapshotSender> factory) {
+        locSndrFactory = factory;
+    }
+
+    /**
+     * @return Factory which produces {@link LocalSnapshotSender} implementation.
+     */
+    Function<String, SnapshotSender> localSnapshotSenderFactory() {
+        return LocalSnapshotSender::new;
+    }
+
+    /**
+     * @param snpName Snapshot name.
+     * @param rmtNodeId Remote node id to send snapshot to.
+     * @return Snapshot sender instance.
+     */
+    SnapshotSender remoteSnapshotSender(String snpName, UUID rmtNodeId) {
+        // Remote snapshots can be send only by single threaded executor since only one transmissionSender created.
+        return new RemoteSnapshotSender(log,
+            new SequentialExecutorWrapper(log, snpRunner),
+            () -> databaseRelativePath(pdsSettings.folderName()),
+            cctx.gridIO().openTransmissionSender(rmtNodeId, DFLT_INITIAL_SNAPSHOT_TOPIC),
+            snpName);
+    }
+
+    /** Snapshot finished successfully or already restored. Key can be removed. */
+    private void removeLastMetaStorageKey() throws IgniteCheckedException {
+        cctx.database().checkpointReadLock();
+
+        try {
+            metaStorage.remove(SNP_RUNNING_KEY);
+        }
+        finally {
+            cctx.database().checkpointReadUnlock();
+        }
+    }
+
+    /**
+     * @return The executor service used to run snapshot tasks.
+     */
+    ExecutorService snapshotExecutorService() {
+        assert snpRunner != null;
+
+        return snpRunner;
+    }
+
+    /**
+     * @param ioFactory Factory to create IO interface over a page stores.
+     */
+    void ioFactory(FileIOFactory ioFactory) {
+        this.ioFactory = ioFactory;
+    }
+
+    /**
+     * @param nodeId Remote node id on which requests has been registered.
+     * @return Snapshot future related to given node id.
+     */
+    SnapshotFutureTask lastScheduledRemoteSnapshotTask(UUID nodeId) {
+        return locSnpTasks.values().stream()
+            .filter(t -> t.type() == RemoteSnapshotSender.class && t.sourceNodeId().equals(nodeId))
+            .findFirst()
+            .orElse(null);
+    }
+
+    /**
+     * @return Relative configured path of persistence data storage directory for the local node.
+     * Example: {@code snapshotWorkDir/db/IgniteNodeName0}
+     */
+    static String databaseRelativePath(String folderName) {
+        return Paths.get(DB_DEFAULT_FOLDER, folderName).toString();
+    }
+
+    /**
+     * @param cfg Ignite configuration.
+     * @return Snapshot work path.
+     */
+    static File resolveSnapshotWorkDirectory(IgniteConfiguration cfg) {
+        try {
+            return cfg.getSnapshotPath() == null ?
 
 Review comment:
   Fixed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r409104563
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotFutureTask.java
 ##########
 @@ -0,0 +1,881 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.Closeable;
+import java.io.File;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.atomic.AtomicBoolean;
+import java.util.concurrent.atomic.AtomicIntegerArray;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.concurrent.locks.ReadWriteLock;
+import java.util.concurrent.locks.ReentrantReadWriteLock;
+import java.util.function.BooleanSupplier;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.internal.pagemem.PageIdUtils;
+import org.apache.ignite.internal.pagemem.store.PageStore;
+import org.apache.ignite.internal.pagemem.store.PageWriteListener;
+import org.apache.ignite.internal.processors.cache.CacheGroupContext;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtLocalPartition;
+import org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionState;
+import org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopology;
+import org.apache.ignite.internal.processors.cache.persistence.DbCheckpointListener;
+import org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.tree.io.PageIO;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.processors.marshaller.MappedName;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.lang.IgniteThrowableRunner;
+import org.apache.ignite.internal.util.tostring.GridToStringExclude;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.A;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.S;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.INDEX_PARTITION;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.cacheDirName;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.cacheWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFile;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.partDeltaFile;
+
+/**
+ *
+ */
+class SnapshotFutureTask extends GridFutureAdapter<Boolean> implements DbCheckpointListener {
+    /** Shared context. */
+    private final GridCacheSharedContext<?, ?> cctx;
+
+    /** Ignite logger */
 
 Review comment:
   Fixed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r410164657
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/distributed/dht/preloader/GridDhtPartitionsExchangeFuture.java
 ##########
 @@ -1404,7 +1419,7 @@ private ExchangeType onServerNodeEvent(boolean crd) throws IgniteCheckedExceptio
     /**
      * @return Exchange type.
      */
-    private ExchangeType onExchangeFreeSwitch() {
+    private ExchangeType onExchangeFreeSwitchOnLeft() {
 
 Review comment:
   onExchangeFreeSwitchNodeLeft?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r408149033
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManager.java
 ##########
 @@ -0,0 +1,1894 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.channels.FileChannel;
+import java.nio.file.FileVisitResult;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.SimpleFileVisitor;
+import java.nio.file.attribute.BasicFileAttributes;
+import java.util.ArrayDeque;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Queue;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.Executor;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.concurrent.atomic.LongAdder;
+import java.util.function.BiConsumer;
+import java.util.function.BiFunction;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.events.DiscoveryEvent;
+import org.apache.ignite.failure.FailureContext;
+import org.apache.ignite.failure.FailureType;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.GridTopic;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.events.DiscoveryCustomEvent;
+import org.apache.ignite.internal.managers.communication.GridIoManager;
+import org.apache.ignite.internal.managers.communication.GridMessageListener;
+import org.apache.ignite.internal.managers.communication.TransmissionCancelledException;
+import org.apache.ignite.internal.managers.communication.TransmissionHandler;
+import org.apache.ignite.internal.managers.communication.TransmissionMeta;
+import org.apache.ignite.internal.managers.communication.TransmissionPolicy;
+import org.apache.ignite.internal.managers.discovery.DiscoCache;
+import org.apache.ignite.internal.managers.discovery.DiscoveryCustomMessage;
+import org.apache.ignite.internal.managers.discovery.GridDiscoveryManager;
+import org.apache.ignite.internal.managers.eventstorage.DiscoveryEventListener;
+import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.CacheType;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.PartitionsExchangeAware;
+import org.apache.ignite.internal.processors.cache.persistence.StorageException;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.filename.PdsFolderSettings;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.MetastorageLifecycleListener;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadOnlyMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadWriteMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.tree.io.PageIO;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.processors.marshaller.MappedName;
+import org.apache.ignite.internal.processors.metric.impl.LongAdderMetric;
+import org.apache.ignite.internal.util.GridBusyLock;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.distributed.InitMessage;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.tostring.GridToStringInclude;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.S;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.lang.IgniteUuid;
+import org.apache.ignite.thread.IgniteThreadPoolExecutor;
+import org.apache.ignite.thread.OomExceptionHandler;
+import org.jetbrains.annotations.Nullable;
+
+import static java.nio.file.StandardOpenOption.READ;
+import static org.apache.ignite.cluster.ClusterState.active;
+import static org.apache.ignite.configuration.IgniteConfiguration.DFLT_SNAPSHOT_DIRECTORY;
+import static org.apache.ignite.events.EventType.EVT_NODE_FAILED;
+import static org.apache.ignite.events.EventType.EVT_NODE_LEFT;
+import static org.apache.ignite.internal.IgniteFeatures.PERSISTENCE_CACHE_SNAPSHOT;
+import static org.apache.ignite.internal.IgniteFeatures.nodeSupports;
+import static org.apache.ignite.internal.MarshallerContextImpl.saveMappings;
+import static org.apache.ignite.internal.events.DiscoveryCustomEvent.EVT_DISCOVERY_CUSTOM_EVT;
+import static org.apache.ignite.internal.managers.communication.GridIoPolicy.SYSTEM_POOL;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.INDEX_PARTITION;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.MAX_PARTITION_ID;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.INDEX_FILE_NAME;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.PART_FILE_TEMPLATE;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFile;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFileName;
+import static org.apache.ignite.internal.processors.cache.persistence.filename.PdsConsistentIdProcessor.DB_DEFAULT_FOLDER;
+import static org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId.getFlagByPartId;
+import static org.apache.ignite.internal.util.IgniteUtils.isLocalNodeCoordinator;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.END_SNAPSHOT;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.START_SNAPSHOT;
+
+/**
+ * Internal implementation of snapshot operations over persistence caches.
+ * <p>
+ * There are two major actions available:
+ * <ul>
+ *     <li>Create snapshot of the whole cluster cache groups by triggering PME to achieve consistency.</li>
+ *     <li>Create local snapshot of requested cache groups and send it to the node which request this operation.
+ *     Cache groups will be transmitted using internal API for transferring files. See {@link TransmissionHandler}.</li>
+ * </ul>
+ */
+public class IgniteSnapshotManager extends GridCacheSharedManagerAdapter
+    implements IgniteSnapshot, PartitionsExchangeAware, MetastorageLifecycleListener {
+    /** File with delta pages suffix. */
+    public static final String DELTA_SUFFIX = ".delta";
+
+    /** File name template consists of delta pages. */
+    public static final String PART_DELTA_TEMPLATE = PART_FILE_TEMPLATE + DELTA_SUFFIX;
+
+    /** File name template for index delta pages. */
+    public static final String INDEX_DELTA_NAME = INDEX_FILE_NAME + DELTA_SUFFIX;
+
+    /** Text Reason for checkpoint to start snapshot operation. */
+    public static final String CP_SNAPSHOT_REASON = "Checkpoint started to enforce snapshot operation: %s";
+
+    /** Name prefix for each remote snapshot operation. */
+    public static final String RMT_SNAPSHOT_PREFIX = "snapshot_";
+
+    /** Default snapshot directory for loading remote snapshots. */
+    public static final String DFLT_SNAPSHOT_TMP_DIR = "snp";
+
+    /** Timeout in millisecond for snapshot operations. */
+    public static final long DFLT_SNAPSHOT_TIMEOUT = 15_000L;
+
+    /** Snapshot in progress error message. */
+    public static final String SNP_IN_PROGRESS_ERR_MSG = "Operation rejected due to the snapshot operation in progress.";
+
+    /** Error message to finalize snapshot tasks. */
+    public static final String SNP_NODE_STOPPING_ERR_MSG = "Snapshot has been cancelled due to the local node " +
+        "is stopping";
+
+    /** Metastorage key to save currently running snapshot. */
+    public static final String SNP_RUNNING_KEY = "snapshot-running";
+
+    /** Prefix for snapshot threads. */
+    private static final String SNAPSHOT_RUNNER_THREAD_PREFIX = "snapshot-runner";
+
+    /** Total number of thread to perform local snapshot. */
+    private static final int SNAPSHOT_THREAD_POOL_SIZE = 4;
+
+    /** Default snapshot topic to receive snapshots from remote node. */
+    private static final Object DFLT_INITIAL_SNAPSHOT_TOPIC = GridTopic.TOPIC_SNAPSHOT.topic("rmt_snp");
+
+    /** File transmission parameter of cache group id. */
+    private static final String SNP_GRP_ID_PARAM = "grpId";
+
+    /** File transmission parameter of cache partition id. */
+    private static final String SNP_PART_ID_PARAM = "partId";
+
+    /** File transmission parameter of node-sender directory path with its consistentId (e.g. db/IgniteNode0). */
+    private static final String SNP_DB_NODE_PATH_PARAM = "dbNodePath";
+
+    /** File transmission parameter of a cache directory with is currently sends its partitions. */
+    private static final String SNP_CACHE_DIR_NAME_PARAM = "cacheDirName";
+
+    /** Snapshot parameter name for a file transmission. */
+    private static final String SNP_NAME_PARAM = "snpName";
+
+    /** Total snapshot files count which receiver should expect to receive. */
+    private static final String SNP_PARTITIONS_CNT = "partsCnt";
+
+    /**
+     * Local buffer to perform copy-on-write operations with pages for {@code SnapshotFutureTask.PageStoreSerialWriter}s.
+     * It is important to have only only buffer per thread (instead of creating each buffer per
+     * each {@code SnapshotFutureTask.PageStoreSerialWriter}) this is redundant and can lead to OOM errors. Direct buffer
+     * deallocate only when ByteBuffer is garbage collected, but it can get out of off-heap memory before it.
+     */
+    private final ThreadLocal<ByteBuffer> locBuff;
+
+    /** Map of registered cache snapshot processes and their corresponding contexts. */
+    private final ConcurrentMap<String, SnapshotFutureTask> locSnpTasks = new ConcurrentHashMap<>();
+
+    /** Lock to protect the resources is used. */
+    private final GridBusyLock busyLock = new GridBusyLock();
+
+    /** Requested snapshot from remote node. */
+    private final AtomicReference<RemoteSnapshotFuture> rmtSnpReq = new AtomicReference<>();
+
+    /** Mutex used to order cluster snapshot operation progress. */
+    private final Object snpOpMux = new Object();
+
+    /** Take snapshot operation procedure. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> startSnpProc;
+
+    /** Check previously preformed snapshot operation and delete uncompleted files if need. */
 
 Review comment:
   fixed

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r410052416
 
 

 ##########
 File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotSelfTest.java
 ##########
 @@ -286,6 +293,134 @@ public void testSnapshotPrimaryBackupsTheSame() throws Exception {
         TestRecordingCommunicationSpi.stopBlockAll();
     }
 
+    /** @throws Exception If fails. */
+    @Test
+    public void testClusterSnapshotConsistencyUnderLoad() throws Exception {
+        int clients = 50;
+        int balance = 10_000;
+        int transferLimit = 1000;
+        int total = clients * balance * 2;
+        int grids = 3;
+        int transferThreadCnt = 4;
+        AtomicBoolean stop = new AtomicBoolean(false);
+        CountDownLatch txStarted = new CountDownLatch(1);
+
+        CacheConfiguration<Integer, Account> eastCcfg = txCacheConfig(new CacheConfiguration<>("east"));
+        CacheConfiguration<Integer, Account> westCcfg = txCacheConfig(new CacheConfiguration<>("west"));
+
+        for (int i = 0; i < grids; i++)
+            startGrid(optimize(getConfiguration(getTestIgniteInstanceName(i)).setCacheConfiguration(eastCcfg, westCcfg)));
+
+        grid(0).cluster().state(ACTIVE);
+
+        Ignite client = startClientGrid(grids);
+
+        IgniteCache<Integer, Account> eastCache = client.cache(eastCcfg.getName());
+        IgniteCache<Integer, Account> westCache = client.cache(westCcfg.getName());
+
+        // Create clients with zero balance.
+        for (int i = 0; i < clients; i++) {
+            eastCache.put(i, new Account(i, balance));
+            westCache.put(i, new Account(i, balance));
+        }
+
+        assertEquals("The initial summary value in all caches is not correct.",
+            total, sumAllCacheValues(client, clients, eastCcfg.getName(), westCcfg.getName()));
+
+        forceCheckpoint();
+
+        IgniteInternalFuture<?> txLoadFut = GridTestUtils.runMultiThreadedAsync(
+            () -> {
+                ThreadLocalRandom rnd = ThreadLocalRandom.current();
+
+                int amount;
+
+                try {
+                    while (!stop.get()) {
+                        IgniteEx ignite = grid(rnd.nextInt(grids));
+                        IgniteCache<Integer, Account> east = ignite.cache("east");
+                        IgniteCache<Integer, Account> west = ignite.cache("west");
+
+                        amount = rnd.nextInt(transferLimit);
+
+                        try (Transaction tx = ignite.transactions().txStart()) {
+                            Integer id = rnd.nextInt(clients);
+
+                            Account acc0 = east.get(id);
+                            Account acc1 = west.get(id);
+
+                            acc0.balance -= amount;
+
+                            txStarted.countDown();
+
+                            acc1.balance += amount;
+
+                            east.put(id, acc0);
+                            west.put(id, acc1);
+
+                            tx.commit();
+                        }
+                    }
+                }
+                catch (Throwable e) {
+                    U.error(log, e);
+
+                    fail("Tx must not be failed.");
+                }
+            }, transferThreadCnt, "transfer-account-thread-");
+
+        try {
+            U.await(txStarted);
+
+            grid(0).snapshot().createSnapshot(SNAPSHOT_NAME).get();
+        }
+        finally {
+            stop.set(true);
+        }
+
+        txLoadFut.get();
+
+        assertEquals("The summary value should not changed during tx transfers.",
+            total, sumAllCacheValues(client, clients, eastCcfg.getName(), westCcfg.getName()));
+
+        stopAllGrids();
+
+        IgniteEx snpIg0 = startGridsFromSnapshot(grids, SNAPSHOT_NAME);
+
+        assertEquals("The total amount of all cache values must not changed in snapshot.",
+            total, sumAllCacheValues(snpIg0, clients, eastCcfg.getName(), westCcfg.getName()));
+    }
+
+    /** @throws Exception If fails. */
+    @Test
+    public void testClusterSnapshotWithCacheNodeFilter() throws Exception {
+        int grids = 4;
+
+        CacheConfiguration<Integer, Integer> ccfg = txCacheConfig(new CacheConfiguration<Integer, Integer>(DEFAULT_CACHE_NAME))
+            .setNodeFilter(node -> node.consistentId().toString().endsWith("1"));
+
+        for (int i = 0; i < grids; i++)
+            startGrid(optimize(getConfiguration(getTestIgniteInstanceName(i)).setCacheConfiguration()));
+
+        IgniteEx ig0 = grid(0);
+
+        ig0.cluster().baselineAutoAdjustEnabled(false);
+        ig0.cluster().state(ACTIVE);
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            ig0.getOrCreateCache(ccfg).put(i, i);
+
+        forceCheckpoint();
 
 Review comment:
   Redundant

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r410205509
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/IgniteCacheDatabaseSharedManager.java
 ##########
 @@ -156,7 +156,6 @@
     /** First eviction was warned flag. */
     private volatile boolean firstEvictWarn;
 
-
 
 Review comment:
   Changes reverted

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r408151987
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManager.java
 ##########
 @@ -0,0 +1,1894 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.channels.FileChannel;
+import java.nio.file.FileVisitResult;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.SimpleFileVisitor;
+import java.nio.file.attribute.BasicFileAttributes;
+import java.util.ArrayDeque;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Queue;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.Executor;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.concurrent.atomic.LongAdder;
+import java.util.function.BiConsumer;
+import java.util.function.BiFunction;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.events.DiscoveryEvent;
+import org.apache.ignite.failure.FailureContext;
+import org.apache.ignite.failure.FailureType;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.GridTopic;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.events.DiscoveryCustomEvent;
+import org.apache.ignite.internal.managers.communication.GridIoManager;
+import org.apache.ignite.internal.managers.communication.GridMessageListener;
+import org.apache.ignite.internal.managers.communication.TransmissionCancelledException;
+import org.apache.ignite.internal.managers.communication.TransmissionHandler;
+import org.apache.ignite.internal.managers.communication.TransmissionMeta;
+import org.apache.ignite.internal.managers.communication.TransmissionPolicy;
+import org.apache.ignite.internal.managers.discovery.DiscoCache;
+import org.apache.ignite.internal.managers.discovery.DiscoveryCustomMessage;
+import org.apache.ignite.internal.managers.discovery.GridDiscoveryManager;
+import org.apache.ignite.internal.managers.eventstorage.DiscoveryEventListener;
+import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.CacheType;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.PartitionsExchangeAware;
+import org.apache.ignite.internal.processors.cache.persistence.StorageException;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.filename.PdsFolderSettings;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.MetastorageLifecycleListener;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadOnlyMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadWriteMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.tree.io.PageIO;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.processors.marshaller.MappedName;
+import org.apache.ignite.internal.processors.metric.impl.LongAdderMetric;
+import org.apache.ignite.internal.util.GridBusyLock;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.distributed.InitMessage;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.tostring.GridToStringInclude;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.S;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.lang.IgniteUuid;
+import org.apache.ignite.thread.IgniteThreadPoolExecutor;
+import org.apache.ignite.thread.OomExceptionHandler;
+import org.jetbrains.annotations.Nullable;
+
+import static java.nio.file.StandardOpenOption.READ;
+import static org.apache.ignite.cluster.ClusterState.active;
+import static org.apache.ignite.configuration.IgniteConfiguration.DFLT_SNAPSHOT_DIRECTORY;
+import static org.apache.ignite.events.EventType.EVT_NODE_FAILED;
+import static org.apache.ignite.events.EventType.EVT_NODE_LEFT;
+import static org.apache.ignite.internal.IgniteFeatures.PERSISTENCE_CACHE_SNAPSHOT;
+import static org.apache.ignite.internal.IgniteFeatures.nodeSupports;
+import static org.apache.ignite.internal.MarshallerContextImpl.saveMappings;
+import static org.apache.ignite.internal.events.DiscoveryCustomEvent.EVT_DISCOVERY_CUSTOM_EVT;
+import static org.apache.ignite.internal.managers.communication.GridIoPolicy.SYSTEM_POOL;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.INDEX_PARTITION;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.MAX_PARTITION_ID;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.INDEX_FILE_NAME;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.PART_FILE_TEMPLATE;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFile;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFileName;
+import static org.apache.ignite.internal.processors.cache.persistence.filename.PdsConsistentIdProcessor.DB_DEFAULT_FOLDER;
+import static org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId.getFlagByPartId;
+import static org.apache.ignite.internal.util.IgniteUtils.isLocalNodeCoordinator;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.END_SNAPSHOT;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.START_SNAPSHOT;
+
+/**
+ * Internal implementation of snapshot operations over persistence caches.
+ * <p>
+ * There are two major actions available:
+ * <ul>
+ *     <li>Create snapshot of the whole cluster cache groups by triggering PME to achieve consistency.</li>
+ *     <li>Create local snapshot of requested cache groups and send it to the node which request this operation.
+ *     Cache groups will be transmitted using internal API for transferring files. See {@link TransmissionHandler}.</li>
+ * </ul>
+ */
+public class IgniteSnapshotManager extends GridCacheSharedManagerAdapter
+    implements IgniteSnapshot, PartitionsExchangeAware, MetastorageLifecycleListener {
+    /** File with delta pages suffix. */
+    public static final String DELTA_SUFFIX = ".delta";
+
+    /** File name template consists of delta pages. */
+    public static final String PART_DELTA_TEMPLATE = PART_FILE_TEMPLATE + DELTA_SUFFIX;
+
+    /** File name template for index delta pages. */
+    public static final String INDEX_DELTA_NAME = INDEX_FILE_NAME + DELTA_SUFFIX;
+
+    /** Text Reason for checkpoint to start snapshot operation. */
+    public static final String CP_SNAPSHOT_REASON = "Checkpoint started to enforce snapshot operation: %s";
+
+    /** Name prefix for each remote snapshot operation. */
+    public static final String RMT_SNAPSHOT_PREFIX = "snapshot_";
+
+    /** Default snapshot directory for loading remote snapshots. */
+    public static final String DFLT_SNAPSHOT_TMP_DIR = "snp";
+
+    /** Timeout in millisecond for snapshot operations. */
+    public static final long DFLT_SNAPSHOT_TIMEOUT = 15_000L;
+
+    /** Snapshot in progress error message. */
+    public static final String SNP_IN_PROGRESS_ERR_MSG = "Operation rejected due to the snapshot operation in progress.";
+
+    /** Error message to finalize snapshot tasks. */
+    public static final String SNP_NODE_STOPPING_ERR_MSG = "Snapshot has been cancelled due to the local node " +
+        "is stopping";
+
+    /** Metastorage key to save currently running snapshot. */
+    public static final String SNP_RUNNING_KEY = "snapshot-running";
+
+    /** Prefix for snapshot threads. */
+    private static final String SNAPSHOT_RUNNER_THREAD_PREFIX = "snapshot-runner";
+
+    /** Total number of thread to perform local snapshot. */
+    private static final int SNAPSHOT_THREAD_POOL_SIZE = 4;
+
+    /** Default snapshot topic to receive snapshots from remote node. */
+    private static final Object DFLT_INITIAL_SNAPSHOT_TOPIC = GridTopic.TOPIC_SNAPSHOT.topic("rmt_snp");
+
+    /** File transmission parameter of cache group id. */
+    private static final String SNP_GRP_ID_PARAM = "grpId";
+
+    /** File transmission parameter of cache partition id. */
+    private static final String SNP_PART_ID_PARAM = "partId";
+
+    /** File transmission parameter of node-sender directory path with its consistentId (e.g. db/IgniteNode0). */
+    private static final String SNP_DB_NODE_PATH_PARAM = "dbNodePath";
+
+    /** File transmission parameter of a cache directory with is currently sends its partitions. */
+    private static final String SNP_CACHE_DIR_NAME_PARAM = "cacheDirName";
+
+    /** Snapshot parameter name for a file transmission. */
+    private static final String SNP_NAME_PARAM = "snpName";
+
+    /** Total snapshot files count which receiver should expect to receive. */
+    private static final String SNP_PARTITIONS_CNT = "partsCnt";
+
+    /**
+     * Local buffer to perform copy-on-write operations with pages for {@code SnapshotFutureTask.PageStoreSerialWriter}s.
+     * It is important to have only only buffer per thread (instead of creating each buffer per
+     * each {@code SnapshotFutureTask.PageStoreSerialWriter}) this is redundant and can lead to OOM errors. Direct buffer
+     * deallocate only when ByteBuffer is garbage collected, but it can get out of off-heap memory before it.
+     */
+    private final ThreadLocal<ByteBuffer> locBuff;
+
+    /** Map of registered cache snapshot processes and their corresponding contexts. */
+    private final ConcurrentMap<String, SnapshotFutureTask> locSnpTasks = new ConcurrentHashMap<>();
+
+    /** Lock to protect the resources is used. */
+    private final GridBusyLock busyLock = new GridBusyLock();
+
+    /** Requested snapshot from remote node. */
+    private final AtomicReference<RemoteSnapshotFuture> rmtSnpReq = new AtomicReference<>();
+
+    /** Mutex used to order cluster snapshot operation progress. */
+    private final Object snpOpMux = new Object();
+
+    /** Take snapshot operation procedure. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> startSnpProc;
+
+    /** Check previously preformed snapshot operation and delete uncompleted files if need. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> endSnpProc;
+
+    /** Resolved persistent data storage settings. */
+    private volatile PdsFolderSettings pdsSettings;
+
+    /** Fully initialized metastorage. */
+    private volatile ReadWriteMetastorage metaStorage;
+
+    /** Local snapshot sender factory. */
+    private Function<String, SnapshotSender> locSndrFactory = this::localSnapshotSender;
+
+    /** Main snapshot directory to save created snapshots. */
+    private volatile File locSnpDir;
+
+    /**
+     * Working directory for loaded snapshots from the remote nodes and storing
+     * temporary partition delta-files of locally started snapshot process.
+     */
+    private File tmpWorkDir;
+
+    /** Factory to working with delta as file storage. */
+    private volatile FileIOFactory ioFactory = new RandomAccessFileIOFactory();
+
+    /** Factory to create page store for restore. */
+    private volatile BiFunction<Integer, Boolean, FilePageStoreFactory> storeFactory;
+
+    /** Snapshot thread pool to perform local partition snapshots. */
+    private ExecutorService snpRunner;
+
+    /** System discovery message listener. */
+    private DiscoveryEventListener discoLsnr;
+
+    /** Cluster snapshot operation requested by user. */
+    private GridFutureAdapter<Void> clusterSnpFut;
+
+    /** Current snapshot operation on local node. */
+    private volatile SnapshotOperationRequest clusterSnpRq;
+
+    /** {@code true} if recovery process occurred for snapshot. */
+    private volatile boolean recovered;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public IgniteSnapshotManager(GridKernalContext ctx) {
+        locBuff = ThreadLocal.withInitial(() ->
+            ByteBuffer.allocateDirect(ctx.config().getDataStorageConfiguration().getPageSize())
+                .order(ByteOrder.nativeOrder()));
+
+        startSnpProc = new DistributedProcess<>(ctx, START_SNAPSHOT, this::startLocalSnapshot,
+            this::startLocalSnapshotResult);
+
+        endSnpProc = new DistributedProcess<>(ctx, END_SNAPSHOT, this::endLocalSnapshot,
+            this::endLocalSnapshotResult);
+    }
+
+    /**
+     * @param snapshotCacheDir Snapshot directory to store files.
+     * @param partId Cache partition identifier.
+     * @return A file representation.
+     */
+    public static File partDeltaFile(File snapshotCacheDir, int partId) {
+        return new File(snapshotCacheDir, partDeltaFileName(partId));
+    }
+
+    /**
+     * @param partId Partition id.
+     * @return File name of delta partition pages.
+     */
+    public static String partDeltaFileName(int partId) {
+        assert partId <= MAX_PARTITION_ID || partId == INDEX_PARTITION;
+
+        return partId == INDEX_PARTITION ? INDEX_DELTA_NAME : String.format(PART_DELTA_TEMPLATE, partId);
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void start0() throws IgniteCheckedException {
+        super.start0();
+
+        GridKernalContext ctx = cctx.kernalContext();
+
+        if (ctx.clientNode())
+            return;
+
+        if (!CU.isPersistenceEnabled(ctx.config()))
+            return;
+
+        snpRunner = new IgniteThreadPoolExecutor(SNAPSHOT_RUNNER_THREAD_PREFIX,
+            cctx.igniteInstanceName(),
+            SNAPSHOT_THREAD_POOL_SIZE,
+            SNAPSHOT_THREAD_POOL_SIZE,
+            IgniteConfiguration.DFLT_THREAD_KEEP_ALIVE_TIME,
+            new LinkedBlockingQueue<>(),
+            SYSTEM_POOL,
+            new OomExceptionHandler(ctx));
+
+        assert cctx.pageStore() instanceof FilePageStoreManager;
+
+        FilePageStoreManager storeMgr = (FilePageStoreManager)cctx.pageStore();
+
+        pdsSettings = cctx.kernalContext().pdsFolderResolver().resolveFolders();
+
+        locSnpDir = snapshotPath(ctx.config()).toFile();
+        tmpWorkDir = Paths.get(storeMgr.workDir().getAbsolutePath(), DFLT_SNAPSHOT_TMP_DIR).toFile();
+
+        U.ensureDirectory(locSnpDir, "snapshot work directory", log);
+        U.ensureDirectory(tmpWorkDir, "temp directory for snapshot creation", log);
+
+        storeFactory = storeMgr::getPageStoreFactory;
+
+        cctx.exchange().registerExchangeAwareComponent(this);
+        ctx.internalSubscriptionProcessor().registerMetastorageListener(this);
+
+        // Receive remote snapshots requests.
+        cctx.gridIO().addMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC, new GridMessageListener() {
+            @Override public void onMessage(UUID nodeId, Object msg, byte plc) {
+                if (!busyLock.enterBusy())
+                    return;
+
+                try {
+                    if (msg instanceof SnapshotRequestMessage) {
+                        SnapshotRequestMessage reqMsg0 = (SnapshotRequestMessage)msg;
+                        String snpName = reqMsg0.snapshotName();
+
+                        synchronized (this) {
+                            SnapshotFutureTask task = lastScheduledRemoteSnapshotTask(nodeId);
+
+                            if (task != null) {
+                                // Task will also be removed from local map due to the listener on future done.
+                                task.cancel();
+
+                                log.info("Snapshot request has been cancelled due to another request received " +
+                                    "[prevSnpResp=" + task + ", msg0=" + reqMsg0 + ']');
+                            }
+                        }
+
+                        SnapshotFutureTask task = registerSnapshotTask(snpName,
+                            nodeId,
+                            reqMsg0.parts(),
+                            remoteSnapshotSender(snpName, nodeId));
+
+                        task.listen(f -> {
+                            if (f.error() == null)
+                                return;
+
+                            U.error(log, "Failed to process request of creating a snapshot " +
+                                "[from=" + nodeId + ", msg=" + reqMsg0 + ']', f.error());
+
+                            try {
+                                cctx.gridIO().sendToCustomTopic(nodeId,
+                                    DFLT_INITIAL_SNAPSHOT_TOPIC,
+                                    new SnapshotResponseMessage(reqMsg0.snapshotName(), f.error().getMessage()),
+                                    SYSTEM_POOL);
+                            }
+                            catch (IgniteCheckedException ex0) {
+                                U.error(log, "Fail to send the response message with processing snapshot request " +
+                                    "error [request=" + reqMsg0 + ", nodeId=" + nodeId + ']', ex0);
+                            }
+                        });
+
+                        task.start();
+                    }
+                    else if (msg instanceof SnapshotResponseMessage) {
+                        SnapshotResponseMessage respMsg0 = (SnapshotResponseMessage)msg;
+
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.snpName.equals(respMsg0.snapshotName())) {
+                            if (log.isInfoEnabled()) {
+                                log.info("A stale snapshot response message has been received. Will be ignored " +
+                                    "[fromNodeId=" + nodeId + ", response=" + respMsg0 + ']');
+                            }
+
+                            return;
+                        }
+
+                        if (respMsg0.errorMessage() != null) {
+                            fut0.onDone(new IgniteCheckedException("Request cancelled. The snapshot operation stopped " +
+                                "on the remote node with an error: " + respMsg0.errorMessage()));
+                        }
+                    }
+                }
+                catch (Throwable e) {
+                    U.error(log, "Processing snapshot request from remote node fails with an error", e);
+
+                    cctx.kernalContext().failure().process(new FailureContext(FailureType.CRITICAL_ERROR, e));
+                }
+                finally {
+                    busyLock.leaveBusy();
+                }
+            }
+        });
+
+        cctx.gridEvents().addDiscoveryEventListener(discoLsnr = (evt, discoCache) -> {
+            if (!busyLock.enterBusy())
+                return;
+
+            try {
+                UUID leftNodeId = evt.eventNode().id();
+
+                if (evt.type() == EVT_DISCOVERY_CUSTOM_EVT) {
+                    DiscoveryCustomEvent evt0 = (DiscoveryCustomEvent)evt;
+
+                    if (evt0.customMessage() instanceof InitMessage) {
+                        InitMessage<?> msg = (InitMessage<?>)evt0.customMessage();
+
+                        // This happens when #takeSnapshot() method already invoked and distributed process
+                        // starts its action.
+                        if (msg.type() == START_SNAPSHOT.ordinal()) {
+                            assert clusterSnpRq != null ||
+                                !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()) : evt;
+
+                            DiscoveryCustomEvent customEvt = new DiscoveryCustomEvent();
+
+                            customEvt.node(evt0.node());
+                            customEvt.eventNode(evt0.eventNode());
+                            customEvt.affinityTopologyVersion(evt0.affinityTopologyVersion());
+                            customEvt.customMessage(new SnapshotStartDiscoveryMessage(discoCache, msg.processId()));
+
+                            // Handle new event inside discovery thread, so no guarantees will be violated.
+                            cctx.exchange().onDiscoveryEvent(customEvt, discoCache);
+                        }
+                    }
+                }
+                else if (evt.type() == EVT_NODE_LEFT || evt.type() == EVT_NODE_FAILED) {
+                    SnapshotOperationRequest snpRq = clusterSnpRq;
+
+                    for (SnapshotFutureTask sctx : locSnpTasks.values()) {
+                        if (sctx.sourceNodeId().equals(leftNodeId) ||
+                            (snpRq != null &&
+                                snpRq.snpName.equals(sctx.snapshotName()) &&
+                                snpRq.bltNodes.contains(leftNodeId))) {
+                            sctx.acceptException(new ClusterTopologyCheckedException("The node which requested snapshot " +
+                                "creation has left the grid"));
+                        }
+                    }
+
+                    RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                    if (snpTrFut != null && snpTrFut.rmtNodeId.equals(leftNodeId)) {
+                        snpTrFut.onDone(new ClusterTopologyCheckedException("The node from which a snapshot has been " +
+                            "requested left the grid"));
+                    }
+                }
+            }
+            finally {
+                busyLock.leaveBusy();
+            }
+        }, EVT_NODE_LEFT, EVT_NODE_FAILED, EVT_DISCOVERY_CUSTOM_EVT);
+
+        // Remote snapshot handler.
+        cctx.kernalContext().io().addTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC, new TransmissionHandler() {
+            @Override public void onEnd(UUID nodeId) {
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                assert snpTrFut.stores.isEmpty() : snpTrFut.stores.entrySet();
+                assert snpTrFut.partsLeft == 0 : snpTrFut;
+
+                snpTrFut.onDone();
+
+                log.info("Requested snapshot from remote node has been fully received " +
+                    "[snpName=" + snpTrFut.snpName + ", snpTrans=" + snpTrFut + ']');
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onException(UUID nodeId, Throwable err) {
+                RemoteSnapshotFuture fut = rmtSnpReq.get();
+
+                if (fut == null)
+                    return;
+
+                if (fut.rmtNodeId.equals(nodeId))
+                    fut.onDone(err);
+            }
+
+            /** {@inheritDoc} */
+            @Override public String filePath(UUID nodeId, TransmissionMeta fileMeta) {
+                Integer partId = (Integer)fileMeta.params().get(SNP_PART_ID_PARAM);
+                String rmtDbNodePath = (String)fileMeta.params().get(SNP_DB_NODE_PATH_PARAM);
+                String cacheDirName = (String)fileMeta.params().get(SNP_CACHE_DIR_NAME_PARAM);
+
+                RemoteSnapshotFuture transFut = resolve(nodeId, fileMeta);
+
+                try {
+                    File cacheDir = U.resolveWorkDirectory(tmpWorkDir.getAbsolutePath(),
+                        Paths.get(transFut.snpName, rmtDbNodePath, cacheDirName).toString(),
+                        false);
+
+                    return new File(cacheDir, getPartitionFileName(partId)).getAbsolutePath();
+                }
+                catch (IgniteCheckedException e) {
+                    throw new IgniteException(e);
+                }
+            }
+
+            /**
+             * @param nodeId Remote node id.
+             * @param meta Transmission meta.
+             * @return Resolved transmission future.
+             */
+            private RemoteSnapshotFuture resolve(UUID nodeId, TransmissionMeta meta) {
+                String snpName = (String)meta.params().get(SNP_NAME_PARAM);
+                Integer partsCnt = (Integer)meta.params().get(SNP_PARTITIONS_CNT);
+
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                if (snpTrFut == null || !snpTrFut.snpName.equals(snpName)) {
+                    throw new TransmissionCancelledException("Stale snapshot transmission will be ignored " +
+                        "[snpName=" + snpName + ", meta=" + meta + ", snpTrFut=" + snpTrFut + ']');
+                }
+
+                assert snpTrFut.snpName.equals(snpName) && snpTrFut.rmtNodeId.equals(nodeId) :
+                    "Another transmission in progress [snpTrFut=" + snpTrFut + ", nodeId=" + snpName + ']';
+
+                if (snpTrFut.partsLeft == -1)
+                    snpTrFut.partsLeft = partsCnt;
+
+                return snpTrFut;
+            }
+
+            /**
+             * @param snpTrans Current snapshot transmission.
+             * @param grpPartId Pair of group id and its partition id.
+             */
+            private void finishRecover(RemoteSnapshotFuture snpTrans, GroupPartitionId grpPartId) {
+                FilePageStore pageStore = null;
+
+                try {
+                    pageStore = snpTrans.stores.remove(grpPartId);
+
+                    pageStore.finishRecover();
+
+                    snpTrans.partConsumer.accept(new File(pageStore.getFileAbsolutePath()), grpPartId);
+
+                    snpTrans.partsLeft--;
+                }
+                catch (StorageException e) {
+                    throw new IgniteException(e);
+                }
+                finally {
+                    U.closeQuiet(pageStore);
+                }
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<ByteBuffer> chunkHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+
+                RemoteSnapshotFuture snpTrFut = resolve(nodeId, initMeta);
+
+                GroupPartitionId grpPartId = new GroupPartitionId(grpId, partId);
+                FilePageStore pageStore = snpTrFut.stores.get(grpPartId);
+
+                if (pageStore == null) {
+                    throw new IgniteException("Partition must be loaded before applying snapshot delta pages " +
+                        "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                pageStore.beginRecover();
+
+                // No snapshot delta pages received. Finalize recovery.
+                if (initMeta.count() == 0)
+                    finishRecover(snpTrFut, grpPartId);
+
+                return new Consumer<ByteBuffer>() {
+                    final LongAdder transferred = new LongAdder();
+
+                    @Override public void accept(ByteBuffer buff) {
+                        try {
+                            assert initMeta.count() != 0 : initMeta;
+
+                            RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                            if (fut0 == null || !fut0.equals(snpTrFut) || fut0.isCancelled()) {
+                                throw new TransmissionCancelledException("Snapshot request is cancelled " +
+                                    "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                            }
+
+                            pageStore.write(PageIO.getPageId(buff), buff, 0, false);
+
+                            transferred.add(buff.capacity());
+
+                            if (transferred.longValue() == initMeta.count())
+                                finishRecover(snpTrFut, grpPartId);
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                    }
+                };
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<File> fileHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+                String snpName = (String)initMeta.params().get(SNP_NAME_PARAM);
+
+                assert grpId != null;
+                assert partId != null;
+                assert snpName != null;
+                assert storeFactory != null;
+
+                RemoteSnapshotFuture transFut = rmtSnpReq.get();
+
+                if (transFut == null) {
+                    throw new IgniteException("Snapshot transmission with given name doesn't exists " +
+                        "[snpName=" + snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                return new Consumer<File>() {
+                    @Override public void accept(File file) {
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.equals(transFut) || fut0.isCancelled()) {
+                            throw new TransmissionCancelledException("Snapshot request is cancelled [snpName=" + snpName +
+                                ", grpId=" + grpId + ", partId=" + partId + ']');
+                        }
+
+                        busyLock.enterBusy();
+
+                        try {
+                            FilePageStore pageStore = (FilePageStore)storeFactory
+                                .apply(grpId, false)
+                                .createPageStore(getFlagByPartId(partId),
+                                    file::toPath,
+                                    new LongAdderMetric("NO_OP", null));
+
+                            transFut.stores.put(new GroupPartitionId(grpId, partId), pageStore);
+
+                            pageStore.init();
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                        finally {
+                            busyLock.leaveBusy();
+                        }
+                    }
+                };
+            }
+        });
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void stop0(boolean cancel) {
+        busyLock.block();
+
+        try {
+            // Try stop all snapshot processing if not yet.
+            for (SnapshotFutureTask sctx : locSnpTasks.values())
+                sctx.acceptException(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+            locSnpTasks.clear();
+
+            RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+            if (snpTrFut != null)
+                snpTrFut.cancel();
+
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null) {
+                    clusterSnpFut.onDone(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+                    clusterSnpFut = null;
+                }
+            }
+
+            if (snpRunner != null)
+                snpRunner.shutdownNow();
+
+            cctx.kernalContext().io().removeMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC);
+            cctx.kernalContext().io().removeTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC);
+
+            if (discoLsnr != null)
+                cctx.kernalContext().event().removeDiscoveryEventListener(discoLsnr);
+
+            cctx.exchange().unregisterExchangeAwareComponent(this);
+        }
+        finally {
+            busyLock.unblock();
+        }
+    }
+
+    /**
+     * Concurrently traverse the snapshot directory for given local node folder name and
+     * delete recursively all files from it if exist.
+     *
+     * @param snpDir Snapshot dire
+     * @param folderName Local node folder name (see U.maskForFileName with consistent id).
+     */
+    public static void deleteSnapshot(File snpDir, String folderName) {
+        if (!snpDir.exists())
+            return;
+
+        assert snpDir.isDirectory() : snpDir;
+
+        try {
+            List<Path> dirs = new ArrayList<>();
+
+            Files.walkFileTree(snpDir.toPath(), new SimpleFileVisitor<Path>() {
+                @Override public FileVisitResult preVisitDirectory(Path dir,
+                    BasicFileAttributes attrs) throws IOException {
+                    if (Files.isDirectory(dir) &&
+                        Files.exists(dir) &&
+                        folderName.equals(dir.getFileName().toString())) {
+                        // Directory found, add it for processing.
+                        dirs.add(dir);
+                    }
+
+                    return super.preVisitDirectory(dir, attrs);
+                }
+
+                @Override public FileVisitResult visitFileFailed(Path file, IOException exc) {
+                    // Skip files which can be concurrently removed from FileTree.
+                    return FileVisitResult.CONTINUE;
+                }
+            });
+
+            dirs.forEach(U::delete);
+
+            File db = new File(snpDir, DB_DEFAULT_FOLDER);
+
+            if (!db.exists() || db.list().length == 0)
+                U.delete(snpDir);
+        }
+        catch (IOException e) {
+            throw new IgniteException(e);
+        }
+    }
+
+    /**
+     * @param snpName Snapshot name.
+     * @return Local snapshot directory for snapshot with given name.
+     */
+    public File snapshotLocalDir(String snpName) {
+        assert locSnpDir != null;
+
+        return new File(locSnpDir, snpName);
+    }
+
+    /**
+     * @return Node snapshot working directory.
+     */
+    public File snapshotTmpDir() {
+        assert tmpWorkDir != null;
+
+        return tmpWorkDir;
+    }
+
+    /**
+     * @param req Request on snapshot creation.
+     * @return Future which will be completed when a snapshot has been started.
+     */
+    private IgniteInternalFuture<SnapshotOperationResponse> startLocalSnapshot(SnapshotOperationRequest req) {
+        if (cctx.kernalContext().clientNode() ||
+            !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()))
+            return new GridFinishedFuture<>();
+
+        // Executed inside discovery notifier thread, prior to firing discovery custom event,
+        // so it is safe to set new snapshot task inside this method without synchronization.
+        if (clusterSnpRq != null) {
+            return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot operation has been rejected. " +
+                "Another snapshot operation in progress [req=" + req + ", curr=" + clusterSnpRq + ']'));
+        }
+
+        // Collection of pairs group and appropriate cache partition to be snapshot.
+        Map<Integer, Set<Integer>> parts = new HashMap<>();
+
+        for (Integer grpId : req.grpIds)
+            parts.put(grpId, null);
+
+        SnapshotFutureTask task0 = registerSnapshotTask(req.snpName,
+            req.srcNodeId,
+            parts,
+            locSndrFactory.apply(req.snpName));
+
+        clusterSnpRq = req;
+
+        return task0.chain(f -> new SnapshotOperationResponse());
+    }
+
+    /**
+     * @param id Request id.
+     * @param res Results.
+     * @param err Errors.
+     */
+    private void startLocalSnapshotResult(UUID id, Map<UUID, SnapshotOperationResponse> res, Map<UUID, Exception> err) {
 
 Review comment:
   Fixed

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r407983773
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManager.java
 ##########
 @@ -0,0 +1,1894 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.channels.FileChannel;
+import java.nio.file.FileVisitResult;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.SimpleFileVisitor;
+import java.nio.file.attribute.BasicFileAttributes;
+import java.util.ArrayDeque;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Queue;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.Executor;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.concurrent.atomic.LongAdder;
+import java.util.function.BiConsumer;
+import java.util.function.BiFunction;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.events.DiscoveryEvent;
+import org.apache.ignite.failure.FailureContext;
+import org.apache.ignite.failure.FailureType;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.GridTopic;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.events.DiscoveryCustomEvent;
+import org.apache.ignite.internal.managers.communication.GridIoManager;
+import org.apache.ignite.internal.managers.communication.GridMessageListener;
+import org.apache.ignite.internal.managers.communication.TransmissionCancelledException;
+import org.apache.ignite.internal.managers.communication.TransmissionHandler;
+import org.apache.ignite.internal.managers.communication.TransmissionMeta;
+import org.apache.ignite.internal.managers.communication.TransmissionPolicy;
+import org.apache.ignite.internal.managers.discovery.DiscoCache;
+import org.apache.ignite.internal.managers.discovery.DiscoveryCustomMessage;
+import org.apache.ignite.internal.managers.discovery.GridDiscoveryManager;
+import org.apache.ignite.internal.managers.eventstorage.DiscoveryEventListener;
+import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.CacheType;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.PartitionsExchangeAware;
+import org.apache.ignite.internal.processors.cache.persistence.StorageException;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.filename.PdsFolderSettings;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.MetastorageLifecycleListener;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadOnlyMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadWriteMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.tree.io.PageIO;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.processors.marshaller.MappedName;
+import org.apache.ignite.internal.processors.metric.impl.LongAdderMetric;
+import org.apache.ignite.internal.util.GridBusyLock;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.distributed.InitMessage;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.tostring.GridToStringInclude;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.S;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.lang.IgniteUuid;
+import org.apache.ignite.thread.IgniteThreadPoolExecutor;
+import org.apache.ignite.thread.OomExceptionHandler;
+import org.jetbrains.annotations.Nullable;
+
+import static java.nio.file.StandardOpenOption.READ;
+import static org.apache.ignite.cluster.ClusterState.active;
+import static org.apache.ignite.configuration.IgniteConfiguration.DFLT_SNAPSHOT_DIRECTORY;
+import static org.apache.ignite.events.EventType.EVT_NODE_FAILED;
+import static org.apache.ignite.events.EventType.EVT_NODE_LEFT;
+import static org.apache.ignite.internal.IgniteFeatures.PERSISTENCE_CACHE_SNAPSHOT;
+import static org.apache.ignite.internal.IgniteFeatures.nodeSupports;
+import static org.apache.ignite.internal.MarshallerContextImpl.saveMappings;
+import static org.apache.ignite.internal.events.DiscoveryCustomEvent.EVT_DISCOVERY_CUSTOM_EVT;
+import static org.apache.ignite.internal.managers.communication.GridIoPolicy.SYSTEM_POOL;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.INDEX_PARTITION;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.MAX_PARTITION_ID;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.INDEX_FILE_NAME;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.PART_FILE_TEMPLATE;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFile;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFileName;
+import static org.apache.ignite.internal.processors.cache.persistence.filename.PdsConsistentIdProcessor.DB_DEFAULT_FOLDER;
+import static org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId.getFlagByPartId;
+import static org.apache.ignite.internal.util.IgniteUtils.isLocalNodeCoordinator;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.END_SNAPSHOT;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.START_SNAPSHOT;
+
+/**
+ * Internal implementation of snapshot operations over persistence caches.
+ * <p>
+ * There are two major actions available:
+ * <ul>
+ *     <li>Create snapshot of the whole cluster cache groups by triggering PME to achieve consistency.</li>
+ *     <li>Create local snapshot of requested cache groups and send it to the node which request this operation.
+ *     Cache groups will be transmitted using internal API for transferring files. See {@link TransmissionHandler}.</li>
+ * </ul>
+ */
+public class IgniteSnapshotManager extends GridCacheSharedManagerAdapter
+    implements IgniteSnapshot, PartitionsExchangeAware, MetastorageLifecycleListener {
+    /** File with delta pages suffix. */
+    public static final String DELTA_SUFFIX = ".delta";
+
+    /** File name template consists of delta pages. */
+    public static final String PART_DELTA_TEMPLATE = PART_FILE_TEMPLATE + DELTA_SUFFIX;
+
+    /** File name template for index delta pages. */
+    public static final String INDEX_DELTA_NAME = INDEX_FILE_NAME + DELTA_SUFFIX;
+
+    /** Text Reason for checkpoint to start snapshot operation. */
+    public static final String CP_SNAPSHOT_REASON = "Checkpoint started to enforce snapshot operation: %s";
+
+    /** Name prefix for each remote snapshot operation. */
+    public static final String RMT_SNAPSHOT_PREFIX = "snapshot_";
+
+    /** Default snapshot directory for loading remote snapshots. */
+    public static final String DFLT_SNAPSHOT_TMP_DIR = "snp";
+
+    /** Timeout in millisecond for snapshot operations. */
+    public static final long DFLT_SNAPSHOT_TIMEOUT = 15_000L;
+
+    /** Snapshot in progress error message. */
+    public static final String SNP_IN_PROGRESS_ERR_MSG = "Operation rejected due to the snapshot operation in progress.";
+
+    /** Error message to finalize snapshot tasks. */
+    public static final String SNP_NODE_STOPPING_ERR_MSG = "Snapshot has been cancelled due to the local node " +
+        "is stopping";
+
+    /** Metastorage key to save currently running snapshot. */
+    public static final String SNP_RUNNING_KEY = "snapshot-running";
+
+    /** Prefix for snapshot threads. */
+    private static final String SNAPSHOT_RUNNER_THREAD_PREFIX = "snapshot-runner";
+
+    /** Total number of thread to perform local snapshot. */
+    private static final int SNAPSHOT_THREAD_POOL_SIZE = 4;
+
+    /** Default snapshot topic to receive snapshots from remote node. */
+    private static final Object DFLT_INITIAL_SNAPSHOT_TOPIC = GridTopic.TOPIC_SNAPSHOT.topic("rmt_snp");
+
+    /** File transmission parameter of cache group id. */
+    private static final String SNP_GRP_ID_PARAM = "grpId";
+
+    /** File transmission parameter of cache partition id. */
+    private static final String SNP_PART_ID_PARAM = "partId";
+
+    /** File transmission parameter of node-sender directory path with its consistentId (e.g. db/IgniteNode0). */
+    private static final String SNP_DB_NODE_PATH_PARAM = "dbNodePath";
+
+    /** File transmission parameter of a cache directory with is currently sends its partitions. */
+    private static final String SNP_CACHE_DIR_NAME_PARAM = "cacheDirName";
+
+    /** Snapshot parameter name for a file transmission. */
+    private static final String SNP_NAME_PARAM = "snpName";
+
+    /** Total snapshot files count which receiver should expect to receive. */
+    private static final String SNP_PARTITIONS_CNT = "partsCnt";
+
+    /**
+     * Local buffer to perform copy-on-write operations with pages for {@code SnapshotFutureTask.PageStoreSerialWriter}s.
+     * It is important to have only only buffer per thread (instead of creating each buffer per
+     * each {@code SnapshotFutureTask.PageStoreSerialWriter}) this is redundant and can lead to OOM errors. Direct buffer
+     * deallocate only when ByteBuffer is garbage collected, but it can get out of off-heap memory before it.
+     */
+    private final ThreadLocal<ByteBuffer> locBuff;
+
+    /** Map of registered cache snapshot processes and their corresponding contexts. */
+    private final ConcurrentMap<String, SnapshotFutureTask> locSnpTasks = new ConcurrentHashMap<>();
+
+    /** Lock to protect the resources is used. */
+    private final GridBusyLock busyLock = new GridBusyLock();
+
+    /** Requested snapshot from remote node. */
+    private final AtomicReference<RemoteSnapshotFuture> rmtSnpReq = new AtomicReference<>();
+
+    /** Mutex used to order cluster snapshot operation progress. */
+    private final Object snpOpMux = new Object();
+
+    /** Take snapshot operation procedure. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> startSnpProc;
+
+    /** Check previously preformed snapshot operation and delete uncompleted files if need. */
 
 Review comment:
   preformed -> performed

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r409022139
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManager.java
 ##########
 @@ -0,0 +1,1986 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.channels.FileChannel;
+import java.nio.file.FileVisitResult;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.SimpleFileVisitor;
+import java.nio.file.attribute.BasicFileAttributes;
+import java.util.ArrayDeque;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Queue;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.Executor;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.concurrent.atomic.LongAdder;
+import java.util.function.BiConsumer;
+import java.util.function.BiFunction;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.events.DiscoveryEvent;
+import org.apache.ignite.failure.FailureContext;
+import org.apache.ignite.failure.FailureType;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.GridTopic;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.events.DiscoveryCustomEvent;
+import org.apache.ignite.internal.managers.communication.GridIoManager;
+import org.apache.ignite.internal.managers.communication.GridMessageListener;
+import org.apache.ignite.internal.managers.communication.TransmissionCancelledException;
+import org.apache.ignite.internal.managers.communication.TransmissionHandler;
+import org.apache.ignite.internal.managers.communication.TransmissionMeta;
+import org.apache.ignite.internal.managers.communication.TransmissionPolicy;
+import org.apache.ignite.internal.managers.discovery.DiscoCache;
+import org.apache.ignite.internal.managers.discovery.DiscoveryCustomMessage;
+import org.apache.ignite.internal.managers.discovery.GridDiscoveryManager;
+import org.apache.ignite.internal.managers.eventstorage.DiscoveryEventListener;
+import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.CacheType;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.PartitionsExchangeAware;
+import org.apache.ignite.internal.processors.cache.persistence.StorageException;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.filename.PdsFolderSettings;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.MetastorageLifecycleListener;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadOnlyMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadWriteMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.tree.io.PageIO;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.processors.marshaller.MappedName;
+import org.apache.ignite.internal.processors.metric.MetricRegistry;
+import org.apache.ignite.internal.processors.metric.impl.LongAdderMetric;
+import org.apache.ignite.internal.util.GridBusyLock;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.distributed.InitMessage;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.tostring.GridToStringInclude;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.A;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.S;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.lang.IgniteUuid;
+import org.apache.ignite.thread.IgniteThreadPoolExecutor;
+import org.apache.ignite.thread.OomExceptionHandler;
+import org.jetbrains.annotations.Nullable;
+
+import static java.nio.file.StandardOpenOption.READ;
+import static org.apache.ignite.cluster.ClusterState.active;
+import static org.apache.ignite.configuration.IgniteConfiguration.DFLT_SNAPSHOT_DIRECTORY;
+import static org.apache.ignite.events.EventType.EVT_NODE_FAILED;
+import static org.apache.ignite.events.EventType.EVT_NODE_LEFT;
+import static org.apache.ignite.internal.IgniteFeatures.PERSISTENCE_CACHE_SNAPSHOT;
+import static org.apache.ignite.internal.IgniteFeatures.nodeSupports;
+import static org.apache.ignite.internal.MarshallerContextImpl.mappingFileStoreWorkDir;
+import static org.apache.ignite.internal.MarshallerContextImpl.saveMappings;
+import static org.apache.ignite.internal.events.DiscoveryCustomEvent.EVT_DISCOVERY_CUSTOM_EVT;
+import static org.apache.ignite.internal.managers.communication.GridIoPolicy.SYSTEM_POOL;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.INDEX_PARTITION;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.MAX_PARTITION_ID;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.resolveBinaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.INDEX_FILE_NAME;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.PART_FILE_TEMPLATE;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFile;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFileName;
+import static org.apache.ignite.internal.processors.cache.persistence.filename.PdsConsistentIdProcessor.DB_DEFAULT_FOLDER;
+import static org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId.getFlagByPartId;
+import static org.apache.ignite.internal.util.IgniteUtils.isLocalNodeCoordinator;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.END_SNAPSHOT;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.START_SNAPSHOT;
+
+/**
+ * Internal implementation of snapshot operations over persistence caches.
+ * <p>
+ * There are two major actions available:
+ * <ul>
+ *     <li>Create snapshot of the whole cluster cache groups by triggering PME to achieve consistency.</li>
+ *     <li>Create local snapshot of requested cache groups and send it to the node which request this operation.
+ *     Cache groups will be transmitted using internal API for transferring files. See {@link TransmissionHandler}.</li>
+ * </ul>
+ */
+public class IgniteSnapshotManager extends GridCacheSharedManagerAdapter
+    implements IgniteSnapshot, PartitionsExchangeAware, MetastorageLifecycleListener {
+    /** File with delta pages suffix. */
+    public static final String DELTA_SUFFIX = ".delta";
+
+    /** File name template consists of delta pages. */
+    public static final String PART_DELTA_TEMPLATE = PART_FILE_TEMPLATE + DELTA_SUFFIX;
+
+    /** File name template for index delta pages. */
+    public static final String INDEX_DELTA_NAME = INDEX_FILE_NAME + DELTA_SUFFIX;
+
+    /** Text Reason for checkpoint to start snapshot operation. */
+    public static final String CP_SNAPSHOT_REASON = "Checkpoint started to enforce snapshot operation: %s";
+
+    /** Name prefix for each remote snapshot operation. */
+    public static final String RMT_SNAPSHOT_PREFIX = "snapshot_";
+
+    /** Default snapshot directory for loading remote snapshots. */
+    public static final String DFLT_SNAPSHOT_TMP_DIR = "snp";
+
+    /** Timeout in millisecond for snapshot operations. */
+    public static final long DFLT_SNAPSHOT_TIMEOUT = 15_000L;
+
+    /** Snapshot in progress error message. */
+    public static final String SNP_IN_PROGRESS_ERR_MSG = "Operation rejected due to the snapshot operation in progress.";
+
+    /** Error message to finalize snapshot tasks. */
+    public static final String SNP_NODE_STOPPING_ERR_MSG = "Snapshot has been cancelled due to the local node " +
+        "is stopping";
+
+    /** Metastorage key to save currently running snapshot. */
+    public static final String SNP_RUNNING_KEY = "snapshot-running";
+
+    /** Snapshot metrics prefix. */
+    public static final String SNAPSHOT_METRICS = "snapshot";
+
+    /** Prefix for snapshot threads. */
+    private static final String SNAPSHOT_RUNNER_THREAD_PREFIX = "snapshot-runner";
+
+    /** Total number of thread to perform local snapshot. */
+    private static final int SNAPSHOT_THREAD_POOL_SIZE = 4;
+
+    /** Default snapshot topic to receive snapshots from remote node. */
+    private static final Object DFLT_INITIAL_SNAPSHOT_TOPIC = GridTopic.TOPIC_SNAPSHOT.topic("rmt_snp");
+
+    /** File transmission parameter of cache group id. */
+    private static final String SNP_GRP_ID_PARAM = "grpId";
+
+    /** File transmission parameter of cache partition id. */
+    private static final String SNP_PART_ID_PARAM = "partId";
+
+    /** File transmission parameter of node-sender directory path with its consistentId (e.g. db/IgniteNode0). */
+    private static final String SNP_DB_NODE_PATH_PARAM = "dbNodePath";
+
+    /** File transmission parameter of a cache directory with is currently sends its partitions. */
+    private static final String SNP_CACHE_DIR_NAME_PARAM = "cacheDirName";
+
+    /** Snapshot parameter name for a file transmission. */
+    private static final String SNP_NAME_PARAM = "snpName";
+
+    /** Total snapshot files count which receiver should expect to receive. */
+    private static final String SNP_PARTITIONS_CNT = "partsCnt";
+
+    /**
+     * Local buffer to perform copy-on-write operations with pages for {@code SnapshotFutureTask.PageStoreSerialWriter}s.
+     * It is important to have only only buffer per thread (instead of creating each buffer per
+     * each {@code SnapshotFutureTask.PageStoreSerialWriter}) this is redundant and can lead to OOM errors. Direct buffer
+     * deallocate only when ByteBuffer is garbage collected, but it can get out of off-heap memory before it.
+     */
+    private final ThreadLocal<ByteBuffer> locBuff;
+
+    /** Map of registered cache snapshot processes and their corresponding contexts. */
+    private final ConcurrentMap<String, SnapshotFutureTask> locSnpTasks = new ConcurrentHashMap<>();
+
+    /** Lock to protect the resources is used. */
+    private final GridBusyLock busyLock = new GridBusyLock();
+
+    /** Requested snapshot from remote node. */
+    private final AtomicReference<RemoteSnapshotFuture> rmtSnpReq = new AtomicReference<>();
+
+    /** Mutex used to order cluster snapshot operation progress. */
+    private final Object snpOpMux = new Object();
+
+    /** Take snapshot operation procedure. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> startSnpProc;
+
+    /** Check previously performed snapshot operation and delete uncompleted files if need. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> endSnpProc;
+
+    /** Resolved persistent data storage settings. */
+    private volatile PdsFolderSettings pdsSettings;
+
+    /** Fully initialized metastorage. */
+    private volatile ReadWriteMetastorage metaStorage;
+
+    /** Local snapshot sender factory. */
+    private Function<String, SnapshotSender> locSndrFactory = localSnapshotSenderFactory();
+
+    /** Main snapshot directory to save created snapshots. */
+    private volatile File locSnpDir;
+
+    /**
+     * Working directory for loaded snapshots from the remote nodes and storing
+     * temporary partition delta-files of locally started snapshot process.
+     */
+    private File tmpWorkDir;
+
+    /** Factory to working with delta as file storage. */
+    private volatile FileIOFactory ioFactory = new RandomAccessFileIOFactory();
+
+    /** Factory to create page store for restore. */
+    private volatile BiFunction<Integer, Boolean, FilePageStoreFactory> storeFactory;
+
+    /** Snapshot thread pool to perform local partition snapshots. */
+    private ExecutorService snpRunner;
+
+    /** System discovery message listener. */
+    private DiscoveryEventListener discoLsnr;
+
+    /** Cluster snapshot operation requested by user. */
+    private ClusterSnapshotFuture clusterSnpFut;
+
+    /** Current snapshot operation on local node. */
+    private volatile SnapshotOperationRequest clusterSnpRq;
+
+    /** {@code true} if recovery process occurred for snapshot. */
+    private volatile boolean recovered;
+
+    /** Last seen cluster snapshot operation. */
+    private volatile ClusterSnapshotFuture lastSeenSnpFut = new ClusterSnapshotFuture();
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public IgniteSnapshotManager(GridKernalContext ctx) {
+        locBuff = ThreadLocal.withInitial(() ->
+            ByteBuffer.allocateDirect(ctx.config().getDataStorageConfiguration().getPageSize())
+                .order(ByteOrder.nativeOrder()));
+
+        startSnpProc = new DistributedProcess<>(ctx, START_SNAPSHOT, this::initLocalSnapshotStartStage,
+            this::processLocalSnapshotStartStageResult);
+
+        endSnpProc = new DistributedProcess<>(ctx, END_SNAPSHOT, this::initLocalSnapshotEndStage,
+            this::processLocalSnapshotEndStageResult);
+    }
+
+    /**
+     * @param snapshotCacheDir Snapshot directory to store files.
+     * @param partId Cache partition identifier.
+     * @return A file representation.
+     */
+    public static File partDeltaFile(File snapshotCacheDir, int partId) {
+        return new File(snapshotCacheDir, partDeltaFileName(partId));
+    }
+
+    /**
+     * @param partId Partition id.
+     * @return File name of delta partition pages.
+     */
+    public static String partDeltaFileName(int partId) {
+        assert partId <= MAX_PARTITION_ID || partId == INDEX_PARTITION;
+
+        return partId == INDEX_PARTITION ? INDEX_DELTA_NAME : String.format(PART_DELTA_TEMPLATE, partId);
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void start0() throws IgniteCheckedException {
+        super.start0();
+
+        GridKernalContext ctx = cctx.kernalContext();
+
+        if (ctx.clientNode())
+            return;
+
+        if (!CU.isPersistenceEnabled(ctx.config()))
+            return;
+
+        snpRunner = new IgniteThreadPoolExecutor(SNAPSHOT_RUNNER_THREAD_PREFIX,
+            cctx.igniteInstanceName(),
+            SNAPSHOT_THREAD_POOL_SIZE,
+            SNAPSHOT_THREAD_POOL_SIZE,
+            IgniteConfiguration.DFLT_THREAD_KEEP_ALIVE_TIME,
+            new LinkedBlockingQueue<>(),
+            SYSTEM_POOL,
+            new OomExceptionHandler(ctx));
+
+        assert cctx.pageStore() instanceof FilePageStoreManager;
+
+        FilePageStoreManager storeMgr = (FilePageStoreManager)cctx.pageStore();
+
+        pdsSettings = cctx.kernalContext().pdsFolderResolver().resolveFolders();
+
+        locSnpDir = resolveSnapshotWorkDirectory(ctx.config());
+        tmpWorkDir = Paths.get(storeMgr.workDir().getAbsolutePath(), DFLT_SNAPSHOT_TMP_DIR).toFile();
+
+        U.ensureDirectory(locSnpDir, "snapshot work directory", log);
+        U.ensureDirectory(tmpWorkDir, "temp directory for snapshot creation", log);
+
+        MetricRegistry mreg = cctx.kernalContext().metric().registry(SNAPSHOT_METRICS);
+
+        mreg.register("LastSnapshotStartTime", () -> lastSeenSnpFut.startTime,
+            "The system time approximated by 10 ms when the last cluster snapshot operation has been started.");
+        mreg.register("LastSnapshotEndTime", () -> lastSeenSnpFut.endTime,
+            "The system time approximated by 10 ms when the last cluster snapshot operation has been finished.");
+        mreg.register("LastSnapshotName", () -> lastSeenSnpFut.name, String.class,
+            "The name of last started cluster snapshot operation.");
+        mreg.register("LastSnapshotErrorMessage",
+            () -> lastSeenSnpFut.error() == null ? null : lastSeenSnpFut.error().getMessage(),
+            String.class,
+            "The error message of last started cluster snapshot operation which fail. This value will be 'null' " +
+                "if last snapshot operation completed successfully.");
+        mreg.register("localSnapshotList", this::getSnapshots, List.class,
+            "The list of all known snapshots currently saved on the local node with respect to " +
+                "configured via IgniteConfiguration a snapshot path.");
+
+        storeFactory = storeMgr::getPageStoreFactory;
+
+        cctx.exchange().registerExchangeAwareComponent(this);
+        ctx.internalSubscriptionProcessor().registerMetastorageListener(this);
+
+        // Receive remote snapshots requests.
+        cctx.gridIO().addMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC, new GridMessageListener() {
+            @Override public void onMessage(UUID nodeId, Object msg, byte plc) {
+                if (!busyLock.enterBusy())
+                    return;
+
+                try {
+                    if (msg instanceof SnapshotRequestMessage) {
+                        SnapshotRequestMessage reqMsg0 = (SnapshotRequestMessage)msg;
+                        String snpName = reqMsg0.snapshotName();
+
+                        synchronized (this) {
+                            SnapshotFutureTask task = lastScheduledRemoteSnapshotTask(nodeId);
+
+                            if (task != null) {
+                                // Task will also be removed from local map due to the listener on future done.
+                                task.cancel();
+
+                                log.info("Snapshot request has been cancelled due to another request received " +
+                                    "[prevSnpResp=" + task + ", msg0=" + reqMsg0 + ']');
+                            }
+                        }
+
+                        SnapshotFutureTask task = registerSnapshotTask(snpName,
+                            nodeId,
+                            reqMsg0.parts(),
+                            remoteSnapshotSender(snpName, nodeId));
+
+                        task.listen(f -> {
+                            if (f.error() == null)
+                                return;
+
+                            U.error(log, "Failed to process request of creating a snapshot " +
+                                "[from=" + nodeId + ", msg=" + reqMsg0 + ']', f.error());
+
+                            try {
+                                cctx.gridIO().sendToCustomTopic(nodeId,
+                                    DFLT_INITIAL_SNAPSHOT_TOPIC,
+                                    new SnapshotResponseMessage(reqMsg0.snapshotName(), f.error().getMessage()),
+                                    SYSTEM_POOL);
+                            }
+                            catch (IgniteCheckedException ex0) {
+                                U.error(log, "Fail to send the response message with processing snapshot request " +
+                                    "error [request=" + reqMsg0 + ", nodeId=" + nodeId + ']', ex0);
+                            }
+                        });
+
+                        task.start();
+                    }
+                    else if (msg instanceof SnapshotResponseMessage) {
+                        SnapshotResponseMessage respMsg0 = (SnapshotResponseMessage)msg;
+
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.snpName.equals(respMsg0.snapshotName())) {
+                            if (log.isInfoEnabled()) {
+                                log.info("A stale snapshot response message has been received. Will be ignored " +
+                                    "[fromNodeId=" + nodeId + ", response=" + respMsg0 + ']');
+                            }
+
+                            return;
+                        }
+
+                        if (respMsg0.errorMessage() != null) {
+                            fut0.onDone(new IgniteCheckedException("Request cancelled. The snapshot operation stopped " +
+                                "on the remote node with an error: " + respMsg0.errorMessage()));
+                        }
+                    }
+                }
+                catch (Throwable e) {
+                    U.error(log, "Processing snapshot request from remote node fails with an error", e);
+
+                    cctx.kernalContext().failure().process(new FailureContext(FailureType.CRITICAL_ERROR, e));
+                }
+                finally {
+                    busyLock.leaveBusy();
+                }
+            }
+        });
+
+        cctx.gridEvents().addDiscoveryEventListener(discoLsnr = (evt, discoCache) -> {
+            if (!busyLock.enterBusy())
+                return;
+
+            try {
+                UUID leftNodeId = evt.eventNode().id();
+
+                if (evt.type() == EVT_DISCOVERY_CUSTOM_EVT) {
+                    DiscoveryCustomEvent evt0 = (DiscoveryCustomEvent)evt;
+
+                    if (evt0.customMessage() instanceof InitMessage) {
+                        InitMessage<?> msg = (InitMessage<?>)evt0.customMessage();
+
+                        // This happens when #takeSnapshot() method already invoked and distributed process
+                        // starts its action.
+                        if (msg.type() == START_SNAPSHOT.ordinal()) {
+                            assert clusterSnpRq != null ||
+                                !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()) : evt;
+
+                            DiscoveryCustomEvent customEvt = new DiscoveryCustomEvent();
+
+                            customEvt.node(evt0.node());
+                            customEvt.eventNode(evt0.eventNode());
+                            customEvt.affinityTopologyVersion(evt0.affinityTopologyVersion());
+                            customEvt.customMessage(new SnapshotStartDiscoveryMessage(discoCache, msg.processId()));
+
+                            // Handle new event inside discovery thread, so no guarantees will be violated.
+                            cctx.exchange().onDiscoveryEvent(customEvt, discoCache);
 
 Review comment:
   Discussed privately. A factory added to the `DistributedProcess` configuration, so it is possible from now start a process with custom inheritor of `InitMessage`, including `SnapshotDiscoveryMessage`.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r408728773
 
 

 ##########
 File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotMXBeanTest.java
 ##########
 @@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.processors.metric.MetricRegistry;
+import org.apache.ignite.mxbean.SnapshotMXBean;
+import org.apache.ignite.spi.metric.LongMetric;
+import org.apache.ignite.testframework.GridTestUtils;
+import org.junit.Test;
+
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.SNAPSHOT_METRICS;
+
+/**
+ * Tests {@link SnapshotMXBean}.
+ */
+public class IgniteSnapshotMXBeanTest extends AbstractSnapshotSelfTest {
+    /** @throws Exception If fails. */
+    @Test
+    public void testCreateSnapshot() throws Exception {
+        IgniteEx ignite = startGridsWithCache(2, dfltCacheCfg, CACHE_KEYS_RANGE);
+
+        SnapshotMXBean mxBean = getMBean(ignite.name());
+
+        mxBean.createSnapshot(SNAPSHOT_NAME);
+
+        MetricRegistry mreg = ignite.context().metric().registry(SNAPSHOT_METRICS);
+
+        LongMetric endTime = mreg.findMetric("LastSnapshotEndTime");
 
 Review comment:
   Please test this value not by metric registry, but by JMX

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r409107800
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManager.java
 ##########
 @@ -0,0 +1,1986 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.channels.FileChannel;
+import java.nio.file.FileVisitResult;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.SimpleFileVisitor;
+import java.nio.file.attribute.BasicFileAttributes;
+import java.util.ArrayDeque;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Queue;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.Executor;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.concurrent.atomic.LongAdder;
+import java.util.function.BiConsumer;
+import java.util.function.BiFunction;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.events.DiscoveryEvent;
+import org.apache.ignite.failure.FailureContext;
+import org.apache.ignite.failure.FailureType;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.GridTopic;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.events.DiscoveryCustomEvent;
+import org.apache.ignite.internal.managers.communication.GridIoManager;
+import org.apache.ignite.internal.managers.communication.GridMessageListener;
+import org.apache.ignite.internal.managers.communication.TransmissionCancelledException;
+import org.apache.ignite.internal.managers.communication.TransmissionHandler;
+import org.apache.ignite.internal.managers.communication.TransmissionMeta;
+import org.apache.ignite.internal.managers.communication.TransmissionPolicy;
+import org.apache.ignite.internal.managers.discovery.DiscoCache;
+import org.apache.ignite.internal.managers.discovery.DiscoveryCustomMessage;
+import org.apache.ignite.internal.managers.discovery.GridDiscoveryManager;
+import org.apache.ignite.internal.managers.eventstorage.DiscoveryEventListener;
+import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.CacheType;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.PartitionsExchangeAware;
+import org.apache.ignite.internal.processors.cache.persistence.StorageException;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.filename.PdsFolderSettings;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.MetastorageLifecycleListener;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadOnlyMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadWriteMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.tree.io.PageIO;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.processors.marshaller.MappedName;
+import org.apache.ignite.internal.processors.metric.MetricRegistry;
+import org.apache.ignite.internal.processors.metric.impl.LongAdderMetric;
+import org.apache.ignite.internal.util.GridBusyLock;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.distributed.InitMessage;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.tostring.GridToStringInclude;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.A;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.S;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.lang.IgniteUuid;
+import org.apache.ignite.thread.IgniteThreadPoolExecutor;
+import org.apache.ignite.thread.OomExceptionHandler;
+import org.jetbrains.annotations.Nullable;
+
+import static java.nio.file.StandardOpenOption.READ;
+import static org.apache.ignite.cluster.ClusterState.active;
+import static org.apache.ignite.configuration.IgniteConfiguration.DFLT_SNAPSHOT_DIRECTORY;
+import static org.apache.ignite.events.EventType.EVT_NODE_FAILED;
+import static org.apache.ignite.events.EventType.EVT_NODE_LEFT;
+import static org.apache.ignite.internal.IgniteFeatures.PERSISTENCE_CACHE_SNAPSHOT;
+import static org.apache.ignite.internal.IgniteFeatures.nodeSupports;
+import static org.apache.ignite.internal.MarshallerContextImpl.mappingFileStoreWorkDir;
+import static org.apache.ignite.internal.MarshallerContextImpl.saveMappings;
+import static org.apache.ignite.internal.events.DiscoveryCustomEvent.EVT_DISCOVERY_CUSTOM_EVT;
+import static org.apache.ignite.internal.managers.communication.GridIoPolicy.SYSTEM_POOL;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.INDEX_PARTITION;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.MAX_PARTITION_ID;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.resolveBinaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.INDEX_FILE_NAME;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.PART_FILE_TEMPLATE;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFile;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFileName;
+import static org.apache.ignite.internal.processors.cache.persistence.filename.PdsConsistentIdProcessor.DB_DEFAULT_FOLDER;
+import static org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId.getFlagByPartId;
+import static org.apache.ignite.internal.util.IgniteUtils.isLocalNodeCoordinator;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.END_SNAPSHOT;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.START_SNAPSHOT;
+
+/**
+ * Internal implementation of snapshot operations over persistence caches.
+ * <p>
+ * There are two major actions available:
+ * <ul>
+ *     <li>Create snapshot of the whole cluster cache groups by triggering PME to achieve consistency.</li>
+ *     <li>Create local snapshot of requested cache groups and send it to the node which request this operation.
+ *     Cache groups will be transmitted using internal API for transferring files. See {@link TransmissionHandler}.</li>
+ * </ul>
+ */
+public class IgniteSnapshotManager extends GridCacheSharedManagerAdapter
+    implements IgniteSnapshot, PartitionsExchangeAware, MetastorageLifecycleListener {
+    /** File with delta pages suffix. */
+    public static final String DELTA_SUFFIX = ".delta";
+
+    /** File name template consists of delta pages. */
+    public static final String PART_DELTA_TEMPLATE = PART_FILE_TEMPLATE + DELTA_SUFFIX;
+
+    /** File name template for index delta pages. */
+    public static final String INDEX_DELTA_NAME = INDEX_FILE_NAME + DELTA_SUFFIX;
+
+    /** Text Reason for checkpoint to start snapshot operation. */
+    public static final String CP_SNAPSHOT_REASON = "Checkpoint started to enforce snapshot operation: %s";
+
+    /** Name prefix for each remote snapshot operation. */
+    public static final String RMT_SNAPSHOT_PREFIX = "snapshot_";
+
+    /** Default snapshot directory for loading remote snapshots. */
+    public static final String DFLT_SNAPSHOT_TMP_DIR = "snp";
+
+    /** Timeout in millisecond for snapshot operations. */
+    public static final long DFLT_SNAPSHOT_TIMEOUT = 15_000L;
+
+    /** Snapshot in progress error message. */
+    public static final String SNP_IN_PROGRESS_ERR_MSG = "Operation rejected due to the snapshot operation in progress.";
+
+    /** Error message to finalize snapshot tasks. */
+    public static final String SNP_NODE_STOPPING_ERR_MSG = "Snapshot has been cancelled due to the local node " +
+        "is stopping";
+
+    /** Metastorage key to save currently running snapshot. */
+    public static final String SNP_RUNNING_KEY = "snapshot-running";
+
+    /** Snapshot metrics prefix. */
+    public static final String SNAPSHOT_METRICS = "snapshot";
+
+    /** Prefix for snapshot threads. */
+    private static final String SNAPSHOT_RUNNER_THREAD_PREFIX = "snapshot-runner";
+
+    /** Total number of thread to perform local snapshot. */
+    private static final int SNAPSHOT_THREAD_POOL_SIZE = 4;
+
+    /** Default snapshot topic to receive snapshots from remote node. */
+    private static final Object DFLT_INITIAL_SNAPSHOT_TOPIC = GridTopic.TOPIC_SNAPSHOT.topic("rmt_snp");
+
+    /** File transmission parameter of cache group id. */
+    private static final String SNP_GRP_ID_PARAM = "grpId";
+
+    /** File transmission parameter of cache partition id. */
+    private static final String SNP_PART_ID_PARAM = "partId";
+
+    /** File transmission parameter of node-sender directory path with its consistentId (e.g. db/IgniteNode0). */
+    private static final String SNP_DB_NODE_PATH_PARAM = "dbNodePath";
+
+    /** File transmission parameter of a cache directory with is currently sends its partitions. */
+    private static final String SNP_CACHE_DIR_NAME_PARAM = "cacheDirName";
+
+    /** Snapshot parameter name for a file transmission. */
+    private static final String SNP_NAME_PARAM = "snpName";
+
+    /** Total snapshot files count which receiver should expect to receive. */
+    private static final String SNP_PARTITIONS_CNT = "partsCnt";
+
+    /**
+     * Local buffer to perform copy-on-write operations with pages for {@code SnapshotFutureTask.PageStoreSerialWriter}s.
+     * It is important to have only only buffer per thread (instead of creating each buffer per
+     * each {@code SnapshotFutureTask.PageStoreSerialWriter}) this is redundant and can lead to OOM errors. Direct buffer
+     * deallocate only when ByteBuffer is garbage collected, but it can get out of off-heap memory before it.
+     */
+    private final ThreadLocal<ByteBuffer> locBuff;
+
+    /** Map of registered cache snapshot processes and their corresponding contexts. */
+    private final ConcurrentMap<String, SnapshotFutureTask> locSnpTasks = new ConcurrentHashMap<>();
+
+    /** Lock to protect the resources is used. */
+    private final GridBusyLock busyLock = new GridBusyLock();
+
+    /** Requested snapshot from remote node. */
+    private final AtomicReference<RemoteSnapshotFuture> rmtSnpReq = new AtomicReference<>();
+
+    /** Mutex used to order cluster snapshot operation progress. */
+    private final Object snpOpMux = new Object();
+
+    /** Take snapshot operation procedure. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> startSnpProc;
+
+    /** Check previously performed snapshot operation and delete uncompleted files if need. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> endSnpProc;
+
+    /** Resolved persistent data storage settings. */
+    private volatile PdsFolderSettings pdsSettings;
+
+    /** Fully initialized metastorage. */
+    private volatile ReadWriteMetastorage metaStorage;
+
+    /** Local snapshot sender factory. */
+    private Function<String, SnapshotSender> locSndrFactory = localSnapshotSenderFactory();
+
+    /** Main snapshot directory to save created snapshots. */
+    private volatile File locSnpDir;
+
+    /**
+     * Working directory for loaded snapshots from the remote nodes and storing
+     * temporary partition delta-files of locally started snapshot process.
+     */
+    private File tmpWorkDir;
+
+    /** Factory to working with delta as file storage. */
+    private volatile FileIOFactory ioFactory = new RandomAccessFileIOFactory();
+
+    /** Factory to create page store for restore. */
+    private volatile BiFunction<Integer, Boolean, FilePageStoreFactory> storeFactory;
+
+    /** Snapshot thread pool to perform local partition snapshots. */
+    private ExecutorService snpRunner;
+
+    /** System discovery message listener. */
+    private DiscoveryEventListener discoLsnr;
+
+    /** Cluster snapshot operation requested by user. */
+    private ClusterSnapshotFuture clusterSnpFut;
+
+    /** Current snapshot operation on local node. */
+    private volatile SnapshotOperationRequest clusterSnpRq;
+
+    /** {@code true} if recovery process occurred for snapshot. */
+    private volatile boolean recovered;
+
+    /** Last seen cluster snapshot operation. */
+    private volatile ClusterSnapshotFuture lastSeenSnpFut = new ClusterSnapshotFuture();
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public IgniteSnapshotManager(GridKernalContext ctx) {
+        locBuff = ThreadLocal.withInitial(() ->
+            ByteBuffer.allocateDirect(ctx.config().getDataStorageConfiguration().getPageSize())
+                .order(ByteOrder.nativeOrder()));
+
+        startSnpProc = new DistributedProcess<>(ctx, START_SNAPSHOT, this::initLocalSnapshotStartStage,
+            this::processLocalSnapshotStartStageResult);
+
+        endSnpProc = new DistributedProcess<>(ctx, END_SNAPSHOT, this::initLocalSnapshotEndStage,
+            this::processLocalSnapshotEndStageResult);
+    }
+
+    /**
+     * @param snapshotCacheDir Snapshot directory to store files.
+     * @param partId Cache partition identifier.
+     * @return A file representation.
+     */
+    public static File partDeltaFile(File snapshotCacheDir, int partId) {
+        return new File(snapshotCacheDir, partDeltaFileName(partId));
+    }
+
+    /**
+     * @param partId Partition id.
+     * @return File name of delta partition pages.
+     */
+    public static String partDeltaFileName(int partId) {
+        assert partId <= MAX_PARTITION_ID || partId == INDEX_PARTITION;
+
+        return partId == INDEX_PARTITION ? INDEX_DELTA_NAME : String.format(PART_DELTA_TEMPLATE, partId);
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void start0() throws IgniteCheckedException {
+        super.start0();
+
+        GridKernalContext ctx = cctx.kernalContext();
+
+        if (ctx.clientNode())
+            return;
+
+        if (!CU.isPersistenceEnabled(ctx.config()))
+            return;
+
+        snpRunner = new IgniteThreadPoolExecutor(SNAPSHOT_RUNNER_THREAD_PREFIX,
+            cctx.igniteInstanceName(),
+            SNAPSHOT_THREAD_POOL_SIZE,
+            SNAPSHOT_THREAD_POOL_SIZE,
+            IgniteConfiguration.DFLT_THREAD_KEEP_ALIVE_TIME,
+            new LinkedBlockingQueue<>(),
+            SYSTEM_POOL,
+            new OomExceptionHandler(ctx));
+
+        assert cctx.pageStore() instanceof FilePageStoreManager;
+
+        FilePageStoreManager storeMgr = (FilePageStoreManager)cctx.pageStore();
+
+        pdsSettings = cctx.kernalContext().pdsFolderResolver().resolveFolders();
+
+        locSnpDir = resolveSnapshotWorkDirectory(ctx.config());
+        tmpWorkDir = Paths.get(storeMgr.workDir().getAbsolutePath(), DFLT_SNAPSHOT_TMP_DIR).toFile();
+
+        U.ensureDirectory(locSnpDir, "snapshot work directory", log);
+        U.ensureDirectory(tmpWorkDir, "temp directory for snapshot creation", log);
+
+        MetricRegistry mreg = cctx.kernalContext().metric().registry(SNAPSHOT_METRICS);
+
+        mreg.register("LastSnapshotStartTime", () -> lastSeenSnpFut.startTime,
+            "The system time approximated by 10 ms when the last cluster snapshot operation has been started.");
+        mreg.register("LastSnapshotEndTime", () -> lastSeenSnpFut.endTime,
+            "The system time approximated by 10 ms when the last cluster snapshot operation has been finished.");
+        mreg.register("LastSnapshotName", () -> lastSeenSnpFut.name, String.class,
+            "The name of last started cluster snapshot operation.");
+        mreg.register("LastSnapshotErrorMessage",
+            () -> lastSeenSnpFut.error() == null ? null : lastSeenSnpFut.error().getMessage(),
+            String.class,
+            "The error message of last started cluster snapshot operation which fail. This value will be 'null' " +
+                "if last snapshot operation completed successfully.");
+        mreg.register("localSnapshotList", this::getSnapshots, List.class,
+            "The list of all known snapshots currently saved on the local node with respect to " +
+                "configured via IgniteConfiguration a snapshot path.");
+
+        storeFactory = storeMgr::getPageStoreFactory;
+
+        cctx.exchange().registerExchangeAwareComponent(this);
+        ctx.internalSubscriptionProcessor().registerMetastorageListener(this);
+
+        // Receive remote snapshots requests.
+        cctx.gridIO().addMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC, new GridMessageListener() {
+            @Override public void onMessage(UUID nodeId, Object msg, byte plc) {
+                if (!busyLock.enterBusy())
+                    return;
+
+                try {
+                    if (msg instanceof SnapshotRequestMessage) {
+                        SnapshotRequestMessage reqMsg0 = (SnapshotRequestMessage)msg;
+                        String snpName = reqMsg0.snapshotName();
+
+                        synchronized (this) {
+                            SnapshotFutureTask task = lastScheduledRemoteSnapshotTask(nodeId);
+
+                            if (task != null) {
+                                // Task will also be removed from local map due to the listener on future done.
+                                task.cancel();
+
+                                log.info("Snapshot request has been cancelled due to another request received " +
+                                    "[prevSnpResp=" + task + ", msg0=" + reqMsg0 + ']');
+                            }
+                        }
+
+                        SnapshotFutureTask task = registerSnapshotTask(snpName,
+                            nodeId,
+                            reqMsg0.parts(),
+                            remoteSnapshotSender(snpName, nodeId));
+
+                        task.listen(f -> {
+                            if (f.error() == null)
+                                return;
+
+                            U.error(log, "Failed to process request of creating a snapshot " +
+                                "[from=" + nodeId + ", msg=" + reqMsg0 + ']', f.error());
+
+                            try {
+                                cctx.gridIO().sendToCustomTopic(nodeId,
+                                    DFLT_INITIAL_SNAPSHOT_TOPIC,
+                                    new SnapshotResponseMessage(reqMsg0.snapshotName(), f.error().getMessage()),
+                                    SYSTEM_POOL);
+                            }
+                            catch (IgniteCheckedException ex0) {
+                                U.error(log, "Fail to send the response message with processing snapshot request " +
+                                    "error [request=" + reqMsg0 + ", nodeId=" + nodeId + ']', ex0);
+                            }
+                        });
+
+                        task.start();
+                    }
+                    else if (msg instanceof SnapshotResponseMessage) {
+                        SnapshotResponseMessage respMsg0 = (SnapshotResponseMessage)msg;
+
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.snpName.equals(respMsg0.snapshotName())) {
+                            if (log.isInfoEnabled()) {
+                                log.info("A stale snapshot response message has been received. Will be ignored " +
+                                    "[fromNodeId=" + nodeId + ", response=" + respMsg0 + ']');
+                            }
+
+                            return;
+                        }
+
+                        if (respMsg0.errorMessage() != null) {
+                            fut0.onDone(new IgniteCheckedException("Request cancelled. The snapshot operation stopped " +
+                                "on the remote node with an error: " + respMsg0.errorMessage()));
+                        }
+                    }
+                }
+                catch (Throwable e) {
+                    U.error(log, "Processing snapshot request from remote node fails with an error", e);
+
+                    cctx.kernalContext().failure().process(new FailureContext(FailureType.CRITICAL_ERROR, e));
+                }
+                finally {
+                    busyLock.leaveBusy();
+                }
+            }
+        });
+
+        cctx.gridEvents().addDiscoveryEventListener(discoLsnr = (evt, discoCache) -> {
+            if (!busyLock.enterBusy())
+                return;
+
+            try {
+                UUID leftNodeId = evt.eventNode().id();
+
+                if (evt.type() == EVT_DISCOVERY_CUSTOM_EVT) {
+                    DiscoveryCustomEvent evt0 = (DiscoveryCustomEvent)evt;
+
+                    if (evt0.customMessage() instanceof InitMessage) {
+                        InitMessage<?> msg = (InitMessage<?>)evt0.customMessage();
+
+                        // This happens when #takeSnapshot() method already invoked and distributed process
+                        // starts its action.
+                        if (msg.type() == START_SNAPSHOT.ordinal()) {
+                            assert clusterSnpRq != null ||
+                                !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()) : evt;
+
+                            DiscoveryCustomEvent customEvt = new DiscoveryCustomEvent();
+
+                            customEvt.node(evt0.node());
+                            customEvt.eventNode(evt0.eventNode());
+                            customEvt.affinityTopologyVersion(evt0.affinityTopologyVersion());
+                            customEvt.customMessage(new SnapshotStartDiscoveryMessage(discoCache, msg.processId()));
+
+                            // Handle new event inside discovery thread, so no guarantees will be violated.
+                            cctx.exchange().onDiscoveryEvent(customEvt, discoCache);
+                        }
+                    }
+                }
+                else if (evt.type() == EVT_NODE_LEFT || evt.type() == EVT_NODE_FAILED) {
+                    SnapshotOperationRequest snpRq = clusterSnpRq;
+
+                    for (SnapshotFutureTask sctx : locSnpTasks.values()) {
+                        if (sctx.sourceNodeId().equals(leftNodeId) ||
+                            (snpRq != null &&
+                                snpRq.snpName.equals(sctx.snapshotName()) &&
+                                snpRq.bltNodes.contains(leftNodeId))) {
+                            sctx.acceptException(new ClusterTopologyCheckedException("The node which requested snapshot " +
 
 Review comment:
   Fixed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r408150026
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManager.java
 ##########
 @@ -0,0 +1,1894 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.channels.FileChannel;
+import java.nio.file.FileVisitResult;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.SimpleFileVisitor;
+import java.nio.file.attribute.BasicFileAttributes;
+import java.util.ArrayDeque;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Queue;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.Executor;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.concurrent.atomic.LongAdder;
+import java.util.function.BiConsumer;
+import java.util.function.BiFunction;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.events.DiscoveryEvent;
+import org.apache.ignite.failure.FailureContext;
+import org.apache.ignite.failure.FailureType;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.GridTopic;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.events.DiscoveryCustomEvent;
+import org.apache.ignite.internal.managers.communication.GridIoManager;
+import org.apache.ignite.internal.managers.communication.GridMessageListener;
+import org.apache.ignite.internal.managers.communication.TransmissionCancelledException;
+import org.apache.ignite.internal.managers.communication.TransmissionHandler;
+import org.apache.ignite.internal.managers.communication.TransmissionMeta;
+import org.apache.ignite.internal.managers.communication.TransmissionPolicy;
+import org.apache.ignite.internal.managers.discovery.DiscoCache;
+import org.apache.ignite.internal.managers.discovery.DiscoveryCustomMessage;
+import org.apache.ignite.internal.managers.discovery.GridDiscoveryManager;
+import org.apache.ignite.internal.managers.eventstorage.DiscoveryEventListener;
+import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.CacheType;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.PartitionsExchangeAware;
+import org.apache.ignite.internal.processors.cache.persistence.StorageException;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.filename.PdsFolderSettings;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.MetastorageLifecycleListener;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadOnlyMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadWriteMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.tree.io.PageIO;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.processors.marshaller.MappedName;
+import org.apache.ignite.internal.processors.metric.impl.LongAdderMetric;
+import org.apache.ignite.internal.util.GridBusyLock;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.distributed.InitMessage;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.tostring.GridToStringInclude;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.S;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.lang.IgniteUuid;
+import org.apache.ignite.thread.IgniteThreadPoolExecutor;
+import org.apache.ignite.thread.OomExceptionHandler;
+import org.jetbrains.annotations.Nullable;
+
+import static java.nio.file.StandardOpenOption.READ;
+import static org.apache.ignite.cluster.ClusterState.active;
+import static org.apache.ignite.configuration.IgniteConfiguration.DFLT_SNAPSHOT_DIRECTORY;
+import static org.apache.ignite.events.EventType.EVT_NODE_FAILED;
+import static org.apache.ignite.events.EventType.EVT_NODE_LEFT;
+import static org.apache.ignite.internal.IgniteFeatures.PERSISTENCE_CACHE_SNAPSHOT;
+import static org.apache.ignite.internal.IgniteFeatures.nodeSupports;
+import static org.apache.ignite.internal.MarshallerContextImpl.saveMappings;
+import static org.apache.ignite.internal.events.DiscoveryCustomEvent.EVT_DISCOVERY_CUSTOM_EVT;
+import static org.apache.ignite.internal.managers.communication.GridIoPolicy.SYSTEM_POOL;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.INDEX_PARTITION;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.MAX_PARTITION_ID;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.INDEX_FILE_NAME;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.PART_FILE_TEMPLATE;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFile;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFileName;
+import static org.apache.ignite.internal.processors.cache.persistence.filename.PdsConsistentIdProcessor.DB_DEFAULT_FOLDER;
+import static org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId.getFlagByPartId;
+import static org.apache.ignite.internal.util.IgniteUtils.isLocalNodeCoordinator;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.END_SNAPSHOT;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.START_SNAPSHOT;
+
+/**
+ * Internal implementation of snapshot operations over persistence caches.
+ * <p>
+ * There are two major actions available:
+ * <ul>
+ *     <li>Create snapshot of the whole cluster cache groups by triggering PME to achieve consistency.</li>
+ *     <li>Create local snapshot of requested cache groups and send it to the node which request this operation.
+ *     Cache groups will be transmitted using internal API for transferring files. See {@link TransmissionHandler}.</li>
+ * </ul>
+ */
+public class IgniteSnapshotManager extends GridCacheSharedManagerAdapter
+    implements IgniteSnapshot, PartitionsExchangeAware, MetastorageLifecycleListener {
+    /** File with delta pages suffix. */
+    public static final String DELTA_SUFFIX = ".delta";
+
+    /** File name template consists of delta pages. */
+    public static final String PART_DELTA_TEMPLATE = PART_FILE_TEMPLATE + DELTA_SUFFIX;
+
+    /** File name template for index delta pages. */
+    public static final String INDEX_DELTA_NAME = INDEX_FILE_NAME + DELTA_SUFFIX;
+
+    /** Text Reason for checkpoint to start snapshot operation. */
+    public static final String CP_SNAPSHOT_REASON = "Checkpoint started to enforce snapshot operation: %s";
+
+    /** Name prefix for each remote snapshot operation. */
+    public static final String RMT_SNAPSHOT_PREFIX = "snapshot_";
+
+    /** Default snapshot directory for loading remote snapshots. */
+    public static final String DFLT_SNAPSHOT_TMP_DIR = "snp";
+
+    /** Timeout in millisecond for snapshot operations. */
+    public static final long DFLT_SNAPSHOT_TIMEOUT = 15_000L;
+
+    /** Snapshot in progress error message. */
+    public static final String SNP_IN_PROGRESS_ERR_MSG = "Operation rejected due to the snapshot operation in progress.";
+
+    /** Error message to finalize snapshot tasks. */
+    public static final String SNP_NODE_STOPPING_ERR_MSG = "Snapshot has been cancelled due to the local node " +
+        "is stopping";
+
+    /** Metastorage key to save currently running snapshot. */
+    public static final String SNP_RUNNING_KEY = "snapshot-running";
+
+    /** Prefix for snapshot threads. */
+    private static final String SNAPSHOT_RUNNER_THREAD_PREFIX = "snapshot-runner";
+
+    /** Total number of thread to perform local snapshot. */
+    private static final int SNAPSHOT_THREAD_POOL_SIZE = 4;
+
+    /** Default snapshot topic to receive snapshots from remote node. */
+    private static final Object DFLT_INITIAL_SNAPSHOT_TOPIC = GridTopic.TOPIC_SNAPSHOT.topic("rmt_snp");
+
+    /** File transmission parameter of cache group id. */
+    private static final String SNP_GRP_ID_PARAM = "grpId";
+
+    /** File transmission parameter of cache partition id. */
+    private static final String SNP_PART_ID_PARAM = "partId";
+
+    /** File transmission parameter of node-sender directory path with its consistentId (e.g. db/IgniteNode0). */
+    private static final String SNP_DB_NODE_PATH_PARAM = "dbNodePath";
+
+    /** File transmission parameter of a cache directory with is currently sends its partitions. */
+    private static final String SNP_CACHE_DIR_NAME_PARAM = "cacheDirName";
+
+    /** Snapshot parameter name for a file transmission. */
+    private static final String SNP_NAME_PARAM = "snpName";
+
+    /** Total snapshot files count which receiver should expect to receive. */
+    private static final String SNP_PARTITIONS_CNT = "partsCnt";
+
+    /**
+     * Local buffer to perform copy-on-write operations with pages for {@code SnapshotFutureTask.PageStoreSerialWriter}s.
+     * It is important to have only only buffer per thread (instead of creating each buffer per
+     * each {@code SnapshotFutureTask.PageStoreSerialWriter}) this is redundant and can lead to OOM errors. Direct buffer
+     * deallocate only when ByteBuffer is garbage collected, but it can get out of off-heap memory before it.
+     */
+    private final ThreadLocal<ByteBuffer> locBuff;
+
+    /** Map of registered cache snapshot processes and their corresponding contexts. */
+    private final ConcurrentMap<String, SnapshotFutureTask> locSnpTasks = new ConcurrentHashMap<>();
+
+    /** Lock to protect the resources is used. */
+    private final GridBusyLock busyLock = new GridBusyLock();
+
+    /** Requested snapshot from remote node. */
+    private final AtomicReference<RemoteSnapshotFuture> rmtSnpReq = new AtomicReference<>();
+
+    /** Mutex used to order cluster snapshot operation progress. */
+    private final Object snpOpMux = new Object();
+
+    /** Take snapshot operation procedure. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> startSnpProc;
+
+    /** Check previously preformed snapshot operation and delete uncompleted files if need. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> endSnpProc;
+
+    /** Resolved persistent data storage settings. */
+    private volatile PdsFolderSettings pdsSettings;
+
+    /** Fully initialized metastorage. */
+    private volatile ReadWriteMetastorage metaStorage;
+
+    /** Local snapshot sender factory. */
+    private Function<String, SnapshotSender> locSndrFactory = this::localSnapshotSender;
+
+    /** Main snapshot directory to save created snapshots. */
+    private volatile File locSnpDir;
+
+    /**
+     * Working directory for loaded snapshots from the remote nodes and storing
+     * temporary partition delta-files of locally started snapshot process.
+     */
+    private File tmpWorkDir;
+
+    /** Factory to working with delta as file storage. */
+    private volatile FileIOFactory ioFactory = new RandomAccessFileIOFactory();
+
+    /** Factory to create page store for restore. */
+    private volatile BiFunction<Integer, Boolean, FilePageStoreFactory> storeFactory;
+
+    /** Snapshot thread pool to perform local partition snapshots. */
+    private ExecutorService snpRunner;
+
+    /** System discovery message listener. */
+    private DiscoveryEventListener discoLsnr;
+
+    /** Cluster snapshot operation requested by user. */
+    private GridFutureAdapter<Void> clusterSnpFut;
+
+    /** Current snapshot operation on local node. */
+    private volatile SnapshotOperationRequest clusterSnpRq;
+
+    /** {@code true} if recovery process occurred for snapshot. */
+    private volatile boolean recovered;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public IgniteSnapshotManager(GridKernalContext ctx) {
+        locBuff = ThreadLocal.withInitial(() ->
+            ByteBuffer.allocateDirect(ctx.config().getDataStorageConfiguration().getPageSize())
+                .order(ByteOrder.nativeOrder()));
+
+        startSnpProc = new DistributedProcess<>(ctx, START_SNAPSHOT, this::startLocalSnapshot,
+            this::startLocalSnapshotResult);
+
+        endSnpProc = new DistributedProcess<>(ctx, END_SNAPSHOT, this::endLocalSnapshot,
+            this::endLocalSnapshotResult);
+    }
+
+    /**
+     * @param snapshotCacheDir Snapshot directory to store files.
+     * @param partId Cache partition identifier.
+     * @return A file representation.
+     */
+    public static File partDeltaFile(File snapshotCacheDir, int partId) {
+        return new File(snapshotCacheDir, partDeltaFileName(partId));
+    }
+
+    /**
+     * @param partId Partition id.
+     * @return File name of delta partition pages.
+     */
+    public static String partDeltaFileName(int partId) {
+        assert partId <= MAX_PARTITION_ID || partId == INDEX_PARTITION;
+
+        return partId == INDEX_PARTITION ? INDEX_DELTA_NAME : String.format(PART_DELTA_TEMPLATE, partId);
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void start0() throws IgniteCheckedException {
+        super.start0();
+
+        GridKernalContext ctx = cctx.kernalContext();
+
+        if (ctx.clientNode())
+            return;
+
+        if (!CU.isPersistenceEnabled(ctx.config()))
+            return;
+
+        snpRunner = new IgniteThreadPoolExecutor(SNAPSHOT_RUNNER_THREAD_PREFIX,
+            cctx.igniteInstanceName(),
+            SNAPSHOT_THREAD_POOL_SIZE,
+            SNAPSHOT_THREAD_POOL_SIZE,
+            IgniteConfiguration.DFLT_THREAD_KEEP_ALIVE_TIME,
+            new LinkedBlockingQueue<>(),
+            SYSTEM_POOL,
+            new OomExceptionHandler(ctx));
+
+        assert cctx.pageStore() instanceof FilePageStoreManager;
+
+        FilePageStoreManager storeMgr = (FilePageStoreManager)cctx.pageStore();
+
+        pdsSettings = cctx.kernalContext().pdsFolderResolver().resolveFolders();
+
+        locSnpDir = snapshotPath(ctx.config()).toFile();
+        tmpWorkDir = Paths.get(storeMgr.workDir().getAbsolutePath(), DFLT_SNAPSHOT_TMP_DIR).toFile();
+
+        U.ensureDirectory(locSnpDir, "snapshot work directory", log);
+        U.ensureDirectory(tmpWorkDir, "temp directory for snapshot creation", log);
+
+        storeFactory = storeMgr::getPageStoreFactory;
+
+        cctx.exchange().registerExchangeAwareComponent(this);
+        ctx.internalSubscriptionProcessor().registerMetastorageListener(this);
+
+        // Receive remote snapshots requests.
+        cctx.gridIO().addMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC, new GridMessageListener() {
+            @Override public void onMessage(UUID nodeId, Object msg, byte plc) {
+                if (!busyLock.enterBusy())
+                    return;
+
+                try {
+                    if (msg instanceof SnapshotRequestMessage) {
+                        SnapshotRequestMessage reqMsg0 = (SnapshotRequestMessage)msg;
+                        String snpName = reqMsg0.snapshotName();
+
+                        synchronized (this) {
+                            SnapshotFutureTask task = lastScheduledRemoteSnapshotTask(nodeId);
+
+                            if (task != null) {
+                                // Task will also be removed from local map due to the listener on future done.
+                                task.cancel();
+
+                                log.info("Snapshot request has been cancelled due to another request received " +
+                                    "[prevSnpResp=" + task + ", msg0=" + reqMsg0 + ']');
+                            }
+                        }
+
+                        SnapshotFutureTask task = registerSnapshotTask(snpName,
+                            nodeId,
+                            reqMsg0.parts(),
+                            remoteSnapshotSender(snpName, nodeId));
+
+                        task.listen(f -> {
+                            if (f.error() == null)
+                                return;
+
+                            U.error(log, "Failed to process request of creating a snapshot " +
+                                "[from=" + nodeId + ", msg=" + reqMsg0 + ']', f.error());
+
+                            try {
+                                cctx.gridIO().sendToCustomTopic(nodeId,
+                                    DFLT_INITIAL_SNAPSHOT_TOPIC,
+                                    new SnapshotResponseMessage(reqMsg0.snapshotName(), f.error().getMessage()),
+                                    SYSTEM_POOL);
+                            }
+                            catch (IgniteCheckedException ex0) {
+                                U.error(log, "Fail to send the response message with processing snapshot request " +
+                                    "error [request=" + reqMsg0 + ", nodeId=" + nodeId + ']', ex0);
+                            }
+                        });
+
+                        task.start();
+                    }
+                    else if (msg instanceof SnapshotResponseMessage) {
+                        SnapshotResponseMessage respMsg0 = (SnapshotResponseMessage)msg;
+
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.snpName.equals(respMsg0.snapshotName())) {
+                            if (log.isInfoEnabled()) {
+                                log.info("A stale snapshot response message has been received. Will be ignored " +
+                                    "[fromNodeId=" + nodeId + ", response=" + respMsg0 + ']');
+                            }
+
+                            return;
+                        }
+
+                        if (respMsg0.errorMessage() != null) {
+                            fut0.onDone(new IgniteCheckedException("Request cancelled. The snapshot operation stopped " +
+                                "on the remote node with an error: " + respMsg0.errorMessage()));
+                        }
+                    }
+                }
+                catch (Throwable e) {
+                    U.error(log, "Processing snapshot request from remote node fails with an error", e);
+
+                    cctx.kernalContext().failure().process(new FailureContext(FailureType.CRITICAL_ERROR, e));
+                }
+                finally {
+                    busyLock.leaveBusy();
+                }
+            }
+        });
+
+        cctx.gridEvents().addDiscoveryEventListener(discoLsnr = (evt, discoCache) -> {
+            if (!busyLock.enterBusy())
+                return;
+
+            try {
+                UUID leftNodeId = evt.eventNode().id();
+
+                if (evt.type() == EVT_DISCOVERY_CUSTOM_EVT) {
+                    DiscoveryCustomEvent evt0 = (DiscoveryCustomEvent)evt;
+
+                    if (evt0.customMessage() instanceof InitMessage) {
+                        InitMessage<?> msg = (InitMessage<?>)evt0.customMessage();
+
+                        // This happens when #takeSnapshot() method already invoked and distributed process
+                        // starts its action.
+                        if (msg.type() == START_SNAPSHOT.ordinal()) {
+                            assert clusterSnpRq != null ||
+                                !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()) : evt;
+
+                            DiscoveryCustomEvent customEvt = new DiscoveryCustomEvent();
+
+                            customEvt.node(evt0.node());
+                            customEvt.eventNode(evt0.eventNode());
+                            customEvt.affinityTopologyVersion(evt0.affinityTopologyVersion());
+                            customEvt.customMessage(new SnapshotStartDiscoveryMessage(discoCache, msg.processId()));
+
+                            // Handle new event inside discovery thread, so no guarantees will be violated.
+                            cctx.exchange().onDiscoveryEvent(customEvt, discoCache);
+                        }
+                    }
+                }
+                else if (evt.type() == EVT_NODE_LEFT || evt.type() == EVT_NODE_FAILED) {
+                    SnapshotOperationRequest snpRq = clusterSnpRq;
+
+                    for (SnapshotFutureTask sctx : locSnpTasks.values()) {
+                        if (sctx.sourceNodeId().equals(leftNodeId) ||
+                            (snpRq != null &&
+                                snpRq.snpName.equals(sctx.snapshotName()) &&
+                                snpRq.bltNodes.contains(leftNodeId))) {
+                            sctx.acceptException(new ClusterTopologyCheckedException("The node which requested snapshot " +
+                                "creation has left the grid"));
+                        }
+                    }
+
+                    RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                    if (snpTrFut != null && snpTrFut.rmtNodeId.equals(leftNodeId)) {
+                        snpTrFut.onDone(new ClusterTopologyCheckedException("The node from which a snapshot has been " +
+                            "requested left the grid"));
+                    }
+                }
+            }
+            finally {
+                busyLock.leaveBusy();
+            }
+        }, EVT_NODE_LEFT, EVT_NODE_FAILED, EVT_DISCOVERY_CUSTOM_EVT);
+
+        // Remote snapshot handler.
+        cctx.kernalContext().io().addTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC, new TransmissionHandler() {
+            @Override public void onEnd(UUID nodeId) {
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                assert snpTrFut.stores.isEmpty() : snpTrFut.stores.entrySet();
+                assert snpTrFut.partsLeft == 0 : snpTrFut;
+
+                snpTrFut.onDone();
+
+                log.info("Requested snapshot from remote node has been fully received " +
+                    "[snpName=" + snpTrFut.snpName + ", snpTrans=" + snpTrFut + ']');
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onException(UUID nodeId, Throwable err) {
+                RemoteSnapshotFuture fut = rmtSnpReq.get();
+
+                if (fut == null)
+                    return;
+
+                if (fut.rmtNodeId.equals(nodeId))
+                    fut.onDone(err);
+            }
+
+            /** {@inheritDoc} */
+            @Override public String filePath(UUID nodeId, TransmissionMeta fileMeta) {
+                Integer partId = (Integer)fileMeta.params().get(SNP_PART_ID_PARAM);
+                String rmtDbNodePath = (String)fileMeta.params().get(SNP_DB_NODE_PATH_PARAM);
+                String cacheDirName = (String)fileMeta.params().get(SNP_CACHE_DIR_NAME_PARAM);
+
+                RemoteSnapshotFuture transFut = resolve(nodeId, fileMeta);
+
+                try {
+                    File cacheDir = U.resolveWorkDirectory(tmpWorkDir.getAbsolutePath(),
+                        Paths.get(transFut.snpName, rmtDbNodePath, cacheDirName).toString(),
+                        false);
+
+                    return new File(cacheDir, getPartitionFileName(partId)).getAbsolutePath();
+                }
+                catch (IgniteCheckedException e) {
+                    throw new IgniteException(e);
+                }
+            }
+
+            /**
+             * @param nodeId Remote node id.
+             * @param meta Transmission meta.
+             * @return Resolved transmission future.
+             */
+            private RemoteSnapshotFuture resolve(UUID nodeId, TransmissionMeta meta) {
+                String snpName = (String)meta.params().get(SNP_NAME_PARAM);
+                Integer partsCnt = (Integer)meta.params().get(SNP_PARTITIONS_CNT);
+
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                if (snpTrFut == null || !snpTrFut.snpName.equals(snpName)) {
+                    throw new TransmissionCancelledException("Stale snapshot transmission will be ignored " +
+                        "[snpName=" + snpName + ", meta=" + meta + ", snpTrFut=" + snpTrFut + ']');
+                }
+
+                assert snpTrFut.snpName.equals(snpName) && snpTrFut.rmtNodeId.equals(nodeId) :
+                    "Another transmission in progress [snpTrFut=" + snpTrFut + ", nodeId=" + snpName + ']';
+
+                if (snpTrFut.partsLeft == -1)
+                    snpTrFut.partsLeft = partsCnt;
+
+                return snpTrFut;
+            }
+
+            /**
+             * @param snpTrans Current snapshot transmission.
+             * @param grpPartId Pair of group id and its partition id.
+             */
+            private void finishRecover(RemoteSnapshotFuture snpTrans, GroupPartitionId grpPartId) {
+                FilePageStore pageStore = null;
+
+                try {
+                    pageStore = snpTrans.stores.remove(grpPartId);
+
+                    pageStore.finishRecover();
+
+                    snpTrans.partConsumer.accept(new File(pageStore.getFileAbsolutePath()), grpPartId);
+
+                    snpTrans.partsLeft--;
+                }
+                catch (StorageException e) {
+                    throw new IgniteException(e);
+                }
+                finally {
+                    U.closeQuiet(pageStore);
+                }
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<ByteBuffer> chunkHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+
+                RemoteSnapshotFuture snpTrFut = resolve(nodeId, initMeta);
+
+                GroupPartitionId grpPartId = new GroupPartitionId(grpId, partId);
+                FilePageStore pageStore = snpTrFut.stores.get(grpPartId);
+
+                if (pageStore == null) {
+                    throw new IgniteException("Partition must be loaded before applying snapshot delta pages " +
+                        "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                pageStore.beginRecover();
+
+                // No snapshot delta pages received. Finalize recovery.
+                if (initMeta.count() == 0)
+                    finishRecover(snpTrFut, grpPartId);
+
+                return new Consumer<ByteBuffer>() {
+                    final LongAdder transferred = new LongAdder();
+
+                    @Override public void accept(ByteBuffer buff) {
+                        try {
+                            assert initMeta.count() != 0 : initMeta;
+
+                            RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                            if (fut0 == null || !fut0.equals(snpTrFut) || fut0.isCancelled()) {
+                                throw new TransmissionCancelledException("Snapshot request is cancelled " +
+                                    "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                            }
+
+                            pageStore.write(PageIO.getPageId(buff), buff, 0, false);
+
+                            transferred.add(buff.capacity());
+
+                            if (transferred.longValue() == initMeta.count())
+                                finishRecover(snpTrFut, grpPartId);
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                    }
+                };
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<File> fileHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+                String snpName = (String)initMeta.params().get(SNP_NAME_PARAM);
+
+                assert grpId != null;
+                assert partId != null;
+                assert snpName != null;
+                assert storeFactory != null;
+
+                RemoteSnapshotFuture transFut = rmtSnpReq.get();
+
+                if (transFut == null) {
+                    throw new IgniteException("Snapshot transmission with given name doesn't exists " +
+                        "[snpName=" + snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                return new Consumer<File>() {
+                    @Override public void accept(File file) {
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.equals(transFut) || fut0.isCancelled()) {
+                            throw new TransmissionCancelledException("Snapshot request is cancelled [snpName=" + snpName +
+                                ", grpId=" + grpId + ", partId=" + partId + ']');
+                        }
+
+                        busyLock.enterBusy();
+
+                        try {
+                            FilePageStore pageStore = (FilePageStore)storeFactory
+                                .apply(grpId, false)
+                                .createPageStore(getFlagByPartId(partId),
+                                    file::toPath,
+                                    new LongAdderMetric("NO_OP", null));
+
+                            transFut.stores.put(new GroupPartitionId(grpId, partId), pageStore);
+
+                            pageStore.init();
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                        finally {
+                            busyLock.leaveBusy();
+                        }
+                    }
+                };
+            }
+        });
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void stop0(boolean cancel) {
+        busyLock.block();
+
+        try {
+            // Try stop all snapshot processing if not yet.
+            for (SnapshotFutureTask sctx : locSnpTasks.values())
+                sctx.acceptException(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+            locSnpTasks.clear();
+
+            RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+            if (snpTrFut != null)
+                snpTrFut.cancel();
+
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null) {
+                    clusterSnpFut.onDone(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+                    clusterSnpFut = null;
+                }
+            }
+
+            if (snpRunner != null)
+                snpRunner.shutdownNow();
+
+            cctx.kernalContext().io().removeMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC);
+            cctx.kernalContext().io().removeTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC);
+
+            if (discoLsnr != null)
+                cctx.kernalContext().event().removeDiscoveryEventListener(discoLsnr);
+
+            cctx.exchange().unregisterExchangeAwareComponent(this);
+        }
+        finally {
+            busyLock.unblock();
+        }
+    }
+
+    /**
+     * Concurrently traverse the snapshot directory for given local node folder name and
+     * delete recursively all files from it if exist.
+     *
+     * @param snpDir Snapshot dire
+     * @param folderName Local node folder name (see U.maskForFileName with consistent id).
+     */
+    public static void deleteSnapshot(File snpDir, String folderName) {
+        if (!snpDir.exists())
+            return;
+
+        assert snpDir.isDirectory() : snpDir;
+
+        try {
+            List<Path> dirs = new ArrayList<>();
+
+            Files.walkFileTree(snpDir.toPath(), new SimpleFileVisitor<Path>() {
+                @Override public FileVisitResult preVisitDirectory(Path dir,
+                    BasicFileAttributes attrs) throws IOException {
+                    if (Files.isDirectory(dir) &&
+                        Files.exists(dir) &&
+                        folderName.equals(dir.getFileName().toString())) {
+                        // Directory found, add it for processing.
+                        dirs.add(dir);
+                    }
+
+                    return super.preVisitDirectory(dir, attrs);
+                }
+
+                @Override public FileVisitResult visitFileFailed(Path file, IOException exc) {
+                    // Skip files which can be concurrently removed from FileTree.
+                    return FileVisitResult.CONTINUE;
+                }
+            });
+
+            dirs.forEach(U::delete);
+
+            File db = new File(snpDir, DB_DEFAULT_FOLDER);
+
+            if (!db.exists() || db.list().length == 0)
+                U.delete(snpDir);
+        }
+        catch (IOException e) {
+            throw new IgniteException(e);
+        }
+    }
+
+    /**
+     * @param snpName Snapshot name.
+     * @return Local snapshot directory for snapshot with given name.
+     */
+    public File snapshotLocalDir(String snpName) {
+        assert locSnpDir != null;
+
+        return new File(locSnpDir, snpName);
+    }
+
+    /**
+     * @return Node snapshot working directory.
+     */
+    public File snapshotTmpDir() {
+        assert tmpWorkDir != null;
+
+        return tmpWorkDir;
+    }
+
+    /**
+     * @param req Request on snapshot creation.
+     * @return Future which will be completed when a snapshot has been started.
+     */
+    private IgniteInternalFuture<SnapshotOperationResponse> startLocalSnapshot(SnapshotOperationRequest req) {
+        if (cctx.kernalContext().clientNode() ||
+            !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()))
+            return new GridFinishedFuture<>();
+
+        // Executed inside discovery notifier thread, prior to firing discovery custom event,
+        // so it is safe to set new snapshot task inside this method without synchronization.
+        if (clusterSnpRq != null) {
+            return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot operation has been rejected. " +
+                "Another snapshot operation in progress [req=" + req + ", curr=" + clusterSnpRq + ']'));
+        }
+
+        // Collection of pairs group and appropriate cache partition to be snapshot.
+        Map<Integer, Set<Integer>> parts = new HashMap<>();
+
+        for (Integer grpId : req.grpIds)
+            parts.put(grpId, null);
+
+        SnapshotFutureTask task0 = registerSnapshotTask(req.snpName,
+            req.srcNodeId,
+            parts,
+            locSndrFactory.apply(req.snpName));
 
 Review comment:
   This factory cannot be simply removed, but I've rechecked the source code and created the factory below. Seems it should cover all cases:
   
   ```
       Function<String, SnapshotSender> localSnapshotSenderFactory() {
           return LocalSnapshotSender::new;
       }
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r409739366
 
 

 ##########
 File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotSelfTest.java
 ##########
 @@ -0,0 +1,734 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.file.OpenOption;
+import java.util.Collections;
+import java.util.List;
+import java.util.Random;
+import java.util.concurrent.Callable;
+import java.util.concurrent.CountDownLatch;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.function.Function;
+import java.util.function.Predicate;
+import org.apache.ignite.Ignite;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.Ignition;
+import org.apache.ignite.cache.CacheAtomicityMode;
+import org.apache.ignite.cache.affinity.rendezvous.RendezvousAffinityFunction;
+import org.apache.ignite.cache.query.ScanQuery;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.IgniteInterruptedCheckedException;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.TestRecordingCommunicationSpi;
+import org.apache.ignite.internal.events.DiscoveryCustomEvent;
+import org.apache.ignite.internal.managers.discovery.DiscoveryCustomMessage;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionDemandMessage;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionSupplyMessage;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.PartitionsExchangeAware;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.metric.MetricRegistry;
+import org.apache.ignite.internal.processors.metric.impl.ObjectGauge;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.distributed.FullMessage;
+import org.apache.ignite.internal.util.distributed.SingleNodeMessage;
+import org.apache.ignite.internal.util.typedef.G;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.spi.metric.LongMetric;
+import org.apache.ignite.testframework.GridTestUtils;
+import org.apache.ignite.transactions.Transaction;
+import org.junit.Before;
+import org.junit.Test;
+
+import static org.apache.ignite.cluster.ClusterState.ACTIVE;
+import static org.apache.ignite.internal.events.DiscoveryCustomEvent.EVT_DISCOVERY_CUSTOM_EVT;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.SNAPSHOT_METRICS;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.SNP_IN_PROGRESS_ERR_MSG;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.SNP_NODE_STOPPING_ERR_MSG;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.isSnapshotOperation;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.resolveSnapshotWorkDirectory;
+import static org.apache.ignite.testframework.GridTestUtils.assertThrowsAnyCause;
+import static org.apache.ignite.testframework.GridTestUtils.assertThrowsWithCause;
+
+/**
+ * Cluster-wide snapshot test.
+ */
+public class IgniteClusterSnapshotSelfTest extends AbstractSnapshotSelfTest {
+    /** Random instance. */
+    private static final Random R = new Random();
+
+    /** Time to wait while rebalance may happen. */
+    private static final long REBALANCE_AWAIT_TIME = GridTestUtils.SF.applyLB(10_000, 3_000);
+
+    /** Cache configuration for test. */
+    private static CacheConfiguration<Integer, Integer> txCcfg = new CacheConfiguration<Integer, Integer>("txCacheName")
+        .setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL)
+        .setBackups(2)
+        .setAffinity(new RendezvousAffinityFunction(false)
+            .setPartitions(CACHE_PARTS_COUNT));
+
+    /** {@code true} if node should be started in separate jvm. */
+    protected volatile boolean jvm;
+
+    /** @throws Exception If fails. */
+    @Before
+    @Override public void beforeTestSnapshot() throws Exception {
+        super.beforeTestSnapshot();
+
+        jvm = false;
+    }
+
+    /**
+     * Take snapshot from the whole cluster and check snapshot consistency.
+     * Note: Client nodes and server nodes not in baseline topology must not be affected.
+     *
+     * @throws Exception If fails.
+     */
+    @Test
+    public void testConsistentClusterSnapshotUnderLoad() throws Exception {
+        int grids = 3;
+        String snpName = "backup23012020";
+        AtomicInteger atKey = new AtomicInteger(CACHE_KEYS_RANGE);
+        AtomicInteger txKey = new AtomicInteger(CACHE_KEYS_RANGE);
+
+        IgniteEx ignite = startGrids(grids);
+        startClientGrid();
+
+        ignite.cluster().baselineAutoAdjustEnabled(false);
+        ignite.cluster().state(ACTIVE);
+
+        // Start node not in baseline.
+        IgniteEx notBltIgnite = startGrid(grids);
+        File locSnpDir = snp(notBltIgnite).snapshotLocalDir(SNAPSHOT_NAME);
+        String notBltDirName = folderName(notBltIgnite);
+
+        IgniteCache<Integer, Integer> cache = ignite.createCache(txCcfg);
+
+        for (int idx = 0; idx < CACHE_KEYS_RANGE; idx++) {
+            cache.put(txKey.incrementAndGet(), -1);
+            ignite.cache(DEFAULT_CACHE_NAME).put(atKey.incrementAndGet(), -1);
+        }
+
+        forceCheckpoint();
+
+        CountDownLatch loadLatch = new CountDownLatch(1);
+
+        ignite.context().cache().context().exchange().registerExchangeAwareComponent(new PartitionsExchangeAware() {
+            /** {@inheritDoc} */
+            @Override public void onInitBeforeTopologyLock(GridDhtPartitionsExchangeFuture fut) {
+                if (fut.firstEvent().type() != EVT_DISCOVERY_CUSTOM_EVT)
+                    return;
+
+                // First discovery custom event will be a snapshot operation.
+                assertTrue(isSnapshotOperation(fut.firstEvent()));
+                assertTrue("Snapshot must use pme-free exchange", fut.context().exchangeFreeSwitch());
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onInitAfterTopologyLock(GridDhtPartitionsExchangeFuture fut) {
+                if (fut.firstEvent().type() != EVT_DISCOVERY_CUSTOM_EVT)
+                    return;
+
+                DiscoveryCustomMessage msg = ((DiscoveryCustomEvent)fut.firstEvent()).customMessage();
+
+                assertNotNull(msg);
+
+                if (msg instanceof SnapshotDiscoveryMessage)
+                    loadLatch.countDown();
+            }
+        });
+
+        // Start cache load
+        IgniteInternalFuture<Long> loadFut = GridTestUtils.runMultiThreadedAsync(() -> {
+            try {
+                U.await(loadLatch);
+
+                while (!Thread.currentThread().isInterrupted()) {
+                    int txIdx = R.nextInt(grids);
+
+                    // zero out the sign bit
+                    grid(txIdx).cache(txCcfg.getName()).put(txKey.incrementAndGet(), R.nextInt() & Integer.MAX_VALUE);
+
+                    int atomicIdx = R.nextInt(grids);
+
+                    grid(atomicIdx).cache(DEFAULT_CACHE_NAME).put(atKey.incrementAndGet(), R.nextInt() & Integer.MAX_VALUE);
+                }
+            }
+            catch (IgniteInterruptedCheckedException e) {
+                throw new RuntimeException(e);
+            }
+        }, 3, "cache-put-");
+
+        try {
+            IgniteFuture<Void> fut = ignite.snapshot().createSnapshot(snpName);
+
+            U.await(loadLatch, 10, TimeUnit.SECONDS);
+
+            fut.get();
+        }
+        finally {
+            loadFut.cancel();
+        }
+
+        // cluster can be deactivated but we must test snapshot restore when binary recovery also occurred
+        stopAllGrids();
+
+        assertTrue("Snapshot directory must be empty for node not in baseline topology: " + notBltDirName,
+            !searchDirectoryRecursively(locSnpDir.toPath(), notBltDirName).isPresent());
+
+        IgniteEx snpIg0 = startGridsFromSnapshot(grids, snpName);
+
+        assertEquals("The number of all (primary + backup) cache keys mismatch for cache: " + DEFAULT_CACHE_NAME,
+            CACHE_KEYS_RANGE, snpIg0.cache(DEFAULT_CACHE_NAME).size());
+
+        assertEquals("The number of all (primary + backup) cache keys mismatch for cache: " + txCcfg.getName(),
+            CACHE_KEYS_RANGE, snpIg0.cache(txCcfg.getName()).size());
+
+        snpIg0.cache(DEFAULT_CACHE_NAME).query(new ScanQuery<>(null))
+            .forEach(e -> assertTrue("Snapshot must contains only negative values " +
+                "[cache=" + DEFAULT_CACHE_NAME + ", entry=" + e +']', (Integer)e.getValue() < 0));
+
+        snpIg0.cache(txCcfg.getName()).query(new ScanQuery<>(null))
+            .forEach(e -> assertTrue("Snapshot must contains only negative values " +
+                "[cache=" + txCcfg.getName() + ", entry=" + e + ']', (Integer)e.getValue() < 0));
+    }
+
+    /** @throws Exception If fails. */
+    @Test
+    public void testSnapshotPrimaryBackupsTheSame() throws Exception {
+        int grids = 3;
+        AtomicInteger cacheKey = new AtomicInteger();
+
+        IgniteEx ignite = startGridsWithCache(grids, dfltCacheCfg, CACHE_KEYS_RANGE);
+
+        IgniteInternalFuture<Long> atLoadFut = GridTestUtils.runMultiThreadedAsync(() -> {
+            while (!Thread.currentThread().isInterrupted()) {
+                int gId = R.nextInt(grids);
+
+                grid(gId).cache(DEFAULT_CACHE_NAME)
+                    .put(cacheKey.incrementAndGet(), 0);
+            }
+        }, 5, "atomic-cache-put-");
+
+        IgniteInternalFuture<Long> txLoadFut = GridTestUtils.runMultiThreadedAsync(() -> {
+            while (!Thread.currentThread().isInterrupted()) {
+                int gId = R.nextInt(grids);
+
+                IgniteCache<Integer, Integer> txCache = grid(gId).getOrCreateCache(txCcfg);
+
+                try (Transaction tx = grid(gId).transactions().txStart()) {
+                    txCache.put(cacheKey.incrementAndGet(), 0);
+
+                    tx.commit();
+                }
+            }
+        }, 5, "tx-cache-put-");
+
+        try {
+            IgniteFuture<Void> fut = ignite.snapshot().createSnapshot(SNAPSHOT_NAME);
+
+            fut.get();
+        }
+        finally {
+            txLoadFut.cancel();
+            atLoadFut.cancel();
+        }
+
+        stopAllGrids();
+
+        IgniteEx snpIg0 = startGridsFromSnapshot(grids, cfg -> resolveSnapshotWorkDirectory(cfg).getAbsolutePath(), SNAPSHOT_NAME, false);
+
+        // Block whole rebalancing.
+        for (Ignite g : G.allGrids())
+            TestRecordingCommunicationSpi.spi(g).blockMessages((node, msg) -> msg instanceof GridDhtPartitionDemandMessage);
+
+        snpIg0.cluster().state(ACTIVE);
+
+        assertFalse("Primary and backup in snapshot must have the same counters. Rebalance must not happen.",
+            GridTestUtils.waitForCondition(() -> {
+                boolean hasMsgs = false;
+
+                for (Ignite g : G.allGrids())
+                    hasMsgs |= TestRecordingCommunicationSpi.spi(g).hasBlockedMessages();
+
+                return hasMsgs;
+            }, REBALANCE_AWAIT_TIME));
+
+        TestRecordingCommunicationSpi.stopBlockAll();
+    }
+
+    /** @throws Exception If fails. */
+    @Test
+    public void testRejectCacheStopDuringClusterSnapshot() throws Exception {
+        // Block the full message, so cluster-wide snapshot operation would not be fully completed.
+        IgniteEx ignite = startGridsWithCache(3, dfltCacheCfg, CACHE_KEYS_RANGE);
+
+        BlockingCustomMessageDiscoverySpi spi = discoSpi(ignite);
+        spi.block((msg) -> {
+            if (msg instanceof FullMessage) {
+                FullMessage<?> msg0 = (FullMessage<?>)msg;
+
+                assertEquals("Snapshot distributed process must be used",
+                    DistributedProcess.DistributedProcessType.START_SNAPSHOT.ordinal(), msg0.type());
+
+                assertTrue("Snapshot has to be finished successfully on all nodes", msg0.error().isEmpty());
+
+                return true;
+            }
+
+            return false;
+        });
+
+        IgniteFuture<Void> fut = ignite.snapshot().createSnapshot(SNAPSHOT_NAME);
+
+        spi.waitBlocked(10_000L);
+
+        // Creating of new caches should not be blocked.
+        ignite.getOrCreateCache(dfltCacheCfg.setName("default2"))
+            .put(1, 1);
+
+        forceCheckpoint();
+
+        assertThrowsAnyCause(log,
+            () -> {
+                ignite.destroyCache(DEFAULT_CACHE_NAME);
+
+                return 0;
+            },
+            IgniteCheckedException.class,
+            SNP_IN_PROGRESS_ERR_MSG);
+
+        spi.unblock();
+
+        fut.get();
+    }
+
+    /** @throws Exception If fails. */
+    @Test
+    public void testBltChangeDuringClusterSnapshot() throws Exception {
+        IgniteEx ignite = startGridsWithCache(3, dfltCacheCfg, CACHE_KEYS_RANGE);
+
+        startGrid(3);
+
+        long topVer = ignite.cluster().topologyVersion();
+
+        BlockingCustomMessageDiscoverySpi spi = discoSpi(ignite);
+        spi.block((msg) -> msg instanceof FullMessage);
+
+        IgniteFuture<Void> fut = ignite.snapshot().createSnapshot(SNAPSHOT_NAME);
+
+        spi.waitBlocked(10_000L);
+
+        // Not baseline node joins successfully.
+        String grid4Dir = folderName(startGrid(4));
+
+        // Not blt node left the cluster and snapshot not affected.
+        stopGrid(4);
+
+        // Client node must connect successfully.
+        startClientGrid(4);
+
+        // Changing baseline complete successfully.
+        ignite.cluster().setBaselineTopology(topVer);
+
+        spi.unblock();
+
+        fut.get();
+
+        assertTrue("Snapshot directory must be empty for node 0 due to snapshot future fail: " + grid4Dir,
+            !searchDirectoryRecursively(snp(ignite).snapshotLocalDir(SNAPSHOT_NAME).toPath(), grid4Dir).isPresent());
+    }
+
+    /** @throws Exception If fails. */
+    @Test
+    public void testClusterSnapshotExOnInitiatorLeft() throws Exception {
+        IgniteEx ignite = startGridsWithCache(2, dfltCacheCfg, CACHE_KEYS_RANGE);
+
+        BlockingCustomMessageDiscoverySpi spi = discoSpi(ignite);
+        spi.block((msg) -> msg instanceof FullMessage);
+
+        IgniteFuture<Void> fut = ignite.snapshot().createSnapshot(SNAPSHOT_NAME);
+
+        spi.waitBlocked(10_000L);
+
+        ignite.close();
+
+        assertThrowsAnyCause(log,
+            fut::get,
+            NodeStoppingException.class,
+            SNP_NODE_STOPPING_ERR_MSG);
+    }
+
+    /** @throws Exception If fails. */
+    @Test
+    public void testSnapshotExistsException() throws Exception {
+        IgniteEx ignite = startGridsWithCache(2, dfltCacheCfg, CACHE_KEYS_RANGE);
+
+        ignite.snapshot().createSnapshot(SNAPSHOT_NAME).get();
+
+        assertThrowsAnyCause(log,
+            () -> ignite.snapshot().createSnapshot(SNAPSHOT_NAME).get(),
+            IgniteException.class,
+            "Snapshot with given name already exists.");
+
+        stopAllGrids();
+
+        // Check that snapshot has not been accidentally deleted.
+        IgniteEx snp = startGridsFromSnapshot(2, SNAPSHOT_NAME);
+
+        assertSnapshotCacheKeys(snp.cache(dfltCacheCfg.getName()));
+    }
+
+    /** @throws Exception If fails. */
+    @Test
+    public void testClusterSnapshotCleanedOnLeft() throws Exception {
+        CountDownLatch block = new CountDownLatch(1);
+        CountDownLatch partProcessed = new CountDownLatch(1);
+
+        IgniteEx ignite = startGridsWithCache(2, dfltCacheCfg, CACHE_KEYS_RANGE);
+
+        File locSnpDir = snp(ignite).snapshotLocalDir(SNAPSHOT_NAME);
+        String dirNameIgnite0 = folderName(ignite);
+
+        String dirNameIgnite1 = folderName(grid(1));
+
+        snp(grid(1)).localSnapshotSenderFactory(
+            blockingLocalSnapshotSender(grid(1), partProcessed, block));
+
+        TestRecordingCommunicationSpi commSpi1 = TestRecordingCommunicationSpi.spi(grid(1));
+        commSpi1.blockMessages((node, msg) -> msg instanceof SingleNodeMessage);
+
+        IgniteFuture<?> fut = ignite.snapshot().createSnapshot(SNAPSHOT_NAME);
+
+        U.await(partProcessed);
+
+        stopGrid(1);
+
+        block.countDown();
+
+        assertThrowsAnyCause(log,
+            fut::get,
+            IgniteCheckedException.class,
+            "Snapshot creation has been finished with an error");
+
+        assertTrue("Snapshot directory must be empty for node 0 due to snapshot future fail: " + dirNameIgnite0,
+            !searchDirectoryRecursively(locSnpDir.toPath(), dirNameIgnite0).isPresent());
+
+        startGrid(1);
+
+        awaitPartitionMapExchange();
+
+        // Snapshot directory must be cleaned.
+        assertTrue("Snapshot directory must be empty for node 1 due to snapshot future fail: " + dirNameIgnite1,
+            !searchDirectoryRecursively(locSnpDir.toPath(), dirNameIgnite1).isPresent());
+
+        List<String> allSnapshots = snp(ignite).getSnapshots();
+
+        assertTrue("Snapshot directory must be empty due to snapshot fail: " + allSnapshots,
+            allSnapshots.isEmpty());
+    }
+
+    /** @throws Exception If fails. */
+    @Test
+    public void testRecoveryClusterSnapshotJvmHalted() throws Exception {
+        IgniteEx ignite = startGridsWithCache(2, dfltCacheCfg, CACHE_KEYS_RANGE);
+
+        String grid0Dir = folderName(ignite);
+        String grid1Dir = folderName(grid(1));
+        File locSnpDir = snp(ignite).snapshotLocalDir(SNAPSHOT_NAME);
+
+        jvm = true;
+
+        IgniteConfiguration cfg2 = optimize(getConfiguration(getTestIgniteInstanceName(2)));
+
+        cfg2.getDataStorageConfiguration()
+            .setFileIOFactory(new HaltJvmFileIOFactory(new RandomAccessFileIOFactory(),
+                (Predicate<File> & Serializable) file -> {
+                    // Trying to create FileIO over partition file.
+                    return file.getAbsolutePath().contains(SNAPSHOT_NAME);
+                }));
+
+        startGrid(cfg2);
+
+        String grid2Dir = U.maskForFileName(cfg2.getConsistentId().toString());
+
+        jvm = false;
+
+        ignite.cluster().setBaselineTopology(ignite.cluster().topologyVersion());
+
+        awaitPartitionMapExchange();
+
+        assertThrowsAnyCause(log,
+            () -> ignite.snapshot().createSnapshot(SNAPSHOT_NAME).get(),
+            IgniteCheckedException.class,
+            "Snapshot creation has been finished with an error");
+
+        assertTrue("Snapshot directory must be empty: " + grid0Dir,
+            !searchDirectoryRecursively(locSnpDir.toPath(), grid0Dir).isPresent());
+
+        assertTrue("Snapshot directory must be empty: " + grid1Dir,
+            !searchDirectoryRecursively(locSnpDir.toPath(), grid1Dir).isPresent());
+
+        assertTrue("Snapshot directory must exist due to grid2 has been halted and cleanup not fully performed: " + grid2Dir,
+            searchDirectoryRecursively(locSnpDir.toPath(), grid2Dir).isPresent());
+
+        IgniteEx grid2 = startGrid(2);
+
+        assertTrue("Snapshot directory must be empty after recovery: " + grid2Dir,
+            !searchDirectoryRecursively(locSnpDir.toPath(), grid2Dir).isPresent());
+
+        awaitPartitionMapExchange();
+
+        assertTrue("Snapshot directory must be empty", grid2.snapshot().getSnapshots().isEmpty());
+
+        ignite.snapshot().createSnapshot(SNAPSHOT_NAME)
+            .get();
+
+        stopAllGrids();
+
+        IgniteEx snp = startGridsFromSnapshot(2, SNAPSHOT_NAME);
+
+        assertSnapshotCacheKeys(snp.cache(dfltCacheCfg.getName()));
+    }
+
+    /** @throws Exception If fails. */
+    @Test
+    public void testClusterSnapshotWithRebalancing() throws Exception {
+        IgniteEx ignite = startGridsWithCache(2, dfltCacheCfg, CACHE_KEYS_RANGE);
+
+        TestRecordingCommunicationSpi commSpi = TestRecordingCommunicationSpi.spi(ignite);
+        commSpi.blockMessages((node, msg) -> msg instanceof GridDhtPartitionSupplyMessage);
+
+        startGrid(2);
+
+        ignite.cluster().setBaselineTopology(ignite.cluster().topologyVersion());
+
+        commSpi.waitForBlocked();
+
+        IgniteFuture<Void> fut = ignite.snapshot().createSnapshot(SNAPSHOT_NAME);
+
+        commSpi.stopBlock(true);
+
+        fut.get();
+
+        stopAllGrids();
+
+        IgniteEx snp = startGridsFromSnapshot(3, SNAPSHOT_NAME);
+
+        awaitPartitionMapExchange();
+
+        assertSnapshotCacheKeys(snp.cache(dfltCacheCfg.getName()));
+    }
+
+    /** @throws Exception If fails. */
+    @Test
+    public void testClusterSnapshotWithExplicitPath() throws Exception {
+        File exSnpDir = U.resolveWorkDirectory(U.defaultWorkDirectory(), "ex_snapshots", true);
+
+        try {
+            IgniteEx ignite = null;
+
+            for (int i = 0; i < 2; i++) {
+                IgniteConfiguration cfg = optimize(getConfiguration(getTestIgniteInstanceName(i)));
+
+                cfg.setSnapshotPath(exSnpDir.getAbsolutePath());
+
+                ignite = startGrid(cfg);
+            }
+
+            ignite.cluster().baselineAutoAdjustEnabled(false);
+            ignite.cluster().state(ACTIVE);
+
+            for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+                ignite.cache(DEFAULT_CACHE_NAME).put(i, i);
+
+            forceCheckpoint();
+
+            ignite.snapshot().createSnapshot(SNAPSHOT_NAME)
+                .get();
+
+            stopAllGrids();
+
+            IgniteEx snp = startGridsFromSnapshot(2, cfg -> exSnpDir.getAbsolutePath(), SNAPSHOT_NAME, true);
+
+            assertSnapshotCacheKeys(snp.cache(dfltCacheCfg.getName()));
+        }
+        finally {
+            stopAllGrids();
+
+            U.delete(exSnpDir);
+        }
+    }
+
+    /** @throws Exception If fails. */
+    @Test
+    public void testClusterSnapshotMetrics() throws Exception {
+        String newSnapshotName = SNAPSHOT_NAME + "_new";
+        CountDownLatch deltaApply = new CountDownLatch(1);
+        CountDownLatch deltaBlock = new CountDownLatch(1);
+        IgniteEx ignite = startGridsWithCache(2, dfltCacheCfg, CACHE_KEYS_RANGE);
+
+        MetricRegistry mreg0 = ignite.context().metric().registry(SNAPSHOT_METRICS);
+
+        LongMetric startTime = mreg0.findMetric("LastSnapshotStartTime");
 
 Review comment:
   Fixed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r408799069
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManager.java
 ##########
 @@ -0,0 +1,1986 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.channels.FileChannel;
+import java.nio.file.FileVisitResult;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.SimpleFileVisitor;
+import java.nio.file.attribute.BasicFileAttributes;
+import java.util.ArrayDeque;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Queue;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.Executor;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.concurrent.atomic.LongAdder;
+import java.util.function.BiConsumer;
+import java.util.function.BiFunction;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.events.DiscoveryEvent;
+import org.apache.ignite.failure.FailureContext;
+import org.apache.ignite.failure.FailureType;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.GridTopic;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.events.DiscoveryCustomEvent;
+import org.apache.ignite.internal.managers.communication.GridIoManager;
+import org.apache.ignite.internal.managers.communication.GridMessageListener;
+import org.apache.ignite.internal.managers.communication.TransmissionCancelledException;
+import org.apache.ignite.internal.managers.communication.TransmissionHandler;
+import org.apache.ignite.internal.managers.communication.TransmissionMeta;
+import org.apache.ignite.internal.managers.communication.TransmissionPolicy;
+import org.apache.ignite.internal.managers.discovery.DiscoCache;
+import org.apache.ignite.internal.managers.discovery.DiscoveryCustomMessage;
+import org.apache.ignite.internal.managers.discovery.GridDiscoveryManager;
+import org.apache.ignite.internal.managers.eventstorage.DiscoveryEventListener;
+import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.CacheType;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.PartitionsExchangeAware;
+import org.apache.ignite.internal.processors.cache.persistence.StorageException;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.filename.PdsFolderSettings;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.MetastorageLifecycleListener;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadOnlyMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadWriteMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.tree.io.PageIO;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.processors.marshaller.MappedName;
+import org.apache.ignite.internal.processors.metric.MetricRegistry;
+import org.apache.ignite.internal.processors.metric.impl.LongAdderMetric;
+import org.apache.ignite.internal.util.GridBusyLock;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.distributed.InitMessage;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.tostring.GridToStringInclude;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.A;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.S;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.lang.IgniteUuid;
+import org.apache.ignite.thread.IgniteThreadPoolExecutor;
+import org.apache.ignite.thread.OomExceptionHandler;
+import org.jetbrains.annotations.Nullable;
+
+import static java.nio.file.StandardOpenOption.READ;
+import static org.apache.ignite.cluster.ClusterState.active;
+import static org.apache.ignite.configuration.IgniteConfiguration.DFLT_SNAPSHOT_DIRECTORY;
+import static org.apache.ignite.events.EventType.EVT_NODE_FAILED;
+import static org.apache.ignite.events.EventType.EVT_NODE_LEFT;
+import static org.apache.ignite.internal.IgniteFeatures.PERSISTENCE_CACHE_SNAPSHOT;
+import static org.apache.ignite.internal.IgniteFeatures.nodeSupports;
+import static org.apache.ignite.internal.MarshallerContextImpl.mappingFileStoreWorkDir;
+import static org.apache.ignite.internal.MarshallerContextImpl.saveMappings;
+import static org.apache.ignite.internal.events.DiscoveryCustomEvent.EVT_DISCOVERY_CUSTOM_EVT;
+import static org.apache.ignite.internal.managers.communication.GridIoPolicy.SYSTEM_POOL;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.INDEX_PARTITION;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.MAX_PARTITION_ID;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.resolveBinaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.INDEX_FILE_NAME;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.PART_FILE_TEMPLATE;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFile;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFileName;
+import static org.apache.ignite.internal.processors.cache.persistence.filename.PdsConsistentIdProcessor.DB_DEFAULT_FOLDER;
+import static org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId.getFlagByPartId;
+import static org.apache.ignite.internal.util.IgniteUtils.isLocalNodeCoordinator;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.END_SNAPSHOT;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.START_SNAPSHOT;
+
+/**
+ * Internal implementation of snapshot operations over persistence caches.
+ * <p>
+ * There are two major actions available:
+ * <ul>
+ *     <li>Create snapshot of the whole cluster cache groups by triggering PME to achieve consistency.</li>
+ *     <li>Create local snapshot of requested cache groups and send it to the node which request this operation.
+ *     Cache groups will be transmitted using internal API for transferring files. See {@link TransmissionHandler}.</li>
+ * </ul>
+ */
+public class IgniteSnapshotManager extends GridCacheSharedManagerAdapter
+    implements IgniteSnapshot, PartitionsExchangeAware, MetastorageLifecycleListener {
+    /** File with delta pages suffix. */
+    public static final String DELTA_SUFFIX = ".delta";
+
+    /** File name template consists of delta pages. */
+    public static final String PART_DELTA_TEMPLATE = PART_FILE_TEMPLATE + DELTA_SUFFIX;
+
+    /** File name template for index delta pages. */
+    public static final String INDEX_DELTA_NAME = INDEX_FILE_NAME + DELTA_SUFFIX;
+
+    /** Text Reason for checkpoint to start snapshot operation. */
+    public static final String CP_SNAPSHOT_REASON = "Checkpoint started to enforce snapshot operation: %s";
+
+    /** Name prefix for each remote snapshot operation. */
+    public static final String RMT_SNAPSHOT_PREFIX = "snapshot_";
+
+    /** Default snapshot directory for loading remote snapshots. */
+    public static final String DFLT_SNAPSHOT_TMP_DIR = "snp";
+
+    /** Timeout in millisecond for snapshot operations. */
+    public static final long DFLT_SNAPSHOT_TIMEOUT = 15_000L;
+
+    /** Snapshot in progress error message. */
+    public static final String SNP_IN_PROGRESS_ERR_MSG = "Operation rejected due to the snapshot operation in progress.";
+
+    /** Error message to finalize snapshot tasks. */
+    public static final String SNP_NODE_STOPPING_ERR_MSG = "Snapshot has been cancelled due to the local node " +
+        "is stopping";
+
+    /** Metastorage key to save currently running snapshot. */
+    public static final String SNP_RUNNING_KEY = "snapshot-running";
+
+    /** Snapshot metrics prefix. */
+    public static final String SNAPSHOT_METRICS = "snapshot";
+
+    /** Prefix for snapshot threads. */
+    private static final String SNAPSHOT_RUNNER_THREAD_PREFIX = "snapshot-runner";
+
+    /** Total number of thread to perform local snapshot. */
+    private static final int SNAPSHOT_THREAD_POOL_SIZE = 4;
+
+    /** Default snapshot topic to receive snapshots from remote node. */
+    private static final Object DFLT_INITIAL_SNAPSHOT_TOPIC = GridTopic.TOPIC_SNAPSHOT.topic("rmt_snp");
+
+    /** File transmission parameter of cache group id. */
+    private static final String SNP_GRP_ID_PARAM = "grpId";
+
+    /** File transmission parameter of cache partition id. */
+    private static final String SNP_PART_ID_PARAM = "partId";
+
+    /** File transmission parameter of node-sender directory path with its consistentId (e.g. db/IgniteNode0). */
+    private static final String SNP_DB_NODE_PATH_PARAM = "dbNodePath";
+
+    /** File transmission parameter of a cache directory with is currently sends its partitions. */
+    private static final String SNP_CACHE_DIR_NAME_PARAM = "cacheDirName";
+
+    /** Snapshot parameter name for a file transmission. */
+    private static final String SNP_NAME_PARAM = "snpName";
+
+    /** Total snapshot files count which receiver should expect to receive. */
+    private static final String SNP_PARTITIONS_CNT = "partsCnt";
+
+    /**
+     * Local buffer to perform copy-on-write operations with pages for {@code SnapshotFutureTask.PageStoreSerialWriter}s.
+     * It is important to have only only buffer per thread (instead of creating each buffer per
+     * each {@code SnapshotFutureTask.PageStoreSerialWriter}) this is redundant and can lead to OOM errors. Direct buffer
+     * deallocate only when ByteBuffer is garbage collected, but it can get out of off-heap memory before it.
+     */
+    private final ThreadLocal<ByteBuffer> locBuff;
+
+    /** Map of registered cache snapshot processes and their corresponding contexts. */
+    private final ConcurrentMap<String, SnapshotFutureTask> locSnpTasks = new ConcurrentHashMap<>();
+
+    /** Lock to protect the resources is used. */
+    private final GridBusyLock busyLock = new GridBusyLock();
+
+    /** Requested snapshot from remote node. */
+    private final AtomicReference<RemoteSnapshotFuture> rmtSnpReq = new AtomicReference<>();
+
+    /** Mutex used to order cluster snapshot operation progress. */
+    private final Object snpOpMux = new Object();
+
+    /** Take snapshot operation procedure. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> startSnpProc;
+
+    /** Check previously performed snapshot operation and delete uncompleted files if need. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> endSnpProc;
+
+    /** Resolved persistent data storage settings. */
+    private volatile PdsFolderSettings pdsSettings;
+
+    /** Fully initialized metastorage. */
+    private volatile ReadWriteMetastorage metaStorage;
+
+    /** Local snapshot sender factory. */
+    private Function<String, SnapshotSender> locSndrFactory = localSnapshotSenderFactory();
+
+    /** Main snapshot directory to save created snapshots. */
+    private volatile File locSnpDir;
+
+    /**
+     * Working directory for loaded snapshots from the remote nodes and storing
+     * temporary partition delta-files of locally started snapshot process.
+     */
+    private File tmpWorkDir;
+
+    /** Factory to working with delta as file storage. */
+    private volatile FileIOFactory ioFactory = new RandomAccessFileIOFactory();
+
+    /** Factory to create page store for restore. */
+    private volatile BiFunction<Integer, Boolean, FilePageStoreFactory> storeFactory;
+
+    /** Snapshot thread pool to perform local partition snapshots. */
+    private ExecutorService snpRunner;
+
+    /** System discovery message listener. */
+    private DiscoveryEventListener discoLsnr;
+
+    /** Cluster snapshot operation requested by user. */
+    private ClusterSnapshotFuture clusterSnpFut;
+
+    /** Current snapshot operation on local node. */
+    private volatile SnapshotOperationRequest clusterSnpRq;
+
+    /** {@code true} if recovery process occurred for snapshot. */
+    private volatile boolean recovered;
+
+    /** Last seen cluster snapshot operation. */
+    private volatile ClusterSnapshotFuture lastSeenSnpFut = new ClusterSnapshotFuture();
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public IgniteSnapshotManager(GridKernalContext ctx) {
+        locBuff = ThreadLocal.withInitial(() ->
+            ByteBuffer.allocateDirect(ctx.config().getDataStorageConfiguration().getPageSize())
+                .order(ByteOrder.nativeOrder()));
+
+        startSnpProc = new DistributedProcess<>(ctx, START_SNAPSHOT, this::initLocalSnapshotStartStage,
+            this::processLocalSnapshotStartStageResult);
+
+        endSnpProc = new DistributedProcess<>(ctx, END_SNAPSHOT, this::initLocalSnapshotEndStage,
+            this::processLocalSnapshotEndStageResult);
+    }
+
+    /**
+     * @param snapshotCacheDir Snapshot directory to store files.
+     * @param partId Cache partition identifier.
+     * @return A file representation.
+     */
+    public static File partDeltaFile(File snapshotCacheDir, int partId) {
+        return new File(snapshotCacheDir, partDeltaFileName(partId));
+    }
+
+    /**
+     * @param partId Partition id.
+     * @return File name of delta partition pages.
+     */
+    public static String partDeltaFileName(int partId) {
+        assert partId <= MAX_PARTITION_ID || partId == INDEX_PARTITION;
+
+        return partId == INDEX_PARTITION ? INDEX_DELTA_NAME : String.format(PART_DELTA_TEMPLATE, partId);
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void start0() throws IgniteCheckedException {
+        super.start0();
+
+        GridKernalContext ctx = cctx.kernalContext();
+
+        if (ctx.clientNode())
+            return;
+
+        if (!CU.isPersistenceEnabled(ctx.config()))
+            return;
+
+        snpRunner = new IgniteThreadPoolExecutor(SNAPSHOT_RUNNER_THREAD_PREFIX,
+            cctx.igniteInstanceName(),
+            SNAPSHOT_THREAD_POOL_SIZE,
+            SNAPSHOT_THREAD_POOL_SIZE,
+            IgniteConfiguration.DFLT_THREAD_KEEP_ALIVE_TIME,
+            new LinkedBlockingQueue<>(),
+            SYSTEM_POOL,
+            new OomExceptionHandler(ctx));
+
+        assert cctx.pageStore() instanceof FilePageStoreManager;
+
+        FilePageStoreManager storeMgr = (FilePageStoreManager)cctx.pageStore();
+
+        pdsSettings = cctx.kernalContext().pdsFolderResolver().resolveFolders();
+
+        locSnpDir = resolveSnapshotWorkDirectory(ctx.config());
+        tmpWorkDir = Paths.get(storeMgr.workDir().getAbsolutePath(), DFLT_SNAPSHOT_TMP_DIR).toFile();
+
+        U.ensureDirectory(locSnpDir, "snapshot work directory", log);
+        U.ensureDirectory(tmpWorkDir, "temp directory for snapshot creation", log);
+
+        MetricRegistry mreg = cctx.kernalContext().metric().registry(SNAPSHOT_METRICS);
+
+        mreg.register("LastSnapshotStartTime", () -> lastSeenSnpFut.startTime,
+            "The system time approximated by 10 ms when the last cluster snapshot operation has been started.");
+        mreg.register("LastSnapshotEndTime", () -> lastSeenSnpFut.endTime,
+            "The system time approximated by 10 ms when the last cluster snapshot operation has been finished.");
+        mreg.register("LastSnapshotName", () -> lastSeenSnpFut.name, String.class,
+            "The name of last started cluster snapshot operation.");
+        mreg.register("LastSnapshotErrorMessage",
+            () -> lastSeenSnpFut.error() == null ? null : lastSeenSnpFut.error().getMessage(),
+            String.class,
+            "The error message of last started cluster snapshot operation which fail. This value will be 'null' " +
+                "if last snapshot operation completed successfully.");
+        mreg.register("localSnapshotList", this::getSnapshots, List.class,
+            "The list of all known snapshots currently saved on the local node with respect to " +
+                "configured via IgniteConfiguration a snapshot path.");
+
+        storeFactory = storeMgr::getPageStoreFactory;
+
+        cctx.exchange().registerExchangeAwareComponent(this);
+        ctx.internalSubscriptionProcessor().registerMetastorageListener(this);
+
+        // Receive remote snapshots requests.
+        cctx.gridIO().addMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC, new GridMessageListener() {
+            @Override public void onMessage(UUID nodeId, Object msg, byte plc) {
+                if (!busyLock.enterBusy())
+                    return;
+
+                try {
+                    if (msg instanceof SnapshotRequestMessage) {
+                        SnapshotRequestMessage reqMsg0 = (SnapshotRequestMessage)msg;
+                        String snpName = reqMsg0.snapshotName();
+
+                        synchronized (this) {
+                            SnapshotFutureTask task = lastScheduledRemoteSnapshotTask(nodeId);
+
+                            if (task != null) {
+                                // Task will also be removed from local map due to the listener on future done.
+                                task.cancel();
+
+                                log.info("Snapshot request has been cancelled due to another request received " +
+                                    "[prevSnpResp=" + task + ", msg0=" + reqMsg0 + ']');
+                            }
+                        }
+
+                        SnapshotFutureTask task = registerSnapshotTask(snpName,
+                            nodeId,
+                            reqMsg0.parts(),
+                            remoteSnapshotSender(snpName, nodeId));
+
+                        task.listen(f -> {
+                            if (f.error() == null)
+                                return;
+
+                            U.error(log, "Failed to process request of creating a snapshot " +
+                                "[from=" + nodeId + ", msg=" + reqMsg0 + ']', f.error());
+
+                            try {
+                                cctx.gridIO().sendToCustomTopic(nodeId,
+                                    DFLT_INITIAL_SNAPSHOT_TOPIC,
+                                    new SnapshotResponseMessage(reqMsg0.snapshotName(), f.error().getMessage()),
+                                    SYSTEM_POOL);
+                            }
+                            catch (IgniteCheckedException ex0) {
+                                U.error(log, "Fail to send the response message with processing snapshot request " +
+                                    "error [request=" + reqMsg0 + ", nodeId=" + nodeId + ']', ex0);
+                            }
+                        });
+
+                        task.start();
+                    }
+                    else if (msg instanceof SnapshotResponseMessage) {
+                        SnapshotResponseMessage respMsg0 = (SnapshotResponseMessage)msg;
+
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.snpName.equals(respMsg0.snapshotName())) {
+                            if (log.isInfoEnabled()) {
+                                log.info("A stale snapshot response message has been received. Will be ignored " +
+                                    "[fromNodeId=" + nodeId + ", response=" + respMsg0 + ']');
+                            }
+
+                            return;
+                        }
+
+                        if (respMsg0.errorMessage() != null) {
+                            fut0.onDone(new IgniteCheckedException("Request cancelled. The snapshot operation stopped " +
+                                "on the remote node with an error: " + respMsg0.errorMessage()));
+                        }
+                    }
+                }
+                catch (Throwable e) {
+                    U.error(log, "Processing snapshot request from remote node fails with an error", e);
+
+                    cctx.kernalContext().failure().process(new FailureContext(FailureType.CRITICAL_ERROR, e));
+                }
+                finally {
+                    busyLock.leaveBusy();
+                }
+            }
+        });
+
+        cctx.gridEvents().addDiscoveryEventListener(discoLsnr = (evt, discoCache) -> {
+            if (!busyLock.enterBusy())
+                return;
+
+            try {
+                UUID leftNodeId = evt.eventNode().id();
+
+                if (evt.type() == EVT_DISCOVERY_CUSTOM_EVT) {
+                    DiscoveryCustomEvent evt0 = (DiscoveryCustomEvent)evt;
+
+                    if (evt0.customMessage() instanceof InitMessage) {
+                        InitMessage<?> msg = (InitMessage<?>)evt0.customMessage();
+
+                        // This happens when #takeSnapshot() method already invoked and distributed process
+                        // starts its action.
+                        if (msg.type() == START_SNAPSHOT.ordinal()) {
+                            assert clusterSnpRq != null ||
+                                !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()) : evt;
+
+                            DiscoveryCustomEvent customEvt = new DiscoveryCustomEvent();
+
+                            customEvt.node(evt0.node());
+                            customEvt.eventNode(evt0.eventNode());
+                            customEvt.affinityTopologyVersion(evt0.affinityTopologyVersion());
+                            customEvt.customMessage(new SnapshotStartDiscoveryMessage(discoCache, msg.processId()));
+
+                            // Handle new event inside discovery thread, so no guarantees will be violated.
+                            cctx.exchange().onDiscoveryEvent(customEvt, discoCache);
+                        }
+                    }
+                }
+                else if (evt.type() == EVT_NODE_LEFT || evt.type() == EVT_NODE_FAILED) {
+                    SnapshotOperationRequest snpRq = clusterSnpRq;
+
+                    for (SnapshotFutureTask sctx : locSnpTasks.values()) {
+                        if (sctx.sourceNodeId().equals(leftNodeId) ||
+                            (snpRq != null &&
+                                snpRq.snpName.equals(sctx.snapshotName()) &&
+                                snpRq.bltNodes.contains(leftNodeId))) {
+                            sctx.acceptException(new ClusterTopologyCheckedException("The node which requested snapshot " +
+                                "creation has left the grid"));
+                        }
+                    }
+
+                    RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                    if (snpTrFut != null && snpTrFut.rmtNodeId.equals(leftNodeId)) {
+                        snpTrFut.onDone(new ClusterTopologyCheckedException("The node from which a snapshot has been " +
+                            "requested left the grid"));
+                    }
+                }
+            }
+            finally {
+                busyLock.leaveBusy();
+            }
+        }, EVT_NODE_LEFT, EVT_NODE_FAILED, EVT_DISCOVERY_CUSTOM_EVT);
+
+        // Remote snapshot handler.
+        cctx.kernalContext().io().addTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC, new TransmissionHandler() {
+            @Override public void onEnd(UUID nodeId) {
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                assert snpTrFut.stores.isEmpty() : snpTrFut.stores.entrySet();
+                assert snpTrFut.partsLeft == 0 : snpTrFut;
+
+                snpTrFut.onDone();
+
+                log.info("Requested snapshot from remote node has been fully received " +
+                    "[snpName=" + snpTrFut.snpName + ", snpTrans=" + snpTrFut + ']');
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onException(UUID nodeId, Throwable err) {
+                RemoteSnapshotFuture fut = rmtSnpReq.get();
+
+                if (fut == null)
+                    return;
+
+                if (fut.rmtNodeId.equals(nodeId))
+                    fut.onDone(err);
+            }
+
+            /** {@inheritDoc} */
+            @Override public String filePath(UUID nodeId, TransmissionMeta fileMeta) {
+                Integer partId = (Integer)fileMeta.params().get(SNP_PART_ID_PARAM);
+                String rmtDbNodePath = (String)fileMeta.params().get(SNP_DB_NODE_PATH_PARAM);
+                String cacheDirName = (String)fileMeta.params().get(SNP_CACHE_DIR_NAME_PARAM);
+
+                RemoteSnapshotFuture transFut = resolve(nodeId, fileMeta);
+
+                try {
+                    File cacheDir = U.resolveWorkDirectory(tmpWorkDir.getAbsolutePath(),
+                        Paths.get(transFut.snpName, rmtDbNodePath, cacheDirName).toString(),
+                        false);
+
+                    return new File(cacheDir, getPartitionFileName(partId)).getAbsolutePath();
+                }
+                catch (IgniteCheckedException e) {
+                    throw new IgniteException(e);
+                }
+            }
+
+            /**
+             * @param nodeId Remote node id.
+             * @param meta Transmission meta.
+             * @return Resolved transmission future.
+             */
+            private RemoteSnapshotFuture resolve(UUID nodeId, TransmissionMeta meta) {
+                String snpName = (String)meta.params().get(SNP_NAME_PARAM);
+                Integer partsCnt = (Integer)meta.params().get(SNP_PARTITIONS_CNT);
+
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                if (snpTrFut == null || !snpTrFut.snpName.equals(snpName)) {
+                    throw new TransmissionCancelledException("Stale snapshot transmission will be ignored " +
+                        "[snpName=" + snpName + ", meta=" + meta + ", snpTrFut=" + snpTrFut + ']');
+                }
+
+                assert snpTrFut.snpName.equals(snpName) && snpTrFut.rmtNodeId.equals(nodeId) :
+                    "Another transmission in progress [snpTrFut=" + snpTrFut + ", nodeId=" + snpName + ']';
+
+                if (snpTrFut.partsLeft == -1)
+                    snpTrFut.partsLeft = partsCnt;
+
+                return snpTrFut;
+            }
+
+            /**
+             * @param snpTrans Current snapshot transmission.
+             * @param grpPartId Pair of group id and its partition id.
+             */
+            private void finishRecover(RemoteSnapshotFuture snpTrans, GroupPartitionId grpPartId) {
+                FilePageStore pageStore = null;
+
+                try {
+                    pageStore = snpTrans.stores.remove(grpPartId);
+
+                    pageStore.finishRecover();
+
+                    snpTrans.partConsumer.accept(new File(pageStore.getFileAbsolutePath()), grpPartId);
+
+                    snpTrans.partsLeft--;
+                }
+                catch (StorageException e) {
+                    throw new IgniteException(e);
+                }
+                finally {
+                    U.closeQuiet(pageStore);
+                }
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<ByteBuffer> chunkHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+
+                RemoteSnapshotFuture snpTrFut = resolve(nodeId, initMeta);
+
+                GroupPartitionId grpPartId = new GroupPartitionId(grpId, partId);
+                FilePageStore pageStore = snpTrFut.stores.get(grpPartId);
+
+                if (pageStore == null) {
+                    throw new IgniteException("Partition must be loaded before applying snapshot delta pages " +
+                        "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                pageStore.beginRecover();
+
+                // No snapshot delta pages received. Finalize recovery.
+                if (initMeta.count() == 0)
+                    finishRecover(snpTrFut, grpPartId);
+
+                return new Consumer<ByteBuffer>() {
+                    final LongAdder transferred = new LongAdder();
+
+                    @Override public void accept(ByteBuffer buff) {
+                        try {
+                            assert initMeta.count() != 0 : initMeta;
+
+                            RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                            if (fut0 == null || !fut0.equals(snpTrFut) || fut0.isCancelled()) {
+                                throw new TransmissionCancelledException("Snapshot request is cancelled " +
+                                    "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                            }
+
+                            pageStore.write(PageIO.getPageId(buff), buff, 0, false);
+
+                            transferred.add(buff.capacity());
+
+                            if (transferred.longValue() == initMeta.count())
+                                finishRecover(snpTrFut, grpPartId);
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                    }
+                };
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<File> fileHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+                String snpName = (String)initMeta.params().get(SNP_NAME_PARAM);
+
+                assert grpId != null;
+                assert partId != null;
+                assert snpName != null;
+                assert storeFactory != null;
+
+                RemoteSnapshotFuture transFut = rmtSnpReq.get();
+
+                if (transFut == null) {
+                    throw new IgniteException("Snapshot transmission with given name doesn't exists " +
+                        "[snpName=" + snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                return new Consumer<File>() {
+                    @Override public void accept(File file) {
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.equals(transFut) || fut0.isCancelled()) {
+                            throw new TransmissionCancelledException("Snapshot request is cancelled [snpName=" + snpName +
+                                ", grpId=" + grpId + ", partId=" + partId + ']');
+                        }
+
+                        busyLock.enterBusy();
+
+                        try {
+                            FilePageStore pageStore = (FilePageStore)storeFactory
+                                .apply(grpId, false)
+                                .createPageStore(getFlagByPartId(partId),
+                                    file::toPath,
+                                    new LongAdderMetric("NO_OP", null));
+
+                            transFut.stores.put(new GroupPartitionId(grpId, partId), pageStore);
+
+                            pageStore.init();
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                        finally {
+                            busyLock.leaveBusy();
+                        }
+                    }
+                };
+            }
+        });
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void stop0(boolean cancel) {
+        busyLock.block();
+
+        try {
+            // Try stop all snapshot processing if not yet.
+            for (SnapshotFutureTask sctx : locSnpTasks.values())
+                sctx.acceptException(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+            locSnpTasks.clear();
+
+            RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+            if (snpTrFut != null)
+                snpTrFut.cancel();
+
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null) {
+                    clusterSnpFut.onDone(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+                    clusterSnpFut = null;
+                }
+            }
+
+            if (snpRunner != null)
+                snpRunner.shutdownNow();
+
+            cctx.kernalContext().io().removeMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC);
+            cctx.kernalContext().io().removeTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC);
+
+            if (discoLsnr != null)
+                cctx.kernalContext().event().removeDiscoveryEventListener(discoLsnr);
+
+            cctx.exchange().unregisterExchangeAwareComponent(this);
+        }
+        finally {
+            busyLock.unblock();
+        }
+    }
+
+    /**
+     * @param snpDir Snapshot dire
+     * @param folderName Local node folder name (see {@link U#maskForFileName} with consistent id).
+     */
+    public static void deleteSnapshot(File snpDir, String folderName) {
+        if (!snpDir.exists())
+            return;
+
+        assert snpDir.isDirectory() : snpDir;
+
+        try {
+            File binDir = resolveBinaryWorkDir(snpDir.getAbsolutePath(), folderName);
+            File dbDir = U.resolveWorkDirectory(snpDir.getAbsolutePath(), databaseRelativePath(folderName), false);
+
+            U.delete(binDir);
+            U.delete(dbDir);
+
+            File marshDir = mappingFileStoreWorkDir(snpDir.getAbsolutePath());
+
+            // Concurrently traverse the snapshot marshaller directory and delete all files.
+            Files.walkFileTree(marshDir.toPath(), new SimpleFileVisitor<Path>() {
+                @Override public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException {
+                    U.delete(file);
+
+                    return FileVisitResult.CONTINUE;
+                }
+
+                @Override public FileVisitResult visitFileFailed(Path file, IOException exc) {
+                    // Skip files which can be concurrently removed from FileTree.
+                    return FileVisitResult.CONTINUE;
+                }
+            });
+
+            File db = new File(snpDir, DB_DEFAULT_FOLDER);
+
+            if (!db.exists() || db.list().length == 0)
+                U.delete(snpDir);
+        }
+        catch (IOException | IgniteCheckedException e) {
+            throw new IgniteException(e);
+        }
+    }
+
+    /**
+     * @param snpName Snapshot name.
+     * @return Local snapshot directory for snapshot with given name.
+     */
+    public File snapshotLocalDir(String snpName) {
+        assert locSnpDir != null;
+
+        return new File(locSnpDir, snpName);
+    }
+
+    /**
+     * @return Node snapshot working directory.
+     */
+    public File snapshotTmpDir() {
+        assert tmpWorkDir != null;
+
+        return tmpWorkDir;
+    }
+
+    /**
+     * @param req Request on snapshot creation.
+     * @return Future which will be completed when a snapshot has been started.
+     */
+    private IgniteInternalFuture<SnapshotOperationResponse> initLocalSnapshotStartStage(SnapshotOperationRequest req) {
+        if (cctx.kernalContext().clientNode() ||
+            !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()))
+            return new GridFinishedFuture<>();
+
+        // Executed inside discovery notifier thread, prior to firing discovery custom event,
+        // so it is safe to set new snapshot task inside this method without synchronization.
+        if (clusterSnpRq != null) {
+            return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot operation has been rejected. " +
+                "Another snapshot operation in progress [req=" + req + ", curr=" + clusterSnpRq + ']'));
+        }
+
+        // Collection of pairs group and appropriate cache partition to be snapshot.
+        Map<Integer, Set<Integer>> parts = new HashMap<>();
+
+        for (Integer grpId : req.grpIds)
+            parts.put(grpId, null);
+
+        SnapshotFutureTask task0 = registerSnapshotTask(req.snpName,
+            req.srcNodeId,
+            parts,
+            locSndrFactory.apply(req.snpName));
+
+        clusterSnpRq = req;
+
+        return task0.chain(f -> new SnapshotOperationResponse());
+    }
+
+    /**
+     * @param id Request id.
+     * @param res Results.
+     * @param err Errors.
+     */
+    private void processLocalSnapshotStartStageResult(UUID id, Map<UUID, SnapshotOperationResponse> res, Map<UUID, Exception> err) {
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        if (snpRq == null)
+            return;
+
+        if (!snpRq.rqId.equals(id)) {
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null && clusterSnpFut.rqId.equals(id)) {
+                    clusterSnpFut.onDone(new IgniteCheckedException("Snapshot operation failed due to another snapshot " +
+                        "operation in progress: " + snpRq.snpName));
+
+                    clusterSnpFut = null;
+                }
+
+                return;
+            }
+        }
+
+        if (isLocalNodeCoordinator(cctx.discovery())) {
+            Set<UUID> missed = new HashSet<>(snpRq.bltNodes);
+            missed.removeAll(res.keySet());
+            missed.removeAll(err.keySet());
+
+            snpRq.hasErr = !F.isEmpty(err) || !missed.isEmpty();
+
+            if (snpRq.hasErr) {
+                U.warn(log, "Execution of local snapshot tasks fails or them haven't been executed " +
+                    "due to some of nodes left the cluster. Uncompleted snapshot will be deleted " +
+                    "[err=" + err + ", missed=" + missed + ']');
+            }
+
+            endSnpProc.start(UUID.randomUUID(), snpRq);
+        }
+    }
+
+    /**
+     * @param req Request on snapshot creation.
+     * @return Future which will be completed when the snapshot will be finalized.
+     */
+    private IgniteInternalFuture<SnapshotOperationResponse> initLocalSnapshotEndStage(SnapshotOperationRequest req) {
+        if (clusterSnpRq == null)
+            return new GridFinishedFuture<>(new SnapshotOperationResponse());
+
+        try {
+            if (req.hasErr)
+                deleteSnapshot(snapshotLocalDir(req.snpName), pdsSettings.folderName());
+
+            removeLastMetaStorageKey();
+        }
+        catch (Exception e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        return new GridFinishedFuture<>(new SnapshotOperationResponse());
+    }
+
+    /**
+     * @param id Request id.
+     * @param res Results.
+     * @param err Errors.
+     */
+    private void processLocalSnapshotEndStageResult(UUID id, Map<UUID, SnapshotOperationResponse> res, Map<UUID, Exception> err) {
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        if (snpRq == null)
+            return;
+
+        Set<UUID> endFail = new HashSet<>(snpRq.bltNodes);
+        endFail.removeAll(res.keySet());
+
+        clusterSnpRq = null;
+
+        synchronized (snpOpMux) {
+            if (clusterSnpFut != null) {
+                if (endFail.isEmpty() && !snpRq.hasErr) {
+                    clusterSnpFut.onDone();
+
+                    if (log.isInfoEnabled())
+                        log.info("Cluster-wide snapshot operation finished successfully [req=" + snpRq + ']');
+                }
+                else {
+                    clusterSnpFut.onDone(new IgniteCheckedException("Snapshot creation has been finished with an error. " +
+                        "Local snapshot tasks may not finished completely or finalizing results fails " +
+                        "[hasErr" + snpRq.hasErr + ", fail=" + endFail + ']'));
+                }
+
+                clusterSnpFut = null;
+            }
+        }
+    }
+
+    /**
+     * @return {@code True} if snapshot operation is in progress.
+     */
+    public boolean isSnapshotCreating() {
+        if (clusterSnpRq != null)
+            return true;
+
+        synchronized (snpOpMux) {
+            return clusterSnpRq != null || clusterSnpFut != null;
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public List<String> getSnapshots() {
+        if (cctx.kernalContext().clientNode())
+            throw new UnsupportedOperationException("Client and daemon nodes can not perform this operation.");
+
+        synchronized (snpOpMux) {
+            return Arrays.stream(locSnpDir.listFiles(File::isDirectory))
+                .map(File::getName)
+                .collect(Collectors.toList());
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public IgniteFuture<Void> createSnapshot(String name) {
+        A.notNullOrEmpty(name, "name");
+
+        try {
+            if (cctx.kernalContext().clientNode())
+                throw new UnsupportedOperationException("Client and daemon nodes can not perform this operation.");
+
+            if (!IgniteFeatures.allNodesSupports(cctx.discovery().allNodes(), PERSISTENCE_CACHE_SNAPSHOT))
+                throw new IgniteException("Not all nodes in the cluster support a snapshot operation.");
+
+            if (!active(cctx.kernalContext().state().clusterState().state()))
+                throw new IgniteException("Snapshot operation has been rejected. The cluster is inactive.");
+
+            DiscoveryDataClusterState clusterState = cctx.kernalContext().state().clusterState();
+
+            if (!clusterState.hasBaselineTopology())
+                throw new IgniteException("Snapshot operation has been rejected. The baseline topology is not configured for cluster.");
+
+            ClusterSnapshotFuture snpFut0;
+
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null && !clusterSnpFut.isDone())
+                    throw new IgniteException("Create snapshot request has been rejected. The previous snapshot operation was not completed.");
+
+                if (clusterSnpRq != null)
+                    throw new IgniteException("Create snapshot request has been rejected. Parallel snapshot processes are not allowed.");
+
+                if (getSnapshots().contains(name))
+                    throw new IgniteException("Create snapshot request has been rejected. Snapshot with given name already exists.");
+
+                snpFut0 = new ClusterSnapshotFuture(UUID.randomUUID(), name);
+
+                clusterSnpFut = snpFut0;
+                lastSeenSnpFut = snpFut0;
+            }
+
+            List<Integer> grps = cctx.cache().persistentGroups().stream()
+                .filter(g -> cctx.cache().cacheType(g.cacheOrGroupName()) == CacheType.USER)
+                .filter(g -> !g.config().isEncryptionEnabled())
+                .map(CacheGroupDescriptor::groupId)
+                .collect(Collectors.toList());
+
+            List<ClusterNode> srvNodes = cctx.discovery().serverNodes(AffinityTopologyVersion.NONE);
+
+            startSnpProc.start(snpFut0.rqId, new SnapshotOperationRequest(snpFut0.rqId,
+                cctx.localNodeId(),
+                name,
+                grps,
+                new HashSet<>(F.viewReadOnly(srvNodes,
+                    F.node2id(),
+                    (node) -> CU.baselineNode(node, clusterState)))));
+
+            if (log.isInfoEnabled())
+                log.info("Cluster-wide snapshot operation started [snpName=" + name + ", grps=" + grps + ']');
+
+            return new IgniteFutureImpl<>(snpFut0);
+        }
+        catch (Exception e) {
+            U.error(log, "Start snapshot operation failed", e);
+
+            lastSeenSnpFut = new ClusterSnapshotFuture(name, e);
+
+            return new IgniteFinishedFutureImpl<>(e);
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onReadyForReadWrite(ReadWriteMetastorage metaStorage) throws IgniteCheckedException {
+        synchronized (snpOpMux) {
+            this.metaStorage = metaStorage;
+
+            if (recovered)
+                removeLastMetaStorageKey();
+
+            recovered = false;
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onReadyForRead(ReadOnlyMetastorage metaStorage) throws IgniteCheckedException {
+        // Snapshot which has not been completed due to the local node crashed must be deleted.
+        String snpName = (String)metaStorage.read(SNP_RUNNING_KEY);
+
+        if (snpName == null)
+            return;
+
+        recovered = true;
+
+        for (File tmp : snapshotTmpDir().listFiles())
+            U.delete(tmp);
+
+        deleteSnapshot(snapshotLocalDir(snpName), pdsSettings.folderName());
+
+        if (log.isInfoEnabled()) {
+            log.info("Previous attempt to create snapshot fail due to the local node crash. All resources " +
+                "related to snapshot operation have been deleted: " + snpName);
+        }
+    }
+
+    /**
+     * @param evt Discovery event to check.
+     * @return {@code true} if exchange started by snapshot operation.
+     */
+    public static boolean isSnapshotOperation(DiscoveryEvent evt) {
+        return !evt.eventNode().isClient() &&
+            evt.type() == EVT_DISCOVERY_CUSTOM_EVT &&
+            ((DiscoveryCustomEvent)evt).customMessage() instanceof SnapshotStartDiscoveryMessage;
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onDoneBeforeTopologyUnlock(GridDhtPartitionsExchangeFuture fut) {
+        if (clusterSnpRq == null || cctx.kernalContext().clientNode())
+            return;
+
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        SnapshotFutureTask task = locSnpTasks.get(snpRq.snpName);
+
+        if (task == null)
+            return;
+
+        if (task.start()) {
+            cctx.database().forceCheckpoint(String.format("Start snapshot operation: %s", snpRq.snpName));
+
+            // schedule task on checkpoint and wait when it starts
+            try {
+                task.awaitStarted();
+            }
+            catch (IgniteCheckedException e) {
+                U.error(log, "Fail to wait while cluster-wide snapshot operation started", e);
+            }
+        }
+    }
+
+    /**
+     * @param parts Collection of pairs group and appropriate cache partition to be snapshot.
+     * @param rmtNodeId The remote node to connect to.
+     * @param partConsumer Received partition handler.
+     * @return Future which will be completed when requested snapshot fully received.
+     */
+    public IgniteInternalFuture<Void> createRemoteSnapshot(
+        UUID rmtNodeId,
+        Map<Integer, Set<Integer>> parts,
+        BiConsumer<File, GroupPartitionId> partConsumer
+    ) {
+        assert partConsumer != null;
+
+        ClusterNode rmtNode = cctx.discovery().node(rmtNodeId);
+
+        if (!nodeSupports(rmtNode, PERSISTENCE_CACHE_SNAPSHOT))
+            return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot on remote node is not supported: " + rmtNode.id()));
+
+        if (rmtNode == null) {
+            return new GridFinishedFuture<>(new ClusterTopologyCheckedException("Snapshot request cannot be performed. " +
+                "Remote node left the grid [rmtNodeId=" + rmtNodeId + ']'));
+        }
+
+        String snpName = RMT_SNAPSHOT_PREFIX + UUID.randomUUID().toString();
+
+        RemoteSnapshotFuture snpTransFut = new RemoteSnapshotFuture(rmtNodeId, snpName, partConsumer);
+
+        busyLock.enterBusy();
+        SnapshotRequestMessage msg0;
+
+        try {
+            msg0 = new SnapshotRequestMessage(snpName, parts);
+
+            RemoteSnapshotFuture fut = rmtSnpReq.get();
+
+            try {
+                if (fut != null)
+                    fut.get(DFLT_SNAPSHOT_TIMEOUT, TimeUnit.MILLISECONDS);
+            }
+            catch (IgniteCheckedException e) {
+                if (log.isInfoEnabled())
+                    log.info("The previous snapshot request finished with an exception:" + e.getMessage());
+            }
+
+            try {
+                if (rmtSnpReq.compareAndSet(null, snpTransFut)) {
+                    cctx.gridIO().sendOrderedMessage(rmtNode, DFLT_INITIAL_SNAPSHOT_TOPIC, msg0, SYSTEM_POOL,
+                        Long.MAX_VALUE, true);
+                }
+                else
+                    return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot request has been concurrently interrupted."));
+
+            }
+            catch (IgniteCheckedException e) {
+                rmtSnpReq.compareAndSet(snpTransFut, null);
+
+                return new GridFinishedFuture<>(e);
+            }
+        }
+        finally {
+            busyLock.leaveBusy();
+        }
+
+        if (log.isInfoEnabled()) {
+            log.info("Snapshot request is sent to the remote node [rmtNodeId=" + rmtNodeId +
+                ", msg0=" + msg0 + ", snpTransFut=" + snpTransFut +
+                ", topVer=" + cctx.discovery().topologyVersionEx() + ']');
+        }
+
+        return snpTransFut;
+    }
+
+    /**
+     * @param grps List of cache groups which will be destroyed.
+     */
+    public void onCacheGroupsStopped(List<Integer> grps) {
+        for (SnapshotFutureTask sctx : locSnpTasks.values()) {
+            Set<Integer> retain = new HashSet<>(grps);
+            retain.retainAll(sctx.affectedCacheGroups());
+
+            if (!retain.isEmpty()) {
+                sctx.acceptException(new IgniteCheckedException("Snapshot has been interrupted due to some of the required " +
+                    "cache groups stopped: " + retain));
+            }
+        }
+    }
+
+    /**
+     * @param snpName Unique snapshot name.
+     * @param srcNodeId Node id which cause snapshot operation.
+     * @param parts Collection of pairs group and appropriate cache partition to be snapshot.
+     * @param snpSndr Factory which produces snapshot receiver instance.
+     * @return Snapshot operation task which should be registered on checkpoint to run.
+     */
+    SnapshotFutureTask registerSnapshotTask(
+        String snpName,
+        UUID srcNodeId,
+        Map<Integer, Set<Integer>> parts,
+        SnapshotSender snpSndr
+    ) {
+        if (!busyLock.enterBusy())
+            return new SnapshotFutureTask(new IgniteCheckedException("Snapshot manager is stopping [locNodeId=" + cctx.localNodeId() + ']'));
+
+        try {
+            if (locSnpTasks.containsKey(snpName))
+                return new SnapshotFutureTask(new IgniteCheckedException("Snapshot with requested name is already scheduled: " + snpName));
+
+            SnapshotFutureTask snpFutTask;
+
+            SnapshotFutureTask prev = locSnpTasks.putIfAbsent(snpName,
+                snpFutTask = new SnapshotFutureTask(cctx,
+                    srcNodeId,
+                    snpName,
+                    tmpWorkDir,
+                    ioFactory,
+                    snpSndr,
+                    parts,
+                    locBuff));
+
+            if (prev != null)
+                return new SnapshotFutureTask(new IgniteCheckedException("Snapshot with requested name is already scheduled: " + snpName));
+
+            if (log.isInfoEnabled()) {
+                log.info("Snapshot task has been registered on local node [sctx=" + this +
+                    ", topVer=" + cctx.discovery().topologyVersionEx() + ']');
+            }
+
+            snpFutTask.listen(f -> locSnpTasks.remove(snpName));
+
+            return snpFutTask;
+        }
+        finally {
+            busyLock.leaveBusy();
+        }
+    }
+
+    /**
+     * @param factory Factory which produces {@link LocalSnapshotSender} implementation.
+     */
+    void setLocalSnapshotSenderFactory(Function<String, SnapshotSender> factory) {
+        locSndrFactory = factory;
+    }
+
+    /**
+     * @return Factory which produces {@link LocalSnapshotSender} implementation.
+     */
+    Function<String, SnapshotSender> localSnapshotSenderFactory() {
 
 Review comment:
   Just a notice: perhaps `localSnapshotSenderFactory` should be getter for `locSndrFactory`, and `locSndrFactory` should be `LocalSnapshotSender::new` by default. Also `setLocalSnapshotSenderFactory` can be renamed to match Ignite style setters.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r408270874
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManager.java
 ##########
 @@ -0,0 +1,1906 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.channels.FileChannel;
+import java.nio.file.FileVisitResult;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.SimpleFileVisitor;
+import java.nio.file.attribute.BasicFileAttributes;
+import java.util.ArrayDeque;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Queue;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.Executor;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.concurrent.atomic.LongAdder;
+import java.util.function.BiConsumer;
+import java.util.function.BiFunction;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.events.DiscoveryEvent;
+import org.apache.ignite.failure.FailureContext;
+import org.apache.ignite.failure.FailureType;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.GridTopic;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.events.DiscoveryCustomEvent;
+import org.apache.ignite.internal.managers.communication.GridIoManager;
+import org.apache.ignite.internal.managers.communication.GridMessageListener;
+import org.apache.ignite.internal.managers.communication.TransmissionCancelledException;
+import org.apache.ignite.internal.managers.communication.TransmissionHandler;
+import org.apache.ignite.internal.managers.communication.TransmissionMeta;
+import org.apache.ignite.internal.managers.communication.TransmissionPolicy;
+import org.apache.ignite.internal.managers.discovery.DiscoCache;
+import org.apache.ignite.internal.managers.discovery.DiscoveryCustomMessage;
+import org.apache.ignite.internal.managers.discovery.GridDiscoveryManager;
+import org.apache.ignite.internal.managers.eventstorage.DiscoveryEventListener;
+import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.CacheType;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.PartitionsExchangeAware;
+import org.apache.ignite.internal.processors.cache.persistence.StorageException;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.filename.PdsFolderSettings;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.MetastorageLifecycleListener;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadOnlyMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadWriteMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.tree.io.PageIO;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.processors.marshaller.MappedName;
+import org.apache.ignite.internal.processors.metric.impl.LongAdderMetric;
+import org.apache.ignite.internal.util.GridBusyLock;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.distributed.InitMessage;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.tostring.GridToStringInclude;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.S;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.lang.IgniteUuid;
+import org.apache.ignite.thread.IgniteThreadPoolExecutor;
+import org.apache.ignite.thread.OomExceptionHandler;
+import org.jetbrains.annotations.Nullable;
+
+import static java.nio.file.StandardOpenOption.READ;
+import static org.apache.ignite.cluster.ClusterState.active;
+import static org.apache.ignite.configuration.IgniteConfiguration.DFLT_SNAPSHOT_DIRECTORY;
+import static org.apache.ignite.events.EventType.EVT_NODE_FAILED;
+import static org.apache.ignite.events.EventType.EVT_NODE_LEFT;
+import static org.apache.ignite.internal.IgniteFeatures.PERSISTENCE_CACHE_SNAPSHOT;
+import static org.apache.ignite.internal.IgniteFeatures.nodeSupports;
+import static org.apache.ignite.internal.MarshallerContextImpl.saveMappings;
+import static org.apache.ignite.internal.events.DiscoveryCustomEvent.EVT_DISCOVERY_CUSTOM_EVT;
+import static org.apache.ignite.internal.managers.communication.GridIoPolicy.SYSTEM_POOL;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.INDEX_PARTITION;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.MAX_PARTITION_ID;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.INDEX_FILE_NAME;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.PART_FILE_TEMPLATE;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFile;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFileName;
+import static org.apache.ignite.internal.processors.cache.persistence.filename.PdsConsistentIdProcessor.DB_DEFAULT_FOLDER;
+import static org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId.getFlagByPartId;
+import static org.apache.ignite.internal.util.IgniteUtils.isLocalNodeCoordinator;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.END_SNAPSHOT;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.START_SNAPSHOT;
+
+/**
+ * Internal implementation of snapshot operations over persistence caches.
+ * <p>
+ * There are two major actions available:
+ * <ul>
+ *     <li>Create snapshot of the whole cluster cache groups by triggering PME to achieve consistency.</li>
+ *     <li>Create local snapshot of requested cache groups and send it to the node which request this operation.
+ *     Cache groups will be transmitted using internal API for transferring files. See {@link TransmissionHandler}.</li>
+ * </ul>
+ */
+public class IgniteSnapshotManager extends GridCacheSharedManagerAdapter
+    implements IgniteSnapshot, PartitionsExchangeAware, MetastorageLifecycleListener {
+    /** File with delta pages suffix. */
+    public static final String DELTA_SUFFIX = ".delta";
+
+    /** File name template consists of delta pages. */
+    public static final String PART_DELTA_TEMPLATE = PART_FILE_TEMPLATE + DELTA_SUFFIX;
+
+    /** File name template for index delta pages. */
+    public static final String INDEX_DELTA_NAME = INDEX_FILE_NAME + DELTA_SUFFIX;
+
+    /** Text Reason for checkpoint to start snapshot operation. */
+    public static final String CP_SNAPSHOT_REASON = "Checkpoint started to enforce snapshot operation: %s";
+
+    /** Name prefix for each remote snapshot operation. */
+    public static final String RMT_SNAPSHOT_PREFIX = "snapshot_";
+
+    /** Default snapshot directory for loading remote snapshots. */
+    public static final String DFLT_SNAPSHOT_TMP_DIR = "snp";
+
+    /** Timeout in millisecond for snapshot operations. */
+    public static final long DFLT_SNAPSHOT_TIMEOUT = 15_000L;
+
+    /** Snapshot in progress error message. */
+    public static final String SNP_IN_PROGRESS_ERR_MSG = "Operation rejected due to the snapshot operation in progress.";
+
+    /** Error message to finalize snapshot tasks. */
+    public static final String SNP_NODE_STOPPING_ERR_MSG = "Snapshot has been cancelled due to the local node " +
+        "is stopping";
+
+    /** Metastorage key to save currently running snapshot. */
+    public static final String SNP_RUNNING_KEY = "snapshot-running";
+
+    /** Prefix for snapshot threads. */
+    private static final String SNAPSHOT_RUNNER_THREAD_PREFIX = "snapshot-runner";
+
+    /** Total number of thread to perform local snapshot. */
+    private static final int SNAPSHOT_THREAD_POOL_SIZE = 4;
+
+    /** Default snapshot topic to receive snapshots from remote node. */
+    private static final Object DFLT_INITIAL_SNAPSHOT_TOPIC = GridTopic.TOPIC_SNAPSHOT.topic("rmt_snp");
+
+    /** File transmission parameter of cache group id. */
+    private static final String SNP_GRP_ID_PARAM = "grpId";
+
+    /** File transmission parameter of cache partition id. */
+    private static final String SNP_PART_ID_PARAM = "partId";
+
+    /** File transmission parameter of node-sender directory path with its consistentId (e.g. db/IgniteNode0). */
+    private static final String SNP_DB_NODE_PATH_PARAM = "dbNodePath";
+
+    /** File transmission parameter of a cache directory with is currently sends its partitions. */
+    private static final String SNP_CACHE_DIR_NAME_PARAM = "cacheDirName";
+
+    /** Snapshot parameter name for a file transmission. */
+    private static final String SNP_NAME_PARAM = "snpName";
+
+    /** Total snapshot files count which receiver should expect to receive. */
+    private static final String SNP_PARTITIONS_CNT = "partsCnt";
+
+    /**
+     * Local buffer to perform copy-on-write operations with pages for {@code SnapshotFutureTask.PageStoreSerialWriter}s.
+     * It is important to have only only buffer per thread (instead of creating each buffer per
+     * each {@code SnapshotFutureTask.PageStoreSerialWriter}) this is redundant and can lead to OOM errors. Direct buffer
+     * deallocate only when ByteBuffer is garbage collected, but it can get out of off-heap memory before it.
+     */
+    private final ThreadLocal<ByteBuffer> locBuff;
+
+    /** Map of registered cache snapshot processes and their corresponding contexts. */
+    private final ConcurrentMap<String, SnapshotFutureTask> locSnpTasks = new ConcurrentHashMap<>();
+
+    /** Lock to protect the resources is used. */
+    private final GridBusyLock busyLock = new GridBusyLock();
+
+    /** Requested snapshot from remote node. */
+    private final AtomicReference<RemoteSnapshotFuture> rmtSnpReq = new AtomicReference<>();
+
+    /** Mutex used to order cluster snapshot operation progress. */
+    private final Object snpOpMux = new Object();
+
+    /** Take snapshot operation procedure. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> startSnpProc;
+
+    /** Check previously performed snapshot operation and delete uncompleted files if need. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> endSnpProc;
+
+    /** Resolved persistent data storage settings. */
+    private volatile PdsFolderSettings pdsSettings;
+
+    /** Fully initialized metastorage. */
+    private volatile ReadWriteMetastorage metaStorage;
+
+    /** Local snapshot sender factory. */
+    private Function<String, SnapshotSender> locSndrFactory = localSnapshotSenderFactory();
+
+    /** Main snapshot directory to save created snapshots. */
+    private volatile File locSnpDir;
+
+    /**
+     * Working directory for loaded snapshots from the remote nodes and storing
+     * temporary partition delta-files of locally started snapshot process.
+     */
+    private File tmpWorkDir;
+
+    /** Factory to working with delta as file storage. */
+    private volatile FileIOFactory ioFactory = new RandomAccessFileIOFactory();
+
+    /** Factory to create page store for restore. */
+    private volatile BiFunction<Integer, Boolean, FilePageStoreFactory> storeFactory;
+
+    /** Snapshot thread pool to perform local partition snapshots. */
+    private ExecutorService snpRunner;
+
+    /** System discovery message listener. */
+    private DiscoveryEventListener discoLsnr;
+
+    /** Cluster snapshot operation requested by user. */
+    private GridFutureAdapter<Void> clusterSnpFut;
+
+    /** Current snapshot operation on local node. */
+    private volatile SnapshotOperationRequest clusterSnpRq;
+
+    /** {@code true} if recovery process occurred for snapshot. */
+    private volatile boolean recovered;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public IgniteSnapshotManager(GridKernalContext ctx) {
+        locBuff = ThreadLocal.withInitial(() ->
+            ByteBuffer.allocateDirect(ctx.config().getDataStorageConfiguration().getPageSize())
+                .order(ByteOrder.nativeOrder()));
+
+        startSnpProc = new DistributedProcess<>(ctx, START_SNAPSHOT, this::initLocalSnapshotStartStage,
+            this::processLocalSnapshotStartStageResult);
+
+        endSnpProc = new DistributedProcess<>(ctx, END_SNAPSHOT, this::initLocalSnapshotEndStage,
+            this::processLocalSnapshotEndStageResult);
+    }
+
+    /**
+     * @param snapshotCacheDir Snapshot directory to store files.
+     * @param partId Cache partition identifier.
+     * @return A file representation.
+     */
+    public static File partDeltaFile(File snapshotCacheDir, int partId) {
+        return new File(snapshotCacheDir, partDeltaFileName(partId));
+    }
+
+    /**
+     * @param partId Partition id.
+     * @return File name of delta partition pages.
+     */
+    public static String partDeltaFileName(int partId) {
+        assert partId <= MAX_PARTITION_ID || partId == INDEX_PARTITION;
+
+        return partId == INDEX_PARTITION ? INDEX_DELTA_NAME : String.format(PART_DELTA_TEMPLATE, partId);
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void start0() throws IgniteCheckedException {
+        super.start0();
+
+        GridKernalContext ctx = cctx.kernalContext();
+
+        if (ctx.clientNode())
+            return;
+
+        if (!CU.isPersistenceEnabled(ctx.config()))
+            return;
+
+        snpRunner = new IgniteThreadPoolExecutor(SNAPSHOT_RUNNER_THREAD_PREFIX,
+            cctx.igniteInstanceName(),
+            SNAPSHOT_THREAD_POOL_SIZE,
+            SNAPSHOT_THREAD_POOL_SIZE,
+            IgniteConfiguration.DFLT_THREAD_KEEP_ALIVE_TIME,
+            new LinkedBlockingQueue<>(),
+            SYSTEM_POOL,
+            new OomExceptionHandler(ctx));
+
+        assert cctx.pageStore() instanceof FilePageStoreManager;
+
+        FilePageStoreManager storeMgr = (FilePageStoreManager)cctx.pageStore();
+
+        pdsSettings = cctx.kernalContext().pdsFolderResolver().resolveFolders();
+
+        locSnpDir = resolveSnapshotWorkDirectory(ctx.config());
+        tmpWorkDir = Paths.get(storeMgr.workDir().getAbsolutePath(), DFLT_SNAPSHOT_TMP_DIR).toFile();
+
+        U.ensureDirectory(locSnpDir, "snapshot work directory", log);
+        U.ensureDirectory(tmpWorkDir, "temp directory for snapshot creation", log);
+
+        storeFactory = storeMgr::getPageStoreFactory;
+
+        cctx.exchange().registerExchangeAwareComponent(this);
+        ctx.internalSubscriptionProcessor().registerMetastorageListener(this);
+
+        // Receive remote snapshots requests.
+        cctx.gridIO().addMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC, new GridMessageListener() {
+            @Override public void onMessage(UUID nodeId, Object msg, byte plc) {
+                if (!busyLock.enterBusy())
+                    return;
+
+                try {
+                    if (msg instanceof SnapshotRequestMessage) {
+                        SnapshotRequestMessage reqMsg0 = (SnapshotRequestMessage)msg;
+                        String snpName = reqMsg0.snapshotName();
+
+                        synchronized (this) {
+                            SnapshotFutureTask task = lastScheduledRemoteSnapshotTask(nodeId);
+
+                            if (task != null) {
+                                // Task will also be removed from local map due to the listener on future done.
+                                task.cancel();
+
+                                log.info("Snapshot request has been cancelled due to another request received " +
+                                    "[prevSnpResp=" + task + ", msg0=" + reqMsg0 + ']');
+                            }
+                        }
+
+                        SnapshotFutureTask task = registerSnapshotTask(snpName,
+                            nodeId,
+                            reqMsg0.parts(),
+                            remoteSnapshotSender(snpName, nodeId));
+
+                        task.listen(f -> {
+                            if (f.error() == null)
+                                return;
+
+                            U.error(log, "Failed to process request of creating a snapshot " +
+                                "[from=" + nodeId + ", msg=" + reqMsg0 + ']', f.error());
+
+                            try {
+                                cctx.gridIO().sendToCustomTopic(nodeId,
+                                    DFLT_INITIAL_SNAPSHOT_TOPIC,
+                                    new SnapshotResponseMessage(reqMsg0.snapshotName(), f.error().getMessage()),
+                                    SYSTEM_POOL);
+                            }
+                            catch (IgniteCheckedException ex0) {
+                                U.error(log, "Fail to send the response message with processing snapshot request " +
+                                    "error [request=" + reqMsg0 + ", nodeId=" + nodeId + ']', ex0);
+                            }
+                        });
+
+                        task.start();
+                    }
+                    else if (msg instanceof SnapshotResponseMessage) {
+                        SnapshotResponseMessage respMsg0 = (SnapshotResponseMessage)msg;
+
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.snpName.equals(respMsg0.snapshotName())) {
+                            if (log.isInfoEnabled()) {
+                                log.info("A stale snapshot response message has been received. Will be ignored " +
+                                    "[fromNodeId=" + nodeId + ", response=" + respMsg0 + ']');
+                            }
+
+                            return;
+                        }
+
+                        if (respMsg0.errorMessage() != null) {
+                            fut0.onDone(new IgniteCheckedException("Request cancelled. The snapshot operation stopped " +
+                                "on the remote node with an error: " + respMsg0.errorMessage()));
+                        }
+                    }
+                }
+                catch (Throwable e) {
+                    U.error(log, "Processing snapshot request from remote node fails with an error", e);
+
+                    cctx.kernalContext().failure().process(new FailureContext(FailureType.CRITICAL_ERROR, e));
+                }
+                finally {
+                    busyLock.leaveBusy();
+                }
+            }
+        });
+
+        cctx.gridEvents().addDiscoveryEventListener(discoLsnr = (evt, discoCache) -> {
+            if (!busyLock.enterBusy())
+                return;
+
+            try {
+                UUID leftNodeId = evt.eventNode().id();
+
+                if (evt.type() == EVT_DISCOVERY_CUSTOM_EVT) {
+                    DiscoveryCustomEvent evt0 = (DiscoveryCustomEvent)evt;
+
+                    if (evt0.customMessage() instanceof InitMessage) {
+                        InitMessage<?> msg = (InitMessage<?>)evt0.customMessage();
+
+                        // This happens when #takeSnapshot() method already invoked and distributed process
+                        // starts its action.
+                        if (msg.type() == START_SNAPSHOT.ordinal()) {
+                            assert clusterSnpRq != null ||
+                                !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()) : evt;
+
+                            DiscoveryCustomEvent customEvt = new DiscoveryCustomEvent();
+
+                            customEvt.node(evt0.node());
+                            customEvt.eventNode(evt0.eventNode());
+                            customEvt.affinityTopologyVersion(evt0.affinityTopologyVersion());
+                            customEvt.customMessage(new SnapshotStartDiscoveryMessage(discoCache, msg.processId()));
+
+                            // Handle new event inside discovery thread, so no guarantees will be violated.
+                            cctx.exchange().onDiscoveryEvent(customEvt, discoCache);
+                        }
+                    }
+                }
+                else if (evt.type() == EVT_NODE_LEFT || evt.type() == EVT_NODE_FAILED) {
+                    SnapshotOperationRequest snpRq = clusterSnpRq;
+
+                    for (SnapshotFutureTask sctx : locSnpTasks.values()) {
+                        if (sctx.sourceNodeId().equals(leftNodeId) ||
+                            (snpRq != null &&
+                                snpRq.snpName.equals(sctx.snapshotName()) &&
+                                snpRq.bltNodes.contains(leftNodeId))) {
+                            sctx.acceptException(new ClusterTopologyCheckedException("The node which requested snapshot " +
+                                "creation has left the grid"));
+                        }
+                    }
+
+                    RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                    if (snpTrFut != null && snpTrFut.rmtNodeId.equals(leftNodeId)) {
+                        snpTrFut.onDone(new ClusterTopologyCheckedException("The node from which a snapshot has been " +
+                            "requested left the grid"));
+                    }
+                }
+            }
+            finally {
+                busyLock.leaveBusy();
+            }
+        }, EVT_NODE_LEFT, EVT_NODE_FAILED, EVT_DISCOVERY_CUSTOM_EVT);
+
+        // Remote snapshot handler.
+        cctx.kernalContext().io().addTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC, new TransmissionHandler() {
+            @Override public void onEnd(UUID nodeId) {
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                assert snpTrFut.stores.isEmpty() : snpTrFut.stores.entrySet();
+                assert snpTrFut.partsLeft == 0 : snpTrFut;
+
+                snpTrFut.onDone();
+
+                log.info("Requested snapshot from remote node has been fully received " +
+                    "[snpName=" + snpTrFut.snpName + ", snpTrans=" + snpTrFut + ']');
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onException(UUID nodeId, Throwable err) {
+                RemoteSnapshotFuture fut = rmtSnpReq.get();
+
+                if (fut == null)
+                    return;
+
+                if (fut.rmtNodeId.equals(nodeId))
+                    fut.onDone(err);
+            }
+
+            /** {@inheritDoc} */
+            @Override public String filePath(UUID nodeId, TransmissionMeta fileMeta) {
+                Integer partId = (Integer)fileMeta.params().get(SNP_PART_ID_PARAM);
+                String rmtDbNodePath = (String)fileMeta.params().get(SNP_DB_NODE_PATH_PARAM);
+                String cacheDirName = (String)fileMeta.params().get(SNP_CACHE_DIR_NAME_PARAM);
+
+                RemoteSnapshotFuture transFut = resolve(nodeId, fileMeta);
+
+                try {
+                    File cacheDir = U.resolveWorkDirectory(tmpWorkDir.getAbsolutePath(),
+                        Paths.get(transFut.snpName, rmtDbNodePath, cacheDirName).toString(),
+                        false);
+
+                    return new File(cacheDir, getPartitionFileName(partId)).getAbsolutePath();
+                }
+                catch (IgniteCheckedException e) {
+                    throw new IgniteException(e);
+                }
+            }
+
+            /**
+             * @param nodeId Remote node id.
+             * @param meta Transmission meta.
+             * @return Resolved transmission future.
+             */
+            private RemoteSnapshotFuture resolve(UUID nodeId, TransmissionMeta meta) {
+                String snpName = (String)meta.params().get(SNP_NAME_PARAM);
+                Integer partsCnt = (Integer)meta.params().get(SNP_PARTITIONS_CNT);
+
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                if (snpTrFut == null || !snpTrFut.snpName.equals(snpName)) {
+                    throw new TransmissionCancelledException("Stale snapshot transmission will be ignored " +
+                        "[snpName=" + snpName + ", meta=" + meta + ", snpTrFut=" + snpTrFut + ']');
+                }
+
+                assert snpTrFut.snpName.equals(snpName) && snpTrFut.rmtNodeId.equals(nodeId) :
+                    "Another transmission in progress [snpTrFut=" + snpTrFut + ", nodeId=" + snpName + ']';
+
+                if (snpTrFut.partsLeft == -1)
+                    snpTrFut.partsLeft = partsCnt;
+
+                return snpTrFut;
+            }
+
+            /**
+             * @param snpTrans Current snapshot transmission.
+             * @param grpPartId Pair of group id and its partition id.
+             */
+            private void finishRecover(RemoteSnapshotFuture snpTrans, GroupPartitionId grpPartId) {
+                FilePageStore pageStore = null;
+
+                try {
+                    pageStore = snpTrans.stores.remove(grpPartId);
+
+                    pageStore.finishRecover();
+
+                    snpTrans.partConsumer.accept(new File(pageStore.getFileAbsolutePath()), grpPartId);
+
+                    snpTrans.partsLeft--;
+                }
+                catch (StorageException e) {
+                    throw new IgniteException(e);
+                }
+                finally {
+                    U.closeQuiet(pageStore);
+                }
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<ByteBuffer> chunkHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+
+                RemoteSnapshotFuture snpTrFut = resolve(nodeId, initMeta);
+
+                GroupPartitionId grpPartId = new GroupPartitionId(grpId, partId);
+                FilePageStore pageStore = snpTrFut.stores.get(grpPartId);
+
+                if (pageStore == null) {
+                    throw new IgniteException("Partition must be loaded before applying snapshot delta pages " +
+                        "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                pageStore.beginRecover();
+
+                // No snapshot delta pages received. Finalize recovery.
+                if (initMeta.count() == 0)
+                    finishRecover(snpTrFut, grpPartId);
+
+                return new Consumer<ByteBuffer>() {
+                    final LongAdder transferred = new LongAdder();
+
+                    @Override public void accept(ByteBuffer buff) {
+                        try {
+                            assert initMeta.count() != 0 : initMeta;
+
+                            RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                            if (fut0 == null || !fut0.equals(snpTrFut) || fut0.isCancelled()) {
+                                throw new TransmissionCancelledException("Snapshot request is cancelled " +
+                                    "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                            }
+
+                            pageStore.write(PageIO.getPageId(buff), buff, 0, false);
+
+                            transferred.add(buff.capacity());
+
+                            if (transferred.longValue() == initMeta.count())
+                                finishRecover(snpTrFut, grpPartId);
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                    }
+                };
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<File> fileHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+                String snpName = (String)initMeta.params().get(SNP_NAME_PARAM);
+
+                assert grpId != null;
+                assert partId != null;
+                assert snpName != null;
+                assert storeFactory != null;
+
+                RemoteSnapshotFuture transFut = rmtSnpReq.get();
+
+                if (transFut == null) {
+                    throw new IgniteException("Snapshot transmission with given name doesn't exists " +
+                        "[snpName=" + snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                return new Consumer<File>() {
+                    @Override public void accept(File file) {
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.equals(transFut) || fut0.isCancelled()) {
+                            throw new TransmissionCancelledException("Snapshot request is cancelled [snpName=" + snpName +
+                                ", grpId=" + grpId + ", partId=" + partId + ']');
+                        }
+
+                        busyLock.enterBusy();
+
+                        try {
+                            FilePageStore pageStore = (FilePageStore)storeFactory
+                                .apply(grpId, false)
+                                .createPageStore(getFlagByPartId(partId),
+                                    file::toPath,
+                                    new LongAdderMetric("NO_OP", null));
+
+                            transFut.stores.put(new GroupPartitionId(grpId, partId), pageStore);
+
+                            pageStore.init();
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                        finally {
+                            busyLock.leaveBusy();
+                        }
+                    }
+                };
+            }
+        });
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void stop0(boolean cancel) {
+        busyLock.block();
+
+        try {
+            // Try stop all snapshot processing if not yet.
+            for (SnapshotFutureTask sctx : locSnpTasks.values())
+                sctx.acceptException(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+            locSnpTasks.clear();
+
+            RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+            if (snpTrFut != null)
+                snpTrFut.cancel();
+
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null) {
+                    clusterSnpFut.onDone(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+                    clusterSnpFut = null;
+                }
+            }
+
+            if (snpRunner != null)
+                snpRunner.shutdownNow();
+
+            cctx.kernalContext().io().removeMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC);
+            cctx.kernalContext().io().removeTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC);
+
+            if (discoLsnr != null)
+                cctx.kernalContext().event().removeDiscoveryEventListener(discoLsnr);
+
+            cctx.exchange().unregisterExchangeAwareComponent(this);
+        }
+        finally {
+            busyLock.unblock();
+        }
+    }
+
+    /**
+     * Concurrently traverse the snapshot directory for given local node folder name and
+     * delete recursively all files from it if exist.
+     *
+     * @param snpDir Snapshot dire
+     * @param folderName Local node folder name (see U.maskForFileName with consistent id).
+     */
+    public static void deleteSnapshot(File snpDir, String folderName) {
+        if (!snpDir.exists())
+            return;
+
+        assert snpDir.isDirectory() : snpDir;
+
+        try {
+            List<Path> dirs = new ArrayList<>();
+
+            Files.walkFileTree(snpDir.toPath(), new SimpleFileVisitor<Path>() {
+                @Override public FileVisitResult preVisitDirectory(Path dir,
+                    BasicFileAttributes attrs) throws IOException {
+                    if (Files.isDirectory(dir) &&
+                        Files.exists(dir) &&
+                        folderName.equals(dir.getFileName().toString())) {
+                        // Directory found, add it for processing.
+                        dirs.add(dir);
+                    }
+
+                    return super.preVisitDirectory(dir, attrs);
+                }
+
+                @Override public FileVisitResult visitFileFailed(Path file, IOException exc) {
+                    // Skip files which can be concurrently removed from FileTree.
+                    return FileVisitResult.CONTINUE;
+                }
+            });
+
+            dirs.forEach(U::delete);
+
+            File db = new File(snpDir, DB_DEFAULT_FOLDER);
+
+            if (!db.exists() || db.list().length == 0)
+                U.delete(snpDir);
+        }
+        catch (IOException e) {
+            throw new IgniteException(e);
+        }
+    }
+
+    /**
+     * @param snpName Snapshot name.
+     * @return Local snapshot directory for snapshot with given name.
+     */
+    public File snapshotLocalDir(String snpName) {
+        assert locSnpDir != null;
+
+        return new File(locSnpDir, snpName);
+    }
+
+    /**
+     * @return Node snapshot working directory.
+     */
+    public File snapshotTmpDir() {
+        assert tmpWorkDir != null;
+
+        return tmpWorkDir;
+    }
+
+    /**
+     * @param req Request on snapshot creation.
+     * @return Future which will be completed when a snapshot has been started.
+     */
+    private IgniteInternalFuture<SnapshotOperationResponse> initLocalSnapshotStartStage(SnapshotOperationRequest req) {
+        if (cctx.kernalContext().clientNode() ||
+            !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()))
+            return new GridFinishedFuture<>();
+
+        // Executed inside discovery notifier thread, prior to firing discovery custom event,
+        // so it is safe to set new snapshot task inside this method without synchronization.
+        if (clusterSnpRq != null) {
+            return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot operation has been rejected. " +
+                "Another snapshot operation in progress [req=" + req + ", curr=" + clusterSnpRq + ']'));
+        }
+
+        // Collection of pairs group and appropriate cache partition to be snapshot.
+        Map<Integer, Set<Integer>> parts = new HashMap<>();
+
+        for (Integer grpId : req.grpIds)
+            parts.put(grpId, null);
+
+        SnapshotFutureTask task0 = registerSnapshotTask(req.snpName,
+            req.srcNodeId,
+            parts,
+            locSndrFactory.apply(req.snpName));
+
+        clusterSnpRq = req;
+
+        return task0.chain(f -> new SnapshotOperationResponse());
+    }
+
+    /**
+     * @param id Request id.
+     * @param res Results.
+     * @param err Errors.
+     */
+    private void processLocalSnapshotStartStageResult(UUID id, Map<UUID, SnapshotOperationResponse> res, Map<UUID, Exception> err) {
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        if (snpRq == null)
+            return;
+
+        if (isLocalNodeCoordinator(cctx.discovery())) {
+            Set<UUID> missed = new HashSet<>(snpRq.bltNodes);
+            missed.removeAll(res.keySet());
+            missed.removeAll(err.keySet());
+
+            snpRq.hasErr = !F.isEmpty(err) || !missed.isEmpty();
+
+            if (snpRq.hasErr) {
+                U.warn(log, "Execution of local snapshot tasks fails or them haven't been executed " +
+                    "due to some of nodes left the cluster. Uncompleted snapshot will be deleted " +
+                    "[err=" + err + ", missed=" + missed + ']');
+            }
+
+            endSnpProc.start(UUID.randomUUID(), snpRq);
+        }
+    }
+
+    /**
+     * @param req Request on snapshot creation.
+     * @return Future which will be completed when the snapshot will be finalized.
+     */
+    private IgniteInternalFuture<SnapshotOperationResponse> initLocalSnapshotEndStage(SnapshotOperationRequest req) {
+        if (clusterSnpRq == null)
+            return new GridFinishedFuture<>(new SnapshotOperationResponse());
+
+        try {
+            if (req.hasErr)
+                deleteSnapshot(snapshotLocalDir(req.snpName), pdsSettings.folderName());
+
+            removeLastMetaStorageKey();
+        }
+        catch (Exception e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        return new GridFinishedFuture<>(new SnapshotOperationResponse());
+    }
+
+    /**
+     * @param id Request id.
+     * @param res Results.
+     * @param err Errors.
+     */
+    private void processLocalSnapshotEndStageResult(UUID id, Map<UUID, SnapshotOperationResponse> res, Map<UUID, Exception> err) {
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        if (snpRq == null)
+            return;
+
+        Set<UUID> endFail = new HashSet<>(snpRq.bltNodes);
+        endFail.removeAll(res.keySet());
+
+        clusterSnpRq = null;
+
+        synchronized (snpOpMux) {
+            if (clusterSnpFut != null) {
+                if (endFail.isEmpty() && !snpRq.hasErr) {
+                    clusterSnpFut.onDone();
+
+                    if (log.isInfoEnabled())
+                        log.info("Cluster-wide snapshot operation finished successfully [req=" + snpRq + ']');
+                }
+                else {
+                    clusterSnpFut.onDone(new IgniteCheckedException("Snapshot creation has been finished with an error. " +
+                        "Local snapshot tasks may not finished completely or finalizing results fails " +
+                        "[hasErr" + snpRq.hasErr + ", fail=" + endFail + ']'));
+                }
+
+                clusterSnpFut = null;
+            }
+        }
+    }
+
+    /**
+     * @return {@code True} if snapshot operation started.
+     */
+    public boolean inProgress() {
+        synchronized (snpOpMux) {
+            return clusterSnpRq != null || clusterSnpFut != null;
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public List<String> getSnapshots() {
+        if (cctx.kernalContext().clientNode())
+            throw new UnsupportedOperationException("Client and daemon nodes can not perform this operation.");
+
+        synchronized (snpOpMux) {
+            return Arrays.stream(locSnpDir.listFiles(File::isDirectory))
+                .map(File::getName)
+                .collect(Collectors.toList());
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public IgniteFuture<Void> createSnapshot(String name) {
+        if (cctx.kernalContext().clientNode()) {
+            return new IgniteFinishedFutureImpl<>(new UnsupportedOperationException("Client and daemon nodes can not " +
+                "perform this operation."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(cctx.discovery().allNodes(), PERSISTENCE_CACHE_SNAPSHOT)) {
+            return new IgniteFinishedFutureImpl<>(new IllegalStateException("Not all nodes in the cluster support " +
+                "a snapshot operation."));
+        }
+
+        if (!active(cctx.kernalContext().state().clusterState().state())) {
+            return new IgniteFinishedFutureImpl<>(new IgniteException("Snapshot operation has been rejected. " +
+                "The cluster is inactive."));
+        }
+
+        DiscoveryDataClusterState clusterState = cctx.kernalContext().state().clusterState();
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(new IgniteException("Snapshot operation has been rejected. " +
+                "The baseline topology is not configured for cluster."));
+        }
+
+        GridFutureAdapter<Void> snpFut0;
+
+        synchronized (snpOpMux) {
+            if (clusterSnpFut != null && !clusterSnpFut.isDone()) {
 
 Review comment:
   As far as I understand, if two snapshot operations with different names were requested concurrently on different nodes both will register `clusterSnpFut`, one of them will fail on `initLocalSnapshotStartStage`, but future for both operations will be completed with success and only after real snapshot operation is completed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r408216303
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotMXBeanImpl.java
 ##########
 @@ -0,0 +1,47 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.util.List;
+import org.apache.ignite.internal.GridKernalContextImpl;
+import org.apache.ignite.mxbean.SnapshotMXBean;
+
+/**
+ * Snapshot MBean features.
+ */
+public class SnapshotMXBeanImpl implements SnapshotMXBean {
+    /** Instance of snapshot cache shared manager. */
+    private final IgniteSnapshotManager mgr;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotMXBeanImpl(GridKernalContextImpl ctx) {
+        mgr = ctx.cache().context().snapshotMgr();
+    }
+
+    /** {@inheritDoc} */
+    @Override public void createSnapshot(String snpName) {
+        mgr.createSnapshot(snpName).get();
 
 Review comment:
   Snapshot operation can be rather long, it's bad for user experience to hang on this MXBean call. Instead, there should be async operation and attributes for current snapshot status (at least something like last snapshot name, in process/completed successfully/completed with error)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r410205303
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/distributed/dht/preloader/GridDhtPartitionsExchangeFuture.java
 ##########
 @@ -1404,7 +1419,7 @@ private ExchangeType onServerNodeEvent(boolean crd) throws IgniteCheckedExceptio
     /**
      * @return Exchange type.
      */
-    private ExchangeType onExchangeFreeSwitch() {
+    private ExchangeType onExchangeFreeSwitchOnLeft() {
 
 Review comment:
   Fixed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r408230228
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManager.java
 ##########
 @@ -0,0 +1,1894 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.channels.FileChannel;
+import java.nio.file.FileVisitResult;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.SimpleFileVisitor;
+import java.nio.file.attribute.BasicFileAttributes;
+import java.util.ArrayDeque;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Queue;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.Executor;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.concurrent.atomic.LongAdder;
+import java.util.function.BiConsumer;
+import java.util.function.BiFunction;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.events.DiscoveryEvent;
+import org.apache.ignite.failure.FailureContext;
+import org.apache.ignite.failure.FailureType;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.GridTopic;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.events.DiscoveryCustomEvent;
+import org.apache.ignite.internal.managers.communication.GridIoManager;
+import org.apache.ignite.internal.managers.communication.GridMessageListener;
+import org.apache.ignite.internal.managers.communication.TransmissionCancelledException;
+import org.apache.ignite.internal.managers.communication.TransmissionHandler;
+import org.apache.ignite.internal.managers.communication.TransmissionMeta;
+import org.apache.ignite.internal.managers.communication.TransmissionPolicy;
+import org.apache.ignite.internal.managers.discovery.DiscoCache;
+import org.apache.ignite.internal.managers.discovery.DiscoveryCustomMessage;
+import org.apache.ignite.internal.managers.discovery.GridDiscoveryManager;
+import org.apache.ignite.internal.managers.eventstorage.DiscoveryEventListener;
+import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.CacheType;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.PartitionsExchangeAware;
+import org.apache.ignite.internal.processors.cache.persistence.StorageException;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.filename.PdsFolderSettings;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.MetastorageLifecycleListener;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadOnlyMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadWriteMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.tree.io.PageIO;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.processors.marshaller.MappedName;
+import org.apache.ignite.internal.processors.metric.impl.LongAdderMetric;
+import org.apache.ignite.internal.util.GridBusyLock;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.distributed.InitMessage;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.tostring.GridToStringInclude;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.S;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.lang.IgniteUuid;
+import org.apache.ignite.thread.IgniteThreadPoolExecutor;
+import org.apache.ignite.thread.OomExceptionHandler;
+import org.jetbrains.annotations.Nullable;
+
+import static java.nio.file.StandardOpenOption.READ;
+import static org.apache.ignite.cluster.ClusterState.active;
+import static org.apache.ignite.configuration.IgniteConfiguration.DFLT_SNAPSHOT_DIRECTORY;
+import static org.apache.ignite.events.EventType.EVT_NODE_FAILED;
+import static org.apache.ignite.events.EventType.EVT_NODE_LEFT;
+import static org.apache.ignite.internal.IgniteFeatures.PERSISTENCE_CACHE_SNAPSHOT;
+import static org.apache.ignite.internal.IgniteFeatures.nodeSupports;
+import static org.apache.ignite.internal.MarshallerContextImpl.saveMappings;
+import static org.apache.ignite.internal.events.DiscoveryCustomEvent.EVT_DISCOVERY_CUSTOM_EVT;
+import static org.apache.ignite.internal.managers.communication.GridIoPolicy.SYSTEM_POOL;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.INDEX_PARTITION;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.MAX_PARTITION_ID;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.INDEX_FILE_NAME;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.PART_FILE_TEMPLATE;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFile;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFileName;
+import static org.apache.ignite.internal.processors.cache.persistence.filename.PdsConsistentIdProcessor.DB_DEFAULT_FOLDER;
+import static org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId.getFlagByPartId;
+import static org.apache.ignite.internal.util.IgniteUtils.isLocalNodeCoordinator;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.END_SNAPSHOT;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.START_SNAPSHOT;
+
+/**
+ * Internal implementation of snapshot operations over persistence caches.
+ * <p>
+ * There are two major actions available:
+ * <ul>
+ *     <li>Create snapshot of the whole cluster cache groups by triggering PME to achieve consistency.</li>
+ *     <li>Create local snapshot of requested cache groups and send it to the node which request this operation.
+ *     Cache groups will be transmitted using internal API for transferring files. See {@link TransmissionHandler}.</li>
+ * </ul>
+ */
+public class IgniteSnapshotManager extends GridCacheSharedManagerAdapter
+    implements IgniteSnapshot, PartitionsExchangeAware, MetastorageLifecycleListener {
+    /** File with delta pages suffix. */
+    public static final String DELTA_SUFFIX = ".delta";
+
+    /** File name template consists of delta pages. */
+    public static final String PART_DELTA_TEMPLATE = PART_FILE_TEMPLATE + DELTA_SUFFIX;
+
+    /** File name template for index delta pages. */
+    public static final String INDEX_DELTA_NAME = INDEX_FILE_NAME + DELTA_SUFFIX;
+
+    /** Text Reason for checkpoint to start snapshot operation. */
+    public static final String CP_SNAPSHOT_REASON = "Checkpoint started to enforce snapshot operation: %s";
+
+    /** Name prefix for each remote snapshot operation. */
+    public static final String RMT_SNAPSHOT_PREFIX = "snapshot_";
+
+    /** Default snapshot directory for loading remote snapshots. */
+    public static final String DFLT_SNAPSHOT_TMP_DIR = "snp";
+
+    /** Timeout in millisecond for snapshot operations. */
+    public static final long DFLT_SNAPSHOT_TIMEOUT = 15_000L;
+
+    /** Snapshot in progress error message. */
+    public static final String SNP_IN_PROGRESS_ERR_MSG = "Operation rejected due to the snapshot operation in progress.";
+
+    /** Error message to finalize snapshot tasks. */
+    public static final String SNP_NODE_STOPPING_ERR_MSG = "Snapshot has been cancelled due to the local node " +
+        "is stopping";
+
+    /** Metastorage key to save currently running snapshot. */
+    public static final String SNP_RUNNING_KEY = "snapshot-running";
+
+    /** Prefix for snapshot threads. */
+    private static final String SNAPSHOT_RUNNER_THREAD_PREFIX = "snapshot-runner";
+
+    /** Total number of thread to perform local snapshot. */
+    private static final int SNAPSHOT_THREAD_POOL_SIZE = 4;
+
+    /** Default snapshot topic to receive snapshots from remote node. */
+    private static final Object DFLT_INITIAL_SNAPSHOT_TOPIC = GridTopic.TOPIC_SNAPSHOT.topic("rmt_snp");
+
+    /** File transmission parameter of cache group id. */
+    private static final String SNP_GRP_ID_PARAM = "grpId";
+
+    /** File transmission parameter of cache partition id. */
+    private static final String SNP_PART_ID_PARAM = "partId";
+
+    /** File transmission parameter of node-sender directory path with its consistentId (e.g. db/IgniteNode0). */
+    private static final String SNP_DB_NODE_PATH_PARAM = "dbNodePath";
+
+    /** File transmission parameter of a cache directory with is currently sends its partitions. */
+    private static final String SNP_CACHE_DIR_NAME_PARAM = "cacheDirName";
+
+    /** Snapshot parameter name for a file transmission. */
+    private static final String SNP_NAME_PARAM = "snpName";
+
+    /** Total snapshot files count which receiver should expect to receive. */
+    private static final String SNP_PARTITIONS_CNT = "partsCnt";
+
+    /**
+     * Local buffer to perform copy-on-write operations with pages for {@code SnapshotFutureTask.PageStoreSerialWriter}s.
+     * It is important to have only only buffer per thread (instead of creating each buffer per
+     * each {@code SnapshotFutureTask.PageStoreSerialWriter}) this is redundant and can lead to OOM errors. Direct buffer
+     * deallocate only when ByteBuffer is garbage collected, but it can get out of off-heap memory before it.
+     */
+    private final ThreadLocal<ByteBuffer> locBuff;
+
+    /** Map of registered cache snapshot processes and their corresponding contexts. */
+    private final ConcurrentMap<String, SnapshotFutureTask> locSnpTasks = new ConcurrentHashMap<>();
+
+    /** Lock to protect the resources is used. */
+    private final GridBusyLock busyLock = new GridBusyLock();
+
+    /** Requested snapshot from remote node. */
+    private final AtomicReference<RemoteSnapshotFuture> rmtSnpReq = new AtomicReference<>();
+
+    /** Mutex used to order cluster snapshot operation progress. */
+    private final Object snpOpMux = new Object();
+
+    /** Take snapshot operation procedure. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> startSnpProc;
+
+    /** Check previously preformed snapshot operation and delete uncompleted files if need. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> endSnpProc;
+
+    /** Resolved persistent data storage settings. */
+    private volatile PdsFolderSettings pdsSettings;
+
+    /** Fully initialized metastorage. */
+    private volatile ReadWriteMetastorage metaStorage;
+
+    /** Local snapshot sender factory. */
+    private Function<String, SnapshotSender> locSndrFactory = this::localSnapshotSender;
+
+    /** Main snapshot directory to save created snapshots. */
+    private volatile File locSnpDir;
+
+    /**
+     * Working directory for loaded snapshots from the remote nodes and storing
+     * temporary partition delta-files of locally started snapshot process.
+     */
+    private File tmpWorkDir;
+
+    /** Factory to working with delta as file storage. */
+    private volatile FileIOFactory ioFactory = new RandomAccessFileIOFactory();
+
+    /** Factory to create page store for restore. */
+    private volatile BiFunction<Integer, Boolean, FilePageStoreFactory> storeFactory;
+
+    /** Snapshot thread pool to perform local partition snapshots. */
+    private ExecutorService snpRunner;
+
+    /** System discovery message listener. */
+    private DiscoveryEventListener discoLsnr;
+
+    /** Cluster snapshot operation requested by user. */
+    private GridFutureAdapter<Void> clusterSnpFut;
+
+    /** Current snapshot operation on local node. */
+    private volatile SnapshotOperationRequest clusterSnpRq;
+
+    /** {@code true} if recovery process occurred for snapshot. */
+    private volatile boolean recovered;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public IgniteSnapshotManager(GridKernalContext ctx) {
+        locBuff = ThreadLocal.withInitial(() ->
+            ByteBuffer.allocateDirect(ctx.config().getDataStorageConfiguration().getPageSize())
+                .order(ByteOrder.nativeOrder()));
+
+        startSnpProc = new DistributedProcess<>(ctx, START_SNAPSHOT, this::startLocalSnapshot,
+            this::startLocalSnapshotResult);
+
+        endSnpProc = new DistributedProcess<>(ctx, END_SNAPSHOT, this::endLocalSnapshot,
+            this::endLocalSnapshotResult);
+    }
+
+    /**
+     * @param snapshotCacheDir Snapshot directory to store files.
+     * @param partId Cache partition identifier.
+     * @return A file representation.
+     */
+    public static File partDeltaFile(File snapshotCacheDir, int partId) {
+        return new File(snapshotCacheDir, partDeltaFileName(partId));
+    }
+
+    /**
+     * @param partId Partition id.
+     * @return File name of delta partition pages.
+     */
+    public static String partDeltaFileName(int partId) {
+        assert partId <= MAX_PARTITION_ID || partId == INDEX_PARTITION;
+
+        return partId == INDEX_PARTITION ? INDEX_DELTA_NAME : String.format(PART_DELTA_TEMPLATE, partId);
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void start0() throws IgniteCheckedException {
+        super.start0();
+
+        GridKernalContext ctx = cctx.kernalContext();
+
+        if (ctx.clientNode())
+            return;
+
+        if (!CU.isPersistenceEnabled(ctx.config()))
+            return;
+
+        snpRunner = new IgniteThreadPoolExecutor(SNAPSHOT_RUNNER_THREAD_PREFIX,
+            cctx.igniteInstanceName(),
+            SNAPSHOT_THREAD_POOL_SIZE,
+            SNAPSHOT_THREAD_POOL_SIZE,
+            IgniteConfiguration.DFLT_THREAD_KEEP_ALIVE_TIME,
+            new LinkedBlockingQueue<>(),
+            SYSTEM_POOL,
+            new OomExceptionHandler(ctx));
+
+        assert cctx.pageStore() instanceof FilePageStoreManager;
+
+        FilePageStoreManager storeMgr = (FilePageStoreManager)cctx.pageStore();
+
+        pdsSettings = cctx.kernalContext().pdsFolderResolver().resolveFolders();
+
+        locSnpDir = snapshotPath(ctx.config()).toFile();
+        tmpWorkDir = Paths.get(storeMgr.workDir().getAbsolutePath(), DFLT_SNAPSHOT_TMP_DIR).toFile();
+
+        U.ensureDirectory(locSnpDir, "snapshot work directory", log);
+        U.ensureDirectory(tmpWorkDir, "temp directory for snapshot creation", log);
+
+        storeFactory = storeMgr::getPageStoreFactory;
+
+        cctx.exchange().registerExchangeAwareComponent(this);
+        ctx.internalSubscriptionProcessor().registerMetastorageListener(this);
+
+        // Receive remote snapshots requests.
+        cctx.gridIO().addMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC, new GridMessageListener() {
+            @Override public void onMessage(UUID nodeId, Object msg, byte plc) {
+                if (!busyLock.enterBusy())
+                    return;
+
+                try {
+                    if (msg instanceof SnapshotRequestMessage) {
+                        SnapshotRequestMessage reqMsg0 = (SnapshotRequestMessage)msg;
+                        String snpName = reqMsg0.snapshotName();
+
+                        synchronized (this) {
+                            SnapshotFutureTask task = lastScheduledRemoteSnapshotTask(nodeId);
+
+                            if (task != null) {
+                                // Task will also be removed from local map due to the listener on future done.
+                                task.cancel();
+
+                                log.info("Snapshot request has been cancelled due to another request received " +
+                                    "[prevSnpResp=" + task + ", msg0=" + reqMsg0 + ']');
+                            }
+                        }
+
+                        SnapshotFutureTask task = registerSnapshotTask(snpName,
+                            nodeId,
+                            reqMsg0.parts(),
+                            remoteSnapshotSender(snpName, nodeId));
+
+                        task.listen(f -> {
+                            if (f.error() == null)
+                                return;
+
+                            U.error(log, "Failed to process request of creating a snapshot " +
+                                "[from=" + nodeId + ", msg=" + reqMsg0 + ']', f.error());
+
+                            try {
+                                cctx.gridIO().sendToCustomTopic(nodeId,
+                                    DFLT_INITIAL_SNAPSHOT_TOPIC,
+                                    new SnapshotResponseMessage(reqMsg0.snapshotName(), f.error().getMessage()),
+                                    SYSTEM_POOL);
+                            }
+                            catch (IgniteCheckedException ex0) {
+                                U.error(log, "Fail to send the response message with processing snapshot request " +
+                                    "error [request=" + reqMsg0 + ", nodeId=" + nodeId + ']', ex0);
+                            }
+                        });
+
+                        task.start();
+                    }
+                    else if (msg instanceof SnapshotResponseMessage) {
+                        SnapshotResponseMessage respMsg0 = (SnapshotResponseMessage)msg;
+
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.snpName.equals(respMsg0.snapshotName())) {
+                            if (log.isInfoEnabled()) {
+                                log.info("A stale snapshot response message has been received. Will be ignored " +
+                                    "[fromNodeId=" + nodeId + ", response=" + respMsg0 + ']');
+                            }
+
+                            return;
+                        }
+
+                        if (respMsg0.errorMessage() != null) {
+                            fut0.onDone(new IgniteCheckedException("Request cancelled. The snapshot operation stopped " +
+                                "on the remote node with an error: " + respMsg0.errorMessage()));
+                        }
+                    }
+                }
+                catch (Throwable e) {
+                    U.error(log, "Processing snapshot request from remote node fails with an error", e);
+
+                    cctx.kernalContext().failure().process(new FailureContext(FailureType.CRITICAL_ERROR, e));
+                }
+                finally {
+                    busyLock.leaveBusy();
+                }
+            }
+        });
+
+        cctx.gridEvents().addDiscoveryEventListener(discoLsnr = (evt, discoCache) -> {
+            if (!busyLock.enterBusy())
+                return;
+
+            try {
+                UUID leftNodeId = evt.eventNode().id();
+
+                if (evt.type() == EVT_DISCOVERY_CUSTOM_EVT) {
+                    DiscoveryCustomEvent evt0 = (DiscoveryCustomEvent)evt;
+
+                    if (evt0.customMessage() instanceof InitMessage) {
+                        InitMessage<?> msg = (InitMessage<?>)evt0.customMessage();
+
+                        // This happens when #takeSnapshot() method already invoked and distributed process
+                        // starts its action.
+                        if (msg.type() == START_SNAPSHOT.ordinal()) {
+                            assert clusterSnpRq != null ||
+                                !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()) : evt;
+
+                            DiscoveryCustomEvent customEvt = new DiscoveryCustomEvent();
+
+                            customEvt.node(evt0.node());
+                            customEvt.eventNode(evt0.eventNode());
+                            customEvt.affinityTopologyVersion(evt0.affinityTopologyVersion());
+                            customEvt.customMessage(new SnapshotStartDiscoveryMessage(discoCache, msg.processId()));
+
+                            // Handle new event inside discovery thread, so no guarantees will be violated.
+                            cctx.exchange().onDiscoveryEvent(customEvt, discoCache);
+                        }
+                    }
+                }
+                else if (evt.type() == EVT_NODE_LEFT || evt.type() == EVT_NODE_FAILED) {
+                    SnapshotOperationRequest snpRq = clusterSnpRq;
+
+                    for (SnapshotFutureTask sctx : locSnpTasks.values()) {
+                        if (sctx.sourceNodeId().equals(leftNodeId) ||
+                            (snpRq != null &&
+                                snpRq.snpName.equals(sctx.snapshotName()) &&
+                                snpRq.bltNodes.contains(leftNodeId))) {
+                            sctx.acceptException(new ClusterTopologyCheckedException("The node which requested snapshot " +
+                                "creation has left the grid"));
+                        }
+                    }
+
+                    RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                    if (snpTrFut != null && snpTrFut.rmtNodeId.equals(leftNodeId)) {
+                        snpTrFut.onDone(new ClusterTopologyCheckedException("The node from which a snapshot has been " +
+                            "requested left the grid"));
+                    }
+                }
+            }
+            finally {
+                busyLock.leaveBusy();
+            }
+        }, EVT_NODE_LEFT, EVT_NODE_FAILED, EVT_DISCOVERY_CUSTOM_EVT);
+
+        // Remote snapshot handler.
+        cctx.kernalContext().io().addTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC, new TransmissionHandler() {
+            @Override public void onEnd(UUID nodeId) {
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                assert snpTrFut.stores.isEmpty() : snpTrFut.stores.entrySet();
+                assert snpTrFut.partsLeft == 0 : snpTrFut;
+
+                snpTrFut.onDone();
+
+                log.info("Requested snapshot from remote node has been fully received " +
+                    "[snpName=" + snpTrFut.snpName + ", snpTrans=" + snpTrFut + ']');
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onException(UUID nodeId, Throwable err) {
+                RemoteSnapshotFuture fut = rmtSnpReq.get();
+
+                if (fut == null)
+                    return;
+
+                if (fut.rmtNodeId.equals(nodeId))
+                    fut.onDone(err);
+            }
+
+            /** {@inheritDoc} */
+            @Override public String filePath(UUID nodeId, TransmissionMeta fileMeta) {
+                Integer partId = (Integer)fileMeta.params().get(SNP_PART_ID_PARAM);
+                String rmtDbNodePath = (String)fileMeta.params().get(SNP_DB_NODE_PATH_PARAM);
+                String cacheDirName = (String)fileMeta.params().get(SNP_CACHE_DIR_NAME_PARAM);
+
+                RemoteSnapshotFuture transFut = resolve(nodeId, fileMeta);
+
+                try {
+                    File cacheDir = U.resolveWorkDirectory(tmpWorkDir.getAbsolutePath(),
+                        Paths.get(transFut.snpName, rmtDbNodePath, cacheDirName).toString(),
+                        false);
+
+                    return new File(cacheDir, getPartitionFileName(partId)).getAbsolutePath();
+                }
+                catch (IgniteCheckedException e) {
+                    throw new IgniteException(e);
+                }
+            }
+
+            /**
+             * @param nodeId Remote node id.
+             * @param meta Transmission meta.
+             * @return Resolved transmission future.
+             */
+            private RemoteSnapshotFuture resolve(UUID nodeId, TransmissionMeta meta) {
+                String snpName = (String)meta.params().get(SNP_NAME_PARAM);
+                Integer partsCnt = (Integer)meta.params().get(SNP_PARTITIONS_CNT);
+
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                if (snpTrFut == null || !snpTrFut.snpName.equals(snpName)) {
+                    throw new TransmissionCancelledException("Stale snapshot transmission will be ignored " +
+                        "[snpName=" + snpName + ", meta=" + meta + ", snpTrFut=" + snpTrFut + ']');
+                }
+
+                assert snpTrFut.snpName.equals(snpName) && snpTrFut.rmtNodeId.equals(nodeId) :
+                    "Another transmission in progress [snpTrFut=" + snpTrFut + ", nodeId=" + snpName + ']';
+
+                if (snpTrFut.partsLeft == -1)
+                    snpTrFut.partsLeft = partsCnt;
+
+                return snpTrFut;
+            }
+
+            /**
+             * @param snpTrans Current snapshot transmission.
+             * @param grpPartId Pair of group id and its partition id.
+             */
+            private void finishRecover(RemoteSnapshotFuture snpTrans, GroupPartitionId grpPartId) {
+                FilePageStore pageStore = null;
+
+                try {
+                    pageStore = snpTrans.stores.remove(grpPartId);
+
+                    pageStore.finishRecover();
+
+                    snpTrans.partConsumer.accept(new File(pageStore.getFileAbsolutePath()), grpPartId);
+
+                    snpTrans.partsLeft--;
+                }
+                catch (StorageException e) {
+                    throw new IgniteException(e);
+                }
+                finally {
+                    U.closeQuiet(pageStore);
+                }
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<ByteBuffer> chunkHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+
+                RemoteSnapshotFuture snpTrFut = resolve(nodeId, initMeta);
+
+                GroupPartitionId grpPartId = new GroupPartitionId(grpId, partId);
+                FilePageStore pageStore = snpTrFut.stores.get(grpPartId);
+
+                if (pageStore == null) {
+                    throw new IgniteException("Partition must be loaded before applying snapshot delta pages " +
+                        "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                pageStore.beginRecover();
+
+                // No snapshot delta pages received. Finalize recovery.
+                if (initMeta.count() == 0)
+                    finishRecover(snpTrFut, grpPartId);
+
+                return new Consumer<ByteBuffer>() {
+                    final LongAdder transferred = new LongAdder();
+
+                    @Override public void accept(ByteBuffer buff) {
+                        try {
+                            assert initMeta.count() != 0 : initMeta;
+
+                            RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                            if (fut0 == null || !fut0.equals(snpTrFut) || fut0.isCancelled()) {
+                                throw new TransmissionCancelledException("Snapshot request is cancelled " +
+                                    "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                            }
+
+                            pageStore.write(PageIO.getPageId(buff), buff, 0, false);
+
+                            transferred.add(buff.capacity());
+
+                            if (transferred.longValue() == initMeta.count())
+                                finishRecover(snpTrFut, grpPartId);
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                    }
+                };
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<File> fileHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+                String snpName = (String)initMeta.params().get(SNP_NAME_PARAM);
+
+                assert grpId != null;
+                assert partId != null;
+                assert snpName != null;
+                assert storeFactory != null;
+
+                RemoteSnapshotFuture transFut = rmtSnpReq.get();
+
+                if (transFut == null) {
+                    throw new IgniteException("Snapshot transmission with given name doesn't exists " +
+                        "[snpName=" + snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                return new Consumer<File>() {
+                    @Override public void accept(File file) {
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.equals(transFut) || fut0.isCancelled()) {
+                            throw new TransmissionCancelledException("Snapshot request is cancelled [snpName=" + snpName +
+                                ", grpId=" + grpId + ", partId=" + partId + ']');
+                        }
+
+                        busyLock.enterBusy();
+
+                        try {
+                            FilePageStore pageStore = (FilePageStore)storeFactory
+                                .apply(grpId, false)
+                                .createPageStore(getFlagByPartId(partId),
+                                    file::toPath,
+                                    new LongAdderMetric("NO_OP", null));
+
+                            transFut.stores.put(new GroupPartitionId(grpId, partId), pageStore);
+
+                            pageStore.init();
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                        finally {
+                            busyLock.leaveBusy();
+                        }
+                    }
+                };
+            }
+        });
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void stop0(boolean cancel) {
+        busyLock.block();
+
+        try {
+            // Try stop all snapshot processing if not yet.
+            for (SnapshotFutureTask sctx : locSnpTasks.values())
+                sctx.acceptException(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+            locSnpTasks.clear();
+
+            RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+            if (snpTrFut != null)
+                snpTrFut.cancel();
+
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null) {
+                    clusterSnpFut.onDone(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+                    clusterSnpFut = null;
+                }
+            }
+
+            if (snpRunner != null)
+                snpRunner.shutdownNow();
+
+            cctx.kernalContext().io().removeMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC);
+            cctx.kernalContext().io().removeTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC);
+
+            if (discoLsnr != null)
+                cctx.kernalContext().event().removeDiscoveryEventListener(discoLsnr);
+
+            cctx.exchange().unregisterExchangeAwareComponent(this);
+        }
+        finally {
+            busyLock.unblock();
+        }
+    }
+
+    /**
+     * Concurrently traverse the snapshot directory for given local node folder name and
+     * delete recursively all files from it if exist.
+     *
+     * @param snpDir Snapshot dire
+     * @param folderName Local node folder name (see U.maskForFileName with consistent id).
+     */
+    public static void deleteSnapshot(File snpDir, String folderName) {
+        if (!snpDir.exists())
+            return;
+
+        assert snpDir.isDirectory() : snpDir;
+
+        try {
+            List<Path> dirs = new ArrayList<>();
+
+            Files.walkFileTree(snpDir.toPath(), new SimpleFileVisitor<Path>() {
+                @Override public FileVisitResult preVisitDirectory(Path dir,
+                    BasicFileAttributes attrs) throws IOException {
+                    if (Files.isDirectory(dir) &&
+                        Files.exists(dir) &&
+                        folderName.equals(dir.getFileName().toString())) {
+                        // Directory found, add it for processing.
+                        dirs.add(dir);
+                    }
+
+                    return super.preVisitDirectory(dir, attrs);
+                }
+
+                @Override public FileVisitResult visitFileFailed(Path file, IOException exc) {
+                    // Skip files which can be concurrently removed from FileTree.
+                    return FileVisitResult.CONTINUE;
+                }
+            });
+
+            dirs.forEach(U::delete);
+
+            File db = new File(snpDir, DB_DEFAULT_FOLDER);
+
+            if (!db.exists() || db.list().length == 0)
+                U.delete(snpDir);
+        }
+        catch (IOException e) {
+            throw new IgniteException(e);
+        }
+    }
+
+    /**
+     * @param snpName Snapshot name.
+     * @return Local snapshot directory for snapshot with given name.
+     */
+    public File snapshotLocalDir(String snpName) {
+        assert locSnpDir != null;
+
+        return new File(locSnpDir, snpName);
+    }
+
+    /**
+     * @return Node snapshot working directory.
+     */
+    public File snapshotTmpDir() {
+        assert tmpWorkDir != null;
+
+        return tmpWorkDir;
+    }
+
+    /**
+     * @param req Request on snapshot creation.
+     * @return Future which will be completed when a snapshot has been started.
+     */
+    private IgniteInternalFuture<SnapshotOperationResponse> startLocalSnapshot(SnapshotOperationRequest req) {
+        if (cctx.kernalContext().clientNode() ||
+            !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()))
+            return new GridFinishedFuture<>();
+
+        // Executed inside discovery notifier thread, prior to firing discovery custom event,
+        // so it is safe to set new snapshot task inside this method without synchronization.
+        if (clusterSnpRq != null) {
+            return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot operation has been rejected. " +
+                "Another snapshot operation in progress [req=" + req + ", curr=" + clusterSnpRq + ']'));
+        }
+
+        // Collection of pairs group and appropriate cache partition to be snapshot.
+        Map<Integer, Set<Integer>> parts = new HashMap<>();
+
+        for (Integer grpId : req.grpIds)
+            parts.put(grpId, null);
+
+        SnapshotFutureTask task0 = registerSnapshotTask(req.snpName,
+            req.srcNodeId,
+            parts,
+            locSndrFactory.apply(req.snpName));
+
+        clusterSnpRq = req;
+
+        return task0.chain(f -> new SnapshotOperationResponse());
+    }
+
+    /**
+     * @param id Request id.
+     * @param res Results.
+     * @param err Errors.
+     */
+    private void startLocalSnapshotResult(UUID id, Map<UUID, SnapshotOperationResponse> res, Map<UUID, Exception> err) {
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        if (snpRq == null)
+            return;
+
+        if (isLocalNodeCoordinator(cctx.discovery())) {
+            Set<UUID> missed = new HashSet<>(snpRq.bltNodes);
+            missed.removeAll(res.keySet());
+            missed.removeAll(err.keySet());
+
+            snpRq.hasErr = !F.isEmpty(err) || !missed.isEmpty();
+
+            if (snpRq.hasErr) {
+                U.warn(log, "Execution of local snapshot tasks fails or them haven't been executed " +
+                    "due to some of nodes left the cluster. Uncompleted snapshot will be deleted " +
+                    "[err=" + err + ", missed=" + missed + ']');
+            }
+
+            endSnpProc.start(UUID.randomUUID(), snpRq);
+        }
+    }
+
+    /**
+     * @param req Request on snapshot creation.
+     * @return Future which will be completed when the snapshot will be finalized.
+     */
+    private IgniteInternalFuture<SnapshotOperationResponse> endLocalSnapshot(SnapshotOperationRequest req) {
+        if (clusterSnpRq == null)
+            return new GridFinishedFuture<>(new SnapshotOperationResponse());
+
+        try {
+            if (req.hasErr)
+                deleteSnapshot(snapshotLocalDir(req.snpName), pdsSettings.folderName());
+
+            removeLastMetaStorageKey();
+        }
+        catch (Exception e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        return new GridFinishedFuture<>(new SnapshotOperationResponse());
+    }
+
+    /**
+     * @param id Request id.
+     * @param res Results.
+     * @param err Errors.
+     */
+    private void endLocalSnapshotResult(UUID id, Map<UUID, SnapshotOperationResponse> res, Map<UUID, Exception> err) {
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        if (snpRq == null)
+            return;
+
+        Set<UUID> endFail = new HashSet<>(snpRq.bltNodes);
+        endFail.removeAll(res.keySet());
+
+        clusterSnpRq = null;
+
+        synchronized (snpOpMux) {
+            if (clusterSnpFut != null) {
+                if (endFail.isEmpty() && !snpRq.hasErr) {
+                    clusterSnpFut.onDone();
+
+                    if (log.isInfoEnabled())
+                        log.info("Cluster-wide snapshot operation finished successfully [req=" + snpRq + ']');
+                }
+                else {
+                    clusterSnpFut.onDone(new IgniteCheckedException("Snapshot creation has been finished with an error. " +
+                        "Local snapshot tasks may not finished completely or finalizing results fails " +
+                        "[hasErr" + snpRq.hasErr + ", fail=" + endFail + ']'));
+                }
+
+                clusterSnpFut = null;
+            }
+        }
+    }
+
+    /**
+     * @return {@code True} if snapshot operation started.
+     */
+    public boolean inProgress() {
+        synchronized (snpOpMux) {
+            return clusterSnpRq != null || clusterSnpFut != null;
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public List<String> getSnapshots() {
+        if (cctx.kernalContext().clientNode())
+            throw new UnsupportedOperationException("Client and daemon nodes can not perform this operation.");
+
+        synchronized (snpOpMux) {
+            return Arrays.stream(locSnpDir.listFiles(File::isDirectory))
+                .map(File::getName)
+                .collect(Collectors.toList());
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public IgniteFuture<Void> createSnapshot(String name) {
+        if (cctx.kernalContext().clientNode()) {
+            return new IgniteFinishedFutureImpl<>(new UnsupportedOperationException("Client and daemon nodes can not " +
+                "perform this operation."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(cctx.discovery().allNodes(), PERSISTENCE_CACHE_SNAPSHOT)) {
+            return new IgniteFinishedFutureImpl<>(new IllegalStateException("Not all nodes in the cluster support " +
+                "a snapshot operation."));
+        }
+
+        if (!active(cctx.kernalContext().state().clusterState().state())) {
+            return new IgniteFinishedFutureImpl<>(new IgniteException("Snapshot operation has been rejected. " +
+                "The cluster is inactive."));
+        }
+
+        DiscoveryDataClusterState clusterState = cctx.kernalContext().state().clusterState();
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(new IgniteException("Snapshot operation has been rejected. " +
+                "The baseline topology is not configured for cluster."));
+        }
+
+        GridFutureAdapter<Void> snpFut0;
+
+        synchronized (snpOpMux) {
+            if (clusterSnpFut != null && !clusterSnpFut.isDone()) {
+                return new IgniteFinishedFutureImpl<>(new IgniteException("Create snapshot request has been rejected. " +
+                    "The previous snapshot operation was not completed."));
+            }
+
+            if (clusterSnpRq != null) {
+                return new IgniteFinishedFutureImpl<>(new IgniteException("Create snapshot request has been rejected. " +
+                    "Parallel snapshot processes are not allowed."));
+            }
+
+            if (getSnapshots().contains(name))
+                return new IgniteFinishedFutureImpl<>(new IgniteException("Create snapshot request has been rejected. " +
+                    "Snapshot with given name already exists."));
+
+            snpFut0 = new GridFutureAdapter<>();
+
+            clusterSnpFut = snpFut0;
+        }
+
+        List<Integer> grps = cctx.cache().persistentGroups().stream()
+            .filter(g -> cctx.cache().cacheType(g.cacheOrGroupName()) == CacheType.USER)
+            .filter(g -> !g.config().isEncryptionEnabled())
+            .map(CacheGroupDescriptor::groupId)
+            .collect(Collectors.toList());
+
+        List<ClusterNode> srvNodes = cctx.discovery().serverNodes(AffinityTopologyVersion.NONE);
+
+        startSnpProc.start(UUID.randomUUID(), new SnapshotOperationRequest(cctx.localNodeId(),
+            name,
+            grps,
+            new HashSet<>(F.viewReadOnly(srvNodes,
+                F.node2id(),
+                (node) -> CU.baselineNode(node, clusterState)))));
+
+        if (log.isInfoEnabled())
+            log.info("Cluster-wide snapshot operation started [snpName=" + name + ", grps=" + grps + ']');
+
+        return new IgniteFutureImpl<>(snpFut0);
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onReadyForReadWrite(ReadWriteMetastorage metaStorage) throws IgniteCheckedException {
+        synchronized (snpOpMux) {
+            this.metaStorage = metaStorage;
+
+            if (recovered)
+                removeLastMetaStorageKey();
+
+            recovered = false;
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onReadyForRead(ReadOnlyMetastorage metaStorage) throws IgniteCheckedException {
+        // Snapshot which has not been completed due to the local node crashed must be deleted.
+        String snpName = (String)metaStorage.read(SNP_RUNNING_KEY);
+
+        if (snpName == null)
+            return;
+
+        recovered = true;
+
+        for (File tmp : snapshotTmpDir().listFiles())
+            U.delete(tmp);
+
+        deleteSnapshot(snapshotLocalDir(snpName), pdsSettings.folderName());
+
+        if (log.isInfoEnabled()) {
+            log.info("Previous attempt to create snapshot fail due to the local node crash. All resources " +
+                "related to snapshot operation have been deleted: " + snpName);
+        }
+    }
+
+    /**
+     * @param evt Discovery event to check.
+     * @return {@code true} if exchange started by snapshot operation.
+     */
+    public static boolean isSnapshotOperation(DiscoveryEvent evt) {
+        return !evt.eventNode().isClient() &&
+            evt.type() == EVT_DISCOVERY_CUSTOM_EVT &&
+            ((DiscoveryCustomEvent)evt).customMessage() instanceof SnapshotStartDiscoveryMessage;
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onDoneBeforeTopologyUnlock(GridDhtPartitionsExchangeFuture fut) {
+        if (clusterSnpRq == null || cctx.kernalContext().clientNode())
+            return;
+
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        SnapshotFutureTask task = locSnpTasks.get(snpRq.snpName);
+
+        if (task == null)
+            return;
+
+        if (task.start()) {
+            cctx.database().forceCheckpoint(String.format("Start snapshot operation: %s", snpRq.snpName));
+
+            // schedule task on checkpoint and wait when it starts
+            try {
+                task.awaitStarted();
+            }
+            catch (IgniteCheckedException e) {
+                U.error(log, "Fail to wait while cluster-wide snapshot operation started", e);
+            }
+        }
+    }
+
+    /**
+     * @param parts Collection of pairs group and appropriate cache partition to be snapshot.
+     * @param rmtNodeId The remote node to connect to.
+     * @param partConsumer Received partition handler.
+     * @return Future which will be completed when requested snapshot fully received.
+     */
+    public IgniteInternalFuture<Void> createRemoteSnapshot(
+        UUID rmtNodeId,
+        Map<Integer, Set<Integer>> parts,
+        BiConsumer<File, GroupPartitionId> partConsumer
+    ) {
+        assert partConsumer != null;
+
+        ClusterNode rmtNode = cctx.discovery().node(rmtNodeId);
+
+        if (!nodeSupports(rmtNode, PERSISTENCE_CACHE_SNAPSHOT))
+            return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot on remote node is not supported: " + rmtNode.id()));
+
+        if (rmtNode == null) {
+            return new GridFinishedFuture<>(new ClusterTopologyCheckedException("Snapshot request cannot be performed. " +
+                "Remote node left the grid [rmtNodeId=" + rmtNodeId + ']'));
+        }
+
+        String snpName = RMT_SNAPSHOT_PREFIX + UUID.randomUUID().toString();
+
+        RemoteSnapshotFuture snpTransFut = new RemoteSnapshotFuture(rmtNodeId, snpName, partConsumer);
+
+        busyLock.enterBusy();
+        SnapshotRequestMessage msg0;
+
+        try {
+            msg0 = new SnapshotRequestMessage(snpName, parts);
+
+            RemoteSnapshotFuture fut = rmtSnpReq.get();
+
+            try {
+                if (fut != null)
+                    fut.get(DFLT_SNAPSHOT_TIMEOUT, TimeUnit.MILLISECONDS);
+            }
+            catch (IgniteCheckedException e) {
+                if (log.isInfoEnabled())
+                    log.info("The previous snapshot request finished with an exception:" + e.getMessage());
+            }
+
+            try {
+                if (rmtSnpReq.compareAndSet(null, snpTransFut)) {
+                    cctx.gridIO().sendOrderedMessage(rmtNode, DFLT_INITIAL_SNAPSHOT_TOPIC, msg0, SYSTEM_POOL,
+                        Long.MAX_VALUE, true);
+                }
+                else
+                    return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot request has been concurrently interrupted."));
+
+            }
+            catch (IgniteCheckedException e) {
+                rmtSnpReq.compareAndSet(snpTransFut, null);
+
+                return new GridFinishedFuture<>(e);
+            }
+        }
+        finally {
+            busyLock.leaveBusy();
+        }
+
+        if (log.isInfoEnabled()) {
+            log.info("Snapshot request is sent to the remote node [rmtNodeId=" + rmtNodeId +
+                ", msg0=" + msg0 + ", snpTransFut=" + snpTransFut +
+                ", topVer=" + cctx.discovery().topologyVersionEx() + ']');
+        }
+
+        return snpTransFut;
+    }
+
+    /**
+     * @param grps List of cache groups which will be destroyed.
+     */
+    public void onCacheGroupsStopped(List<Integer> grps) {
+        for (SnapshotFutureTask sctx : locSnpTasks.values()) {
+            Set<Integer> retain = new HashSet<>(grps);
+            retain.retainAll(sctx.affectedCacheGroups());
+
+            if (!retain.isEmpty()) {
+                sctx.acceptException(new IgniteCheckedException("Snapshot has been interrupted due to some of the required " +
+                    "cache groups stopped: " + retain));
+            }
+        }
+    }
+
+    /**
+     * @param snpName Unique snapshot name.
+     * @param srcNodeId Node id which cause snapshot operation.
+     * @param parts Collection of pairs group and appropriate cache partition to be snapshot.
+     * @param snpSndr Factory which produces snapshot receiver instance.
+     * @return Snapshot operation task which should be registered on checkpoint to run.
+     */
+    SnapshotFutureTask registerSnapshotTask(
+        String snpName,
+        UUID srcNodeId,
+        Map<Integer, Set<Integer>> parts,
+        SnapshotSender snpSndr
+    ) {
+        if (!busyLock.enterBusy())
+            return new SnapshotFutureTask(new IgniteCheckedException("Snapshot manager is stopping [locNodeId=" + cctx.localNodeId() + ']'));
+
+        try {
+            if (locSnpTasks.containsKey(snpName))
+                return new SnapshotFutureTask(new IgniteCheckedException("Snapshot with requested name is already scheduled: " + snpName));
+
+            SnapshotFutureTask snpFutTask;
+
+            SnapshotFutureTask prev = locSnpTasks.putIfAbsent(snpName,
+                snpFutTask = new SnapshotFutureTask(cctx,
+                    srcNodeId,
+                    snpName,
+                    tmpWorkDir,
+                    ioFactory,
+                    snpSndr,
+                    parts,
+                    locBuff));
+
+            if (prev != null)
+                return new SnapshotFutureTask(new IgniteCheckedException("Snapshot with requested name is already scheduled: " + snpName));
+
+            if (log.isInfoEnabled()) {
+                log.info("Snapshot task has been registered on local node [sctx=" + this +
+                    ", topVer=" + cctx.discovery().topologyVersionEx() + ']');
+            }
+
+            snpFutTask.listen(f -> locSnpTasks.remove(snpName));
+
+            return snpFutTask;
+        }
+        finally {
+            busyLock.leaveBusy();
+        }
+    }
+
+    /**
+     * @return Snapshot receiver instance.
+     */
+    SnapshotSender localSnapshotSender(String snpName) {
+        return new LocalSnapshotSender(snpName);
+    }
+
+    /**
+     * @param snpName Snapshot name.
+     * @param rmtNodeId Remote node id to send snapshot to.
+     * @return Snapshot sender instance.
+     */
+    SnapshotSender remoteSnapshotSender(String snpName, UUID rmtNodeId) {
+        // Remote snapshots can be send only by single threaded executor since only one transmissionSender created.
+        return new RemoteSnapshotSender(log,
+            new SequentialExecutorWrapper(log, snpRunner),
+            () -> igniteCacheStoragePath(pdsSettings),
+            cctx.gridIO().openTransmissionSender(rmtNodeId, DFLT_INITIAL_SNAPSHOT_TOPIC),
+            snpName);
+    }
+
+    /** Snapshot finished successfully or already restored. Key can be removed. */
+    private void removeLastMetaStorageKey() throws IgniteCheckedException {
+        cctx.database().checkpointReadLock();
+
+        try {
+            metaStorage.remove(SNP_RUNNING_KEY);
+        }
+        finally {
+            cctx.database().checkpointReadUnlock();
+        }
+    }
+
+    /**
+     * @return The executor service used to run snapshot tasks.
+     */
+    ExecutorService snapshotExecutorService() {
+        assert snpRunner != null;
+
+        return snpRunner;
+    }
+
+    /**
+     * @param ioFactory Factory to create IO interface over a page stores.
+     */
+    void ioFactory(FileIOFactory ioFactory) {
+        this.ioFactory = ioFactory;
+    }
+
+    /**
+     * @param nodeId Remote node id on which requests has been registered.
+     * @return Snapshot future related to given node id.
+     */
+    SnapshotFutureTask lastScheduledRemoteSnapshotTask(UUID nodeId) {
+        return locSnpTasks.values().stream()
+            .filter(t -> t.type() == RemoteSnapshotSender.class && t.sourceNodeId().equals(nodeId))
+            .findFirst()
+            .orElse(null);
+    }
+
+    /**
+     * @return Relative configured path of persistence data storage directory for the local node.
+     * Example: {@code snapshotWorkDir/db/IgniteNodeName0}
+     */
+    static String igniteCacheStoragePath(PdsFolderSettings pcfg) {
+        return Paths.get(DB_DEFAULT_FOLDER, pcfg.folderName()).toString();
+    }
+
+    /**
+     * @param cfg Ignite configuration.
+     * @return Snapshot work path.
+     */
+    static Path snapshotPath(IgniteConfiguration cfg) {
 
 Review comment:
   Fixed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r407992112
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManager.java
 ##########
 @@ -0,0 +1,1894 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.channels.FileChannel;
+import java.nio.file.FileVisitResult;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.SimpleFileVisitor;
+import java.nio.file.attribute.BasicFileAttributes;
+import java.util.ArrayDeque;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Queue;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.Executor;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.concurrent.atomic.LongAdder;
+import java.util.function.BiConsumer;
+import java.util.function.BiFunction;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.events.DiscoveryEvent;
+import org.apache.ignite.failure.FailureContext;
+import org.apache.ignite.failure.FailureType;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.GridTopic;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.events.DiscoveryCustomEvent;
+import org.apache.ignite.internal.managers.communication.GridIoManager;
+import org.apache.ignite.internal.managers.communication.GridMessageListener;
+import org.apache.ignite.internal.managers.communication.TransmissionCancelledException;
+import org.apache.ignite.internal.managers.communication.TransmissionHandler;
+import org.apache.ignite.internal.managers.communication.TransmissionMeta;
+import org.apache.ignite.internal.managers.communication.TransmissionPolicy;
+import org.apache.ignite.internal.managers.discovery.DiscoCache;
+import org.apache.ignite.internal.managers.discovery.DiscoveryCustomMessage;
+import org.apache.ignite.internal.managers.discovery.GridDiscoveryManager;
+import org.apache.ignite.internal.managers.eventstorage.DiscoveryEventListener;
+import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.CacheType;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.PartitionsExchangeAware;
+import org.apache.ignite.internal.processors.cache.persistence.StorageException;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.filename.PdsFolderSettings;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.MetastorageLifecycleListener;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadOnlyMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadWriteMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.tree.io.PageIO;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.processors.marshaller.MappedName;
+import org.apache.ignite.internal.processors.metric.impl.LongAdderMetric;
+import org.apache.ignite.internal.util.GridBusyLock;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.distributed.InitMessage;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.tostring.GridToStringInclude;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.S;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.lang.IgniteUuid;
+import org.apache.ignite.thread.IgniteThreadPoolExecutor;
+import org.apache.ignite.thread.OomExceptionHandler;
+import org.jetbrains.annotations.Nullable;
+
+import static java.nio.file.StandardOpenOption.READ;
+import static org.apache.ignite.cluster.ClusterState.active;
+import static org.apache.ignite.configuration.IgniteConfiguration.DFLT_SNAPSHOT_DIRECTORY;
+import static org.apache.ignite.events.EventType.EVT_NODE_FAILED;
+import static org.apache.ignite.events.EventType.EVT_NODE_LEFT;
+import static org.apache.ignite.internal.IgniteFeatures.PERSISTENCE_CACHE_SNAPSHOT;
+import static org.apache.ignite.internal.IgniteFeatures.nodeSupports;
+import static org.apache.ignite.internal.MarshallerContextImpl.saveMappings;
+import static org.apache.ignite.internal.events.DiscoveryCustomEvent.EVT_DISCOVERY_CUSTOM_EVT;
+import static org.apache.ignite.internal.managers.communication.GridIoPolicy.SYSTEM_POOL;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.INDEX_PARTITION;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.MAX_PARTITION_ID;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.INDEX_FILE_NAME;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.PART_FILE_TEMPLATE;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFile;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFileName;
+import static org.apache.ignite.internal.processors.cache.persistence.filename.PdsConsistentIdProcessor.DB_DEFAULT_FOLDER;
+import static org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId.getFlagByPartId;
+import static org.apache.ignite.internal.util.IgniteUtils.isLocalNodeCoordinator;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.END_SNAPSHOT;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.START_SNAPSHOT;
+
+/**
+ * Internal implementation of snapshot operations over persistence caches.
+ * <p>
+ * There are two major actions available:
+ * <ul>
+ *     <li>Create snapshot of the whole cluster cache groups by triggering PME to achieve consistency.</li>
+ *     <li>Create local snapshot of requested cache groups and send it to the node which request this operation.
+ *     Cache groups will be transmitted using internal API for transferring files. See {@link TransmissionHandler}.</li>
+ * </ul>
+ */
+public class IgniteSnapshotManager extends GridCacheSharedManagerAdapter
+    implements IgniteSnapshot, PartitionsExchangeAware, MetastorageLifecycleListener {
+    /** File with delta pages suffix. */
+    public static final String DELTA_SUFFIX = ".delta";
+
+    /** File name template consists of delta pages. */
+    public static final String PART_DELTA_TEMPLATE = PART_FILE_TEMPLATE + DELTA_SUFFIX;
+
+    /** File name template for index delta pages. */
+    public static final String INDEX_DELTA_NAME = INDEX_FILE_NAME + DELTA_SUFFIX;
+
+    /** Text Reason for checkpoint to start snapshot operation. */
+    public static final String CP_SNAPSHOT_REASON = "Checkpoint started to enforce snapshot operation: %s";
+
+    /** Name prefix for each remote snapshot operation. */
+    public static final String RMT_SNAPSHOT_PREFIX = "snapshot_";
+
+    /** Default snapshot directory for loading remote snapshots. */
+    public static final String DFLT_SNAPSHOT_TMP_DIR = "snp";
+
+    /** Timeout in millisecond for snapshot operations. */
+    public static final long DFLT_SNAPSHOT_TIMEOUT = 15_000L;
+
+    /** Snapshot in progress error message. */
+    public static final String SNP_IN_PROGRESS_ERR_MSG = "Operation rejected due to the snapshot operation in progress.";
+
+    /** Error message to finalize snapshot tasks. */
+    public static final String SNP_NODE_STOPPING_ERR_MSG = "Snapshot has been cancelled due to the local node " +
+        "is stopping";
+
+    /** Metastorage key to save currently running snapshot. */
+    public static final String SNP_RUNNING_KEY = "snapshot-running";
+
+    /** Prefix for snapshot threads. */
+    private static final String SNAPSHOT_RUNNER_THREAD_PREFIX = "snapshot-runner";
+
+    /** Total number of thread to perform local snapshot. */
+    private static final int SNAPSHOT_THREAD_POOL_SIZE = 4;
+
+    /** Default snapshot topic to receive snapshots from remote node. */
+    private static final Object DFLT_INITIAL_SNAPSHOT_TOPIC = GridTopic.TOPIC_SNAPSHOT.topic("rmt_snp");
+
+    /** File transmission parameter of cache group id. */
+    private static final String SNP_GRP_ID_PARAM = "grpId";
+
+    /** File transmission parameter of cache partition id. */
+    private static final String SNP_PART_ID_PARAM = "partId";
+
+    /** File transmission parameter of node-sender directory path with its consistentId (e.g. db/IgniteNode0). */
+    private static final String SNP_DB_NODE_PATH_PARAM = "dbNodePath";
+
+    /** File transmission parameter of a cache directory with is currently sends its partitions. */
+    private static final String SNP_CACHE_DIR_NAME_PARAM = "cacheDirName";
+
+    /** Snapshot parameter name for a file transmission. */
+    private static final String SNP_NAME_PARAM = "snpName";
+
+    /** Total snapshot files count which receiver should expect to receive. */
+    private static final String SNP_PARTITIONS_CNT = "partsCnt";
+
+    /**
+     * Local buffer to perform copy-on-write operations with pages for {@code SnapshotFutureTask.PageStoreSerialWriter}s.
+     * It is important to have only only buffer per thread (instead of creating each buffer per
+     * each {@code SnapshotFutureTask.PageStoreSerialWriter}) this is redundant and can lead to OOM errors. Direct buffer
+     * deallocate only when ByteBuffer is garbage collected, but it can get out of off-heap memory before it.
+     */
+    private final ThreadLocal<ByteBuffer> locBuff;
+
+    /** Map of registered cache snapshot processes and their corresponding contexts. */
+    private final ConcurrentMap<String, SnapshotFutureTask> locSnpTasks = new ConcurrentHashMap<>();
+
+    /** Lock to protect the resources is used. */
+    private final GridBusyLock busyLock = new GridBusyLock();
+
+    /** Requested snapshot from remote node. */
+    private final AtomicReference<RemoteSnapshotFuture> rmtSnpReq = new AtomicReference<>();
+
+    /** Mutex used to order cluster snapshot operation progress. */
+    private final Object snpOpMux = new Object();
+
+    /** Take snapshot operation procedure. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> startSnpProc;
+
+    /** Check previously preformed snapshot operation and delete uncompleted files if need. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> endSnpProc;
+
+    /** Resolved persistent data storage settings. */
+    private volatile PdsFolderSettings pdsSettings;
+
+    /** Fully initialized metastorage. */
+    private volatile ReadWriteMetastorage metaStorage;
+
+    /** Local snapshot sender factory. */
+    private Function<String, SnapshotSender> locSndrFactory = this::localSnapshotSender;
+
+    /** Main snapshot directory to save created snapshots. */
+    private volatile File locSnpDir;
+
+    /**
+     * Working directory for loaded snapshots from the remote nodes and storing
+     * temporary partition delta-files of locally started snapshot process.
+     */
+    private File tmpWorkDir;
+
+    /** Factory to working with delta as file storage. */
+    private volatile FileIOFactory ioFactory = new RandomAccessFileIOFactory();
+
+    /** Factory to create page store for restore. */
+    private volatile BiFunction<Integer, Boolean, FilePageStoreFactory> storeFactory;
+
+    /** Snapshot thread pool to perform local partition snapshots. */
+    private ExecutorService snpRunner;
+
+    /** System discovery message listener. */
+    private DiscoveryEventListener discoLsnr;
+
+    /** Cluster snapshot operation requested by user. */
+    private GridFutureAdapter<Void> clusterSnpFut;
+
+    /** Current snapshot operation on local node. */
+    private volatile SnapshotOperationRequest clusterSnpRq;
+
+    /** {@code true} if recovery process occurred for snapshot. */
+    private volatile boolean recovered;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public IgniteSnapshotManager(GridKernalContext ctx) {
+        locBuff = ThreadLocal.withInitial(() ->
+            ByteBuffer.allocateDirect(ctx.config().getDataStorageConfiguration().getPageSize())
+                .order(ByteOrder.nativeOrder()));
+
+        startSnpProc = new DistributedProcess<>(ctx, START_SNAPSHOT, this::startLocalSnapshot,
+            this::startLocalSnapshotResult);
+
+        endSnpProc = new DistributedProcess<>(ctx, END_SNAPSHOT, this::endLocalSnapshot,
+            this::endLocalSnapshotResult);
+    }
+
+    /**
+     * @param snapshotCacheDir Snapshot directory to store files.
+     * @param partId Cache partition identifier.
+     * @return A file representation.
+     */
+    public static File partDeltaFile(File snapshotCacheDir, int partId) {
+        return new File(snapshotCacheDir, partDeltaFileName(partId));
+    }
+
+    /**
+     * @param partId Partition id.
+     * @return File name of delta partition pages.
+     */
+    public static String partDeltaFileName(int partId) {
+        assert partId <= MAX_PARTITION_ID || partId == INDEX_PARTITION;
+
+        return partId == INDEX_PARTITION ? INDEX_DELTA_NAME : String.format(PART_DELTA_TEMPLATE, partId);
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void start0() throws IgniteCheckedException {
+        super.start0();
+
+        GridKernalContext ctx = cctx.kernalContext();
+
+        if (ctx.clientNode())
+            return;
+
+        if (!CU.isPersistenceEnabled(ctx.config()))
+            return;
+
+        snpRunner = new IgniteThreadPoolExecutor(SNAPSHOT_RUNNER_THREAD_PREFIX,
+            cctx.igniteInstanceName(),
+            SNAPSHOT_THREAD_POOL_SIZE,
+            SNAPSHOT_THREAD_POOL_SIZE,
+            IgniteConfiguration.DFLT_THREAD_KEEP_ALIVE_TIME,
+            new LinkedBlockingQueue<>(),
+            SYSTEM_POOL,
+            new OomExceptionHandler(ctx));
+
+        assert cctx.pageStore() instanceof FilePageStoreManager;
+
+        FilePageStoreManager storeMgr = (FilePageStoreManager)cctx.pageStore();
+
+        pdsSettings = cctx.kernalContext().pdsFolderResolver().resolveFolders();
+
+        locSnpDir = snapshotPath(ctx.config()).toFile();
+        tmpWorkDir = Paths.get(storeMgr.workDir().getAbsolutePath(), DFLT_SNAPSHOT_TMP_DIR).toFile();
+
+        U.ensureDirectory(locSnpDir, "snapshot work directory", log);
+        U.ensureDirectory(tmpWorkDir, "temp directory for snapshot creation", log);
+
+        storeFactory = storeMgr::getPageStoreFactory;
+
+        cctx.exchange().registerExchangeAwareComponent(this);
+        ctx.internalSubscriptionProcessor().registerMetastorageListener(this);
+
+        // Receive remote snapshots requests.
+        cctx.gridIO().addMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC, new GridMessageListener() {
+            @Override public void onMessage(UUID nodeId, Object msg, byte plc) {
+                if (!busyLock.enterBusy())
+                    return;
+
+                try {
+                    if (msg instanceof SnapshotRequestMessage) {
+                        SnapshotRequestMessage reqMsg0 = (SnapshotRequestMessage)msg;
+                        String snpName = reqMsg0.snapshotName();
+
+                        synchronized (this) {
+                            SnapshotFutureTask task = lastScheduledRemoteSnapshotTask(nodeId);
+
+                            if (task != null) {
+                                // Task will also be removed from local map due to the listener on future done.
+                                task.cancel();
+
+                                log.info("Snapshot request has been cancelled due to another request received " +
+                                    "[prevSnpResp=" + task + ", msg0=" + reqMsg0 + ']');
+                            }
+                        }
+
+                        SnapshotFutureTask task = registerSnapshotTask(snpName,
+                            nodeId,
+                            reqMsg0.parts(),
+                            remoteSnapshotSender(snpName, nodeId));
+
+                        task.listen(f -> {
+                            if (f.error() == null)
+                                return;
+
+                            U.error(log, "Failed to process request of creating a snapshot " +
+                                "[from=" + nodeId + ", msg=" + reqMsg0 + ']', f.error());
+
+                            try {
+                                cctx.gridIO().sendToCustomTopic(nodeId,
+                                    DFLT_INITIAL_SNAPSHOT_TOPIC,
+                                    new SnapshotResponseMessage(reqMsg0.snapshotName(), f.error().getMessage()),
+                                    SYSTEM_POOL);
+                            }
+                            catch (IgniteCheckedException ex0) {
+                                U.error(log, "Fail to send the response message with processing snapshot request " +
+                                    "error [request=" + reqMsg0 + ", nodeId=" + nodeId + ']', ex0);
+                            }
+                        });
+
+                        task.start();
+                    }
+                    else if (msg instanceof SnapshotResponseMessage) {
+                        SnapshotResponseMessage respMsg0 = (SnapshotResponseMessage)msg;
+
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.snpName.equals(respMsg0.snapshotName())) {
+                            if (log.isInfoEnabled()) {
+                                log.info("A stale snapshot response message has been received. Will be ignored " +
+                                    "[fromNodeId=" + nodeId + ", response=" + respMsg0 + ']');
+                            }
+
+                            return;
+                        }
+
+                        if (respMsg0.errorMessage() != null) {
+                            fut0.onDone(new IgniteCheckedException("Request cancelled. The snapshot operation stopped " +
+                                "on the remote node with an error: " + respMsg0.errorMessage()));
+                        }
+                    }
+                }
+                catch (Throwable e) {
+                    U.error(log, "Processing snapshot request from remote node fails with an error", e);
+
+                    cctx.kernalContext().failure().process(new FailureContext(FailureType.CRITICAL_ERROR, e));
+                }
+                finally {
+                    busyLock.leaveBusy();
+                }
+            }
+        });
+
+        cctx.gridEvents().addDiscoveryEventListener(discoLsnr = (evt, discoCache) -> {
+            if (!busyLock.enterBusy())
+                return;
+
+            try {
+                UUID leftNodeId = evt.eventNode().id();
+
+                if (evt.type() == EVT_DISCOVERY_CUSTOM_EVT) {
+                    DiscoveryCustomEvent evt0 = (DiscoveryCustomEvent)evt;
+
+                    if (evt0.customMessage() instanceof InitMessage) {
+                        InitMessage<?> msg = (InitMessage<?>)evt0.customMessage();
+
+                        // This happens when #takeSnapshot() method already invoked and distributed process
+                        // starts its action.
+                        if (msg.type() == START_SNAPSHOT.ordinal()) {
+                            assert clusterSnpRq != null ||
+                                !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()) : evt;
+
+                            DiscoveryCustomEvent customEvt = new DiscoveryCustomEvent();
+
+                            customEvt.node(evt0.node());
+                            customEvt.eventNode(evt0.eventNode());
+                            customEvt.affinityTopologyVersion(evt0.affinityTopologyVersion());
+                            customEvt.customMessage(new SnapshotStartDiscoveryMessage(discoCache, msg.processId()));
+
+                            // Handle new event inside discovery thread, so no guarantees will be violated.
+                            cctx.exchange().onDiscoveryEvent(customEvt, discoCache);
+                        }
+                    }
+                }
+                else if (evt.type() == EVT_NODE_LEFT || evt.type() == EVT_NODE_FAILED) {
+                    SnapshotOperationRequest snpRq = clusterSnpRq;
+
+                    for (SnapshotFutureTask sctx : locSnpTasks.values()) {
+                        if (sctx.sourceNodeId().equals(leftNodeId) ||
+                            (snpRq != null &&
+                                snpRq.snpName.equals(sctx.snapshotName()) &&
+                                snpRq.bltNodes.contains(leftNodeId))) {
+                            sctx.acceptException(new ClusterTopologyCheckedException("The node which requested snapshot " +
+                                "creation has left the grid"));
+                        }
+                    }
+
+                    RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                    if (snpTrFut != null && snpTrFut.rmtNodeId.equals(leftNodeId)) {
+                        snpTrFut.onDone(new ClusterTopologyCheckedException("The node from which a snapshot has been " +
+                            "requested left the grid"));
+                    }
+                }
+            }
+            finally {
+                busyLock.leaveBusy();
+            }
+        }, EVT_NODE_LEFT, EVT_NODE_FAILED, EVT_DISCOVERY_CUSTOM_EVT);
+
+        // Remote snapshot handler.
+        cctx.kernalContext().io().addTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC, new TransmissionHandler() {
+            @Override public void onEnd(UUID nodeId) {
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                assert snpTrFut.stores.isEmpty() : snpTrFut.stores.entrySet();
+                assert snpTrFut.partsLeft == 0 : snpTrFut;
+
+                snpTrFut.onDone();
+
+                log.info("Requested snapshot from remote node has been fully received " +
+                    "[snpName=" + snpTrFut.snpName + ", snpTrans=" + snpTrFut + ']');
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onException(UUID nodeId, Throwable err) {
+                RemoteSnapshotFuture fut = rmtSnpReq.get();
+
+                if (fut == null)
+                    return;
+
+                if (fut.rmtNodeId.equals(nodeId))
+                    fut.onDone(err);
+            }
+
+            /** {@inheritDoc} */
+            @Override public String filePath(UUID nodeId, TransmissionMeta fileMeta) {
+                Integer partId = (Integer)fileMeta.params().get(SNP_PART_ID_PARAM);
+                String rmtDbNodePath = (String)fileMeta.params().get(SNP_DB_NODE_PATH_PARAM);
+                String cacheDirName = (String)fileMeta.params().get(SNP_CACHE_DIR_NAME_PARAM);
+
+                RemoteSnapshotFuture transFut = resolve(nodeId, fileMeta);
+
+                try {
+                    File cacheDir = U.resolveWorkDirectory(tmpWorkDir.getAbsolutePath(),
+                        Paths.get(transFut.snpName, rmtDbNodePath, cacheDirName).toString(),
+                        false);
+
+                    return new File(cacheDir, getPartitionFileName(partId)).getAbsolutePath();
+                }
+                catch (IgniteCheckedException e) {
+                    throw new IgniteException(e);
+                }
+            }
+
+            /**
+             * @param nodeId Remote node id.
+             * @param meta Transmission meta.
+             * @return Resolved transmission future.
+             */
+            private RemoteSnapshotFuture resolve(UUID nodeId, TransmissionMeta meta) {
+                String snpName = (String)meta.params().get(SNP_NAME_PARAM);
+                Integer partsCnt = (Integer)meta.params().get(SNP_PARTITIONS_CNT);
+
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                if (snpTrFut == null || !snpTrFut.snpName.equals(snpName)) {
+                    throw new TransmissionCancelledException("Stale snapshot transmission will be ignored " +
+                        "[snpName=" + snpName + ", meta=" + meta + ", snpTrFut=" + snpTrFut + ']');
+                }
+
+                assert snpTrFut.snpName.equals(snpName) && snpTrFut.rmtNodeId.equals(nodeId) :
+                    "Another transmission in progress [snpTrFut=" + snpTrFut + ", nodeId=" + snpName + ']';
+
+                if (snpTrFut.partsLeft == -1)
+                    snpTrFut.partsLeft = partsCnt;
+
+                return snpTrFut;
+            }
+
+            /**
+             * @param snpTrans Current snapshot transmission.
+             * @param grpPartId Pair of group id and its partition id.
+             */
+            private void finishRecover(RemoteSnapshotFuture snpTrans, GroupPartitionId grpPartId) {
+                FilePageStore pageStore = null;
+
+                try {
+                    pageStore = snpTrans.stores.remove(grpPartId);
+
+                    pageStore.finishRecover();
+
+                    snpTrans.partConsumer.accept(new File(pageStore.getFileAbsolutePath()), grpPartId);
+
+                    snpTrans.partsLeft--;
+                }
+                catch (StorageException e) {
+                    throw new IgniteException(e);
+                }
+                finally {
+                    U.closeQuiet(pageStore);
+                }
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<ByteBuffer> chunkHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+
+                RemoteSnapshotFuture snpTrFut = resolve(nodeId, initMeta);
+
+                GroupPartitionId grpPartId = new GroupPartitionId(grpId, partId);
+                FilePageStore pageStore = snpTrFut.stores.get(grpPartId);
+
+                if (pageStore == null) {
+                    throw new IgniteException("Partition must be loaded before applying snapshot delta pages " +
+                        "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                pageStore.beginRecover();
+
+                // No snapshot delta pages received. Finalize recovery.
+                if (initMeta.count() == 0)
+                    finishRecover(snpTrFut, grpPartId);
+
+                return new Consumer<ByteBuffer>() {
+                    final LongAdder transferred = new LongAdder();
+
+                    @Override public void accept(ByteBuffer buff) {
+                        try {
+                            assert initMeta.count() != 0 : initMeta;
+
+                            RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                            if (fut0 == null || !fut0.equals(snpTrFut) || fut0.isCancelled()) {
+                                throw new TransmissionCancelledException("Snapshot request is cancelled " +
+                                    "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                            }
+
+                            pageStore.write(PageIO.getPageId(buff), buff, 0, false);
+
+                            transferred.add(buff.capacity());
+
+                            if (transferred.longValue() == initMeta.count())
+                                finishRecover(snpTrFut, grpPartId);
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                    }
+                };
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<File> fileHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+                String snpName = (String)initMeta.params().get(SNP_NAME_PARAM);
+
+                assert grpId != null;
+                assert partId != null;
+                assert snpName != null;
+                assert storeFactory != null;
+
+                RemoteSnapshotFuture transFut = rmtSnpReq.get();
+
+                if (transFut == null) {
+                    throw new IgniteException("Snapshot transmission with given name doesn't exists " +
+                        "[snpName=" + snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                return new Consumer<File>() {
+                    @Override public void accept(File file) {
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.equals(transFut) || fut0.isCancelled()) {
+                            throw new TransmissionCancelledException("Snapshot request is cancelled [snpName=" + snpName +
+                                ", grpId=" + grpId + ", partId=" + partId + ']');
+                        }
+
+                        busyLock.enterBusy();
+
+                        try {
+                            FilePageStore pageStore = (FilePageStore)storeFactory
+                                .apply(grpId, false)
+                                .createPageStore(getFlagByPartId(partId),
+                                    file::toPath,
+                                    new LongAdderMetric("NO_OP", null));
+
+                            transFut.stores.put(new GroupPartitionId(grpId, partId), pageStore);
+
+                            pageStore.init();
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                        finally {
+                            busyLock.leaveBusy();
+                        }
+                    }
+                };
+            }
+        });
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void stop0(boolean cancel) {
+        busyLock.block();
+
+        try {
+            // Try stop all snapshot processing if not yet.
+            for (SnapshotFutureTask sctx : locSnpTasks.values())
+                sctx.acceptException(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+            locSnpTasks.clear();
+
+            RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+            if (snpTrFut != null)
+                snpTrFut.cancel();
+
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null) {
+                    clusterSnpFut.onDone(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+                    clusterSnpFut = null;
+                }
+            }
+
+            if (snpRunner != null)
+                snpRunner.shutdownNow();
+
+            cctx.kernalContext().io().removeMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC);
+            cctx.kernalContext().io().removeTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC);
+
+            if (discoLsnr != null)
+                cctx.kernalContext().event().removeDiscoveryEventListener(discoLsnr);
+
+            cctx.exchange().unregisterExchangeAwareComponent(this);
+        }
+        finally {
+            busyLock.unblock();
+        }
+    }
+
+    /**
+     * Concurrently traverse the snapshot directory for given local node folder name and
+     * delete recursively all files from it if exist.
+     *
+     * @param snpDir Snapshot dire
+     * @param folderName Local node folder name (see U.maskForFileName with consistent id).
+     */
+    public static void deleteSnapshot(File snpDir, String folderName) {
+        if (!snpDir.exists())
+            return;
+
+        assert snpDir.isDirectory() : snpDir;
+
+        try {
+            List<Path> dirs = new ArrayList<>();
+
+            Files.walkFileTree(snpDir.toPath(), new SimpleFileVisitor<Path>() {
+                @Override public FileVisitResult preVisitDirectory(Path dir,
+                    BasicFileAttributes attrs) throws IOException {
+                    if (Files.isDirectory(dir) &&
+                        Files.exists(dir) &&
+                        folderName.equals(dir.getFileName().toString())) {
+                        // Directory found, add it for processing.
+                        dirs.add(dir);
+                    }
+
+                    return super.preVisitDirectory(dir, attrs);
+                }
+
+                @Override public FileVisitResult visitFileFailed(Path file, IOException exc) {
+                    // Skip files which can be concurrently removed from FileTree.
+                    return FileVisitResult.CONTINUE;
+                }
+            });
+
+            dirs.forEach(U::delete);
+
+            File db = new File(snpDir, DB_DEFAULT_FOLDER);
+
+            if (!db.exists() || db.list().length == 0)
+                U.delete(snpDir);
+        }
+        catch (IOException e) {
+            throw new IgniteException(e);
+        }
+    }
+
+    /**
+     * @param snpName Snapshot name.
+     * @return Local snapshot directory for snapshot with given name.
+     */
+    public File snapshotLocalDir(String snpName) {
+        assert locSnpDir != null;
+
+        return new File(locSnpDir, snpName);
+    }
+
+    /**
+     * @return Node snapshot working directory.
+     */
+    public File snapshotTmpDir() {
+        assert tmpWorkDir != null;
+
+        return tmpWorkDir;
+    }
+
+    /**
+     * @param req Request on snapshot creation.
+     * @return Future which will be completed when a snapshot has been started.
+     */
+    private IgniteInternalFuture<SnapshotOperationResponse> startLocalSnapshot(SnapshotOperationRequest req) {
+        if (cctx.kernalContext().clientNode() ||
+            !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()))
+            return new GridFinishedFuture<>();
+
+        // Executed inside discovery notifier thread, prior to firing discovery custom event,
+        // so it is safe to set new snapshot task inside this method without synchronization.
+        if (clusterSnpRq != null) {
+            return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot operation has been rejected. " +
+                "Another snapshot operation in progress [req=" + req + ", curr=" + clusterSnpRq + ']'));
+        }
+
+        // Collection of pairs group and appropriate cache partition to be snapshot.
+        Map<Integer, Set<Integer>> parts = new HashMap<>();
+
+        for (Integer grpId : req.grpIds)
+            parts.put(grpId, null);
+
+        SnapshotFutureTask task0 = registerSnapshotTask(req.snpName,
+            req.srcNodeId,
+            parts,
+            locSndrFactory.apply(req.snpName));
 
 Review comment:
   `locSndrFactory` (perhaps `localSnapshotSender()` too) is redundant, it's used only once and doesn't add readability, `new LocalSnapshotSender()` can be used directly here.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r409916792
 
 

 ##########
 File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotSelfTest.java
 ##########
 @@ -0,0 +1,734 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.file.OpenOption;
+import java.util.Collections;
+import java.util.List;
+import java.util.Random;
+import java.util.concurrent.Callable;
+import java.util.concurrent.CountDownLatch;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.function.Function;
+import java.util.function.Predicate;
+import org.apache.ignite.Ignite;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.Ignition;
+import org.apache.ignite.cache.CacheAtomicityMode;
+import org.apache.ignite.cache.affinity.rendezvous.RendezvousAffinityFunction;
+import org.apache.ignite.cache.query.ScanQuery;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.IgniteInterruptedCheckedException;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.TestRecordingCommunicationSpi;
+import org.apache.ignite.internal.events.DiscoveryCustomEvent;
+import org.apache.ignite.internal.managers.discovery.DiscoveryCustomMessage;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionDemandMessage;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionSupplyMessage;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.PartitionsExchangeAware;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.metric.MetricRegistry;
+import org.apache.ignite.internal.processors.metric.impl.ObjectGauge;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.distributed.FullMessage;
+import org.apache.ignite.internal.util.distributed.SingleNodeMessage;
+import org.apache.ignite.internal.util.typedef.G;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.spi.metric.LongMetric;
+import org.apache.ignite.testframework.GridTestUtils;
+import org.apache.ignite.transactions.Transaction;
+import org.junit.Before;
+import org.junit.Test;
+
+import static org.apache.ignite.cluster.ClusterState.ACTIVE;
+import static org.apache.ignite.internal.events.DiscoveryCustomEvent.EVT_DISCOVERY_CUSTOM_EVT;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.SNAPSHOT_METRICS;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.SNP_IN_PROGRESS_ERR_MSG;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.SNP_NODE_STOPPING_ERR_MSG;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.isSnapshotOperation;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.resolveSnapshotWorkDirectory;
+import static org.apache.ignite.testframework.GridTestUtils.assertThrowsAnyCause;
+import static org.apache.ignite.testframework.GridTestUtils.assertThrowsWithCause;
+
+/**
+ * Cluster-wide snapshot test.
+ */
+public class IgniteClusterSnapshotSelfTest extends AbstractSnapshotSelfTest {
 
 Review comment:
   Test added.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r409738545
 
 

 ##########
 File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotSelfTest.java
 ##########
 @@ -0,0 +1,734 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.file.OpenOption;
+import java.util.Collections;
+import java.util.List;
+import java.util.Random;
+import java.util.concurrent.Callable;
+import java.util.concurrent.CountDownLatch;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.function.Function;
+import java.util.function.Predicate;
+import org.apache.ignite.Ignite;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.Ignition;
+import org.apache.ignite.cache.CacheAtomicityMode;
+import org.apache.ignite.cache.affinity.rendezvous.RendezvousAffinityFunction;
+import org.apache.ignite.cache.query.ScanQuery;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.IgniteInterruptedCheckedException;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.TestRecordingCommunicationSpi;
+import org.apache.ignite.internal.events.DiscoveryCustomEvent;
+import org.apache.ignite.internal.managers.discovery.DiscoveryCustomMessage;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionDemandMessage;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionSupplyMessage;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.PartitionsExchangeAware;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.metric.MetricRegistry;
+import org.apache.ignite.internal.processors.metric.impl.ObjectGauge;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.distributed.FullMessage;
+import org.apache.ignite.internal.util.distributed.SingleNodeMessage;
+import org.apache.ignite.internal.util.typedef.G;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.spi.metric.LongMetric;
+import org.apache.ignite.testframework.GridTestUtils;
+import org.apache.ignite.transactions.Transaction;
+import org.junit.Before;
+import org.junit.Test;
+
+import static org.apache.ignite.cluster.ClusterState.ACTIVE;
+import static org.apache.ignite.internal.events.DiscoveryCustomEvent.EVT_DISCOVERY_CUSTOM_EVT;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.SNAPSHOT_METRICS;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.SNP_IN_PROGRESS_ERR_MSG;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.SNP_NODE_STOPPING_ERR_MSG;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.isSnapshotOperation;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.resolveSnapshotWorkDirectory;
+import static org.apache.ignite.testframework.GridTestUtils.assertThrowsAnyCause;
+import static org.apache.ignite.testframework.GridTestUtils.assertThrowsWithCause;
+
+/**
+ * Cluster-wide snapshot test.
+ */
+public class IgniteClusterSnapshotSelfTest extends AbstractSnapshotSelfTest {
+    /** Random instance. */
+    private static final Random R = new Random();
+
+    /** Time to wait while rebalance may happen. */
+    private static final long REBALANCE_AWAIT_TIME = GridTestUtils.SF.applyLB(10_000, 3_000);
+
+    /** Cache configuration for test. */
+    private static CacheConfiguration<Integer, Integer> txCcfg = new CacheConfiguration<Integer, Integer>("txCacheName")
+        .setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL)
+        .setBackups(2)
+        .setAffinity(new RendezvousAffinityFunction(false)
+            .setPartitions(CACHE_PARTS_COUNT));
+
+    /** {@code true} if node should be started in separate jvm. */
+    protected volatile boolean jvm;
+
+    /** @throws Exception If fails. */
+    @Before
+    @Override public void beforeTestSnapshot() throws Exception {
+        super.beforeTestSnapshot();
+
+        jvm = false;
+    }
+
+    /**
+     * Take snapshot from the whole cluster and check snapshot consistency.
+     * Note: Client nodes and server nodes not in baseline topology must not be affected.
+     *
+     * @throws Exception If fails.
+     */
+    @Test
+    public void testConsistentClusterSnapshotUnderLoad() throws Exception {
+        int grids = 3;
+        String snpName = "backup23012020";
+        AtomicInteger atKey = new AtomicInteger(CACHE_KEYS_RANGE);
+        AtomicInteger txKey = new AtomicInteger(CACHE_KEYS_RANGE);
+
+        IgniteEx ignite = startGrids(grids);
+        startClientGrid();
+
+        ignite.cluster().baselineAutoAdjustEnabled(false);
+        ignite.cluster().state(ACTIVE);
+
+        // Start node not in baseline.
+        IgniteEx notBltIgnite = startGrid(grids);
+        File locSnpDir = snp(notBltIgnite).snapshotLocalDir(SNAPSHOT_NAME);
+        String notBltDirName = folderName(notBltIgnite);
+
+        IgniteCache<Integer, Integer> cache = ignite.createCache(txCcfg);
+
+        for (int idx = 0; idx < CACHE_KEYS_RANGE; idx++) {
+            cache.put(txKey.incrementAndGet(), -1);
+            ignite.cache(DEFAULT_CACHE_NAME).put(atKey.incrementAndGet(), -1);
+        }
+
+        forceCheckpoint();
+
+        CountDownLatch loadLatch = new CountDownLatch(1);
+
+        ignite.context().cache().context().exchange().registerExchangeAwareComponent(new PartitionsExchangeAware() {
+            /** {@inheritDoc} */
+            @Override public void onInitBeforeTopologyLock(GridDhtPartitionsExchangeFuture fut) {
+                if (fut.firstEvent().type() != EVT_DISCOVERY_CUSTOM_EVT)
+                    return;
+
+                // First discovery custom event will be a snapshot operation.
+                assertTrue(isSnapshotOperation(fut.firstEvent()));
+                assertTrue("Snapshot must use pme-free exchange", fut.context().exchangeFreeSwitch());
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onInitAfterTopologyLock(GridDhtPartitionsExchangeFuture fut) {
+                if (fut.firstEvent().type() != EVT_DISCOVERY_CUSTOM_EVT)
+                    return;
+
+                DiscoveryCustomMessage msg = ((DiscoveryCustomEvent)fut.firstEvent()).customMessage();
+
+                assertNotNull(msg);
+
+                if (msg instanceof SnapshotDiscoveryMessage)
+                    loadLatch.countDown();
+            }
+        });
+
+        // Start cache load
+        IgniteInternalFuture<Long> loadFut = GridTestUtils.runMultiThreadedAsync(() -> {
+            try {
+                U.await(loadLatch);
+
+                while (!Thread.currentThread().isInterrupted()) {
+                    int txIdx = R.nextInt(grids);
+
+                    // zero out the sign bit
+                    grid(txIdx).cache(txCcfg.getName()).put(txKey.incrementAndGet(), R.nextInt() & Integer.MAX_VALUE);
+
+                    int atomicIdx = R.nextInt(grids);
+
+                    grid(atomicIdx).cache(DEFAULT_CACHE_NAME).put(atKey.incrementAndGet(), R.nextInt() & Integer.MAX_VALUE);
 
 Review comment:
   Fixed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r409394177
 
 

 ##########
 File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotSelfTest.java
 ##########
 @@ -0,0 +1,734 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.file.OpenOption;
+import java.util.Collections;
+import java.util.List;
+import java.util.Random;
+import java.util.concurrent.Callable;
+import java.util.concurrent.CountDownLatch;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.function.Function;
+import java.util.function.Predicate;
+import org.apache.ignite.Ignite;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.Ignition;
+import org.apache.ignite.cache.CacheAtomicityMode;
+import org.apache.ignite.cache.affinity.rendezvous.RendezvousAffinityFunction;
+import org.apache.ignite.cache.query.ScanQuery;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.IgniteInterruptedCheckedException;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.TestRecordingCommunicationSpi;
+import org.apache.ignite.internal.events.DiscoveryCustomEvent;
+import org.apache.ignite.internal.managers.discovery.DiscoveryCustomMessage;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionDemandMessage;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionSupplyMessage;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.PartitionsExchangeAware;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.metric.MetricRegistry;
+import org.apache.ignite.internal.processors.metric.impl.ObjectGauge;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.distributed.FullMessage;
+import org.apache.ignite.internal.util.distributed.SingleNodeMessage;
+import org.apache.ignite.internal.util.typedef.G;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.spi.metric.LongMetric;
+import org.apache.ignite.testframework.GridTestUtils;
+import org.apache.ignite.transactions.Transaction;
+import org.junit.Before;
+import org.junit.Test;
+
+import static org.apache.ignite.cluster.ClusterState.ACTIVE;
+import static org.apache.ignite.internal.events.DiscoveryCustomEvent.EVT_DISCOVERY_CUSTOM_EVT;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.SNAPSHOT_METRICS;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.SNP_IN_PROGRESS_ERR_MSG;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.SNP_NODE_STOPPING_ERR_MSG;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.isSnapshotOperation;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.resolveSnapshotWorkDirectory;
+import static org.apache.ignite.testframework.GridTestUtils.assertThrowsAnyCause;
+import static org.apache.ignite.testframework.GridTestUtils.assertThrowsWithCause;
+
+/**
+ * Cluster-wide snapshot test.
+ */
+public class IgniteClusterSnapshotSelfTest extends AbstractSnapshotSelfTest {
+    /** Random instance. */
+    private static final Random R = new Random();
+
+    /** Time to wait while rebalance may happen. */
+    private static final long REBALANCE_AWAIT_TIME = GridTestUtils.SF.applyLB(10_000, 3_000);
+
+    /** Cache configuration for test. */
+    private static CacheConfiguration<Integer, Integer> txCcfg = new CacheConfiguration<Integer, Integer>("txCacheName")
+        .setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL)
+        .setBackups(2)
+        .setAffinity(new RendezvousAffinityFunction(false)
+            .setPartitions(CACHE_PARTS_COUNT));
+
+    /** {@code true} if node should be started in separate jvm. */
+    protected volatile boolean jvm;
+
+    /** @throws Exception If fails. */
+    @Before
+    @Override public void beforeTestSnapshot() throws Exception {
+        super.beforeTestSnapshot();
+
+        jvm = false;
+    }
+
+    /**
+     * Take snapshot from the whole cluster and check snapshot consistency.
+     * Note: Client nodes and server nodes not in baseline topology must not be affected.
+     *
+     * @throws Exception If fails.
+     */
+    @Test
+    public void testConsistentClusterSnapshotUnderLoad() throws Exception {
+        int grids = 3;
+        String snpName = "backup23012020";
+        AtomicInteger atKey = new AtomicInteger(CACHE_KEYS_RANGE);
+        AtomicInteger txKey = new AtomicInteger(CACHE_KEYS_RANGE);
+
+        IgniteEx ignite = startGrids(grids);
+        startClientGrid();
+
+        ignite.cluster().baselineAutoAdjustEnabled(false);
+        ignite.cluster().state(ACTIVE);
+
+        // Start node not in baseline.
+        IgniteEx notBltIgnite = startGrid(grids);
+        File locSnpDir = snp(notBltIgnite).snapshotLocalDir(SNAPSHOT_NAME);
+        String notBltDirName = folderName(notBltIgnite);
+
+        IgniteCache<Integer, Integer> cache = ignite.createCache(txCcfg);
+
+        for (int idx = 0; idx < CACHE_KEYS_RANGE; idx++) {
+            cache.put(txKey.incrementAndGet(), -1);
+            ignite.cache(DEFAULT_CACHE_NAME).put(atKey.incrementAndGet(), -1);
+        }
+
+        forceCheckpoint();
+
+        CountDownLatch loadLatch = new CountDownLatch(1);
+
+        ignite.context().cache().context().exchange().registerExchangeAwareComponent(new PartitionsExchangeAware() {
+            /** {@inheritDoc} */
+            @Override public void onInitBeforeTopologyLock(GridDhtPartitionsExchangeFuture fut) {
+                if (fut.firstEvent().type() != EVT_DISCOVERY_CUSTOM_EVT)
+                    return;
+
+                // First discovery custom event will be a snapshot operation.
+                assertTrue(isSnapshotOperation(fut.firstEvent()));
+                assertTrue("Snapshot must use pme-free exchange", fut.context().exchangeFreeSwitch());
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onInitAfterTopologyLock(GridDhtPartitionsExchangeFuture fut) {
+                if (fut.firstEvent().type() != EVT_DISCOVERY_CUSTOM_EVT)
+                    return;
+
+                DiscoveryCustomMessage msg = ((DiscoveryCustomEvent)fut.firstEvent()).customMessage();
+
+                assertNotNull(msg);
+
+                if (msg instanceof SnapshotDiscoveryMessage)
+                    loadLatch.countDown();
+            }
+        });
+
+        // Start cache load
+        IgniteInternalFuture<Long> loadFut = GridTestUtils.runMultiThreadedAsync(() -> {
+            try {
+                U.await(loadLatch);
+
+                while (!Thread.currentThread().isInterrupted()) {
+                    int txIdx = R.nextInt(grids);
+
+                    // zero out the sign bit
+                    grid(txIdx).cache(txCcfg.getName()).put(txKey.incrementAndGet(), R.nextInt() & Integer.MAX_VALUE);
+
+                    int atomicIdx = R.nextInt(grids);
+
+                    grid(atomicIdx).cache(DEFAULT_CACHE_NAME).put(atKey.incrementAndGet(), R.nextInt() & Integer.MAX_VALUE);
 
 Review comment:
   DEFAULT_CACHE_NAME is also transactional, see dfltCacheCfg

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r410206839
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/distributed/dht/preloader/GridDhtPartitionsExchangeFuture.java
 ##########
 @@ -937,6 +939,19 @@ else if (msg instanceof WalStateAbstractMessage)
             for (PartitionsExchangeAware comp : cctx.exchange().exchangeAwareComponents())
                 comp.onInitAfterTopologyLock(this);
 
+            // For pme-free exchanges onInitAfterTopologyLock must be
+            // invoked prior to onDoneBeforeTopologyUnlock
 
 Review comment:
   Fixed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r408768351
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/GridCacheProcessor.java
 ##########
 @@ -1994,7 +1997,7 @@ private GridCacheContext prepareCacheContext(
      *
      * @param cctx Cache context.
      */
-    private void stopCacheSafely(GridCacheContext<?, ?> cctx) {
+    public void stopCacheSafely(GridCacheContext<?, ?> cctx) {
 
 Review comment:
   Used only inside this class

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r409102997
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotMXBeanImpl.java
 ##########
 @@ -0,0 +1,47 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.util.List;
+import org.apache.ignite.internal.GridKernalContextImpl;
+import org.apache.ignite.mxbean.SnapshotMXBean;
+
+/**
+ * Snapshot MBean features.
+ */
+public class SnapshotMXBeanImpl implements SnapshotMXBean {
+    /** Instance of snapshot cache shared manager. */
+    private final IgniteSnapshotManager mgr;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotMXBeanImpl(GridKernalContextImpl ctx) {
+        mgr = ctx.cache().context().snapshotMgr();
+    }
+
+    /** {@inheritDoc} */
+    @Override public void createSnapshot(String snpName) {
+        mgr.createSnapshot(snpName).get();
 
 Review comment:
   Cluster snapshot metrics extended, `get()` removed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r408970649
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManager.java
 ##########
 @@ -0,0 +1,1986 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.channels.FileChannel;
+import java.nio.file.FileVisitResult;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.SimpleFileVisitor;
+import java.nio.file.attribute.BasicFileAttributes;
+import java.util.ArrayDeque;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Queue;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.Executor;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.concurrent.atomic.LongAdder;
+import java.util.function.BiConsumer;
+import java.util.function.BiFunction;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.events.DiscoveryEvent;
+import org.apache.ignite.failure.FailureContext;
+import org.apache.ignite.failure.FailureType;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.GridTopic;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.events.DiscoveryCustomEvent;
+import org.apache.ignite.internal.managers.communication.GridIoManager;
+import org.apache.ignite.internal.managers.communication.GridMessageListener;
+import org.apache.ignite.internal.managers.communication.TransmissionCancelledException;
+import org.apache.ignite.internal.managers.communication.TransmissionHandler;
+import org.apache.ignite.internal.managers.communication.TransmissionMeta;
+import org.apache.ignite.internal.managers.communication.TransmissionPolicy;
+import org.apache.ignite.internal.managers.discovery.DiscoCache;
+import org.apache.ignite.internal.managers.discovery.DiscoveryCustomMessage;
+import org.apache.ignite.internal.managers.discovery.GridDiscoveryManager;
+import org.apache.ignite.internal.managers.eventstorage.DiscoveryEventListener;
+import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.CacheType;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.PartitionsExchangeAware;
+import org.apache.ignite.internal.processors.cache.persistence.StorageException;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.filename.PdsFolderSettings;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.MetastorageLifecycleListener;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadOnlyMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadWriteMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.tree.io.PageIO;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.processors.marshaller.MappedName;
+import org.apache.ignite.internal.processors.metric.MetricRegistry;
+import org.apache.ignite.internal.processors.metric.impl.LongAdderMetric;
+import org.apache.ignite.internal.util.GridBusyLock;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.distributed.InitMessage;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.tostring.GridToStringInclude;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.A;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.S;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.lang.IgniteUuid;
+import org.apache.ignite.thread.IgniteThreadPoolExecutor;
+import org.apache.ignite.thread.OomExceptionHandler;
+import org.jetbrains.annotations.Nullable;
+
+import static java.nio.file.StandardOpenOption.READ;
+import static org.apache.ignite.cluster.ClusterState.active;
+import static org.apache.ignite.configuration.IgniteConfiguration.DFLT_SNAPSHOT_DIRECTORY;
+import static org.apache.ignite.events.EventType.EVT_NODE_FAILED;
+import static org.apache.ignite.events.EventType.EVT_NODE_LEFT;
+import static org.apache.ignite.internal.IgniteFeatures.PERSISTENCE_CACHE_SNAPSHOT;
+import static org.apache.ignite.internal.IgniteFeatures.nodeSupports;
+import static org.apache.ignite.internal.MarshallerContextImpl.mappingFileStoreWorkDir;
+import static org.apache.ignite.internal.MarshallerContextImpl.saveMappings;
+import static org.apache.ignite.internal.events.DiscoveryCustomEvent.EVT_DISCOVERY_CUSTOM_EVT;
+import static org.apache.ignite.internal.managers.communication.GridIoPolicy.SYSTEM_POOL;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.INDEX_PARTITION;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.MAX_PARTITION_ID;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.resolveBinaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.INDEX_FILE_NAME;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.PART_FILE_TEMPLATE;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFile;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFileName;
+import static org.apache.ignite.internal.processors.cache.persistence.filename.PdsConsistentIdProcessor.DB_DEFAULT_FOLDER;
+import static org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId.getFlagByPartId;
+import static org.apache.ignite.internal.util.IgniteUtils.isLocalNodeCoordinator;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.END_SNAPSHOT;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.START_SNAPSHOT;
+
+/**
+ * Internal implementation of snapshot operations over persistence caches.
+ * <p>
+ * There are two major actions available:
+ * <ul>
+ *     <li>Create snapshot of the whole cluster cache groups by triggering PME to achieve consistency.</li>
+ *     <li>Create local snapshot of requested cache groups and send it to the node which request this operation.
+ *     Cache groups will be transmitted using internal API for transferring files. See {@link TransmissionHandler}.</li>
+ * </ul>
+ */
+public class IgniteSnapshotManager extends GridCacheSharedManagerAdapter
+    implements IgniteSnapshot, PartitionsExchangeAware, MetastorageLifecycleListener {
+    /** File with delta pages suffix. */
+    public static final String DELTA_SUFFIX = ".delta";
+
+    /** File name template consists of delta pages. */
+    public static final String PART_DELTA_TEMPLATE = PART_FILE_TEMPLATE + DELTA_SUFFIX;
+
+    /** File name template for index delta pages. */
+    public static final String INDEX_DELTA_NAME = INDEX_FILE_NAME + DELTA_SUFFIX;
+
+    /** Text Reason for checkpoint to start snapshot operation. */
+    public static final String CP_SNAPSHOT_REASON = "Checkpoint started to enforce snapshot operation: %s";
+
+    /** Name prefix for each remote snapshot operation. */
+    public static final String RMT_SNAPSHOT_PREFIX = "snapshot_";
+
+    /** Default snapshot directory for loading remote snapshots. */
+    public static final String DFLT_SNAPSHOT_TMP_DIR = "snp";
+
+    /** Timeout in millisecond for snapshot operations. */
+    public static final long DFLT_SNAPSHOT_TIMEOUT = 15_000L;
+
+    /** Snapshot in progress error message. */
+    public static final String SNP_IN_PROGRESS_ERR_MSG = "Operation rejected due to the snapshot operation in progress.";
+
+    /** Error message to finalize snapshot tasks. */
+    public static final String SNP_NODE_STOPPING_ERR_MSG = "Snapshot has been cancelled due to the local node " +
+        "is stopping";
+
+    /** Metastorage key to save currently running snapshot. */
+    public static final String SNP_RUNNING_KEY = "snapshot-running";
+
+    /** Snapshot metrics prefix. */
+    public static final String SNAPSHOT_METRICS = "snapshot";
+
+    /** Prefix for snapshot threads. */
+    private static final String SNAPSHOT_RUNNER_THREAD_PREFIX = "snapshot-runner";
+
+    /** Total number of thread to perform local snapshot. */
+    private static final int SNAPSHOT_THREAD_POOL_SIZE = 4;
+
+    /** Default snapshot topic to receive snapshots from remote node. */
+    private static final Object DFLT_INITIAL_SNAPSHOT_TOPIC = GridTopic.TOPIC_SNAPSHOT.topic("rmt_snp");
+
+    /** File transmission parameter of cache group id. */
+    private static final String SNP_GRP_ID_PARAM = "grpId";
+
+    /** File transmission parameter of cache partition id. */
+    private static final String SNP_PART_ID_PARAM = "partId";
+
+    /** File transmission parameter of node-sender directory path with its consistentId (e.g. db/IgniteNode0). */
+    private static final String SNP_DB_NODE_PATH_PARAM = "dbNodePath";
+
+    /** File transmission parameter of a cache directory with is currently sends its partitions. */
+    private static final String SNP_CACHE_DIR_NAME_PARAM = "cacheDirName";
+
+    /** Snapshot parameter name for a file transmission. */
+    private static final String SNP_NAME_PARAM = "snpName";
+
+    /** Total snapshot files count which receiver should expect to receive. */
+    private static final String SNP_PARTITIONS_CNT = "partsCnt";
+
+    /**
+     * Local buffer to perform copy-on-write operations with pages for {@code SnapshotFutureTask.PageStoreSerialWriter}s.
+     * It is important to have only only buffer per thread (instead of creating each buffer per
+     * each {@code SnapshotFutureTask.PageStoreSerialWriter}) this is redundant and can lead to OOM errors. Direct buffer
+     * deallocate only when ByteBuffer is garbage collected, but it can get out of off-heap memory before it.
+     */
+    private final ThreadLocal<ByteBuffer> locBuff;
+
+    /** Map of registered cache snapshot processes and their corresponding contexts. */
+    private final ConcurrentMap<String, SnapshotFutureTask> locSnpTasks = new ConcurrentHashMap<>();
+
+    /** Lock to protect the resources is used. */
+    private final GridBusyLock busyLock = new GridBusyLock();
+
+    /** Requested snapshot from remote node. */
+    private final AtomicReference<RemoteSnapshotFuture> rmtSnpReq = new AtomicReference<>();
+
+    /** Mutex used to order cluster snapshot operation progress. */
+    private final Object snpOpMux = new Object();
+
+    /** Take snapshot operation procedure. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> startSnpProc;
+
+    /** Check previously performed snapshot operation and delete uncompleted files if need. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> endSnpProc;
+
+    /** Resolved persistent data storage settings. */
+    private volatile PdsFolderSettings pdsSettings;
+
+    /** Fully initialized metastorage. */
+    private volatile ReadWriteMetastorage metaStorage;
+
+    /** Local snapshot sender factory. */
+    private Function<String, SnapshotSender> locSndrFactory = localSnapshotSenderFactory();
+
+    /** Main snapshot directory to save created snapshots. */
+    private volatile File locSnpDir;
+
+    /**
+     * Working directory for loaded snapshots from the remote nodes and storing
+     * temporary partition delta-files of locally started snapshot process.
+     */
+    private File tmpWorkDir;
+
+    /** Factory to working with delta as file storage. */
+    private volatile FileIOFactory ioFactory = new RandomAccessFileIOFactory();
+
+    /** Factory to create page store for restore. */
+    private volatile BiFunction<Integer, Boolean, FilePageStoreFactory> storeFactory;
+
+    /** Snapshot thread pool to perform local partition snapshots. */
+    private ExecutorService snpRunner;
+
+    /** System discovery message listener. */
+    private DiscoveryEventListener discoLsnr;
+
+    /** Cluster snapshot operation requested by user. */
+    private ClusterSnapshotFuture clusterSnpFut;
+
+    /** Current snapshot operation on local node. */
+    private volatile SnapshotOperationRequest clusterSnpRq;
+
+    /** {@code true} if recovery process occurred for snapshot. */
+    private volatile boolean recovered;
+
+    /** Last seen cluster snapshot operation. */
+    private volatile ClusterSnapshotFuture lastSeenSnpFut = new ClusterSnapshotFuture();
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public IgniteSnapshotManager(GridKernalContext ctx) {
+        locBuff = ThreadLocal.withInitial(() ->
+            ByteBuffer.allocateDirect(ctx.config().getDataStorageConfiguration().getPageSize())
+                .order(ByteOrder.nativeOrder()));
+
+        startSnpProc = new DistributedProcess<>(ctx, START_SNAPSHOT, this::initLocalSnapshotStartStage,
+            this::processLocalSnapshotStartStageResult);
+
+        endSnpProc = new DistributedProcess<>(ctx, END_SNAPSHOT, this::initLocalSnapshotEndStage,
+            this::processLocalSnapshotEndStageResult);
+    }
+
+    /**
+     * @param snapshotCacheDir Snapshot directory to store files.
+     * @param partId Cache partition identifier.
+     * @return A file representation.
+     */
+    public static File partDeltaFile(File snapshotCacheDir, int partId) {
+        return new File(snapshotCacheDir, partDeltaFileName(partId));
+    }
+
+    /**
+     * @param partId Partition id.
+     * @return File name of delta partition pages.
+     */
+    public static String partDeltaFileName(int partId) {
+        assert partId <= MAX_PARTITION_ID || partId == INDEX_PARTITION;
+
+        return partId == INDEX_PARTITION ? INDEX_DELTA_NAME : String.format(PART_DELTA_TEMPLATE, partId);
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void start0() throws IgniteCheckedException {
+        super.start0();
+
+        GridKernalContext ctx = cctx.kernalContext();
+
+        if (ctx.clientNode())
+            return;
+
+        if (!CU.isPersistenceEnabled(ctx.config()))
+            return;
+
+        snpRunner = new IgniteThreadPoolExecutor(SNAPSHOT_RUNNER_THREAD_PREFIX,
+            cctx.igniteInstanceName(),
+            SNAPSHOT_THREAD_POOL_SIZE,
+            SNAPSHOT_THREAD_POOL_SIZE,
+            IgniteConfiguration.DFLT_THREAD_KEEP_ALIVE_TIME,
+            new LinkedBlockingQueue<>(),
+            SYSTEM_POOL,
+            new OomExceptionHandler(ctx));
+
+        assert cctx.pageStore() instanceof FilePageStoreManager;
+
+        FilePageStoreManager storeMgr = (FilePageStoreManager)cctx.pageStore();
+
+        pdsSettings = cctx.kernalContext().pdsFolderResolver().resolveFolders();
+
+        locSnpDir = resolveSnapshotWorkDirectory(ctx.config());
+        tmpWorkDir = Paths.get(storeMgr.workDir().getAbsolutePath(), DFLT_SNAPSHOT_TMP_DIR).toFile();
+
+        U.ensureDirectory(locSnpDir, "snapshot work directory", log);
+        U.ensureDirectory(tmpWorkDir, "temp directory for snapshot creation", log);
+
+        MetricRegistry mreg = cctx.kernalContext().metric().registry(SNAPSHOT_METRICS);
+
+        mreg.register("LastSnapshotStartTime", () -> lastSeenSnpFut.startTime,
+            "The system time approximated by 10 ms when the last cluster snapshot operation has been started.");
+        mreg.register("LastSnapshotEndTime", () -> lastSeenSnpFut.endTime,
+            "The system time approximated by 10 ms when the last cluster snapshot operation has been finished.");
+        mreg.register("LastSnapshotName", () -> lastSeenSnpFut.name, String.class,
+            "The name of last started cluster snapshot operation.");
+        mreg.register("LastSnapshotErrorMessage",
+            () -> lastSeenSnpFut.error() == null ? null : lastSeenSnpFut.error().getMessage(),
+            String.class,
+            "The error message of last started cluster snapshot operation which fail. This value will be 'null' " +
+                "if last snapshot operation completed successfully.");
+        mreg.register("localSnapshotList", this::getSnapshots, List.class,
+            "The list of all known snapshots currently saved on the local node with respect to " +
+                "configured via IgniteConfiguration a snapshot path.");
+
+        storeFactory = storeMgr::getPageStoreFactory;
+
+        cctx.exchange().registerExchangeAwareComponent(this);
+        ctx.internalSubscriptionProcessor().registerMetastorageListener(this);
+
+        // Receive remote snapshots requests.
+        cctx.gridIO().addMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC, new GridMessageListener() {
+            @Override public void onMessage(UUID nodeId, Object msg, byte plc) {
+                if (!busyLock.enterBusy())
+                    return;
+
+                try {
+                    if (msg instanceof SnapshotRequestMessage) {
+                        SnapshotRequestMessage reqMsg0 = (SnapshotRequestMessage)msg;
+                        String snpName = reqMsg0.snapshotName();
+
+                        synchronized (this) {
+                            SnapshotFutureTask task = lastScheduledRemoteSnapshotTask(nodeId);
+
+                            if (task != null) {
+                                // Task will also be removed from local map due to the listener on future done.
+                                task.cancel();
+
+                                log.info("Snapshot request has been cancelled due to another request received " +
+                                    "[prevSnpResp=" + task + ", msg0=" + reqMsg0 + ']');
+                            }
+                        }
+
+                        SnapshotFutureTask task = registerSnapshotTask(snpName,
+                            nodeId,
+                            reqMsg0.parts(),
+                            remoteSnapshotSender(snpName, nodeId));
+
+                        task.listen(f -> {
+                            if (f.error() == null)
+                                return;
+
+                            U.error(log, "Failed to process request of creating a snapshot " +
+                                "[from=" + nodeId + ", msg=" + reqMsg0 + ']', f.error());
+
+                            try {
+                                cctx.gridIO().sendToCustomTopic(nodeId,
+                                    DFLT_INITIAL_SNAPSHOT_TOPIC,
+                                    new SnapshotResponseMessage(reqMsg0.snapshotName(), f.error().getMessage()),
+                                    SYSTEM_POOL);
+                            }
+                            catch (IgniteCheckedException ex0) {
+                                U.error(log, "Fail to send the response message with processing snapshot request " +
+                                    "error [request=" + reqMsg0 + ", nodeId=" + nodeId + ']', ex0);
+                            }
+                        });
+
+                        task.start();
+                    }
+                    else if (msg instanceof SnapshotResponseMessage) {
+                        SnapshotResponseMessage respMsg0 = (SnapshotResponseMessage)msg;
+
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.snpName.equals(respMsg0.snapshotName())) {
+                            if (log.isInfoEnabled()) {
+                                log.info("A stale snapshot response message has been received. Will be ignored " +
+                                    "[fromNodeId=" + nodeId + ", response=" + respMsg0 + ']');
+                            }
+
+                            return;
+                        }
+
+                        if (respMsg0.errorMessage() != null) {
+                            fut0.onDone(new IgniteCheckedException("Request cancelled. The snapshot operation stopped " +
+                                "on the remote node with an error: " + respMsg0.errorMessage()));
+                        }
+                    }
+                }
+                catch (Throwable e) {
+                    U.error(log, "Processing snapshot request from remote node fails with an error", e);
+
+                    cctx.kernalContext().failure().process(new FailureContext(FailureType.CRITICAL_ERROR, e));
+                }
+                finally {
+                    busyLock.leaveBusy();
+                }
+            }
+        });
+
+        cctx.gridEvents().addDiscoveryEventListener(discoLsnr = (evt, discoCache) -> {
+            if (!busyLock.enterBusy())
+                return;
+
+            try {
+                UUID leftNodeId = evt.eventNode().id();
+
+                if (evt.type() == EVT_DISCOVERY_CUSTOM_EVT) {
+                    DiscoveryCustomEvent evt0 = (DiscoveryCustomEvent)evt;
+
+                    if (evt0.customMessage() instanceof InitMessage) {
+                        InitMessage<?> msg = (InitMessage<?>)evt0.customMessage();
+
+                        // This happens when #takeSnapshot() method already invoked and distributed process
+                        // starts its action.
+                        if (msg.type() == START_SNAPSHOT.ordinal()) {
+                            assert clusterSnpRq != null ||
+                                !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()) : evt;
+
+                            DiscoveryCustomEvent customEvt = new DiscoveryCustomEvent();
+
+                            customEvt.node(evt0.node());
+                            customEvt.eventNode(evt0.eventNode());
+                            customEvt.affinityTopologyVersion(evt0.affinityTopologyVersion());
+                            customEvt.customMessage(new SnapshotStartDiscoveryMessage(discoCache, msg.processId()));
+
+                            // Handle new event inside discovery thread, so no guarantees will be violated.
+                            cctx.exchange().onDiscoveryEvent(customEvt, discoCache);
+                        }
+                    }
+                }
+                else if (evt.type() == EVT_NODE_LEFT || evt.type() == EVT_NODE_FAILED) {
+                    SnapshotOperationRequest snpRq = clusterSnpRq;
+
+                    for (SnapshotFutureTask sctx : locSnpTasks.values()) {
+                        if (sctx.sourceNodeId().equals(leftNodeId) ||
+                            (snpRq != null &&
+                                snpRq.snpName.equals(sctx.snapshotName()) &&
+                                snpRq.bltNodes.contains(leftNodeId))) {
+                            sctx.acceptException(new ClusterTopologyCheckedException("The node which requested snapshot " +
+                                "creation has left the grid"));
+                        }
+                    }
+
+                    RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                    if (snpTrFut != null && snpTrFut.rmtNodeId.equals(leftNodeId)) {
+                        snpTrFut.onDone(new ClusterTopologyCheckedException("The node from which a snapshot has been " +
+                            "requested left the grid"));
+                    }
+                }
+            }
+            finally {
+                busyLock.leaveBusy();
+            }
+        }, EVT_NODE_LEFT, EVT_NODE_FAILED, EVT_DISCOVERY_CUSTOM_EVT);
+
+        // Remote snapshot handler.
+        cctx.kernalContext().io().addTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC, new TransmissionHandler() {
+            @Override public void onEnd(UUID nodeId) {
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                assert snpTrFut.stores.isEmpty() : snpTrFut.stores.entrySet();
+                assert snpTrFut.partsLeft == 0 : snpTrFut;
+
+                snpTrFut.onDone();
+
+                log.info("Requested snapshot from remote node has been fully received " +
+                    "[snpName=" + snpTrFut.snpName + ", snpTrans=" + snpTrFut + ']');
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onException(UUID nodeId, Throwable err) {
+                RemoteSnapshotFuture fut = rmtSnpReq.get();
+
+                if (fut == null)
+                    return;
+
+                if (fut.rmtNodeId.equals(nodeId))
+                    fut.onDone(err);
+            }
+
+            /** {@inheritDoc} */
+            @Override public String filePath(UUID nodeId, TransmissionMeta fileMeta) {
+                Integer partId = (Integer)fileMeta.params().get(SNP_PART_ID_PARAM);
+                String rmtDbNodePath = (String)fileMeta.params().get(SNP_DB_NODE_PATH_PARAM);
+                String cacheDirName = (String)fileMeta.params().get(SNP_CACHE_DIR_NAME_PARAM);
+
+                RemoteSnapshotFuture transFut = resolve(nodeId, fileMeta);
+
+                try {
+                    File cacheDir = U.resolveWorkDirectory(tmpWorkDir.getAbsolutePath(),
+                        Paths.get(transFut.snpName, rmtDbNodePath, cacheDirName).toString(),
+                        false);
+
+                    return new File(cacheDir, getPartitionFileName(partId)).getAbsolutePath();
+                }
+                catch (IgniteCheckedException e) {
+                    throw new IgniteException(e);
+                }
+            }
+
+            /**
+             * @param nodeId Remote node id.
+             * @param meta Transmission meta.
+             * @return Resolved transmission future.
+             */
+            private RemoteSnapshotFuture resolve(UUID nodeId, TransmissionMeta meta) {
+                String snpName = (String)meta.params().get(SNP_NAME_PARAM);
+                Integer partsCnt = (Integer)meta.params().get(SNP_PARTITIONS_CNT);
+
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                if (snpTrFut == null || !snpTrFut.snpName.equals(snpName)) {
+                    throw new TransmissionCancelledException("Stale snapshot transmission will be ignored " +
+                        "[snpName=" + snpName + ", meta=" + meta + ", snpTrFut=" + snpTrFut + ']');
+                }
+
+                assert snpTrFut.snpName.equals(snpName) && snpTrFut.rmtNodeId.equals(nodeId) :
+                    "Another transmission in progress [snpTrFut=" + snpTrFut + ", nodeId=" + snpName + ']';
+
+                if (snpTrFut.partsLeft == -1)
+                    snpTrFut.partsLeft = partsCnt;
+
+                return snpTrFut;
+            }
+
+            /**
+             * @param snpTrans Current snapshot transmission.
+             * @param grpPartId Pair of group id and its partition id.
+             */
+            private void finishRecover(RemoteSnapshotFuture snpTrans, GroupPartitionId grpPartId) {
+                FilePageStore pageStore = null;
+
+                try {
+                    pageStore = snpTrans.stores.remove(grpPartId);
+
+                    pageStore.finishRecover();
+
+                    snpTrans.partConsumer.accept(new File(pageStore.getFileAbsolutePath()), grpPartId);
+
+                    snpTrans.partsLeft--;
+                }
+                catch (StorageException e) {
+                    throw new IgniteException(e);
+                }
+                finally {
+                    U.closeQuiet(pageStore);
+                }
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<ByteBuffer> chunkHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+
+                RemoteSnapshotFuture snpTrFut = resolve(nodeId, initMeta);
+
+                GroupPartitionId grpPartId = new GroupPartitionId(grpId, partId);
+                FilePageStore pageStore = snpTrFut.stores.get(grpPartId);
+
+                if (pageStore == null) {
+                    throw new IgniteException("Partition must be loaded before applying snapshot delta pages " +
+                        "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                pageStore.beginRecover();
+
+                // No snapshot delta pages received. Finalize recovery.
+                if (initMeta.count() == 0)
+                    finishRecover(snpTrFut, grpPartId);
+
+                return new Consumer<ByteBuffer>() {
+                    final LongAdder transferred = new LongAdder();
+
+                    @Override public void accept(ByteBuffer buff) {
+                        try {
+                            assert initMeta.count() != 0 : initMeta;
+
+                            RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                            if (fut0 == null || !fut0.equals(snpTrFut) || fut0.isCancelled()) {
+                                throw new TransmissionCancelledException("Snapshot request is cancelled " +
+                                    "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                            }
+
+                            pageStore.write(PageIO.getPageId(buff), buff, 0, false);
+
+                            transferred.add(buff.capacity());
+
+                            if (transferred.longValue() == initMeta.count())
+                                finishRecover(snpTrFut, grpPartId);
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                    }
+                };
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<File> fileHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+                String snpName = (String)initMeta.params().get(SNP_NAME_PARAM);
+
+                assert grpId != null;
+                assert partId != null;
+                assert snpName != null;
+                assert storeFactory != null;
+
+                RemoteSnapshotFuture transFut = rmtSnpReq.get();
+
+                if (transFut == null) {
+                    throw new IgniteException("Snapshot transmission with given name doesn't exists " +
+                        "[snpName=" + snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                return new Consumer<File>() {
+                    @Override public void accept(File file) {
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.equals(transFut) || fut0.isCancelled()) {
+                            throw new TransmissionCancelledException("Snapshot request is cancelled [snpName=" + snpName +
+                                ", grpId=" + grpId + ", partId=" + partId + ']');
+                        }
+
+                        busyLock.enterBusy();
+
+                        try {
+                            FilePageStore pageStore = (FilePageStore)storeFactory
+                                .apply(grpId, false)
+                                .createPageStore(getFlagByPartId(partId),
+                                    file::toPath,
+                                    new LongAdderMetric("NO_OP", null));
+
+                            transFut.stores.put(new GroupPartitionId(grpId, partId), pageStore);
+
+                            pageStore.init();
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                        finally {
+                            busyLock.leaveBusy();
+                        }
+                    }
+                };
+            }
+        });
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void stop0(boolean cancel) {
+        busyLock.block();
+
+        try {
+            // Try stop all snapshot processing if not yet.
+            for (SnapshotFutureTask sctx : locSnpTasks.values())
+                sctx.acceptException(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+            locSnpTasks.clear();
+
+            RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+            if (snpTrFut != null)
+                snpTrFut.cancel();
+
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null) {
+                    clusterSnpFut.onDone(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+                    clusterSnpFut = null;
+                }
+            }
+
+            if (snpRunner != null)
+                snpRunner.shutdownNow();
+
+            cctx.kernalContext().io().removeMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC);
+            cctx.kernalContext().io().removeTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC);
+
+            if (discoLsnr != null)
+                cctx.kernalContext().event().removeDiscoveryEventListener(discoLsnr);
+
+            cctx.exchange().unregisterExchangeAwareComponent(this);
+        }
+        finally {
+            busyLock.unblock();
+        }
+    }
+
+    /**
+     * @param snpDir Snapshot dire
+     * @param folderName Local node folder name (see {@link U#maskForFileName} with consistent id).
+     */
+    public static void deleteSnapshot(File snpDir, String folderName) {
+        if (!snpDir.exists())
+            return;
+
+        assert snpDir.isDirectory() : snpDir;
+
+        try {
+            File binDir = resolveBinaryWorkDir(snpDir.getAbsolutePath(), folderName);
+            File dbDir = U.resolveWorkDirectory(snpDir.getAbsolutePath(), databaseRelativePath(folderName), false);
+
+            U.delete(binDir);
+            U.delete(dbDir);
+
+            File marshDir = mappingFileStoreWorkDir(snpDir.getAbsolutePath());
+
+            // Concurrently traverse the snapshot marshaller directory and delete all files.
+            Files.walkFileTree(marshDir.toPath(), new SimpleFileVisitor<Path>() {
+                @Override public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException {
+                    U.delete(file);
+
+                    return FileVisitResult.CONTINUE;
+                }
+
+                @Override public FileVisitResult visitFileFailed(Path file, IOException exc) {
+                    // Skip files which can be concurrently removed from FileTree.
+                    return FileVisitResult.CONTINUE;
+                }
+            });
+
+            File db = new File(snpDir, DB_DEFAULT_FOLDER);
+
+            if (!db.exists() || db.list().length == 0)
+                U.delete(snpDir);
+        }
+        catch (IOException | IgniteCheckedException e) {
+            throw new IgniteException(e);
+        }
+    }
+
+    /**
+     * @param snpName Snapshot name.
+     * @return Local snapshot directory for snapshot with given name.
+     */
+    public File snapshotLocalDir(String snpName) {
+        assert locSnpDir != null;
+
+        return new File(locSnpDir, snpName);
+    }
+
+    /**
+     * @return Node snapshot working directory.
+     */
+    public File snapshotTmpDir() {
+        assert tmpWorkDir != null;
+
+        return tmpWorkDir;
+    }
+
+    /**
+     * @param req Request on snapshot creation.
+     * @return Future which will be completed when a snapshot has been started.
+     */
+    private IgniteInternalFuture<SnapshotOperationResponse> initLocalSnapshotStartStage(SnapshotOperationRequest req) {
+        if (cctx.kernalContext().clientNode() ||
+            !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()))
+            return new GridFinishedFuture<>();
+
+        // Executed inside discovery notifier thread, prior to firing discovery custom event,
+        // so it is safe to set new snapshot task inside this method without synchronization.
+        if (clusterSnpRq != null) {
+            return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot operation has been rejected. " +
+                "Another snapshot operation in progress [req=" + req + ", curr=" + clusterSnpRq + ']'));
+        }
+
+        // Collection of pairs group and appropriate cache partition to be snapshot.
+        Map<Integer, Set<Integer>> parts = new HashMap<>();
+
+        for (Integer grpId : req.grpIds)
+            parts.put(grpId, null);
+
+        SnapshotFutureTask task0 = registerSnapshotTask(req.snpName,
+            req.srcNodeId,
+            parts,
+            locSndrFactory.apply(req.snpName));
+
+        clusterSnpRq = req;
+
+        return task0.chain(f -> new SnapshotOperationResponse());
+    }
+
+    /**
+     * @param id Request id.
+     * @param res Results.
+     * @param err Errors.
+     */
+    private void processLocalSnapshotStartStageResult(UUID id, Map<UUID, SnapshotOperationResponse> res, Map<UUID, Exception> err) {
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        if (snpRq == null)
+            return;
+
+        if (!snpRq.rqId.equals(id)) {
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null && clusterSnpFut.rqId.equals(id)) {
+                    clusterSnpFut.onDone(new IgniteCheckedException("Snapshot operation failed due to another snapshot " +
+                        "operation in progress: " + snpRq.snpName));
+
+                    clusterSnpFut = null;
+                }
+
+                return;
+            }
+        }
+
+        if (isLocalNodeCoordinator(cctx.discovery())) {
+            Set<UUID> missed = new HashSet<>(snpRq.bltNodes);
+            missed.removeAll(res.keySet());
+            missed.removeAll(err.keySet());
+
+            snpRq.hasErr = !F.isEmpty(err) || !missed.isEmpty();
+
+            if (snpRq.hasErr) {
+                U.warn(log, "Execution of local snapshot tasks fails or them haven't been executed " +
+                    "due to some of nodes left the cluster. Uncompleted snapshot will be deleted " +
+                    "[err=" + err + ", missed=" + missed + ']');
+            }
+
+            endSnpProc.start(UUID.randomUUID(), snpRq);
+        }
+    }
+
+    /**
+     * @param req Request on snapshot creation.
+     * @return Future which will be completed when the snapshot will be finalized.
+     */
+    private IgniteInternalFuture<SnapshotOperationResponse> initLocalSnapshotEndStage(SnapshotOperationRequest req) {
+        if (clusterSnpRq == null)
+            return new GridFinishedFuture<>(new SnapshotOperationResponse());
+
+        try {
+            if (req.hasErr)
+                deleteSnapshot(snapshotLocalDir(req.snpName), pdsSettings.folderName());
+
+            removeLastMetaStorageKey();
+        }
+        catch (Exception e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        return new GridFinishedFuture<>(new SnapshotOperationResponse());
+    }
+
+    /**
+     * @param id Request id.
+     * @param res Results.
+     * @param err Errors.
+     */
+    private void processLocalSnapshotEndStageResult(UUID id, Map<UUID, SnapshotOperationResponse> res, Map<UUID, Exception> err) {
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        if (snpRq == null)
+            return;
+
+        Set<UUID> endFail = new HashSet<>(snpRq.bltNodes);
+        endFail.removeAll(res.keySet());
+
+        clusterSnpRq = null;
+
+        synchronized (snpOpMux) {
+            if (clusterSnpFut != null) {
+                if (endFail.isEmpty() && !snpRq.hasErr) {
+                    clusterSnpFut.onDone();
+
+                    if (log.isInfoEnabled())
+                        log.info("Cluster-wide snapshot operation finished successfully [req=" + snpRq + ']');
+                }
+                else {
+                    clusterSnpFut.onDone(new IgniteCheckedException("Snapshot creation has been finished with an error. " +
+                        "Local snapshot tasks may not finished completely or finalizing results fails " +
+                        "[hasErr" + snpRq.hasErr + ", fail=" + endFail + ']'));
+                }
+
+                clusterSnpFut = null;
+            }
+        }
+    }
+
+    /**
+     * @return {@code True} if snapshot operation is in progress.
+     */
+    public boolean isSnapshotCreating() {
+        if (clusterSnpRq != null)
+            return true;
+
+        synchronized (snpOpMux) {
+            return clusterSnpRq != null || clusterSnpFut != null;
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public List<String> getSnapshots() {
+        if (cctx.kernalContext().clientNode())
+            throw new UnsupportedOperationException("Client and daemon nodes can not perform this operation.");
+
+        synchronized (snpOpMux) {
+            return Arrays.stream(locSnpDir.listFiles(File::isDirectory))
+                .map(File::getName)
+                .collect(Collectors.toList());
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public IgniteFuture<Void> createSnapshot(String name) {
+        A.notNullOrEmpty(name, "name");
+
+        try {
+            if (cctx.kernalContext().clientNode())
+                throw new UnsupportedOperationException("Client and daemon nodes can not perform this operation.");
+
+            if (!IgniteFeatures.allNodesSupports(cctx.discovery().allNodes(), PERSISTENCE_CACHE_SNAPSHOT))
+                throw new IgniteException("Not all nodes in the cluster support a snapshot operation.");
+
+            if (!active(cctx.kernalContext().state().clusterState().state()))
+                throw new IgniteException("Snapshot operation has been rejected. The cluster is inactive.");
+
+            DiscoveryDataClusterState clusterState = cctx.kernalContext().state().clusterState();
+
+            if (!clusterState.hasBaselineTopology())
+                throw new IgniteException("Snapshot operation has been rejected. The baseline topology is not configured for cluster.");
+
+            ClusterSnapshotFuture snpFut0;
+
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null && !clusterSnpFut.isDone())
+                    throw new IgniteException("Create snapshot request has been rejected. The previous snapshot operation was not completed.");
+
+                if (clusterSnpRq != null)
+                    throw new IgniteException("Create snapshot request has been rejected. Parallel snapshot processes are not allowed.");
+
+                if (getSnapshots().contains(name))
+                    throw new IgniteException("Create snapshot request has been rejected. Snapshot with given name already exists.");
+
+                snpFut0 = new ClusterSnapshotFuture(UUID.randomUUID(), name);
+
+                clusterSnpFut = snpFut0;
+                lastSeenSnpFut = snpFut0;
+            }
+
+            List<Integer> grps = cctx.cache().persistentGroups().stream()
+                .filter(g -> cctx.cache().cacheType(g.cacheOrGroupName()) == CacheType.USER)
+                .filter(g -> !g.config().isEncryptionEnabled())
+                .map(CacheGroupDescriptor::groupId)
+                .collect(Collectors.toList());
+
+            List<ClusterNode> srvNodes = cctx.discovery().serverNodes(AffinityTopologyVersion.NONE);
+
+            startSnpProc.start(snpFut0.rqId, new SnapshotOperationRequest(snpFut0.rqId,
+                cctx.localNodeId(),
+                name,
+                grps,
+                new HashSet<>(F.viewReadOnly(srvNodes,
+                    F.node2id(),
+                    (node) -> CU.baselineNode(node, clusterState)))));
+
+            if (log.isInfoEnabled())
+                log.info("Cluster-wide snapshot operation started [snpName=" + name + ", grps=" + grps + ']');
+
+            return new IgniteFutureImpl<>(snpFut0);
+        }
+        catch (Exception e) {
+            U.error(log, "Start snapshot operation failed", e);
+
+            lastSeenSnpFut = new ClusterSnapshotFuture(name, e);
+
+            return new IgniteFinishedFutureImpl<>(e);
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onReadyForReadWrite(ReadWriteMetastorage metaStorage) throws IgniteCheckedException {
+        synchronized (snpOpMux) {
+            this.metaStorage = metaStorage;
+
+            if (recovered)
+                removeLastMetaStorageKey();
+
+            recovered = false;
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onReadyForRead(ReadOnlyMetastorage metaStorage) throws IgniteCheckedException {
+        // Snapshot which has not been completed due to the local node crashed must be deleted.
+        String snpName = (String)metaStorage.read(SNP_RUNNING_KEY);
+
+        if (snpName == null)
+            return;
+
+        recovered = true;
+
+        for (File tmp : snapshotTmpDir().listFiles())
+            U.delete(tmp);
+
+        deleteSnapshot(snapshotLocalDir(snpName), pdsSettings.folderName());
+
+        if (log.isInfoEnabled()) {
+            log.info("Previous attempt to create snapshot fail due to the local node crash. All resources " +
+                "related to snapshot operation have been deleted: " + snpName);
+        }
+    }
+
+    /**
+     * @param evt Discovery event to check.
+     * @return {@code true} if exchange started by snapshot operation.
+     */
+    public static boolean isSnapshotOperation(DiscoveryEvent evt) {
+        return !evt.eventNode().isClient() &&
+            evt.type() == EVT_DISCOVERY_CUSTOM_EVT &&
+            ((DiscoveryCustomEvent)evt).customMessage() instanceof SnapshotStartDiscoveryMessage;
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onDoneBeforeTopologyUnlock(GridDhtPartitionsExchangeFuture fut) {
+        if (clusterSnpRq == null || cctx.kernalContext().clientNode())
+            return;
+
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        SnapshotFutureTask task = locSnpTasks.get(snpRq.snpName);
+
+        if (task == null)
+            return;
+
+        if (task.start()) {
+            cctx.database().forceCheckpoint(String.format("Start snapshot operation: %s", snpRq.snpName));
+
+            // schedule task on checkpoint and wait when it starts
+            try {
+                task.awaitStarted();
+            }
+            catch (IgniteCheckedException e) {
+                U.error(log, "Fail to wait while cluster-wide snapshot operation started", e);
+            }
+        }
+    }
+
+    /**
+     * @param parts Collection of pairs group and appropriate cache partition to be snapshot.
+     * @param rmtNodeId The remote node to connect to.
+     * @param partConsumer Received partition handler.
+     * @return Future which will be completed when requested snapshot fully received.
+     */
+    public IgniteInternalFuture<Void> createRemoteSnapshot(
+        UUID rmtNodeId,
+        Map<Integer, Set<Integer>> parts,
+        BiConsumer<File, GroupPartitionId> partConsumer
+    ) {
+        assert partConsumer != null;
+
+        ClusterNode rmtNode = cctx.discovery().node(rmtNodeId);
+
+        if (!nodeSupports(rmtNode, PERSISTENCE_CACHE_SNAPSHOT))
+            return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot on remote node is not supported: " + rmtNode.id()));
+
+        if (rmtNode == null) {
+            return new GridFinishedFuture<>(new ClusterTopologyCheckedException("Snapshot request cannot be performed. " +
+                "Remote node left the grid [rmtNodeId=" + rmtNodeId + ']'));
+        }
+
+        String snpName = RMT_SNAPSHOT_PREFIX + UUID.randomUUID().toString();
+
+        RemoteSnapshotFuture snpTransFut = new RemoteSnapshotFuture(rmtNodeId, snpName, partConsumer);
+
+        busyLock.enterBusy();
+        SnapshotRequestMessage msg0;
+
+        try {
+            msg0 = new SnapshotRequestMessage(snpName, parts);
+
+            RemoteSnapshotFuture fut = rmtSnpReq.get();
+
+            try {
+                if (fut != null)
+                    fut.get(DFLT_SNAPSHOT_TIMEOUT, TimeUnit.MILLISECONDS);
+            }
+            catch (IgniteCheckedException e) {
+                if (log.isInfoEnabled())
+                    log.info("The previous snapshot request finished with an exception:" + e.getMessage());
+            }
+
+            try {
+                if (rmtSnpReq.compareAndSet(null, snpTransFut)) {
+                    cctx.gridIO().sendOrderedMessage(rmtNode, DFLT_INITIAL_SNAPSHOT_TOPIC, msg0, SYSTEM_POOL,
+                        Long.MAX_VALUE, true);
+                }
+                else
+                    return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot request has been concurrently interrupted."));
+
+            }
+            catch (IgniteCheckedException e) {
+                rmtSnpReq.compareAndSet(snpTransFut, null);
+
+                return new GridFinishedFuture<>(e);
+            }
+        }
+        finally {
+            busyLock.leaveBusy();
+        }
+
+        if (log.isInfoEnabled()) {
+            log.info("Snapshot request is sent to the remote node [rmtNodeId=" + rmtNodeId +
+                ", msg0=" + msg0 + ", snpTransFut=" + snpTransFut +
+                ", topVer=" + cctx.discovery().topologyVersionEx() + ']');
+        }
+
+        return snpTransFut;
+    }
+
+    /**
+     * @param grps List of cache groups which will be destroyed.
+     */
+    public void onCacheGroupsStopped(List<Integer> grps) {
+        for (SnapshotFutureTask sctx : locSnpTasks.values()) {
+            Set<Integer> retain = new HashSet<>(grps);
+            retain.retainAll(sctx.affectedCacheGroups());
+
+            if (!retain.isEmpty()) {
+                sctx.acceptException(new IgniteCheckedException("Snapshot has been interrupted due to some of the required " +
+                    "cache groups stopped: " + retain));
+            }
+        }
+    }
+
+    /**
+     * @param snpName Unique snapshot name.
+     * @param srcNodeId Node id which cause snapshot operation.
+     * @param parts Collection of pairs group and appropriate cache partition to be snapshot.
+     * @param snpSndr Factory which produces snapshot receiver instance.
+     * @return Snapshot operation task which should be registered on checkpoint to run.
+     */
+    SnapshotFutureTask registerSnapshotTask(
+        String snpName,
+        UUID srcNodeId,
+        Map<Integer, Set<Integer>> parts,
+        SnapshotSender snpSndr
+    ) {
+        if (!busyLock.enterBusy())
+            return new SnapshotFutureTask(new IgniteCheckedException("Snapshot manager is stopping [locNodeId=" + cctx.localNodeId() + ']'));
+
+        try {
+            if (locSnpTasks.containsKey(snpName))
+                return new SnapshotFutureTask(new IgniteCheckedException("Snapshot with requested name is already scheduled: " + snpName));
+
+            SnapshotFutureTask snpFutTask;
+
+            SnapshotFutureTask prev = locSnpTasks.putIfAbsent(snpName,
+                snpFutTask = new SnapshotFutureTask(cctx,
+                    srcNodeId,
+                    snpName,
+                    tmpWorkDir,
+                    ioFactory,
+                    snpSndr,
+                    parts,
+                    locBuff));
+
+            if (prev != null)
+                return new SnapshotFutureTask(new IgniteCheckedException("Snapshot with requested name is already scheduled: " + snpName));
+
+            if (log.isInfoEnabled()) {
+                log.info("Snapshot task has been registered on local node [sctx=" + this +
+                    ", topVer=" + cctx.discovery().topologyVersionEx() + ']');
+            }
+
+            snpFutTask.listen(f -> locSnpTasks.remove(snpName));
+
+            return snpFutTask;
+        }
+        finally {
+            busyLock.leaveBusy();
+        }
+    }
+
+    /**
+     * @param factory Factory which produces {@link LocalSnapshotSender} implementation.
+     */
+    void setLocalSnapshotSenderFactory(Function<String, SnapshotSender> factory) {
+        locSndrFactory = factory;
+    }
+
+    /**
+     * @return Factory which produces {@link LocalSnapshotSender} implementation.
+     */
+    Function<String, SnapshotSender> localSnapshotSenderFactory() {
+        return LocalSnapshotSender::new;
+    }
+
+    /**
+     * @param snpName Snapshot name.
+     * @param rmtNodeId Remote node id to send snapshot to.
+     * @return Snapshot sender instance.
+     */
+    SnapshotSender remoteSnapshotSender(String snpName, UUID rmtNodeId) {
+        // Remote snapshots can be send only by single threaded executor since only one transmissionSender created.
+        return new RemoteSnapshotSender(log,
+            new SequentialExecutorWrapper(log, snpRunner),
+            () -> databaseRelativePath(pdsSettings.folderName()),
+            cctx.gridIO().openTransmissionSender(rmtNodeId, DFLT_INITIAL_SNAPSHOT_TOPIC),
+            snpName);
+    }
+
+    /** Snapshot finished successfully or already restored. Key can be removed. */
+    private void removeLastMetaStorageKey() throws IgniteCheckedException {
+        cctx.database().checkpointReadLock();
+
+        try {
+            metaStorage.remove(SNP_RUNNING_KEY);
+        }
+        finally {
+            cctx.database().checkpointReadUnlock();
+        }
+    }
+
+    /**
+     * @return The executor service used to run snapshot tasks.
+     */
+    ExecutorService snapshotExecutorService() {
+        assert snpRunner != null;
+
+        return snpRunner;
+    }
+
+    /**
+     * @param ioFactory Factory to create IO interface over a page stores.
+     */
+    void ioFactory(FileIOFactory ioFactory) {
+        this.ioFactory = ioFactory;
+    }
+
+    /**
+     * @param nodeId Remote node id on which requests has been registered.
+     * @return Snapshot future related to given node id.
+     */
+    SnapshotFutureTask lastScheduledRemoteSnapshotTask(UUID nodeId) {
+        return locSnpTasks.values().stream()
+            .filter(t -> t.type() == RemoteSnapshotSender.class && t.sourceNodeId().equals(nodeId))
+            .findFirst()
+            .orElse(null);
+    }
+
+    /**
+     * @return Relative configured path of persistence data storage directory for the local node.
+     * Example: {@code snapshotWorkDir/db/IgniteNodeName0}
+     */
+    static String databaseRelativePath(String folderName) {
+        return Paths.get(DB_DEFAULT_FOLDER, folderName).toString();
+    }
+
+    /**
+     * @param cfg Ignite configuration.
+     * @return Snapshot work path.
+     */
+    static File resolveSnapshotWorkDirectory(IgniteConfiguration cfg) {
+        try {
+            return cfg.getSnapshotPath() == null ?
+                U.resolveWorkDirectory(cfg.getWorkDirectory(), DFLT_SNAPSHOT_DIRECTORY, false) :
+                U.resolveWorkDirectory(cfg.getWorkDirectory(), cfg.getSnapshotPath(), false);
+        }
+        catch (IgniteCheckedException e) {
+            throw new IgniteException(e);
+        }
+    }
+
+    /** Remote snapshot future which tracks remote snapshot transmission result. */
+    private class RemoteSnapshotFuture extends GridFutureAdapter<Void> {
+        /** Snapshot name to create. */
+        private final String snpName;
+
+        /** Remote node id to request snapshot from. */
+        private final UUID rmtNodeId;
+
+        /** Collection of partition to be received. */
+        private final Map<GroupPartitionId, FilePageStore> stores = new ConcurrentHashMap<>();
+
+        /** Partition handler given by request initiator. */
+        private final BiConsumer<File, GroupPartitionId> partConsumer;
+
+        /** Counter which show how many partitions left to be received. */
+        private int partsLeft = -1;
+
+        /**
+         * @param partConsumer Received partition handler.
+         */
+        public RemoteSnapshotFuture(UUID rmtNodeId, String snpName, BiConsumer<File, GroupPartitionId> partConsumer) {
+            this.snpName = snpName;
+            this.rmtNodeId = rmtNodeId;
+            this.partConsumer = partConsumer;
+        }
+
+        /** {@inheritDoc} */
+        @Override public boolean cancel() {
+            return onCancelled();
+        }
+
+        /** {@inheritDoc} */
+        @Override protected boolean onDone(@Nullable Void res, @Nullable Throwable err, boolean cancel) {
+            assert err != null || cancel || stores.isEmpty() : "Not all file storage processed: " + stores;
+
+            rmtSnpReq.compareAndSet(this, null);
+
+            if (err != null || cancel) {
+                // Close non finished file storage.
+                for (Map.Entry<GroupPartitionId, FilePageStore> entry : stores.entrySet()) {
+                    FilePageStore store = entry.getValue();
+
+                    try {
+                        store.stop(true);
+                    }
+                    catch (StorageException e) {
+                        log.warning("Error stopping received file page store", e);
+                    }
+                }
+            }
+
+            U.delete(Paths.get(tmpWorkDir.getAbsolutePath(), snpName));
+
+            return super.onDone(res, err, cancel);
+        }
+
+        /** {@inheritDoc} */
+        @Override public boolean equals(Object o) {
+            if (this == o)
+                return true;
+
+            if (o == null || getClass() != o.getClass())
+                return false;
+
+            RemoteSnapshotFuture fut = (RemoteSnapshotFuture)o;
+
+            return rmtNodeId.equals(fut.rmtNodeId) &&
+                snpName.equals(fut.snpName);
+        }
+
+        /** {@inheritDoc} */
+        @Override public int hashCode() {
+            return Objects.hash(rmtNodeId, snpName);
+        }
+
+        /** {@inheritDoc} */
+        @Override public String toString() {
+            return S.toString(RemoteSnapshotFuture.class, this);
+        }
+    }
+
+    /**
+     * Such an executor can executes tasks not in a single thread, but executes them
+     * on different threads sequentially. It's important for some {@link SnapshotSender}'s
+     * to process sub-task sequentially due to all these sub-tasks may share a single socket
+     * channel to send data to.
+     */
+    private static class SequentialExecutorWrapper implements Executor {
+        /** Ignite logger. */
+        private final IgniteLogger log;
+
+        /** Queue of task to execute. */
+        private final Queue<Runnable> tasks = new ArrayDeque<>();
+
+        /** Delegate executor. */
+        private final Executor executor;
+
+        /** Currently running task. */
+        private volatile Runnable active;
+
+        /** If wrapped executor is shutting down. */
+        private volatile boolean stopping;
+
+        /**
+         * @param executor Executor to run tasks on.
+         */
+        public SequentialExecutorWrapper(IgniteLogger log, Executor executor) {
+            this.log = log.getLogger(SequentialExecutorWrapper.class);
+            this.executor = executor;
+        }
+
+        /** {@inheritDoc} */
+        @Override public synchronized void execute(final Runnable r) {
+            assert !stopping : "Task must be cancelled prior to the wrapped executor is shutting down.";
+
+            tasks.offer(() -> {
+                try {
+                    r.run();
+                }
+                finally {
+                    scheduleNext();
+                }
+            });
+
+            if (active == null)
+                scheduleNext();
+        }
+
+        /** */
+        protected synchronized void scheduleNext() {
+            if ((active = tasks.poll()) != null) {
+                try {
+                    executor.execute(active);
+                }
+                catch (RejectedExecutionException e) {
+                    tasks.clear();
+
+                    stopping = true;
+
+                    log.warning("Task is outdated. Wrapped executor is shutting down.", e);
+                }
+            }
+        }
+    }
+
+    /**
+     *
+     */
+    private static class RemoteSnapshotSender extends SnapshotSender {
+        /** The sender which sends files to remote node. */
+        private final GridIoManager.TransmissionSender sndr;
+
+        /** Relative node path initializer. */
+        private final Supplier<String> initPath;
+
+        /** Snapshot name */
+        private final String snpName;
+
+        /** Local node persistent directory with consistent id. */
+        private String relativeNodePath;
+
+        /** The number of cache partition files expected to be processed. */
+        private int partsCnt;
+
+        /**
+         * @param log Ignite logger.
+         * @param sndr File sender instance.
+         * @param snpName Snapshot name.
+         */
+        public RemoteSnapshotSender(
+            IgniteLogger log,
+            Executor exec,
+            Supplier<String> initPath,
+            GridIoManager.TransmissionSender sndr,
+            String snpName
+        ) {
+            super(log, exec);
+
+            this.sndr = sndr;
+            this.snpName = snpName;
+            this.initPath = initPath;
+        }
+
+        /** {@inheritDoc} */
+        @Override protected void init(int partsCnt) {
+            this.partsCnt = partsCnt;
+
+            relativeNodePath = initPath.get();
+
+            if (relativeNodePath == null)
+                throw new IgniteException("Relative node path cannot be empty.");
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendPart0(File part, String cacheDirName, GroupPartitionId pair, Long len) {
+            try {
+                assert part.exists();
+                assert len > 0 : "Requested partitions has incorrect file length " +
+                    "[pair=" + pair + ", cacheDirName=" + cacheDirName + ']';
+
+                sndr.send(part, 0, len, transmissionParams(snpName, cacheDirName, pair), TransmissionPolicy.FILE);
+
+                if (log.isInfoEnabled()) {
+                    log.info("Partition file has been send [part=" + part.getName() + ", pair=" + pair +
+                        ", length=" + len + ']');
+                }
+            }
+            catch (TransmissionCancelledException e) {
+                if (log.isInfoEnabled()) {
+                    log.info("Transmission partition file has been interrupted [part=" + part.getName() +
+                        ", pair=" + pair + ']');
+                }
+            }
+            catch (IgniteCheckedException | InterruptedException | IOException e) {
+                U.error(log, "Error sending partition file [part=" + part.getName() + ", pair=" + pair +
+                    ", length=" + len + ']', e);
+
+                throw new IgniteException(e);
+            }
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendDelta0(File delta, String cacheDirName, GroupPartitionId pair) {
+            try {
+                sndr.send(delta, transmissionParams(snpName, cacheDirName, pair), TransmissionPolicy.CHUNK);
+
+                if (log.isInfoEnabled())
+                    log.info("Delta pages storage has been send [part=" + delta.getName() + ", pair=" + pair + ']');
+            }
+            catch (TransmissionCancelledException e) {
+                if (log.isInfoEnabled()) {
+                    log.info("Transmission delta pages has been interrupted [part=" + delta.getName() +
+                        ", pair=" + pair + ']');
+                }
+            }
+            catch (IgniteCheckedException | InterruptedException | IOException e) {
+                U.error(log, "Error sending delta file  [part=" + delta.getName() + ", pair=" + pair + ']', e);
+
+                throw new IgniteException(e);
+            }
+        }
+
+        /**
+         * @param cacheDirName Cache directory name.
+         * @param pair Cache group id with corresponding partition id.
+         * @return Map of params.
+         */
+        private Map<String, Serializable> transmissionParams(String snpName, String cacheDirName,
+            GroupPartitionId pair) {
+            Map<String, Serializable> params = new HashMap<>();
+
+            params.put(SNP_GRP_ID_PARAM, pair.getGroupId());
+            params.put(SNP_PART_ID_PARAM, pair.getPartitionId());
+            params.put(SNP_DB_NODE_PATH_PARAM, relativeNodePath);
+            params.put(SNP_CACHE_DIR_NAME_PARAM, cacheDirName);
+            params.put(SNP_NAME_PARAM, snpName);
+            params.put(SNP_PARTITIONS_CNT, partsCnt);
+
+            return params;
+        }
+
+        /** {@inheritDoc} */
+        @Override public void close0(@Nullable Throwable th) {
+            U.closeQuiet(sndr);
+
+            if (th == null) {
+                if (log.isInfoEnabled())
+                    log.info("The remote snapshot sender closed normally [snpName=" + snpName + ']');
+            }
+            else {
+                U.warn(log, "The remote snapshot sender closed due to an error occurred while processing " +
+                    "snapshot operation [snpName=" + snpName + ']', th);
+            }
+        }
+    }
+
+    /**
+     * Snapshot sender which writes all data to local directory.
+     */
+    private class LocalSnapshotSender extends SnapshotSender {
+        /** Snapshot name. */
+        private final String snpName;
+
+        /** Local snapshot directory. */
+        private final File snpLocDir;
+
+        /** Local node snapshot directory calculated on snapshot directory. */
+        private File dbDir;
+
+        /** Size of page. */
+        private final int pageSize;
+
+        /**
+         * @param snpName Snapshot name.
+         */
+        public LocalSnapshotSender(String snpName) {
+            super(IgniteSnapshotManager.this.log, snpRunner);
+
+            this.snpName = snpName;
+            snpLocDir = snapshotLocalDir(snpName);
+            pageSize = cctx.kernalContext().config().getDataStorageConfiguration().getPageSize();
+        }
+
+        /** {@inheritDoc} */
+        @Override protected void init(int partsCnt) {
+            dbDir = new File (snpLocDir, databaseRelativePath(pdsSettings.folderName()));
+
+            if (dbDir.exists()) {
+                throw new IgniteException("Snapshot with given name already exists " +
+                    "[snpName=" + snpName + ", absPath=" + dbDir.getAbsolutePath() + ']');
+            }
+
+            cctx.database().checkpointReadLock();
+
+            try {
+                assert metaStorage != null && metaStorage.read(SNP_RUNNING_KEY) == null :
+                    "The previous snapshot hasn't been completed correctly";
+
+                metaStorage.write(SNP_RUNNING_KEY, snpName);
+
+                U.ensureDirectory(dbDir, "snapshot work directory", log);
+            }
+            catch (IgniteCheckedException e) {
+                throw new IgniteException(e);
+            }
+            finally {
+                cctx.database().checkpointReadUnlock();
+            }
+
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendCacheConfig0(File ccfg, String cacheDirName) {
+            assert dbDir != null;
+
+            try {
+                File cacheDir = U.resolveWorkDirectory(dbDir.getAbsolutePath(), cacheDirName, false);
+
+                copy(ccfg, new File(cacheDir, ccfg.getName()), ccfg.length());
+            }
+            catch (IgniteCheckedException | IOException e) {
+                throw new IgniteException(e);
+            }
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendMarshallerMeta0(List<Map<Integer, MappedName>> mappings) {
+            if (mappings == null)
+                return;
+
+            saveMappings(cctx.kernalContext(), mappings, snpLocDir);
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendBinaryMeta0(Collection<BinaryType> types) {
+            if (types == null)
+                return;
+
+            cctx.kernalContext().cacheObjects().saveMetadata(types, snpLocDir);
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendPart0(File part, String cacheDirName, GroupPartitionId pair, Long len) {
+            try {
+                if (len == 0)
+                    return;
+
+                File cacheDir = U.resolveWorkDirectory(dbDir.getAbsolutePath(), cacheDirName, false);
+
+                File snpPart = new File(cacheDir, part.getName());
+
+                if (!snpPart.exists() || snpPart.delete())
+                    snpPart.createNewFile();
+
+                copy(part, snpPart, len);
+
+                if (log.isInfoEnabled()) {
+                    log.info("Partition has been snapshot [snapshotDir=" + dbDir.getAbsolutePath() +
+                        ", cacheDirName=" + cacheDirName + ", part=" + part.getName() +
+                        ", length=" + part.length() + ", snapshot=" + snpPart.getName() + ']');
+                }
+            }
+            catch (IOException | IgniteCheckedException ex) {
+                throw new IgniteException(ex);
+            }
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendDelta0(File delta, String cacheDirName, GroupPartitionId pair) {
+            File snpPart = getPartitionFile(dbDir, cacheDirName, pair.getPartitionId());
+
+            if (log.isInfoEnabled()) {
+                log.info("Start partition snapshot recovery with the given delta page file [part=" + snpPart +
+                    ", delta=" + delta + ']');
+            }
+
+            try (FileIO fileIo = ioFactory.create(delta, READ);
+                 FilePageStore pageStore = (FilePageStore)storeFactory
+                     .apply(pair.getGroupId(), false)
+                     .createPageStore(getFlagByPartId(pair.getPartitionId()),
+                         snpPart::toPath,
+                         new LongAdderMetric("NO_OP", null))
+            ) {
+                ByteBuffer pageBuf = ByteBuffer.allocate(pageSize)
+                    .order(ByteOrder.nativeOrder());
+
+                long totalBytes = fileIo.size();
+
+                assert totalBytes % pageSize == 0 : "Given file with delta pages has incorrect size: " + fileIo.size();
+
+                pageStore.beginRecover();
+
+                for (long pos = 0; pos < totalBytes; pos += pageSize) {
+                    long read = fileIo.readFully(pageBuf, pos);
+
+                    assert read == pageBuf.capacity();
+
+                    pageBuf.flip();
+
+                    if (log.isDebugEnabled()) {
+                        log.debug("Read page given delta file [path=" + delta.getName() +
+                            ", pageId=" + PageIO.getPageId(pageBuf) + ", pos=" + pos + ", pages=" + (totalBytes / pageSize) +
+                            ", crcBuff=" + FastCrc.calcCrc(pageBuf, pageBuf.limit()) + ", crcPage=" + PageIO.getCrc(pageBuf) + ']');
+
+                        pageBuf.rewind();
+                    }
+
+                    pageStore.write(PageIO.getPageId(pageBuf), pageBuf, 0, false);
+
+                    pageBuf.flip();
+                }
+
+                pageStore.finishRecover();
+            }
+            catch (IOException | IgniteCheckedException e) {
+                throw new IgniteException(e);
+            }
+        }
+
+        /** {@inheritDoc} */
+        @Override protected void close0(@Nullable Throwable th) {
+            if (th == null) {
+                if (log.isInfoEnabled())
+                    log.info("Local snapshot sender closed, resources released [dbNodeSnpDir=" + dbDir + ']');
+            }
+            else {
+                deleteSnapshot(snpLocDir, pdsSettings.folderName());
+
+                U.warn(log, "Local snapshot sender closed due to an error occurred", th);
+            }
+        }
+
+        /**
+         * @param from Copy from file.
+         * @param to Copy data to file.
+         * @param length Number of bytes to copy from beginning.
+         * @throws IOException If fails.
+         */
+        private void copy(File from, File to, long length) throws IOException {
+            try (FileIO src = ioFactory.create(from, READ);
+                 FileChannel dest = new FileOutputStream(to).getChannel()) {
+                if (src.size() < length) {
+                    throw new IgniteException("The source file to copy has to enough length " +
+                        "[expected=" + length + ", actual=" + src.size() + ']');
+                }
+
+                src.position(0);
+
+                long written = 0;
+
+                while (written < length)
+                    written += src.transferTo(written, length - written, dest);
+            }
+        }
+    }
+
+    /** Snapshot start request for {@link DistributedProcess} initiate message. */
+    private static class SnapshotOperationRequest implements Serializable {
+        /** Serial version uid. */
+        private static final long serialVersionUID = 0L;
+
+        /** Unique snapshot request id. */
+        private final UUID rqId;
+
+        /** Source node id which trigger request. */
+        private final UUID srcNodeId;
+
+        /** Snapshot name. */
+        private final String snpName;
+
+        @GridToStringInclude
+        /** The list of cache groups to include into snapshot. */
+        private final List<Integer> grpIds;
+
+        @GridToStringInclude
+        /** The list of affected by snapshot operation baseline nodes. */
 
 Review comment:
   Please reorder comment and annotation.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r410149199
 
 

 ##########
 File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManagerSelfTest.java
 ##########
 @@ -354,6 +354,42 @@ public void testSnapshotCreateLocalCopyPartitionFail() throws Exception {
             err_msg);
     }
 
+    /** @throws Exception If fails. */
+    @Test
+    public void testSnapshotRemoteWithNodeFiler() throws Exception {
+        int grids = 3;
+        CacheConfiguration<Integer, Integer> ccfg = txCacheConfig(new CacheConfiguration<Integer, Integer>(DEFAULT_CACHE_NAME))
+            .setNodeFilter(node -> node.consistentId().toString().endsWith("1"));
+
+        for (int i = 0; i < grids; i++)
+            startGrid(optimize(getConfiguration(getTestIgniteInstanceName(i)).setCacheConfiguration()));
+
+        IgniteEx ig0 = grid(0);
+        ig0.cluster().baselineAutoAdjustEnabled(false);
+        ig0.cluster().state(ACTIVE);
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            ig0.getOrCreateCache(ccfg).put(i, i);
+
+        forceCheckpoint();
 
 Review comment:
   Fixed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r409044002
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManager.java
 ##########
 @@ -0,0 +1,1986 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.channels.FileChannel;
+import java.nio.file.FileVisitResult;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.SimpleFileVisitor;
+import java.nio.file.attribute.BasicFileAttributes;
+import java.util.ArrayDeque;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Queue;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.Executor;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.concurrent.atomic.LongAdder;
+import java.util.function.BiConsumer;
+import java.util.function.BiFunction;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.events.DiscoveryEvent;
+import org.apache.ignite.failure.FailureContext;
+import org.apache.ignite.failure.FailureType;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.GridTopic;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.events.DiscoveryCustomEvent;
+import org.apache.ignite.internal.managers.communication.GridIoManager;
+import org.apache.ignite.internal.managers.communication.GridMessageListener;
+import org.apache.ignite.internal.managers.communication.TransmissionCancelledException;
+import org.apache.ignite.internal.managers.communication.TransmissionHandler;
+import org.apache.ignite.internal.managers.communication.TransmissionMeta;
+import org.apache.ignite.internal.managers.communication.TransmissionPolicy;
+import org.apache.ignite.internal.managers.discovery.DiscoCache;
+import org.apache.ignite.internal.managers.discovery.DiscoveryCustomMessage;
+import org.apache.ignite.internal.managers.discovery.GridDiscoveryManager;
+import org.apache.ignite.internal.managers.eventstorage.DiscoveryEventListener;
+import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.CacheType;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.PartitionsExchangeAware;
+import org.apache.ignite.internal.processors.cache.persistence.StorageException;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.filename.PdsFolderSettings;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.MetastorageLifecycleListener;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadOnlyMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadWriteMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.tree.io.PageIO;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.processors.marshaller.MappedName;
+import org.apache.ignite.internal.processors.metric.MetricRegistry;
+import org.apache.ignite.internal.processors.metric.impl.LongAdderMetric;
+import org.apache.ignite.internal.util.GridBusyLock;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.distributed.InitMessage;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.tostring.GridToStringInclude;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.A;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.S;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.lang.IgniteUuid;
+import org.apache.ignite.thread.IgniteThreadPoolExecutor;
+import org.apache.ignite.thread.OomExceptionHandler;
+import org.jetbrains.annotations.Nullable;
+
+import static java.nio.file.StandardOpenOption.READ;
+import static org.apache.ignite.cluster.ClusterState.active;
+import static org.apache.ignite.configuration.IgniteConfiguration.DFLT_SNAPSHOT_DIRECTORY;
+import static org.apache.ignite.events.EventType.EVT_NODE_FAILED;
+import static org.apache.ignite.events.EventType.EVT_NODE_LEFT;
+import static org.apache.ignite.internal.IgniteFeatures.PERSISTENCE_CACHE_SNAPSHOT;
+import static org.apache.ignite.internal.IgniteFeatures.nodeSupports;
+import static org.apache.ignite.internal.MarshallerContextImpl.mappingFileStoreWorkDir;
+import static org.apache.ignite.internal.MarshallerContextImpl.saveMappings;
+import static org.apache.ignite.internal.events.DiscoveryCustomEvent.EVT_DISCOVERY_CUSTOM_EVT;
+import static org.apache.ignite.internal.managers.communication.GridIoPolicy.SYSTEM_POOL;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.INDEX_PARTITION;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.MAX_PARTITION_ID;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.resolveBinaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.INDEX_FILE_NAME;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.PART_FILE_TEMPLATE;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFile;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFileName;
+import static org.apache.ignite.internal.processors.cache.persistence.filename.PdsConsistentIdProcessor.DB_DEFAULT_FOLDER;
+import static org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId.getFlagByPartId;
+import static org.apache.ignite.internal.util.IgniteUtils.isLocalNodeCoordinator;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.END_SNAPSHOT;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.START_SNAPSHOT;
+
+/**
+ * Internal implementation of snapshot operations over persistence caches.
+ * <p>
+ * There are two major actions available:
+ * <ul>
+ *     <li>Create snapshot of the whole cluster cache groups by triggering PME to achieve consistency.</li>
+ *     <li>Create local snapshot of requested cache groups and send it to the node which request this operation.
+ *     Cache groups will be transmitted using internal API for transferring files. See {@link TransmissionHandler}.</li>
+ * </ul>
+ */
+public class IgniteSnapshotManager extends GridCacheSharedManagerAdapter
+    implements IgniteSnapshot, PartitionsExchangeAware, MetastorageLifecycleListener {
+    /** File with delta pages suffix. */
+    public static final String DELTA_SUFFIX = ".delta";
+
+    /** File name template consists of delta pages. */
+    public static final String PART_DELTA_TEMPLATE = PART_FILE_TEMPLATE + DELTA_SUFFIX;
+
+    /** File name template for index delta pages. */
+    public static final String INDEX_DELTA_NAME = INDEX_FILE_NAME + DELTA_SUFFIX;
+
+    /** Text Reason for checkpoint to start snapshot operation. */
+    public static final String CP_SNAPSHOT_REASON = "Checkpoint started to enforce snapshot operation: %s";
+
+    /** Name prefix for each remote snapshot operation. */
+    public static final String RMT_SNAPSHOT_PREFIX = "snapshot_";
+
+    /** Default snapshot directory for loading remote snapshots. */
+    public static final String DFLT_SNAPSHOT_TMP_DIR = "snp";
+
+    /** Timeout in millisecond for snapshot operations. */
+    public static final long DFLT_SNAPSHOT_TIMEOUT = 15_000L;
+
+    /** Snapshot in progress error message. */
+    public static final String SNP_IN_PROGRESS_ERR_MSG = "Operation rejected due to the snapshot operation in progress.";
+
+    /** Error message to finalize snapshot tasks. */
+    public static final String SNP_NODE_STOPPING_ERR_MSG = "Snapshot has been cancelled due to the local node " +
+        "is stopping";
+
+    /** Metastorage key to save currently running snapshot. */
+    public static final String SNP_RUNNING_KEY = "snapshot-running";
+
+    /** Snapshot metrics prefix. */
+    public static final String SNAPSHOT_METRICS = "snapshot";
+
+    /** Prefix for snapshot threads. */
+    private static final String SNAPSHOT_RUNNER_THREAD_PREFIX = "snapshot-runner";
+
+    /** Total number of thread to perform local snapshot. */
+    private static final int SNAPSHOT_THREAD_POOL_SIZE = 4;
+
+    /** Default snapshot topic to receive snapshots from remote node. */
+    private static final Object DFLT_INITIAL_SNAPSHOT_TOPIC = GridTopic.TOPIC_SNAPSHOT.topic("rmt_snp");
+
+    /** File transmission parameter of cache group id. */
+    private static final String SNP_GRP_ID_PARAM = "grpId";
+
+    /** File transmission parameter of cache partition id. */
+    private static final String SNP_PART_ID_PARAM = "partId";
+
+    /** File transmission parameter of node-sender directory path with its consistentId (e.g. db/IgniteNode0). */
+    private static final String SNP_DB_NODE_PATH_PARAM = "dbNodePath";
+
+    /** File transmission parameter of a cache directory with is currently sends its partitions. */
+    private static final String SNP_CACHE_DIR_NAME_PARAM = "cacheDirName";
+
+    /** Snapshot parameter name for a file transmission. */
+    private static final String SNP_NAME_PARAM = "snpName";
+
+    /** Total snapshot files count which receiver should expect to receive. */
+    private static final String SNP_PARTITIONS_CNT = "partsCnt";
+
+    /**
+     * Local buffer to perform copy-on-write operations with pages for {@code SnapshotFutureTask.PageStoreSerialWriter}s.
+     * It is important to have only only buffer per thread (instead of creating each buffer per
+     * each {@code SnapshotFutureTask.PageStoreSerialWriter}) this is redundant and can lead to OOM errors. Direct buffer
+     * deallocate only when ByteBuffer is garbage collected, but it can get out of off-heap memory before it.
+     */
+    private final ThreadLocal<ByteBuffer> locBuff;
+
+    /** Map of registered cache snapshot processes and their corresponding contexts. */
+    private final ConcurrentMap<String, SnapshotFutureTask> locSnpTasks = new ConcurrentHashMap<>();
+
+    /** Lock to protect the resources is used. */
+    private final GridBusyLock busyLock = new GridBusyLock();
+
+    /** Requested snapshot from remote node. */
+    private final AtomicReference<RemoteSnapshotFuture> rmtSnpReq = new AtomicReference<>();
+
+    /** Mutex used to order cluster snapshot operation progress. */
+    private final Object snpOpMux = new Object();
+
+    /** Take snapshot operation procedure. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> startSnpProc;
+
+    /** Check previously performed snapshot operation and delete uncompleted files if need. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> endSnpProc;
+
+    /** Resolved persistent data storage settings. */
+    private volatile PdsFolderSettings pdsSettings;
+
+    /** Fully initialized metastorage. */
+    private volatile ReadWriteMetastorage metaStorage;
+
+    /** Local snapshot sender factory. */
+    private Function<String, SnapshotSender> locSndrFactory = localSnapshotSenderFactory();
+
+    /** Main snapshot directory to save created snapshots. */
+    private volatile File locSnpDir;
+
+    /**
+     * Working directory for loaded snapshots from the remote nodes and storing
+     * temporary partition delta-files of locally started snapshot process.
+     */
+    private File tmpWorkDir;
+
+    /** Factory to working with delta as file storage. */
+    private volatile FileIOFactory ioFactory = new RandomAccessFileIOFactory();
+
+    /** Factory to create page store for restore. */
+    private volatile BiFunction<Integer, Boolean, FilePageStoreFactory> storeFactory;
+
+    /** Snapshot thread pool to perform local partition snapshots. */
+    private ExecutorService snpRunner;
+
+    /** System discovery message listener. */
+    private DiscoveryEventListener discoLsnr;
+
+    /** Cluster snapshot operation requested by user. */
+    private ClusterSnapshotFuture clusterSnpFut;
+
+    /** Current snapshot operation on local node. */
+    private volatile SnapshotOperationRequest clusterSnpRq;
+
+    /** {@code true} if recovery process occurred for snapshot. */
+    private volatile boolean recovered;
+
+    /** Last seen cluster snapshot operation. */
+    private volatile ClusterSnapshotFuture lastSeenSnpFut = new ClusterSnapshotFuture();
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public IgniteSnapshotManager(GridKernalContext ctx) {
+        locBuff = ThreadLocal.withInitial(() ->
+            ByteBuffer.allocateDirect(ctx.config().getDataStorageConfiguration().getPageSize())
+                .order(ByteOrder.nativeOrder()));
+
+        startSnpProc = new DistributedProcess<>(ctx, START_SNAPSHOT, this::initLocalSnapshotStartStage,
+            this::processLocalSnapshotStartStageResult);
+
+        endSnpProc = new DistributedProcess<>(ctx, END_SNAPSHOT, this::initLocalSnapshotEndStage,
+            this::processLocalSnapshotEndStageResult);
+    }
+
+    /**
+     * @param snapshotCacheDir Snapshot directory to store files.
+     * @param partId Cache partition identifier.
+     * @return A file representation.
+     */
+    public static File partDeltaFile(File snapshotCacheDir, int partId) {
+        return new File(snapshotCacheDir, partDeltaFileName(partId));
+    }
+
+    /**
+     * @param partId Partition id.
+     * @return File name of delta partition pages.
+     */
+    public static String partDeltaFileName(int partId) {
+        assert partId <= MAX_PARTITION_ID || partId == INDEX_PARTITION;
+
+        return partId == INDEX_PARTITION ? INDEX_DELTA_NAME : String.format(PART_DELTA_TEMPLATE, partId);
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void start0() throws IgniteCheckedException {
+        super.start0();
+
+        GridKernalContext ctx = cctx.kernalContext();
+
+        if (ctx.clientNode())
+            return;
+
+        if (!CU.isPersistenceEnabled(ctx.config()))
+            return;
+
+        snpRunner = new IgniteThreadPoolExecutor(SNAPSHOT_RUNNER_THREAD_PREFIX,
+            cctx.igniteInstanceName(),
+            SNAPSHOT_THREAD_POOL_SIZE,
+            SNAPSHOT_THREAD_POOL_SIZE,
+            IgniteConfiguration.DFLT_THREAD_KEEP_ALIVE_TIME,
+            new LinkedBlockingQueue<>(),
+            SYSTEM_POOL,
+            new OomExceptionHandler(ctx));
+
+        assert cctx.pageStore() instanceof FilePageStoreManager;
+
+        FilePageStoreManager storeMgr = (FilePageStoreManager)cctx.pageStore();
+
+        pdsSettings = cctx.kernalContext().pdsFolderResolver().resolveFolders();
+
+        locSnpDir = resolveSnapshotWorkDirectory(ctx.config());
+        tmpWorkDir = Paths.get(storeMgr.workDir().getAbsolutePath(), DFLT_SNAPSHOT_TMP_DIR).toFile();
+
+        U.ensureDirectory(locSnpDir, "snapshot work directory", log);
+        U.ensureDirectory(tmpWorkDir, "temp directory for snapshot creation", log);
+
+        MetricRegistry mreg = cctx.kernalContext().metric().registry(SNAPSHOT_METRICS);
+
+        mreg.register("LastSnapshotStartTime", () -> lastSeenSnpFut.startTime,
+            "The system time approximated by 10 ms when the last cluster snapshot operation has been started.");
+        mreg.register("LastSnapshotEndTime", () -> lastSeenSnpFut.endTime,
+            "The system time approximated by 10 ms when the last cluster snapshot operation has been finished.");
+        mreg.register("LastSnapshotName", () -> lastSeenSnpFut.name, String.class,
+            "The name of last started cluster snapshot operation.");
+        mreg.register("LastSnapshotErrorMessage",
+            () -> lastSeenSnpFut.error() == null ? null : lastSeenSnpFut.error().getMessage(),
+            String.class,
+            "The error message of last started cluster snapshot operation which fail. This value will be 'null' " +
+                "if last snapshot operation completed successfully.");
+        mreg.register("localSnapshotList", this::getSnapshots, List.class,
+            "The list of all known snapshots currently saved on the local node with respect to " +
+                "configured via IgniteConfiguration a snapshot path.");
+
+        storeFactory = storeMgr::getPageStoreFactory;
+
+        cctx.exchange().registerExchangeAwareComponent(this);
+        ctx.internalSubscriptionProcessor().registerMetastorageListener(this);
+
+        // Receive remote snapshots requests.
+        cctx.gridIO().addMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC, new GridMessageListener() {
+            @Override public void onMessage(UUID nodeId, Object msg, byte plc) {
+                if (!busyLock.enterBusy())
+                    return;
+
+                try {
+                    if (msg instanceof SnapshotRequestMessage) {
+                        SnapshotRequestMessage reqMsg0 = (SnapshotRequestMessage)msg;
+                        String snpName = reqMsg0.snapshotName();
+
+                        synchronized (this) {
+                            SnapshotFutureTask task = lastScheduledRemoteSnapshotTask(nodeId);
+
+                            if (task != null) {
+                                // Task will also be removed from local map due to the listener on future done.
+                                task.cancel();
+
+                                log.info("Snapshot request has been cancelled due to another request received " +
+                                    "[prevSnpResp=" + task + ", msg0=" + reqMsg0 + ']');
+                            }
+                        }
+
+                        SnapshotFutureTask task = registerSnapshotTask(snpName,
+                            nodeId,
+                            reqMsg0.parts(),
+                            remoteSnapshotSender(snpName, nodeId));
+
+                        task.listen(f -> {
+                            if (f.error() == null)
+                                return;
+
+                            U.error(log, "Failed to process request of creating a snapshot " +
+                                "[from=" + nodeId + ", msg=" + reqMsg0 + ']', f.error());
+
+                            try {
+                                cctx.gridIO().sendToCustomTopic(nodeId,
+                                    DFLT_INITIAL_SNAPSHOT_TOPIC,
+                                    new SnapshotResponseMessage(reqMsg0.snapshotName(), f.error().getMessage()),
+                                    SYSTEM_POOL);
+                            }
+                            catch (IgniteCheckedException ex0) {
+                                U.error(log, "Fail to send the response message with processing snapshot request " +
+                                    "error [request=" + reqMsg0 + ", nodeId=" + nodeId + ']', ex0);
+                            }
+                        });
+
+                        task.start();
+                    }
+                    else if (msg instanceof SnapshotResponseMessage) {
+                        SnapshotResponseMessage respMsg0 = (SnapshotResponseMessage)msg;
+
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.snpName.equals(respMsg0.snapshotName())) {
+                            if (log.isInfoEnabled()) {
+                                log.info("A stale snapshot response message has been received. Will be ignored " +
+                                    "[fromNodeId=" + nodeId + ", response=" + respMsg0 + ']');
+                            }
+
+                            return;
+                        }
+
+                        if (respMsg0.errorMessage() != null) {
+                            fut0.onDone(new IgniteCheckedException("Request cancelled. The snapshot operation stopped " +
+                                "on the remote node with an error: " + respMsg0.errorMessage()));
+                        }
+                    }
+                }
+                catch (Throwable e) {
+                    U.error(log, "Processing snapshot request from remote node fails with an error", e);
+
+                    cctx.kernalContext().failure().process(new FailureContext(FailureType.CRITICAL_ERROR, e));
+                }
+                finally {
+                    busyLock.leaveBusy();
+                }
+            }
+        });
+
+        cctx.gridEvents().addDiscoveryEventListener(discoLsnr = (evt, discoCache) -> {
+            if (!busyLock.enterBusy())
+                return;
+
+            try {
+                UUID leftNodeId = evt.eventNode().id();
+
+                if (evt.type() == EVT_DISCOVERY_CUSTOM_EVT) {
+                    DiscoveryCustomEvent evt0 = (DiscoveryCustomEvent)evt;
+
+                    if (evt0.customMessage() instanceof InitMessage) {
+                        InitMessage<?> msg = (InitMessage<?>)evt0.customMessage();
+
+                        // This happens when #takeSnapshot() method already invoked and distributed process
+                        // starts its action.
+                        if (msg.type() == START_SNAPSHOT.ordinal()) {
+                            assert clusterSnpRq != null ||
+                                !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()) : evt;
+
+                            DiscoveryCustomEvent customEvt = new DiscoveryCustomEvent();
+
+                            customEvt.node(evt0.node());
+                            customEvt.eventNode(evt0.eventNode());
+                            customEvt.affinityTopologyVersion(evt0.affinityTopologyVersion());
+                            customEvt.customMessage(new SnapshotStartDiscoveryMessage(discoCache, msg.processId()));
+
+                            // Handle new event inside discovery thread, so no guarantees will be violated.
+                            cctx.exchange().onDiscoveryEvent(customEvt, discoCache);
+                        }
+                    }
+                }
+                else if (evt.type() == EVT_NODE_LEFT || evt.type() == EVT_NODE_FAILED) {
+                    SnapshotOperationRequest snpRq = clusterSnpRq;
+
+                    for (SnapshotFutureTask sctx : locSnpTasks.values()) {
+                        if (sctx.sourceNodeId().equals(leftNodeId) ||
+                            (snpRq != null &&
+                                snpRq.snpName.equals(sctx.snapshotName()) &&
+                                snpRq.bltNodes.contains(leftNodeId))) {
+                            sctx.acceptException(new ClusterTopologyCheckedException("The node which requested snapshot " +
+                                "creation has left the grid"));
+                        }
+                    }
+
+                    RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                    if (snpTrFut != null && snpTrFut.rmtNodeId.equals(leftNodeId)) {
+                        snpTrFut.onDone(new ClusterTopologyCheckedException("The node from which a snapshot has been " +
+                            "requested left the grid"));
+                    }
+                }
+            }
+            finally {
+                busyLock.leaveBusy();
+            }
+        }, EVT_NODE_LEFT, EVT_NODE_FAILED, EVT_DISCOVERY_CUSTOM_EVT);
+
+        // Remote snapshot handler.
+        cctx.kernalContext().io().addTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC, new TransmissionHandler() {
+            @Override public void onEnd(UUID nodeId) {
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                assert snpTrFut.stores.isEmpty() : snpTrFut.stores.entrySet();
+                assert snpTrFut.partsLeft == 0 : snpTrFut;
+
+                snpTrFut.onDone();
+
+                log.info("Requested snapshot from remote node has been fully received " +
+                    "[snpName=" + snpTrFut.snpName + ", snpTrans=" + snpTrFut + ']');
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onException(UUID nodeId, Throwable err) {
+                RemoteSnapshotFuture fut = rmtSnpReq.get();
+
+                if (fut == null)
+                    return;
+
+                if (fut.rmtNodeId.equals(nodeId))
+                    fut.onDone(err);
+            }
+
+            /** {@inheritDoc} */
+            @Override public String filePath(UUID nodeId, TransmissionMeta fileMeta) {
+                Integer partId = (Integer)fileMeta.params().get(SNP_PART_ID_PARAM);
+                String rmtDbNodePath = (String)fileMeta.params().get(SNP_DB_NODE_PATH_PARAM);
+                String cacheDirName = (String)fileMeta.params().get(SNP_CACHE_DIR_NAME_PARAM);
+
+                RemoteSnapshotFuture transFut = resolve(nodeId, fileMeta);
+
+                try {
+                    File cacheDir = U.resolveWorkDirectory(tmpWorkDir.getAbsolutePath(),
+                        Paths.get(transFut.snpName, rmtDbNodePath, cacheDirName).toString(),
+                        false);
+
+                    return new File(cacheDir, getPartitionFileName(partId)).getAbsolutePath();
+                }
+                catch (IgniteCheckedException e) {
+                    throw new IgniteException(e);
+                }
+            }
+
+            /**
+             * @param nodeId Remote node id.
+             * @param meta Transmission meta.
+             * @return Resolved transmission future.
+             */
+            private RemoteSnapshotFuture resolve(UUID nodeId, TransmissionMeta meta) {
+                String snpName = (String)meta.params().get(SNP_NAME_PARAM);
+                Integer partsCnt = (Integer)meta.params().get(SNP_PARTITIONS_CNT);
+
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                if (snpTrFut == null || !snpTrFut.snpName.equals(snpName)) {
+                    throw new TransmissionCancelledException("Stale snapshot transmission will be ignored " +
+                        "[snpName=" + snpName + ", meta=" + meta + ", snpTrFut=" + snpTrFut + ']');
+                }
+
+                assert snpTrFut.snpName.equals(snpName) && snpTrFut.rmtNodeId.equals(nodeId) :
+                    "Another transmission in progress [snpTrFut=" + snpTrFut + ", nodeId=" + snpName + ']';
+
+                if (snpTrFut.partsLeft == -1)
+                    snpTrFut.partsLeft = partsCnt;
+
+                return snpTrFut;
+            }
+
+            /**
+             * @param snpTrans Current snapshot transmission.
+             * @param grpPartId Pair of group id and its partition id.
+             */
+            private void finishRecover(RemoteSnapshotFuture snpTrans, GroupPartitionId grpPartId) {
+                FilePageStore pageStore = null;
+
+                try {
+                    pageStore = snpTrans.stores.remove(grpPartId);
+
+                    pageStore.finishRecover();
+
+                    snpTrans.partConsumer.accept(new File(pageStore.getFileAbsolutePath()), grpPartId);
+
+                    snpTrans.partsLeft--;
+                }
+                catch (StorageException e) {
+                    throw new IgniteException(e);
+                }
+                finally {
+                    U.closeQuiet(pageStore);
+                }
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<ByteBuffer> chunkHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+
+                RemoteSnapshotFuture snpTrFut = resolve(nodeId, initMeta);
+
+                GroupPartitionId grpPartId = new GroupPartitionId(grpId, partId);
+                FilePageStore pageStore = snpTrFut.stores.get(grpPartId);
+
+                if (pageStore == null) {
+                    throw new IgniteException("Partition must be loaded before applying snapshot delta pages " +
+                        "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                pageStore.beginRecover();
+
+                // No snapshot delta pages received. Finalize recovery.
+                if (initMeta.count() == 0)
+                    finishRecover(snpTrFut, grpPartId);
+
+                return new Consumer<ByteBuffer>() {
+                    final LongAdder transferred = new LongAdder();
+
+                    @Override public void accept(ByteBuffer buff) {
+                        try {
+                            assert initMeta.count() != 0 : initMeta;
+
+                            RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                            if (fut0 == null || !fut0.equals(snpTrFut) || fut0.isCancelled()) {
+                                throw new TransmissionCancelledException("Snapshot request is cancelled " +
+                                    "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                            }
+
+                            pageStore.write(PageIO.getPageId(buff), buff, 0, false);
+
+                            transferred.add(buff.capacity());
+
+                            if (transferred.longValue() == initMeta.count())
+                                finishRecover(snpTrFut, grpPartId);
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                    }
+                };
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<File> fileHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+                String snpName = (String)initMeta.params().get(SNP_NAME_PARAM);
+
+                assert grpId != null;
+                assert partId != null;
+                assert snpName != null;
+                assert storeFactory != null;
+
+                RemoteSnapshotFuture transFut = rmtSnpReq.get();
+
+                if (transFut == null) {
+                    throw new IgniteException("Snapshot transmission with given name doesn't exists " +
+                        "[snpName=" + snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                return new Consumer<File>() {
+                    @Override public void accept(File file) {
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.equals(transFut) || fut0.isCancelled()) {
+                            throw new TransmissionCancelledException("Snapshot request is cancelled [snpName=" + snpName +
+                                ", grpId=" + grpId + ", partId=" + partId + ']');
+                        }
+
+                        busyLock.enterBusy();
+
+                        try {
+                            FilePageStore pageStore = (FilePageStore)storeFactory
+                                .apply(grpId, false)
+                                .createPageStore(getFlagByPartId(partId),
+                                    file::toPath,
+                                    new LongAdderMetric("NO_OP", null));
+
+                            transFut.stores.put(new GroupPartitionId(grpId, partId), pageStore);
+
+                            pageStore.init();
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                        finally {
+                            busyLock.leaveBusy();
+                        }
+                    }
+                };
+            }
+        });
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void stop0(boolean cancel) {
+        busyLock.block();
+
+        try {
+            // Try stop all snapshot processing if not yet.
+            for (SnapshotFutureTask sctx : locSnpTasks.values())
+                sctx.acceptException(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+            locSnpTasks.clear();
+
+            RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+            if (snpTrFut != null)
+                snpTrFut.cancel();
+
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null) {
+                    clusterSnpFut.onDone(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+                    clusterSnpFut = null;
+                }
+            }
+
+            if (snpRunner != null)
+                snpRunner.shutdownNow();
+
+            cctx.kernalContext().io().removeMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC);
+            cctx.kernalContext().io().removeTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC);
+
+            if (discoLsnr != null)
+                cctx.kernalContext().event().removeDiscoveryEventListener(discoLsnr);
+
+            cctx.exchange().unregisterExchangeAwareComponent(this);
+        }
+        finally {
+            busyLock.unblock();
+        }
+    }
+
+    /**
+     * @param snpDir Snapshot dire
+     * @param folderName Local node folder name (see {@link U#maskForFileName} with consistent id).
+     */
+    public static void deleteSnapshot(File snpDir, String folderName) {
+        if (!snpDir.exists())
+            return;
+
+        assert snpDir.isDirectory() : snpDir;
+
+        try {
+            File binDir = resolveBinaryWorkDir(snpDir.getAbsolutePath(), folderName);
+            File dbDir = U.resolveWorkDirectory(snpDir.getAbsolutePath(), databaseRelativePath(folderName), false);
+
+            U.delete(binDir);
+            U.delete(dbDir);
+
+            File marshDir = mappingFileStoreWorkDir(snpDir.getAbsolutePath());
+
+            // Concurrently traverse the snapshot marshaller directory and delete all files.
+            Files.walkFileTree(marshDir.toPath(), new SimpleFileVisitor<Path>() {
+                @Override public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException {
+                    U.delete(file);
+
+                    return FileVisitResult.CONTINUE;
+                }
+
+                @Override public FileVisitResult visitFileFailed(Path file, IOException exc) {
+                    // Skip files which can be concurrently removed from FileTree.
+                    return FileVisitResult.CONTINUE;
+                }
+            });
+
+            File db = new File(snpDir, DB_DEFAULT_FOLDER);
+
+            if (!db.exists() || db.list().length == 0)
+                U.delete(snpDir);
+        }
+        catch (IOException | IgniteCheckedException e) {
+            throw new IgniteException(e);
+        }
+    }
+
+    /**
+     * @param snpName Snapshot name.
+     * @return Local snapshot directory for snapshot with given name.
+     */
+    public File snapshotLocalDir(String snpName) {
+        assert locSnpDir != null;
+
+        return new File(locSnpDir, snpName);
+    }
+
+    /**
+     * @return Node snapshot working directory.
+     */
+    public File snapshotTmpDir() {
+        assert tmpWorkDir != null;
+
+        return tmpWorkDir;
+    }
+
+    /**
+     * @param req Request on snapshot creation.
+     * @return Future which will be completed when a snapshot has been started.
+     */
+    private IgniteInternalFuture<SnapshotOperationResponse> initLocalSnapshotStartStage(SnapshotOperationRequest req) {
+        if (cctx.kernalContext().clientNode() ||
+            !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()))
+            return new GridFinishedFuture<>();
+
+        // Executed inside discovery notifier thread, prior to firing discovery custom event,
+        // so it is safe to set new snapshot task inside this method without synchronization.
+        if (clusterSnpRq != null) {
+            return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot operation has been rejected. " +
+                "Another snapshot operation in progress [req=" + req + ", curr=" + clusterSnpRq + ']'));
+        }
+
+        // Collection of pairs group and appropriate cache partition to be snapshot.
+        Map<Integer, Set<Integer>> parts = new HashMap<>();
+
+        for (Integer grpId : req.grpIds)
+            parts.put(grpId, null);
+
+        SnapshotFutureTask task0 = registerSnapshotTask(req.snpName,
+            req.srcNodeId,
+            parts,
+            locSndrFactory.apply(req.snpName));
+
+        clusterSnpRq = req;
+
+        return task0.chain(f -> new SnapshotOperationResponse());
+    }
+
+    /**
+     * @param id Request id.
+     * @param res Results.
+     * @param err Errors.
+     */
+    private void processLocalSnapshotStartStageResult(UUID id, Map<UUID, SnapshotOperationResponse> res, Map<UUID, Exception> err) {
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        if (snpRq == null)
+            return;
+
+        if (!snpRq.rqId.equals(id)) {
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null && clusterSnpFut.rqId.equals(id)) {
+                    clusterSnpFut.onDone(new IgniteCheckedException("Snapshot operation failed due to another snapshot " +
+                        "operation in progress: " + snpRq.snpName));
+
+                    clusterSnpFut = null;
+                }
+
+                return;
+            }
+        }
+
+        if (isLocalNodeCoordinator(cctx.discovery())) {
+            Set<UUID> missed = new HashSet<>(snpRq.bltNodes);
+            missed.removeAll(res.keySet());
+            missed.removeAll(err.keySet());
+
+            snpRq.hasErr = !F.isEmpty(err) || !missed.isEmpty();
+
+            if (snpRq.hasErr) {
+                U.warn(log, "Execution of local snapshot tasks fails or them haven't been executed " +
+                    "due to some of nodes left the cluster. Uncompleted snapshot will be deleted " +
+                    "[err=" + err + ", missed=" + missed + ']');
+            }
+
+            endSnpProc.start(UUID.randomUUID(), snpRq);
+        }
+    }
+
+    /**
+     * @param req Request on snapshot creation.
+     * @return Future which will be completed when the snapshot will be finalized.
+     */
+    private IgniteInternalFuture<SnapshotOperationResponse> initLocalSnapshotEndStage(SnapshotOperationRequest req) {
+        if (clusterSnpRq == null)
+            return new GridFinishedFuture<>(new SnapshotOperationResponse());
+
+        try {
+            if (req.hasErr)
+                deleteSnapshot(snapshotLocalDir(req.snpName), pdsSettings.folderName());
+
+            removeLastMetaStorageKey();
+        }
+        catch (Exception e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        return new GridFinishedFuture<>(new SnapshotOperationResponse());
+    }
+
+    /**
+     * @param id Request id.
+     * @param res Results.
+     * @param err Errors.
+     */
+    private void processLocalSnapshotEndStageResult(UUID id, Map<UUID, SnapshotOperationResponse> res, Map<UUID, Exception> err) {
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        if (snpRq == null)
+            return;
+
+        Set<UUID> endFail = new HashSet<>(snpRq.bltNodes);
+        endFail.removeAll(res.keySet());
+
+        clusterSnpRq = null;
+
+        synchronized (snpOpMux) {
+            if (clusterSnpFut != null) {
+                if (endFail.isEmpty() && !snpRq.hasErr) {
+                    clusterSnpFut.onDone();
+
+                    if (log.isInfoEnabled())
+                        log.info("Cluster-wide snapshot operation finished successfully [req=" + snpRq + ']');
+                }
+                else {
+                    clusterSnpFut.onDone(new IgniteCheckedException("Snapshot creation has been finished with an error. " +
+                        "Local snapshot tasks may not finished completely or finalizing results fails " +
+                        "[hasErr" + snpRq.hasErr + ", fail=" + endFail + ']'));
+                }
+
+                clusterSnpFut = null;
+            }
+        }
+    }
+
+    /**
+     * @return {@code True} if snapshot operation is in progress.
+     */
+    public boolean isSnapshotCreating() {
+        if (clusterSnpRq != null)
+            return true;
+
+        synchronized (snpOpMux) {
+            return clusterSnpRq != null || clusterSnpFut != null;
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public List<String> getSnapshots() {
+        if (cctx.kernalContext().clientNode())
+            throw new UnsupportedOperationException("Client and daemon nodes can not perform this operation.");
+
+        synchronized (snpOpMux) {
+            return Arrays.stream(locSnpDir.listFiles(File::isDirectory))
+                .map(File::getName)
+                .collect(Collectors.toList());
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public IgniteFuture<Void> createSnapshot(String name) {
+        A.notNullOrEmpty(name, "name");
+
+        try {
+            if (cctx.kernalContext().clientNode())
+                throw new UnsupportedOperationException("Client and daemon nodes can not perform this operation.");
+
+            if (!IgniteFeatures.allNodesSupports(cctx.discovery().allNodes(), PERSISTENCE_CACHE_SNAPSHOT))
+                throw new IgniteException("Not all nodes in the cluster support a snapshot operation.");
+
+            if (!active(cctx.kernalContext().state().clusterState().state()))
+                throw new IgniteException("Snapshot operation has been rejected. The cluster is inactive.");
+
+            DiscoveryDataClusterState clusterState = cctx.kernalContext().state().clusterState();
+
+            if (!clusterState.hasBaselineTopology())
+                throw new IgniteException("Snapshot operation has been rejected. The baseline topology is not configured for cluster.");
+
+            ClusterSnapshotFuture snpFut0;
+
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null && !clusterSnpFut.isDone())
+                    throw new IgniteException("Create snapshot request has been rejected. The previous snapshot operation was not completed.");
+
+                if (clusterSnpRq != null)
+                    throw new IgniteException("Create snapshot request has been rejected. Parallel snapshot processes are not allowed.");
+
+                if (getSnapshots().contains(name))
+                    throw new IgniteException("Create snapshot request has been rejected. Snapshot with given name already exists.");
+
+                snpFut0 = new ClusterSnapshotFuture(UUID.randomUUID(), name);
+
+                clusterSnpFut = snpFut0;
+                lastSeenSnpFut = snpFut0;
+            }
+
+            List<Integer> grps = cctx.cache().persistentGroups().stream()
+                .filter(g -> cctx.cache().cacheType(g.cacheOrGroupName()) == CacheType.USER)
+                .filter(g -> !g.config().isEncryptionEnabled())
+                .map(CacheGroupDescriptor::groupId)
+                .collect(Collectors.toList());
+
+            List<ClusterNode> srvNodes = cctx.discovery().serverNodes(AffinityTopologyVersion.NONE);
+
+            startSnpProc.start(snpFut0.rqId, new SnapshotOperationRequest(snpFut0.rqId,
+                cctx.localNodeId(),
+                name,
+                grps,
+                new HashSet<>(F.viewReadOnly(srvNodes,
+                    F.node2id(),
+                    (node) -> CU.baselineNode(node, clusterState)))));
+
+            if (log.isInfoEnabled())
+                log.info("Cluster-wide snapshot operation started [snpName=" + name + ", grps=" + grps + ']');
+
+            return new IgniteFutureImpl<>(snpFut0);
+        }
+        catch (Exception e) {
+            U.error(log, "Start snapshot operation failed", e);
+
+            lastSeenSnpFut = new ClusterSnapshotFuture(name, e);
+
+            return new IgniteFinishedFutureImpl<>(e);
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onReadyForReadWrite(ReadWriteMetastorage metaStorage) throws IgniteCheckedException {
+        synchronized (snpOpMux) {
+            this.metaStorage = metaStorage;
+
+            if (recovered)
+                removeLastMetaStorageKey();
+
+            recovered = false;
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onReadyForRead(ReadOnlyMetastorage metaStorage) throws IgniteCheckedException {
+        // Snapshot which has not been completed due to the local node crashed must be deleted.
+        String snpName = (String)metaStorage.read(SNP_RUNNING_KEY);
+
+        if (snpName == null)
+            return;
+
+        recovered = true;
+
+        for (File tmp : snapshotTmpDir().listFiles())
+            U.delete(tmp);
+
+        deleteSnapshot(snapshotLocalDir(snpName), pdsSettings.folderName());
+
+        if (log.isInfoEnabled()) {
+            log.info("Previous attempt to create snapshot fail due to the local node crash. All resources " +
+                "related to snapshot operation have been deleted: " + snpName);
+        }
+    }
+
+    /**
+     * @param evt Discovery event to check.
+     * @return {@code true} if exchange started by snapshot operation.
+     */
+    public static boolean isSnapshotOperation(DiscoveryEvent evt) {
+        return !evt.eventNode().isClient() &&
+            evt.type() == EVT_DISCOVERY_CUSTOM_EVT &&
+            ((DiscoveryCustomEvent)evt).customMessage() instanceof SnapshotStartDiscoveryMessage;
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onDoneBeforeTopologyUnlock(GridDhtPartitionsExchangeFuture fut) {
+        if (clusterSnpRq == null || cctx.kernalContext().clientNode())
+            return;
+
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        SnapshotFutureTask task = locSnpTasks.get(snpRq.snpName);
+
+        if (task == null)
+            return;
+
+        if (task.start()) {
+            cctx.database().forceCheckpoint(String.format("Start snapshot operation: %s", snpRq.snpName));
+
+            // schedule task on checkpoint and wait when it starts
 
 Review comment:
   Fixed

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r408784676
 
 

 ##########
 File path: modules/core/src/test/java/org/apache/ignite/platform/PlatformDeployServiceTask.java
 ##########
 @@ -18,6 +18,11 @@
 package org.apache.ignite.platform;
 
 import java.sql.Timestamp;
+import java.util.ArrayList;
 
 Review comment:
   No change, only imports

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r408327958
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManager.java
 ##########
 @@ -0,0 +1,1906 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.channels.FileChannel;
+import java.nio.file.FileVisitResult;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.SimpleFileVisitor;
+import java.nio.file.attribute.BasicFileAttributes;
+import java.util.ArrayDeque;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Queue;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.Executor;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.concurrent.atomic.LongAdder;
+import java.util.function.BiConsumer;
+import java.util.function.BiFunction;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.events.DiscoveryEvent;
+import org.apache.ignite.failure.FailureContext;
+import org.apache.ignite.failure.FailureType;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.GridTopic;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.events.DiscoveryCustomEvent;
+import org.apache.ignite.internal.managers.communication.GridIoManager;
+import org.apache.ignite.internal.managers.communication.GridMessageListener;
+import org.apache.ignite.internal.managers.communication.TransmissionCancelledException;
+import org.apache.ignite.internal.managers.communication.TransmissionHandler;
+import org.apache.ignite.internal.managers.communication.TransmissionMeta;
+import org.apache.ignite.internal.managers.communication.TransmissionPolicy;
+import org.apache.ignite.internal.managers.discovery.DiscoCache;
+import org.apache.ignite.internal.managers.discovery.DiscoveryCustomMessage;
+import org.apache.ignite.internal.managers.discovery.GridDiscoveryManager;
+import org.apache.ignite.internal.managers.eventstorage.DiscoveryEventListener;
+import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.CacheType;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.PartitionsExchangeAware;
+import org.apache.ignite.internal.processors.cache.persistence.StorageException;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.filename.PdsFolderSettings;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.MetastorageLifecycleListener;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadOnlyMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadWriteMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.tree.io.PageIO;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.processors.marshaller.MappedName;
+import org.apache.ignite.internal.processors.metric.impl.LongAdderMetric;
+import org.apache.ignite.internal.util.GridBusyLock;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.distributed.InitMessage;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.tostring.GridToStringInclude;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.S;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.lang.IgniteUuid;
+import org.apache.ignite.thread.IgniteThreadPoolExecutor;
+import org.apache.ignite.thread.OomExceptionHandler;
+import org.jetbrains.annotations.Nullable;
+
+import static java.nio.file.StandardOpenOption.READ;
+import static org.apache.ignite.cluster.ClusterState.active;
+import static org.apache.ignite.configuration.IgniteConfiguration.DFLT_SNAPSHOT_DIRECTORY;
+import static org.apache.ignite.events.EventType.EVT_NODE_FAILED;
+import static org.apache.ignite.events.EventType.EVT_NODE_LEFT;
+import static org.apache.ignite.internal.IgniteFeatures.PERSISTENCE_CACHE_SNAPSHOT;
+import static org.apache.ignite.internal.IgniteFeatures.nodeSupports;
+import static org.apache.ignite.internal.MarshallerContextImpl.saveMappings;
+import static org.apache.ignite.internal.events.DiscoveryCustomEvent.EVT_DISCOVERY_CUSTOM_EVT;
+import static org.apache.ignite.internal.managers.communication.GridIoPolicy.SYSTEM_POOL;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.INDEX_PARTITION;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.MAX_PARTITION_ID;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.INDEX_FILE_NAME;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.PART_FILE_TEMPLATE;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFile;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFileName;
+import static org.apache.ignite.internal.processors.cache.persistence.filename.PdsConsistentIdProcessor.DB_DEFAULT_FOLDER;
+import static org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId.getFlagByPartId;
+import static org.apache.ignite.internal.util.IgniteUtils.isLocalNodeCoordinator;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.END_SNAPSHOT;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.START_SNAPSHOT;
+
+/**
+ * Internal implementation of snapshot operations over persistence caches.
+ * <p>
+ * There are two major actions available:
+ * <ul>
+ *     <li>Create snapshot of the whole cluster cache groups by triggering PME to achieve consistency.</li>
+ *     <li>Create local snapshot of requested cache groups and send it to the node which request this operation.
+ *     Cache groups will be transmitted using internal API for transferring files. See {@link TransmissionHandler}.</li>
+ * </ul>
+ */
+public class IgniteSnapshotManager extends GridCacheSharedManagerAdapter
+    implements IgniteSnapshot, PartitionsExchangeAware, MetastorageLifecycleListener {
+    /** File with delta pages suffix. */
+    public static final String DELTA_SUFFIX = ".delta";
+
+    /** File name template consists of delta pages. */
+    public static final String PART_DELTA_TEMPLATE = PART_FILE_TEMPLATE + DELTA_SUFFIX;
+
+    /** File name template for index delta pages. */
+    public static final String INDEX_DELTA_NAME = INDEX_FILE_NAME + DELTA_SUFFIX;
+
+    /** Text Reason for checkpoint to start snapshot operation. */
+    public static final String CP_SNAPSHOT_REASON = "Checkpoint started to enforce snapshot operation: %s";
+
+    /** Name prefix for each remote snapshot operation. */
+    public static final String RMT_SNAPSHOT_PREFIX = "snapshot_";
+
+    /** Default snapshot directory for loading remote snapshots. */
+    public static final String DFLT_SNAPSHOT_TMP_DIR = "snp";
+
+    /** Timeout in millisecond for snapshot operations. */
+    public static final long DFLT_SNAPSHOT_TIMEOUT = 15_000L;
+
+    /** Snapshot in progress error message. */
+    public static final String SNP_IN_PROGRESS_ERR_MSG = "Operation rejected due to the snapshot operation in progress.";
+
+    /** Error message to finalize snapshot tasks. */
+    public static final String SNP_NODE_STOPPING_ERR_MSG = "Snapshot has been cancelled due to the local node " +
+        "is stopping";
+
+    /** Metastorage key to save currently running snapshot. */
+    public static final String SNP_RUNNING_KEY = "snapshot-running";
+
+    /** Prefix for snapshot threads. */
+    private static final String SNAPSHOT_RUNNER_THREAD_PREFIX = "snapshot-runner";
+
+    /** Total number of thread to perform local snapshot. */
+    private static final int SNAPSHOT_THREAD_POOL_SIZE = 4;
+
+    /** Default snapshot topic to receive snapshots from remote node. */
+    private static final Object DFLT_INITIAL_SNAPSHOT_TOPIC = GridTopic.TOPIC_SNAPSHOT.topic("rmt_snp");
+
+    /** File transmission parameter of cache group id. */
+    private static final String SNP_GRP_ID_PARAM = "grpId";
+
+    /** File transmission parameter of cache partition id. */
+    private static final String SNP_PART_ID_PARAM = "partId";
+
+    /** File transmission parameter of node-sender directory path with its consistentId (e.g. db/IgniteNode0). */
+    private static final String SNP_DB_NODE_PATH_PARAM = "dbNodePath";
+
+    /** File transmission parameter of a cache directory with is currently sends its partitions. */
+    private static final String SNP_CACHE_DIR_NAME_PARAM = "cacheDirName";
+
+    /** Snapshot parameter name for a file transmission. */
+    private static final String SNP_NAME_PARAM = "snpName";
+
+    /** Total snapshot files count which receiver should expect to receive. */
+    private static final String SNP_PARTITIONS_CNT = "partsCnt";
+
+    /**
+     * Local buffer to perform copy-on-write operations with pages for {@code SnapshotFutureTask.PageStoreSerialWriter}s.
+     * It is important to have only only buffer per thread (instead of creating each buffer per
+     * each {@code SnapshotFutureTask.PageStoreSerialWriter}) this is redundant and can lead to OOM errors. Direct buffer
+     * deallocate only when ByteBuffer is garbage collected, but it can get out of off-heap memory before it.
+     */
+    private final ThreadLocal<ByteBuffer> locBuff;
+
+    /** Map of registered cache snapshot processes and their corresponding contexts. */
+    private final ConcurrentMap<String, SnapshotFutureTask> locSnpTasks = new ConcurrentHashMap<>();
+
+    /** Lock to protect the resources is used. */
+    private final GridBusyLock busyLock = new GridBusyLock();
+
+    /** Requested snapshot from remote node. */
+    private final AtomicReference<RemoteSnapshotFuture> rmtSnpReq = new AtomicReference<>();
+
+    /** Mutex used to order cluster snapshot operation progress. */
+    private final Object snpOpMux = new Object();
+
+    /** Take snapshot operation procedure. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> startSnpProc;
+
+    /** Check previously performed snapshot operation and delete uncompleted files if need. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> endSnpProc;
+
+    /** Resolved persistent data storage settings. */
+    private volatile PdsFolderSettings pdsSettings;
+
+    /** Fully initialized metastorage. */
+    private volatile ReadWriteMetastorage metaStorage;
+
+    /** Local snapshot sender factory. */
+    private Function<String, SnapshotSender> locSndrFactory = localSnapshotSenderFactory();
+
+    /** Main snapshot directory to save created snapshots. */
+    private volatile File locSnpDir;
+
+    /**
+     * Working directory for loaded snapshots from the remote nodes and storing
+     * temporary partition delta-files of locally started snapshot process.
+     */
+    private File tmpWorkDir;
+
+    /** Factory to working with delta as file storage. */
+    private volatile FileIOFactory ioFactory = new RandomAccessFileIOFactory();
+
+    /** Factory to create page store for restore. */
+    private volatile BiFunction<Integer, Boolean, FilePageStoreFactory> storeFactory;
+
+    /** Snapshot thread pool to perform local partition snapshots. */
+    private ExecutorService snpRunner;
+
+    /** System discovery message listener. */
+    private DiscoveryEventListener discoLsnr;
+
+    /** Cluster snapshot operation requested by user. */
+    private GridFutureAdapter<Void> clusterSnpFut;
+
+    /** Current snapshot operation on local node. */
+    private volatile SnapshotOperationRequest clusterSnpRq;
+
+    /** {@code true} if recovery process occurred for snapshot. */
+    private volatile boolean recovered;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public IgniteSnapshotManager(GridKernalContext ctx) {
+        locBuff = ThreadLocal.withInitial(() ->
+            ByteBuffer.allocateDirect(ctx.config().getDataStorageConfiguration().getPageSize())
+                .order(ByteOrder.nativeOrder()));
+
+        startSnpProc = new DistributedProcess<>(ctx, START_SNAPSHOT, this::initLocalSnapshotStartStage,
+            this::processLocalSnapshotStartStageResult);
+
+        endSnpProc = new DistributedProcess<>(ctx, END_SNAPSHOT, this::initLocalSnapshotEndStage,
+            this::processLocalSnapshotEndStageResult);
+    }
+
+    /**
+     * @param snapshotCacheDir Snapshot directory to store files.
+     * @param partId Cache partition identifier.
+     * @return A file representation.
+     */
+    public static File partDeltaFile(File snapshotCacheDir, int partId) {
+        return new File(snapshotCacheDir, partDeltaFileName(partId));
+    }
+
+    /**
+     * @param partId Partition id.
+     * @return File name of delta partition pages.
+     */
+    public static String partDeltaFileName(int partId) {
+        assert partId <= MAX_PARTITION_ID || partId == INDEX_PARTITION;
+
+        return partId == INDEX_PARTITION ? INDEX_DELTA_NAME : String.format(PART_DELTA_TEMPLATE, partId);
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void start0() throws IgniteCheckedException {
+        super.start0();
+
+        GridKernalContext ctx = cctx.kernalContext();
+
+        if (ctx.clientNode())
+            return;
+
+        if (!CU.isPersistenceEnabled(ctx.config()))
+            return;
+
+        snpRunner = new IgniteThreadPoolExecutor(SNAPSHOT_RUNNER_THREAD_PREFIX,
+            cctx.igniteInstanceName(),
+            SNAPSHOT_THREAD_POOL_SIZE,
+            SNAPSHOT_THREAD_POOL_SIZE,
+            IgniteConfiguration.DFLT_THREAD_KEEP_ALIVE_TIME,
+            new LinkedBlockingQueue<>(),
+            SYSTEM_POOL,
+            new OomExceptionHandler(ctx));
+
+        assert cctx.pageStore() instanceof FilePageStoreManager;
+
+        FilePageStoreManager storeMgr = (FilePageStoreManager)cctx.pageStore();
+
+        pdsSettings = cctx.kernalContext().pdsFolderResolver().resolveFolders();
+
+        locSnpDir = resolveSnapshotWorkDirectory(ctx.config());
+        tmpWorkDir = Paths.get(storeMgr.workDir().getAbsolutePath(), DFLT_SNAPSHOT_TMP_DIR).toFile();
+
+        U.ensureDirectory(locSnpDir, "snapshot work directory", log);
+        U.ensureDirectory(tmpWorkDir, "temp directory for snapshot creation", log);
+
+        storeFactory = storeMgr::getPageStoreFactory;
+
+        cctx.exchange().registerExchangeAwareComponent(this);
+        ctx.internalSubscriptionProcessor().registerMetastorageListener(this);
+
+        // Receive remote snapshots requests.
+        cctx.gridIO().addMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC, new GridMessageListener() {
+            @Override public void onMessage(UUID nodeId, Object msg, byte plc) {
+                if (!busyLock.enterBusy())
+                    return;
+
+                try {
+                    if (msg instanceof SnapshotRequestMessage) {
+                        SnapshotRequestMessage reqMsg0 = (SnapshotRequestMessage)msg;
+                        String snpName = reqMsg0.snapshotName();
+
+                        synchronized (this) {
+                            SnapshotFutureTask task = lastScheduledRemoteSnapshotTask(nodeId);
+
+                            if (task != null) {
+                                // Task will also be removed from local map due to the listener on future done.
+                                task.cancel();
+
+                                log.info("Snapshot request has been cancelled due to another request received " +
+                                    "[prevSnpResp=" + task + ", msg0=" + reqMsg0 + ']');
+                            }
+                        }
+
+                        SnapshotFutureTask task = registerSnapshotTask(snpName,
+                            nodeId,
+                            reqMsg0.parts(),
+                            remoteSnapshotSender(snpName, nodeId));
+
+                        task.listen(f -> {
+                            if (f.error() == null)
+                                return;
+
+                            U.error(log, "Failed to process request of creating a snapshot " +
+                                "[from=" + nodeId + ", msg=" + reqMsg0 + ']', f.error());
+
+                            try {
+                                cctx.gridIO().sendToCustomTopic(nodeId,
+                                    DFLT_INITIAL_SNAPSHOT_TOPIC,
+                                    new SnapshotResponseMessage(reqMsg0.snapshotName(), f.error().getMessage()),
+                                    SYSTEM_POOL);
+                            }
+                            catch (IgniteCheckedException ex0) {
+                                U.error(log, "Fail to send the response message with processing snapshot request " +
+                                    "error [request=" + reqMsg0 + ", nodeId=" + nodeId + ']', ex0);
+                            }
+                        });
+
+                        task.start();
+                    }
+                    else if (msg instanceof SnapshotResponseMessage) {
+                        SnapshotResponseMessage respMsg0 = (SnapshotResponseMessage)msg;
+
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.snpName.equals(respMsg0.snapshotName())) {
+                            if (log.isInfoEnabled()) {
+                                log.info("A stale snapshot response message has been received. Will be ignored " +
+                                    "[fromNodeId=" + nodeId + ", response=" + respMsg0 + ']');
+                            }
+
+                            return;
+                        }
+
+                        if (respMsg0.errorMessage() != null) {
+                            fut0.onDone(new IgniteCheckedException("Request cancelled. The snapshot operation stopped " +
+                                "on the remote node with an error: " + respMsg0.errorMessage()));
+                        }
+                    }
+                }
+                catch (Throwable e) {
+                    U.error(log, "Processing snapshot request from remote node fails with an error", e);
+
+                    cctx.kernalContext().failure().process(new FailureContext(FailureType.CRITICAL_ERROR, e));
+                }
+                finally {
+                    busyLock.leaveBusy();
+                }
+            }
+        });
+
+        cctx.gridEvents().addDiscoveryEventListener(discoLsnr = (evt, discoCache) -> {
+            if (!busyLock.enterBusy())
+                return;
+
+            try {
+                UUID leftNodeId = evt.eventNode().id();
+
+                if (evt.type() == EVT_DISCOVERY_CUSTOM_EVT) {
+                    DiscoveryCustomEvent evt0 = (DiscoveryCustomEvent)evt;
+
+                    if (evt0.customMessage() instanceof InitMessage) {
+                        InitMessage<?> msg = (InitMessage<?>)evt0.customMessage();
+
+                        // This happens when #takeSnapshot() method already invoked and distributed process
+                        // starts its action.
+                        if (msg.type() == START_SNAPSHOT.ordinal()) {
+                            assert clusterSnpRq != null ||
+                                !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()) : evt;
+
+                            DiscoveryCustomEvent customEvt = new DiscoveryCustomEvent();
+
+                            customEvt.node(evt0.node());
+                            customEvt.eventNode(evt0.eventNode());
+                            customEvt.affinityTopologyVersion(evt0.affinityTopologyVersion());
+                            customEvt.customMessage(new SnapshotStartDiscoveryMessage(discoCache, msg.processId()));
+
+                            // Handle new event inside discovery thread, so no guarantees will be violated.
+                            cctx.exchange().onDiscoveryEvent(customEvt, discoCache);
+                        }
+                    }
+                }
+                else if (evt.type() == EVT_NODE_LEFT || evt.type() == EVT_NODE_FAILED) {
+                    SnapshotOperationRequest snpRq = clusterSnpRq;
+
+                    for (SnapshotFutureTask sctx : locSnpTasks.values()) {
+                        if (sctx.sourceNodeId().equals(leftNodeId) ||
+                            (snpRq != null &&
+                                snpRq.snpName.equals(sctx.snapshotName()) &&
+                                snpRq.bltNodes.contains(leftNodeId))) {
+                            sctx.acceptException(new ClusterTopologyCheckedException("The node which requested snapshot " +
+                                "creation has left the grid"));
+                        }
+                    }
+
+                    RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                    if (snpTrFut != null && snpTrFut.rmtNodeId.equals(leftNodeId)) {
+                        snpTrFut.onDone(new ClusterTopologyCheckedException("The node from which a snapshot has been " +
+                            "requested left the grid"));
+                    }
+                }
+            }
+            finally {
+                busyLock.leaveBusy();
+            }
+        }, EVT_NODE_LEFT, EVT_NODE_FAILED, EVT_DISCOVERY_CUSTOM_EVT);
+
+        // Remote snapshot handler.
+        cctx.kernalContext().io().addTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC, new TransmissionHandler() {
+            @Override public void onEnd(UUID nodeId) {
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                assert snpTrFut.stores.isEmpty() : snpTrFut.stores.entrySet();
+                assert snpTrFut.partsLeft == 0 : snpTrFut;
+
+                snpTrFut.onDone();
+
+                log.info("Requested snapshot from remote node has been fully received " +
+                    "[snpName=" + snpTrFut.snpName + ", snpTrans=" + snpTrFut + ']');
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onException(UUID nodeId, Throwable err) {
+                RemoteSnapshotFuture fut = rmtSnpReq.get();
+
+                if (fut == null)
+                    return;
+
+                if (fut.rmtNodeId.equals(nodeId))
+                    fut.onDone(err);
+            }
+
+            /** {@inheritDoc} */
+            @Override public String filePath(UUID nodeId, TransmissionMeta fileMeta) {
+                Integer partId = (Integer)fileMeta.params().get(SNP_PART_ID_PARAM);
+                String rmtDbNodePath = (String)fileMeta.params().get(SNP_DB_NODE_PATH_PARAM);
+                String cacheDirName = (String)fileMeta.params().get(SNP_CACHE_DIR_NAME_PARAM);
+
+                RemoteSnapshotFuture transFut = resolve(nodeId, fileMeta);
+
+                try {
+                    File cacheDir = U.resolveWorkDirectory(tmpWorkDir.getAbsolutePath(),
+                        Paths.get(transFut.snpName, rmtDbNodePath, cacheDirName).toString(),
+                        false);
+
+                    return new File(cacheDir, getPartitionFileName(partId)).getAbsolutePath();
+                }
+                catch (IgniteCheckedException e) {
+                    throw new IgniteException(e);
+                }
+            }
+
+            /**
+             * @param nodeId Remote node id.
+             * @param meta Transmission meta.
+             * @return Resolved transmission future.
+             */
+            private RemoteSnapshotFuture resolve(UUID nodeId, TransmissionMeta meta) {
+                String snpName = (String)meta.params().get(SNP_NAME_PARAM);
+                Integer partsCnt = (Integer)meta.params().get(SNP_PARTITIONS_CNT);
+
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                if (snpTrFut == null || !snpTrFut.snpName.equals(snpName)) {
+                    throw new TransmissionCancelledException("Stale snapshot transmission will be ignored " +
+                        "[snpName=" + snpName + ", meta=" + meta + ", snpTrFut=" + snpTrFut + ']');
+                }
+
+                assert snpTrFut.snpName.equals(snpName) && snpTrFut.rmtNodeId.equals(nodeId) :
+                    "Another transmission in progress [snpTrFut=" + snpTrFut + ", nodeId=" + snpName + ']';
+
+                if (snpTrFut.partsLeft == -1)
+                    snpTrFut.partsLeft = partsCnt;
+
+                return snpTrFut;
+            }
+
+            /**
+             * @param snpTrans Current snapshot transmission.
+             * @param grpPartId Pair of group id and its partition id.
+             */
+            private void finishRecover(RemoteSnapshotFuture snpTrans, GroupPartitionId grpPartId) {
+                FilePageStore pageStore = null;
+
+                try {
+                    pageStore = snpTrans.stores.remove(grpPartId);
+
+                    pageStore.finishRecover();
+
+                    snpTrans.partConsumer.accept(new File(pageStore.getFileAbsolutePath()), grpPartId);
+
+                    snpTrans.partsLeft--;
+                }
+                catch (StorageException e) {
+                    throw new IgniteException(e);
+                }
+                finally {
+                    U.closeQuiet(pageStore);
+                }
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<ByteBuffer> chunkHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+
+                RemoteSnapshotFuture snpTrFut = resolve(nodeId, initMeta);
+
+                GroupPartitionId grpPartId = new GroupPartitionId(grpId, partId);
+                FilePageStore pageStore = snpTrFut.stores.get(grpPartId);
+
+                if (pageStore == null) {
+                    throw new IgniteException("Partition must be loaded before applying snapshot delta pages " +
+                        "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                pageStore.beginRecover();
+
+                // No snapshot delta pages received. Finalize recovery.
+                if (initMeta.count() == 0)
+                    finishRecover(snpTrFut, grpPartId);
+
+                return new Consumer<ByteBuffer>() {
+                    final LongAdder transferred = new LongAdder();
+
+                    @Override public void accept(ByteBuffer buff) {
+                        try {
+                            assert initMeta.count() != 0 : initMeta;
+
+                            RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                            if (fut0 == null || !fut0.equals(snpTrFut) || fut0.isCancelled()) {
+                                throw new TransmissionCancelledException("Snapshot request is cancelled " +
+                                    "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                            }
+
+                            pageStore.write(PageIO.getPageId(buff), buff, 0, false);
+
+                            transferred.add(buff.capacity());
+
+                            if (transferred.longValue() == initMeta.count())
+                                finishRecover(snpTrFut, grpPartId);
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                    }
+                };
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<File> fileHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+                String snpName = (String)initMeta.params().get(SNP_NAME_PARAM);
+
+                assert grpId != null;
+                assert partId != null;
+                assert snpName != null;
+                assert storeFactory != null;
+
+                RemoteSnapshotFuture transFut = rmtSnpReq.get();
+
+                if (transFut == null) {
+                    throw new IgniteException("Snapshot transmission with given name doesn't exists " +
+                        "[snpName=" + snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                return new Consumer<File>() {
+                    @Override public void accept(File file) {
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.equals(transFut) || fut0.isCancelled()) {
+                            throw new TransmissionCancelledException("Snapshot request is cancelled [snpName=" + snpName +
+                                ", grpId=" + grpId + ", partId=" + partId + ']');
+                        }
+
+                        busyLock.enterBusy();
+
+                        try {
+                            FilePageStore pageStore = (FilePageStore)storeFactory
+                                .apply(grpId, false)
+                                .createPageStore(getFlagByPartId(partId),
+                                    file::toPath,
+                                    new LongAdderMetric("NO_OP", null));
+
+                            transFut.stores.put(new GroupPartitionId(grpId, partId), pageStore);
+
+                            pageStore.init();
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                        finally {
+                            busyLock.leaveBusy();
+                        }
+                    }
+                };
+            }
+        });
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void stop0(boolean cancel) {
+        busyLock.block();
+
+        try {
+            // Try stop all snapshot processing if not yet.
+            for (SnapshotFutureTask sctx : locSnpTasks.values())
+                sctx.acceptException(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+            locSnpTasks.clear();
+
+            RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+            if (snpTrFut != null)
+                snpTrFut.cancel();
+
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null) {
+                    clusterSnpFut.onDone(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+                    clusterSnpFut = null;
+                }
+            }
+
+            if (snpRunner != null)
+                snpRunner.shutdownNow();
+
+            cctx.kernalContext().io().removeMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC);
+            cctx.kernalContext().io().removeTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC);
+
+            if (discoLsnr != null)
+                cctx.kernalContext().event().removeDiscoveryEventListener(discoLsnr);
+
+            cctx.exchange().unregisterExchangeAwareComponent(this);
+        }
+        finally {
+            busyLock.unblock();
+        }
+    }
+
+    /**
+     * Concurrently traverse the snapshot directory for given local node folder name and
+     * delete recursively all files from it if exist.
+     *
+     * @param snpDir Snapshot dire
+     * @param folderName Local node folder name (see U.maskForFileName with consistent id).
+     */
+    public static void deleteSnapshot(File snpDir, String folderName) {
+        if (!snpDir.exists())
+            return;
+
+        assert snpDir.isDirectory() : snpDir;
+
+        try {
+            List<Path> dirs = new ArrayList<>();
+
+            Files.walkFileTree(snpDir.toPath(), new SimpleFileVisitor<Path>() {
+                @Override public FileVisitResult preVisitDirectory(Path dir,
+                    BasicFileAttributes attrs) throws IOException {
+                    if (Files.isDirectory(dir) &&
+                        Files.exists(dir) &&
+                        folderName.equals(dir.getFileName().toString())) {
+                        // Directory found, add it for processing.
+                        dirs.add(dir);
+                    }
+
+                    return super.preVisitDirectory(dir, attrs);
+                }
+
+                @Override public FileVisitResult visitFileFailed(Path file, IOException exc) {
+                    // Skip files which can be concurrently removed from FileTree.
+                    return FileVisitResult.CONTINUE;
+                }
+            });
+
+            dirs.forEach(U::delete);
+
+            File db = new File(snpDir, DB_DEFAULT_FOLDER);
+
+            if (!db.exists() || db.list().length == 0)
+                U.delete(snpDir);
+        }
+        catch (IOException e) {
+            throw new IgniteException(e);
+        }
+    }
+
+    /**
+     * @param snpName Snapshot name.
+     * @return Local snapshot directory for snapshot with given name.
+     */
+    public File snapshotLocalDir(String snpName) {
+        assert locSnpDir != null;
+
+        return new File(locSnpDir, snpName);
+    }
+
+    /**
+     * @return Node snapshot working directory.
+     */
+    public File snapshotTmpDir() {
+        assert tmpWorkDir != null;
+
+        return tmpWorkDir;
+    }
+
+    /**
+     * @param req Request on snapshot creation.
+     * @return Future which will be completed when a snapshot has been started.
+     */
+    private IgniteInternalFuture<SnapshotOperationResponse> initLocalSnapshotStartStage(SnapshotOperationRequest req) {
+        if (cctx.kernalContext().clientNode() ||
+            !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()))
+            return new GridFinishedFuture<>();
+
+        // Executed inside discovery notifier thread, prior to firing discovery custom event,
+        // so it is safe to set new snapshot task inside this method without synchronization.
+        if (clusterSnpRq != null) {
+            return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot operation has been rejected. " +
+                "Another snapshot operation in progress [req=" + req + ", curr=" + clusterSnpRq + ']'));
+        }
+
+        // Collection of pairs group and appropriate cache partition to be snapshot.
+        Map<Integer, Set<Integer>> parts = new HashMap<>();
+
+        for (Integer grpId : req.grpIds)
+            parts.put(grpId, null);
+
+        SnapshotFutureTask task0 = registerSnapshotTask(req.snpName,
+            req.srcNodeId,
+            parts,
+            locSndrFactory.apply(req.snpName));
+
+        clusterSnpRq = req;
+
+        return task0.chain(f -> new SnapshotOperationResponse());
+    }
+
+    /**
+     * @param id Request id.
+     * @param res Results.
+     * @param err Errors.
+     */
+    private void processLocalSnapshotStartStageResult(UUID id, Map<UUID, SnapshotOperationResponse> res, Map<UUID, Exception> err) {
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        if (snpRq == null)
+            return;
+
+        if (isLocalNodeCoordinator(cctx.discovery())) {
+            Set<UUID> missed = new HashSet<>(snpRq.bltNodes);
+            missed.removeAll(res.keySet());
+            missed.removeAll(err.keySet());
+
+            snpRq.hasErr = !F.isEmpty(err) || !missed.isEmpty();
+
+            if (snpRq.hasErr) {
+                U.warn(log, "Execution of local snapshot tasks fails or them haven't been executed " +
+                    "due to some of nodes left the cluster. Uncompleted snapshot will be deleted " +
+                    "[err=" + err + ", missed=" + missed + ']');
+            }
+
+            endSnpProc.start(UUID.randomUUID(), snpRq);
+        }
+    }
+
+    /**
+     * @param req Request on snapshot creation.
+     * @return Future which will be completed when the snapshot will be finalized.
+     */
+    private IgniteInternalFuture<SnapshotOperationResponse> initLocalSnapshotEndStage(SnapshotOperationRequest req) {
+        if (clusterSnpRq == null)
+            return new GridFinishedFuture<>(new SnapshotOperationResponse());
+
+        try {
+            if (req.hasErr)
+                deleteSnapshot(snapshotLocalDir(req.snpName), pdsSettings.folderName());
+
+            removeLastMetaStorageKey();
+        }
+        catch (Exception e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        return new GridFinishedFuture<>(new SnapshotOperationResponse());
+    }
+
+    /**
+     * @param id Request id.
+     * @param res Results.
+     * @param err Errors.
+     */
+    private void processLocalSnapshotEndStageResult(UUID id, Map<UUID, SnapshotOperationResponse> res, Map<UUID, Exception> err) {
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        if (snpRq == null)
+            return;
+
+        Set<UUID> endFail = new HashSet<>(snpRq.bltNodes);
+        endFail.removeAll(res.keySet());
+
+        clusterSnpRq = null;
+
+        synchronized (snpOpMux) {
+            if (clusterSnpFut != null) {
+                if (endFail.isEmpty() && !snpRq.hasErr) {
+                    clusterSnpFut.onDone();
+
+                    if (log.isInfoEnabled())
+                        log.info("Cluster-wide snapshot operation finished successfully [req=" + snpRq + ']');
+                }
+                else {
+                    clusterSnpFut.onDone(new IgniteCheckedException("Snapshot creation has been finished with an error. " +
+                        "Local snapshot tasks may not finished completely or finalizing results fails " +
+                        "[hasErr" + snpRq.hasErr + ", fail=" + endFail + ']'));
+                }
+
+                clusterSnpFut = null;
+            }
+        }
+    }
+
+    /**
+     * @return {@code True} if snapshot operation started.
+     */
+    public boolean inProgress() {
+        synchronized (snpOpMux) {
+            return clusterSnpRq != null || clusterSnpFut != null;
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public List<String> getSnapshots() {
+        if (cctx.kernalContext().clientNode())
+            throw new UnsupportedOperationException("Client and daemon nodes can not perform this operation.");
+
+        synchronized (snpOpMux) {
+            return Arrays.stream(locSnpDir.listFiles(File::isDirectory))
+                .map(File::getName)
+                .collect(Collectors.toList());
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public IgniteFuture<Void> createSnapshot(String name) {
+        if (cctx.kernalContext().clientNode()) {
+            return new IgniteFinishedFutureImpl<>(new UnsupportedOperationException("Client and daemon nodes can not " +
+                "perform this operation."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(cctx.discovery().allNodes(), PERSISTENCE_CACHE_SNAPSHOT)) {
+            return new IgniteFinishedFutureImpl<>(new IllegalStateException("Not all nodes in the cluster support " +
+                "a snapshot operation."));
+        }
+
+        if (!active(cctx.kernalContext().state().clusterState().state())) {
+            return new IgniteFinishedFutureImpl<>(new IgniteException("Snapshot operation has been rejected. " +
+                "The cluster is inactive."));
+        }
+
+        DiscoveryDataClusterState clusterState = cctx.kernalContext().state().clusterState();
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(new IgniteException("Snapshot operation has been rejected. " +
+                "The baseline topology is not configured for cluster."));
+        }
+
+        GridFutureAdapter<Void> snpFut0;
+
+        synchronized (snpOpMux) {
+            if (clusterSnpFut != null && !clusterSnpFut.isDone()) {
+                return new IgniteFinishedFutureImpl<>(new IgniteException("Create snapshot request has been rejected. " +
+                    "The previous snapshot operation was not completed."));
+            }
+
+            if (clusterSnpRq != null) {
+                return new IgniteFinishedFutureImpl<>(new IgniteException("Create snapshot request has been rejected. " +
+                    "Parallel snapshot processes are not allowed."));
+            }
+
+            if (getSnapshots().contains(name))
+                return new IgniteFinishedFutureImpl<>(new IgniteException("Create snapshot request has been rejected. " +
+                    "Snapshot with given name already exists."));
+
+            snpFut0 = new GridFutureAdapter<>();
+
+            clusterSnpFut = snpFut0;
+        }
+
+        List<Integer> grps = cctx.cache().persistentGroups().stream()
+            .filter(g -> cctx.cache().cacheType(g.cacheOrGroupName()) == CacheType.USER)
+            .filter(g -> !g.config().isEncryptionEnabled())
+            .map(CacheGroupDescriptor::groupId)
+            .collect(Collectors.toList());
+
+        List<ClusterNode> srvNodes = cctx.discovery().serverNodes(AffinityTopologyVersion.NONE);
+
+        startSnpProc.start(UUID.randomUUID(), new SnapshotOperationRequest(cctx.localNodeId(),
+            name,
+            grps,
+            new HashSet<>(F.viewReadOnly(srvNodes,
+                F.node2id(),
+                (node) -> CU.baselineNode(node, clusterState)))));
+
+        if (log.isInfoEnabled())
+            log.info("Cluster-wide snapshot operation started [snpName=" + name + ", grps=" + grps + ']');
+
+        return new IgniteFutureImpl<>(snpFut0);
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onReadyForReadWrite(ReadWriteMetastorage metaStorage) throws IgniteCheckedException {
+        synchronized (snpOpMux) {
+            this.metaStorage = metaStorage;
+
+            if (recovered)
+                removeLastMetaStorageKey();
+
+            recovered = false;
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onReadyForRead(ReadOnlyMetastorage metaStorage) throws IgniteCheckedException {
+        // Snapshot which has not been completed due to the local node crashed must be deleted.
+        String snpName = (String)metaStorage.read(SNP_RUNNING_KEY);
+
+        if (snpName == null)
+            return;
+
+        recovered = true;
+
+        for (File tmp : snapshotTmpDir().listFiles())
+            U.delete(tmp);
+
+        deleteSnapshot(snapshotLocalDir(snpName), pdsSettings.folderName());
+
+        if (log.isInfoEnabled()) {
+            log.info("Previous attempt to create snapshot fail due to the local node crash. All resources " +
+                "related to snapshot operation have been deleted: " + snpName);
+        }
+    }
+
+    /**
+     * @param evt Discovery event to check.
+     * @return {@code true} if exchange started by snapshot operation.
+     */
+    public static boolean isSnapshotOperation(DiscoveryEvent evt) {
+        return !evt.eventNode().isClient() &&
+            evt.type() == EVT_DISCOVERY_CUSTOM_EVT &&
+            ((DiscoveryCustomEvent)evt).customMessage() instanceof SnapshotStartDiscoveryMessage;
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onDoneBeforeTopologyUnlock(GridDhtPartitionsExchangeFuture fut) {
+        if (clusterSnpRq == null || cctx.kernalContext().clientNode())
+            return;
+
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        SnapshotFutureTask task = locSnpTasks.get(snpRq.snpName);
+
+        if (task == null)
+            return;
+
+        if (task.start()) {
+            cctx.database().forceCheckpoint(String.format("Start snapshot operation: %s", snpRq.snpName));
+
+            // schedule task on checkpoint and wait when it starts
+            try {
+                task.awaitStarted();
+            }
+            catch (IgniteCheckedException e) {
+                U.error(log, "Fail to wait while cluster-wide snapshot operation started", e);
+            }
+        }
+    }
+
+    /**
+     * @param parts Collection of pairs group and appropriate cache partition to be snapshot.
+     * @param rmtNodeId The remote node to connect to.
+     * @param partConsumer Received partition handler.
+     * @return Future which will be completed when requested snapshot fully received.
+     */
+    public IgniteInternalFuture<Void> createRemoteSnapshot(
+        UUID rmtNodeId,
+        Map<Integer, Set<Integer>> parts,
+        BiConsumer<File, GroupPartitionId> partConsumer
+    ) {
+        assert partConsumer != null;
+
+        ClusterNode rmtNode = cctx.discovery().node(rmtNodeId);
+
+        if (!nodeSupports(rmtNode, PERSISTENCE_CACHE_SNAPSHOT))
+            return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot on remote node is not supported: " + rmtNode.id()));
+
+        if (rmtNode == null) {
+            return new GridFinishedFuture<>(new ClusterTopologyCheckedException("Snapshot request cannot be performed. " +
+                "Remote node left the grid [rmtNodeId=" + rmtNodeId + ']'));
+        }
+
+        String snpName = RMT_SNAPSHOT_PREFIX + UUID.randomUUID().toString();
+
+        RemoteSnapshotFuture snpTransFut = new RemoteSnapshotFuture(rmtNodeId, snpName, partConsumer);
+
+        busyLock.enterBusy();
+        SnapshotRequestMessage msg0;
+
+        try {
+            msg0 = new SnapshotRequestMessage(snpName, parts);
+
+            RemoteSnapshotFuture fut = rmtSnpReq.get();
+
+            try {
+                if (fut != null)
+                    fut.get(DFLT_SNAPSHOT_TIMEOUT, TimeUnit.MILLISECONDS);
+            }
+            catch (IgniteCheckedException e) {
+                if (log.isInfoEnabled())
+                    log.info("The previous snapshot request finished with an exception:" + e.getMessage());
+            }
+
+            try {
+                if (rmtSnpReq.compareAndSet(null, snpTransFut)) {
+                    cctx.gridIO().sendOrderedMessage(rmtNode, DFLT_INITIAL_SNAPSHOT_TOPIC, msg0, SYSTEM_POOL,
+                        Long.MAX_VALUE, true);
+                }
+                else
+                    return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot request has been concurrently interrupted."));
+
+            }
+            catch (IgniteCheckedException e) {
+                rmtSnpReq.compareAndSet(snpTransFut, null);
+
+                return new GridFinishedFuture<>(e);
+            }
+        }
+        finally {
+            busyLock.leaveBusy();
+        }
+
+        if (log.isInfoEnabled()) {
+            log.info("Snapshot request is sent to the remote node [rmtNodeId=" + rmtNodeId +
+                ", msg0=" + msg0 + ", snpTransFut=" + snpTransFut +
+                ", topVer=" + cctx.discovery().topologyVersionEx() + ']');
+        }
+
+        return snpTransFut;
+    }
+
+    /**
+     * @param grps List of cache groups which will be destroyed.
+     */
+    public void onCacheGroupsStopped(List<Integer> grps) {
+        for (SnapshotFutureTask sctx : locSnpTasks.values()) {
+            Set<Integer> retain = new HashSet<>(grps);
+            retain.retainAll(sctx.affectedCacheGroups());
+
+            if (!retain.isEmpty()) {
+                sctx.acceptException(new IgniteCheckedException("Snapshot has been interrupted due to some of the required " +
+                    "cache groups stopped: " + retain));
+            }
+        }
+    }
+
+    /**
+     * @param snpName Unique snapshot name.
+     * @param srcNodeId Node id which cause snapshot operation.
+     * @param parts Collection of pairs group and appropriate cache partition to be snapshot.
+     * @param snpSndr Factory which produces snapshot receiver instance.
+     * @return Snapshot operation task which should be registered on checkpoint to run.
+     */
+    SnapshotFutureTask registerSnapshotTask(
+        String snpName,
+        UUID srcNodeId,
+        Map<Integer, Set<Integer>> parts,
+        SnapshotSender snpSndr
+    ) {
+        if (!busyLock.enterBusy())
+            return new SnapshotFutureTask(new IgniteCheckedException("Snapshot manager is stopping [locNodeId=" + cctx.localNodeId() + ']'));
+
+        try {
+            if (locSnpTasks.containsKey(snpName))
+                return new SnapshotFutureTask(new IgniteCheckedException("Snapshot with requested name is already scheduled: " + snpName));
+
+            SnapshotFutureTask snpFutTask;
+
+            SnapshotFutureTask prev = locSnpTasks.putIfAbsent(snpName,
+                snpFutTask = new SnapshotFutureTask(cctx,
+                    srcNodeId,
+                    snpName,
+                    tmpWorkDir,
+                    ioFactory,
+                    snpSndr,
+                    parts,
+                    locBuff));
+
+            if (prev != null)
+                return new SnapshotFutureTask(new IgniteCheckedException("Snapshot with requested name is already scheduled: " + snpName));
+
+            if (log.isInfoEnabled()) {
+                log.info("Snapshot task has been registered on local node [sctx=" + this +
+                    ", topVer=" + cctx.discovery().topologyVersionEx() + ']');
+            }
+
+            snpFutTask.listen(f -> locSnpTasks.remove(snpName));
+
+            return snpFutTask;
+        }
+        finally {
+            busyLock.leaveBusy();
+        }
+    }
+
+    /**
+     * @param factory Factory which produces {@link LocalSnapshotSender} implementation.
+     */
+    void setLocalSnapshotSenderFactory(Function<String, SnapshotSender> factory) {
+        locSndrFactory = factory;
+    }
+
+    /**
+     * @return Factory which produces {@link LocalSnapshotSender} implementation.
+     */
+    Function<String, SnapshotSender> localSnapshotSenderFactory() {
+        return LocalSnapshotSender::new;
+    }
+
+    /**
+     * @param snpName Snapshot name.
+     * @param rmtNodeId Remote node id to send snapshot to.
+     * @return Snapshot sender instance.
+     */
+    SnapshotSender remoteSnapshotSender(String snpName, UUID rmtNodeId) {
+        // Remote snapshots can be send only by single threaded executor since only one transmissionSender created.
+        return new RemoteSnapshotSender(log,
+            new SequentialExecutorWrapper(log, snpRunner),
+            () -> igniteCacheStoragePath(pdsSettings),
+            cctx.gridIO().openTransmissionSender(rmtNodeId, DFLT_INITIAL_SNAPSHOT_TOPIC),
+            snpName);
+    }
+
+    /** Snapshot finished successfully or already restored. Key can be removed. */
+    private void removeLastMetaStorageKey() throws IgniteCheckedException {
+        cctx.database().checkpointReadLock();
+
+        try {
+            metaStorage.remove(SNP_RUNNING_KEY);
+        }
+        finally {
+            cctx.database().checkpointReadUnlock();
+        }
+    }
+
+    /**
+     * @return The executor service used to run snapshot tasks.
+     */
+    ExecutorService snapshotExecutorService() {
+        assert snpRunner != null;
+
+        return snpRunner;
+    }
+
+    /**
+     * @param ioFactory Factory to create IO interface over a page stores.
+     */
+    void ioFactory(FileIOFactory ioFactory) {
+        this.ioFactory = ioFactory;
+    }
+
+    /**
+     * @param nodeId Remote node id on which requests has been registered.
+     * @return Snapshot future related to given node id.
+     */
+    SnapshotFutureTask lastScheduledRemoteSnapshotTask(UUID nodeId) {
+        return locSnpTasks.values().stream()
+            .filter(t -> t.type() == RemoteSnapshotSender.class && t.sourceNodeId().equals(nodeId))
+            .findFirst()
+            .orElse(null);
+    }
+
+    /**
+     * @return Relative configured path of persistence data storage directory for the local node.
+     * Example: {@code snapshotWorkDir/db/IgniteNodeName0}
+     */
+    static String igniteCacheStoragePath(PdsFolderSettings pcfg) {
+        return Paths.get(DB_DEFAULT_FOLDER, pcfg.folderName()).toString();
+    }
+
+    /**
+     * @param cfg Ignite configuration.
+     * @return Snapshot work path.
+     */
+    static File resolveSnapshotWorkDirectory(IgniteConfiguration cfg) {
+        try {
+            return cfg.getSnapshotPath() == null ?
+                U.resolveWorkDirectory(cfg.getWorkDirectory(), DFLT_SNAPSHOT_DIRECTORY, false) :
+                U.resolveWorkDirectory(cfg.getWorkDirectory(), cfg.getSnapshotPath(), false);
+        }
+        catch (IgniteCheckedException e) {
+            throw new IgniteException(e);
+        }
+    }
+
+    /** Remote snapshot future which tracks remote snapshot transmission result. */
+    private class RemoteSnapshotFuture extends GridFutureAdapter<Void> {
+        /** Snapshot name to create. */
+        private final String snpName;
+
+        /** Remote node id to request snapshot from. */
+        private final UUID rmtNodeId;
+
+        /** Collection of partition to be received. */
+        private final Map<GroupPartitionId, FilePageStore> stores = new ConcurrentHashMap<>();
+
+        /** Partition handler given by request initiator. */
+        private final BiConsumer<File, GroupPartitionId> partConsumer;
+
+        /** Counter which show how many partitions left to be received. */
+        private int partsLeft = -1;
+
+        /**
+         * @param partConsumer Received partition handler.
+         */
+        public RemoteSnapshotFuture(UUID rmtNodeId, String snpName, BiConsumer<File, GroupPartitionId> partConsumer) {
+            this.snpName = snpName;
+            this.rmtNodeId = rmtNodeId;
+            this.partConsumer = partConsumer;
+        }
+
+        /** {@inheritDoc} */
+        @Override public boolean cancel() {
+            return onCancelled();
+        }
+
+        /** {@inheritDoc} */
+        @Override protected boolean onDone(@Nullable Void res, @Nullable Throwable err, boolean cancel) {
+            assert err != null || cancel || stores.isEmpty() : "Not all file storage processed: " + stores;
+
+            rmtSnpReq.compareAndSet(this, null);
+
+            if (err != null || cancel) {
+                // Close non finished file storage.
+                for (Map.Entry<GroupPartitionId, FilePageStore> entry : stores.entrySet()) {
+                    FilePageStore store = entry.getValue();
+
+                    try {
+                        store.stop(true);
+                    }
+                    catch (StorageException e) {
+                        log.warning("Error stopping received file page store", e);
+                    }
+                }
+            }
+
+            U.delete(Paths.get(tmpWorkDir.getAbsolutePath(), snpName));
+
+            return super.onDone(res, err, cancel);
+        }
+
+        /** {@inheritDoc} */
+        @Override public boolean equals(Object o) {
+            if (this == o)
+                return true;
+
+            if (o == null || getClass() != o.getClass())
+                return false;
+
+            RemoteSnapshotFuture fut = (RemoteSnapshotFuture)o;
+
+            return rmtNodeId.equals(fut.rmtNodeId) &&
+                snpName.equals(fut.snpName);
+        }
+
+        /** {@inheritDoc} */
+        @Override public int hashCode() {
+            return Objects.hash(rmtNodeId, snpName);
+        }
+
+        /** {@inheritDoc} */
+        @Override public String toString() {
+            return S.toString(RemoteSnapshotFuture.class, this);
+        }
+    }
+
+    /**
+     * Such an executor can executes tasks not in a single thread, but executes them
+     * on different threads sequentially. It's important for some {@link SnapshotSender}'s
+     * to process sub-task sequentially due to all these sub-tasks may share a single socket
+     * channel to send data to.
+     */
+    private static class SequentialExecutorWrapper implements Executor {
+        /** Ignite logger. */
+        private final IgniteLogger log;
+
+        /** Queue of task to execute. */
+        private final Queue<Runnable> tasks = new ArrayDeque<>();
+
+        /** Delegate executor. */
+        private final Executor executor;
+
+        /** Currently running task. */
+        private volatile Runnable active;
+
+        /** If wrapped executor is shutting down. */
+        private volatile boolean stopping;
+
+        /**
+         * @param executor Executor to run tasks on.
+         */
+        public SequentialExecutorWrapper(IgniteLogger log, Executor executor) {
+            this.log = log.getLogger(SequentialExecutorWrapper.class);
+            this.executor = executor;
+        }
+
+        /** {@inheritDoc} */
+        @Override public synchronized void execute(final Runnable r) {
+            assert !stopping : "Task must be cancelled prior to the wrapped executor is shutting down.";
+
+            tasks.offer(() -> {
+                try {
+                    r.run();
+                }
+                finally {
+                    scheduleNext();
+                }
+            });
+
+            if (active == null)
+                scheduleNext();
+        }
+
+        /** */
+        protected synchronized void scheduleNext() {
+            if ((active = tasks.poll()) != null) {
+                try {
+                    executor.execute(active);
+                }
+                catch (RejectedExecutionException e) {
+                    tasks.clear();
+
+                    stopping = true;
+
+                    log.warning("Task is outdated. Wrapped executor is shutting down.", e);
+                }
+            }
+        }
+    }
+
+    /**
+     *
+     */
+    private static class RemoteSnapshotSender extends SnapshotSender {
+        /** The sender which sends files to remote node. */
+        private final GridIoManager.TransmissionSender sndr;
+
+        /** Relative node path initializer. */
+        private final Supplier<String> initPath;
+
+        /** Snapshot name */
+        private final String snpName;
+
+        /** Local node persistent directory with consistent id. */
+        private String relativeNodePath;
+
+        /** The number of cache partition files expected to be processed. */
+        private int partsCnt;
+
+        /**
+         * @param log Ignite logger.
+         * @param sndr File sender instance.
+         * @param snpName Snapshot name.
+         */
+        public RemoteSnapshotSender(
+            IgniteLogger log,
+            Executor exec,
+            Supplier<String> initPath,
+            GridIoManager.TransmissionSender sndr,
+            String snpName
+        ) {
+            super(log, exec);
+
+            this.sndr = sndr;
+            this.snpName = snpName;
+            this.initPath = initPath;
+        }
+
+        /** {@inheritDoc} */
+        @Override protected void init(int partsCnt) {
+            this.partsCnt = partsCnt;
+
+            relativeNodePath = initPath.get();
+
+            if (relativeNodePath == null)
+                throw new IgniteException("Relative node path cannot be empty.");
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendPart0(File part, String cacheDirName, GroupPartitionId pair, Long len) {
+            try {
+                assert part.exists();
+                assert len > 0 : "Requested partitions has incorrect file length " +
+                    "[pair=" + pair + ", cacheDirName=" + cacheDirName + ']';
+
+                sndr.send(part, 0, len, transmissionParams(snpName, cacheDirName, pair), TransmissionPolicy.FILE);
+
+                if (log.isInfoEnabled()) {
+                    log.info("Partition file has been send [part=" + part.getName() + ", pair=" + pair +
+                        ", length=" + len + ']');
+                }
+            }
+            catch (TransmissionCancelledException e) {
+                if (log.isInfoEnabled()) {
+                    log.info("Transmission partition file has been interrupted [part=" + part.getName() +
+                        ", pair=" + pair + ']');
+                }
+            }
+            catch (IgniteCheckedException | InterruptedException | IOException e) {
+                U.error(log, "Error sending partition file [part=" + part.getName() + ", pair=" + pair +
+                    ", length=" + len + ']', e);
+
+                throw new IgniteException(e);
+            }
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendDelta0(File delta, String cacheDirName, GroupPartitionId pair) {
+            try {
+                sndr.send(delta, transmissionParams(snpName, cacheDirName, pair), TransmissionPolicy.CHUNK);
+
+                if (log.isInfoEnabled())
+                    log.info("Delta pages storage has been send [part=" + delta.getName() + ", pair=" + pair + ']');
+            }
+            catch (TransmissionCancelledException e) {
+                if (log.isInfoEnabled()) {
+                    log.info("Transmission delta pages has been interrupted [part=" + delta.getName() +
+                        ", pair=" + pair + ']');
+                }
+            }
+            catch (IgniteCheckedException | InterruptedException | IOException e) {
+                U.error(log, "Error sending delta file  [part=" + delta.getName() + ", pair=" + pair + ']', e);
+
+                throw new IgniteException(e);
+            }
+        }
+
+        /**
+         * @param cacheDirName Cache directory name.
+         * @param pair Cache group id with corresponding partition id.
+         * @return Map of params.
+         */
+        private Map<String, Serializable> transmissionParams(String snpName, String cacheDirName,
+            GroupPartitionId pair) {
+            Map<String, Serializable> params = new HashMap<>();
+
+            params.put(SNP_GRP_ID_PARAM, pair.getGroupId());
+            params.put(SNP_PART_ID_PARAM, pair.getPartitionId());
+            params.put(SNP_DB_NODE_PATH_PARAM, relativeNodePath);
+            params.put(SNP_CACHE_DIR_NAME_PARAM, cacheDirName);
+            params.put(SNP_NAME_PARAM, snpName);
+            params.put(SNP_PARTITIONS_CNT, partsCnt);
+
+            return params;
+        }
+
+        /** {@inheritDoc} */
+        @Override public void close0(@Nullable Throwable th) {
+            U.closeQuiet(sndr);
+
+            if (th == null) {
+                if (log.isInfoEnabled())
+                    log.info("The remote snapshot sender closed normally [snpName=" + snpName + ']');
+            }
+            else {
+                U.warn(log, "The remote snapshot sender closed due to an error occurred while processing " +
+                    "snapshot operation [snpName=" + snpName + ']', th);
+            }
+        }
+    }
+
+    /**
+     * Snapshot sender which writes all data to local directory.
+     */
+    private class LocalSnapshotSender extends SnapshotSender {
+        /** Snapshot name. */
+        private final String snpName;
+
+        /** Local snapshot directory. */
+        private final File snpLocDir;
+
+        /** Local node snapshot directory calculated on snapshot directory. */
+        private File dbDir;
+
+        /** Size of page. */
+        private final int pageSize;
+
+        /**
+         * @param snpName Snapshot name.
+         */
+        public LocalSnapshotSender(String snpName) {
+            super(IgniteSnapshotManager.this.log, snpRunner);
+
+            this.snpName = snpName;
+            snpLocDir = snapshotLocalDir(snpName);
+            pageSize = cctx.kernalContext().config().getDataStorageConfiguration().getPageSize();
+        }
+
+        /** {@inheritDoc} */
+        @Override protected void init(int partsCnt) {
+            dbDir = new File (snpLocDir, igniteCacheStoragePath(pdsSettings));
+
+            if (dbDir.exists()) {
+                throw new IgniteException("Snapshot with given name already exists " +
+                    "[snpName=" + snpName + ", absPath=" + dbDir.getAbsolutePath() + ']');
+            }
+
+            cctx.database().checkpointReadLock();
+
+            try {
+                assert metaStorage != null && metaStorage.read(SNP_RUNNING_KEY) == null :
+                    "The previous snapshot hasn't been completed correctly";
+
+                metaStorage.write(SNP_RUNNING_KEY, snpName);
+
+                U.ensureDirectory(dbDir, "snapshot work directory", log);
+            }
+            catch (IgniteCheckedException e) {
+                throw new IgniteException(e);
+            }
+            finally {
+                cctx.database().checkpointReadUnlock();
+            }
+
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendCacheConfig0(File ccfg, String cacheDirName) {
+            assert dbDir != null;
+
+            try {
+                File cacheDir = U.resolveWorkDirectory(dbDir.getAbsolutePath(), cacheDirName, false);
+
+                copy(ccfg, new File(cacheDir, ccfg.getName()), ccfg.length());
+            }
+            catch (IgniteCheckedException | IOException e) {
+                throw new IgniteException(e);
+            }
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendMarshallerMeta0(List<Map<Integer, MappedName>> mappings) {
+            if (mappings == null)
+                return;
+
+            saveMappings(cctx.kernalContext(), mappings, snpLocDir);
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendBinaryMeta0(Collection<BinaryType> types) {
+            if (types == null)
+                return;
+
+            cctx.kernalContext().cacheObjects().saveMetadata(types, snpLocDir);
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendPart0(File part, String cacheDirName, GroupPartitionId pair, Long len) {
+            try {
+                if (len == 0)
+                    return;
+
+                File cacheDir = U.resolveWorkDirectory(dbDir.getAbsolutePath(), cacheDirName, false);
+
+                File snpPart = new File(cacheDir, part.getName());
+
+                if (!snpPart.exists() || snpPart.delete())
+                    snpPart.createNewFile();
+
+                copy(part, snpPart, len);
+
+                if (log.isInfoEnabled()) {
+                    log.info("Partition has been snapshot [snapshotDir=" + dbDir.getAbsolutePath() +
+                        ", cacheDirName=" + cacheDirName + ", part=" + part.getName() +
+                        ", length=" + part.length() + ", snapshot=" + snpPart.getName() + ']');
+                }
+            }
+            catch (IOException | IgniteCheckedException ex) {
+                throw new IgniteException(ex);
+            }
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendDelta0(File delta, String cacheDirName, GroupPartitionId pair) {
+            File snpPart = getPartitionFile(dbDir, cacheDirName, pair.getPartitionId());
+
+            if (log.isInfoEnabled()) {
+                log.info("Start partition snapshot recovery with the given delta page file [part=" + snpPart +
+                    ", delta=" + delta + ']');
+            }
+
+            try (FileIO fileIo = ioFactory.create(delta, READ);
+                 FilePageStore pageStore = (FilePageStore)storeFactory
+                     .apply(pair.getGroupId(), false)
+                     .createPageStore(getFlagByPartId(pair.getPartitionId()),
+                         snpPart::toPath,
+                         new LongAdderMetric("NO_OP", null))
+            ) {
+                ByteBuffer pageBuf = ByteBuffer.allocate(pageSize)
+                    .order(ByteOrder.nativeOrder());
+
+                long totalBytes = fileIo.size();
+
+                assert totalBytes % pageSize == 0 : "Given file with delta pages has incorrect size: " + fileIo.size();
+
+                pageStore.beginRecover();
+
+                for (long pos = 0; pos < totalBytes; pos += pageSize) {
+                    long read = fileIo.readFully(pageBuf, pos);
+
+                    assert read == pageBuf.capacity();
+
+                    pageBuf.flip();
+
+                    if (log.isDebugEnabled()) {
+                        log.debug("Read page given delta file [path=" + delta.getName() +
+                            ", pageId=" + PageIO.getPageId(pageBuf) + ", pos=" + pos + ", pages=" + (totalBytes / pageSize) +
+                            ", crcBuff=" + FastCrc.calcCrc(pageBuf, pageBuf.limit()) + ", crcPage=" + PageIO.getCrc(pageBuf) + ']');
+
+                        pageBuf.rewind();
+                    }
+
+                    pageStore.write(PageIO.getPageId(pageBuf), pageBuf, 0, false);
+
+                    pageBuf.flip();
+                }
+
+                pageStore.finishRecover();
+            }
+            catch (IOException | IgniteCheckedException e) {
+                throw new IgniteException(e);
+            }
+        }
+
+        /** {@inheritDoc} */
+        @Override protected void close0(@Nullable Throwable th) {
+            if (th == null) {
+                if (log.isInfoEnabled())
+                    log.info("Local snapshot sender closed, resources released [dbNodeSnpDir=" + dbDir + ']');
+            }
+            else {
+                deleteSnapshot(snpLocDir, pdsSettings.folderName());
+
+                U.warn(log, "Local snapshot sender closed due to an error occurred", th);
+            }
+        }
+
+        /**
+         * @param from Copy from file.
+         * @param to Copy data to file.
+         * @param length Number of bytes to copy from beginning.
+         * @throws IOException If fails.
+         */
+        private void copy(File from, File to, long length) throws IOException {
+            try (FileIO src = ioFactory.create(from, READ);
+                 FileChannel dest = new FileOutputStream(to).getChannel()) {
+                if (src.size() < length) {
+                    throw new IgniteException("The source file to copy has to enough length " +
+                        "[expected=" + length + ", actual=" + src.size() + ']');
+                }
+
+                src.position(0);
+
+                long written = 0;
+
+                while (written < length)
+                    written += src.transferTo(written, length - written, dest);
+            }
+        }
+    }
+
+    /** Snapshot start request for {@link DistributedProcess} initiate message. */
+    private static class SnapshotOperationRequest implements Serializable {
+        /** Serial version uid. */
+        private static final long serialVersionUID = 0L;
+
+        /** Source node id which trigger request. */
+        private final UUID srcNodeId;
+
+        /** Snapshot name. */
+        private final String snpName;
+
+        @GridToStringInclude
+        /** The list of cache groups to include into snapshot. */
+        private final List<Integer> grpIds;
+
+        @GridToStringInclude
+        /** The list of affected by snapshot operation baseline nodes. */
+        private final Set<UUID> bltNodes;
+
+        /** {@code true} if an execution of local snapshot tasks failed with an error. */
+        private volatile boolean hasErr;
+
+        /**
+         * @param snpName Snapshot name.
+         * @param grpIds Cache groups to include into snapshot.
+         */
+        public SnapshotOperationRequest(UUID srcNodeId, String snpName, List<Integer> grpIds, Set<UUID> bltNodes) {
+            this.srcNodeId = srcNodeId;
+            this.snpName = snpName;
+            this.grpIds = grpIds;
+            this.bltNodes = bltNodes;
+        }
+
+        /** {@inheritDoc} */
+        @Override public boolean equals(Object o) {
 
 Review comment:
   Removed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r405151491
 
 

 ##########
 File path: modules/platforms/dotnet/Apache.Ignite.Core.Tests/Services/ServicesTest.cs
 ##########
 @@ -870,20 +870,6 @@ public void TestCallJavaService()
                 binSvc.testBinaryObject(
                     Grid1.GetBinary().ToBinary<IBinaryObject>(new PlatformComputeBinarizable {Field = 6}))
                     .GetField<int>("Field"));
-            
 
 Review comment:
   Thanks, I've reverted changes. This was an incorrect merge with the master branch.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r409741355
 
 

 ##########
 File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManagerSelfTest.java
 ##########
 @@ -0,0 +1,770 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.nio.file.OpenOption;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.CountDownLatch;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.function.BiConsumer;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.cache.affinity.rendezvous.RendezvousAffinityFunction;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.IgniteInterruptedCheckedException;
+import org.apache.ignite.internal.TestRecordingCommunicationSpi;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionDemandMessage;
+import org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionState;
+import org.apache.ignite.internal.processors.cache.persistence.CheckpointState;
+import org.apache.ignite.internal.processors.cache.persistence.DbCheckpointListener;
+import org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIODecorator;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.filename.PdsFolderSettings;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.util.lang.GridAbsPredicate;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.testframework.GridTestUtils;
+import org.junit.Test;
+
+import static org.apache.ignite.internal.MarshallerContextImpl.mappingFileStoreWorkDir;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.resolveBinaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.cacheDirName;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.CP_SNAPSHOT_REASON;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.testframework.GridTestUtils.assertThrowsAnyCause;
+
+/**
+ * Default snapshot manager test.
+ */
+public class IgniteSnapshotManagerSelfTest extends AbstractSnapshotSelfTest {
+    /** @throws Exception If fails. */
+    @Test
+    public void testSnapshotLocalPartitions() throws Exception {
+        // Start grid node with data before each test.
+        IgniteEx ig = startGridWithCache(dfltCacheCfg, 2048);
+
+        // The following data will be included into checkpoint.
+        for (int i = 2048; i < 4096; i++)
+            ig.cache(DEFAULT_CACHE_NAME).put(i, new TestOrderItem(i, i));
+
+        for (int i = 4096; i < 8192; i++) {
+            ig.cache(DEFAULT_CACHE_NAME).put(i, new TestOrderItem(i, i) {
+                @Override public String toString() {
+                    return "_" + super.toString();
+                }
+            });
+        }
+
+        GridCacheSharedContext<?, ?> cctx = ig.context().cache().context();
+        IgniteSnapshotManager mgr = snp(ig);
+
+        // Collection of pairs group and appropriate cache partition to be snapshot.
+        IgniteInternalFuture<?> snpFut = startLocalSnapshotTask(cctx,
+            SNAPSHOT_NAME,
+            F.asMap(CU.cacheId(DEFAULT_CACHE_NAME), null),
+            mgr.localSnapshotSenderFactory().apply(SNAPSHOT_NAME));
+
+        snpFut.get();
+
+        File cacheWorkDir = ((FilePageStoreManager)ig.context()
+            .cache()
+            .context()
+            .pageStore())
+            .cacheWorkDir(dfltCacheCfg);
+
+        // Checkpoint forces on cluster deactivation (currently only single node in cluster),
+        // so we must have the same data in snapshot partitions and those which left
+        // after node stop.
+        stopGrid(ig.name());
+
+        // Calculate CRCs.
+        IgniteConfiguration cfg = ig.context().config();
+        PdsFolderSettings settings = ig.context().pdsFolderResolver().resolveFolders();
+        String nodePath = databaseRelativePath(settings.folderName());
+        File binWorkDir = resolveBinaryWorkDir(cfg.getWorkDirectory(), settings.folderName());
+        File marshWorkDir = mappingFileStoreWorkDir(U.workDirectory(cfg.getWorkDirectory(), cfg.getIgniteHome()));
+        File snpBinWorkDir = resolveBinaryWorkDir(mgr.snapshotLocalDir(SNAPSHOT_NAME).getAbsolutePath(), settings.folderName());
+        File snpMarshWorkDir = mappingFileStoreWorkDir(mgr.snapshotLocalDir(SNAPSHOT_NAME).getAbsolutePath());
+
+        final Map<String, Integer> origPartCRCs = calculateCRC32Partitions(cacheWorkDir);
+        final Map<String, Integer> snpPartCRCs = calculateCRC32Partitions(
+            FilePageStoreManager.cacheWorkDir(U.resolveWorkDirectory(mgr.snapshotLocalDir(SNAPSHOT_NAME)
+                    .getAbsolutePath(),
+                nodePath,
+                false),
+                cacheDirName(dfltCacheCfg)));
+
+        assertEquals("Partitions must have the same CRC after file copying and merging partition delta files",
+            origPartCRCs, snpPartCRCs);
+        assertEquals("Binary object mappings must be the same for local node and created snapshot",
+            calculateCRC32Partitions(binWorkDir), calculateCRC32Partitions(snpBinWorkDir));
+        assertEquals("Marshaller meta mast be the same for local node and created snapshot",
+            calculateCRC32Partitions(marshWorkDir), calculateCRC32Partitions(snpMarshWorkDir));
+
+        File snpWorkDir = mgr.snapshotTmpDir();
+
+        assertEquals("Snapshot working directory must be cleaned after usage", 0, snpWorkDir.listFiles().length);
+    }
+
+    /**
+     * Test that all partitions are copied successfully even after multiple checkpoints occur during
+     * the long copy of cache partition files.
+     *
+     * Data consistency checked through a test node started right from snapshot directory and all values
+     * read successes.
+     *
+     * @throws Exception If fails.
+     */
+    @Test
+    public void testSnapshotLocalPartitionMultiCpWithLoad() throws Exception {
+        int valMultiplier = 2;
+        CountDownLatch slowCopy = new CountDownLatch(1);
+
+        // Start grid node with data before each test.
+        IgniteEx ig = startGrid(0);
+
+        ig.cluster().baselineAutoAdjustEnabled(false);
+        ig.cluster().state(ClusterState.ACTIVE);
+        GridCacheSharedContext<?, ?> cctx = ig.context().cache().context();
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            ig.cache(DEFAULT_CACHE_NAME).put(i, new TestOrderItem(i, i));
+
+        forceCheckpoint(ig);
+
+        AtomicInteger cntr = new AtomicInteger();
+        CountDownLatch ldrLatch = new CountDownLatch(1);
+        IgniteSnapshotManager mgr = snp(ig);
+        GridCacheDatabaseSharedManager db = (GridCacheDatabaseSharedManager)cctx.database();
+
+        IgniteInternalFuture<?> loadFut = GridTestUtils.runMultiThreadedAsync(() -> {
+            try {
+                U.await(ldrLatch);
+
+                while (!Thread.currentThread().isInterrupted())
+                    ig.cache(DEFAULT_CACHE_NAME).put(cntr.incrementAndGet(),
+                        new TestOrderItem(cntr.incrementAndGet(), cntr.incrementAndGet()));
+            }
+            catch (IgniteInterruptedCheckedException e) {
+                log.warning("Loader has been interrupted", e);
+            }
+        }, 5, "cache-loader-");
+
+        // Register task but not schedule it on the checkpoint.
+        SnapshotFutureTask snpFutTask = mgr.registerSnapshotTask(SNAPSHOT_NAME,
+            cctx.localNodeId(),
+            F.asMap(CU.cacheId(DEFAULT_CACHE_NAME), null),
+            new DelegateSnapshotSender(log, mgr.snapshotExecutorService(), mgr.localSnapshotSenderFactory().apply(SNAPSHOT_NAME)) {
+                @Override public void sendPart0(File part, String cacheDirName, GroupPartitionId pair, Long length) {
+                    try {
+                        U.await(slowCopy);
+
+                        delegate.sendPart0(part, cacheDirName, pair, length);
+                    }
+                    catch (IgniteInterruptedCheckedException e) {
+                        throw new IgniteException(e);
+                    }
+                }
+            });
+
+        db.addCheckpointListener(new DbCheckpointListener() {
+            /** {@inheritDoc} */
+            @Override public void beforeCheckpointBegin(Context ctx) {
+                // No-op.
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onMarkCheckpointBegin(Context ctx) {
+                // No-op.
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onCheckpointBegin(Context ctx) {
+                Map<Integer, Set<Integer>> processed = GridTestUtils.getFieldValue(snpFutTask,
+                    SnapshotFutureTask.class,
+                    "processed");
+
+                if (!processed.isEmpty())
+                    ldrLatch.countDown();
+            }
+        });
+
+        try {
+            snpFutTask.start();
+
+            // Change data before snapshot creation which must be included into it witch correct value multiplier.
+            for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+                ig.cache(DEFAULT_CACHE_NAME).put(i, new TestOrderItem(i, valMultiplier * i));
+
+            // Snapshot is still in the INIT state. beforeCheckpoint has been skipped
+            // due to checkpoint already running and we need to schedule the next one
+            // right after current will be completed.
+            cctx.database().forceCheckpoint(String.format(CP_SNAPSHOT_REASON, SNAPSHOT_NAME));
+
+            snpFutTask.awaitStarted();
+
+            db.forceCheckpoint("snapshot is ready to be created")
+                .futureFor(CheckpointState.MARKER_STORED_TO_DISK)
+                .get();
+
+            // Change data after snapshot.
+            for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+                ig.cache(DEFAULT_CACHE_NAME).put(i, new TestOrderItem(i, 3 * i));
+
+            // Snapshot on the next checkpoint must copy page to delta file before write it to a partition.
+            forceCheckpoint(ig);
+
+            slowCopy.countDown();
+
+            snpFutTask.get();
+        }
+        finally {
+            loadFut.cancel();
+        }
+
+        // Now can stop the node and check created snapshots.
+        stopGrid(0);
+
+        cleanPersistenceDir(ig.name());
+
+        // Start Ignite instance from snapshot directory.
+        IgniteEx ig2 = startGridsFromSnapshot(1, SNAPSHOT_NAME);
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++) {
+            assertEquals("snapshot data consistency violation [key=" + i + ']',
+                i * valMultiplier, ((TestOrderItem)ig2.cache(DEFAULT_CACHE_NAME).get(i)).value);
+        }
+    }
+
+    /** @throws Exception If fails. */
+    @Test
+    public void testSnapshotLocalPartitionNotEnoughSpace() throws Exception {
+        String err_msg = "Test exception. Not enough space.";
+        AtomicInteger throwCntr = new AtomicInteger();
+        RandomAccessFileIOFactory ioFactory = new RandomAccessFileIOFactory();
+
+        IgniteEx ig = startGridWithCache(dfltCacheCfg.setAffinity(new ZeroPartitionAffinityFunction()),
+            CACHE_KEYS_RANGE);
+
+        // Change data after backup.
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            ig.cache(DEFAULT_CACHE_NAME).put(i, 2 * i);
+
+        GridCacheSharedContext<?, ?> cctx0 = ig.context().cache().context();
+
+        IgniteSnapshotManager mgr = snp(ig);
+
+        mgr.ioFactory(new FileIOFactory() {
+            @Override public FileIO create(File file, OpenOption... modes) throws IOException {
+                FileIO fileIo = ioFactory.create(file, modes);
+
+                if (file.getName().equals(IgniteSnapshotManager.partDeltaFileName(0)))
+                    return new FileIODecorator(fileIo) {
+                        @Override public int writeFully(ByteBuffer srcBuf) throws IOException {
+                            if (throwCntr.incrementAndGet() == 3)
+                                throw new IOException(err_msg);
+
+                            return super.writeFully(srcBuf);
+                        }
+                    };
+
+                return fileIo;
+            }
+        });
+
+        IgniteInternalFuture<?> snpFut = startLocalSnapshotTask(cctx0,
+            SNAPSHOT_NAME,
+            F.asMap(CU.cacheId(DEFAULT_CACHE_NAME), null),
+            mgr.localSnapshotSenderFactory().apply(SNAPSHOT_NAME));
+
+        // Check the right exception thrown.
+        assertThrowsAnyCause(log,
+            snpFut::get,
+            IOException.class,
+            err_msg);
+    }
+
+    /** @throws Exception If fails. */
+    @Test
+    public void testSnapshotCreateLocalCopyPartitionFail() throws Exception {
+        String err_msg = "Test. Fail to copy partition: ";
+        IgniteEx ig = startGridWithCache(dfltCacheCfg, CACHE_KEYS_RANGE);
+
+        Map<Integer, Set<Integer>> parts = new HashMap<>();
+        parts.put(CU.cacheId(DEFAULT_CACHE_NAME), new HashSet<>(Collections.singletonList(0)));
+
+        IgniteSnapshotManager mgr0 = snp(ig);
+
+        IgniteInternalFuture<?> fut = startLocalSnapshotTask(ig.context().cache().context(),
+            SNAPSHOT_NAME,
+            parts,
+            new DelegateSnapshotSender(log, mgr0.snapshotExecutorService(),
+                mgr0.localSnapshotSenderFactory().apply(SNAPSHOT_NAME)) {
+                @Override public void sendPart0(File part, String cacheDirName, GroupPartitionId pair, Long length) {
+                    if (pair.getPartitionId() == 0)
+                        throw new IgniteException(err_msg + pair);
+
+                    delegate.sendPart0(part, cacheDirName, pair, length);
+                }
+            });
+
+        assertThrowsAnyCause(log,
+            fut::get,
+            IgniteException.class,
+            err_msg);
+    }
+
+    /** @throws Exception If fails. */
+    @Test
+    public void testSnapshotRemotePartitionsWithLoad() throws Exception {
+        IgniteEx ig0 = startGrids(2);
+
+        ig0.cluster().state(ClusterState.ACTIVE);
+
+        AtomicInteger cntr = new AtomicInteger();
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            ig0.cache(DEFAULT_CACHE_NAME).put(i, cntr.incrementAndGet());
+
+        GridCacheSharedContext<?, ?> cctx1 = grid(1).context().cache().context();
+        GridCacheDatabaseSharedManager db1 = (GridCacheDatabaseSharedManager)cctx1.database();
+
+        forceCheckpoint();
+
+        Map<String, Integer> rmtPartCRCs = new HashMap<>();
+        CountDownLatch cancelLatch = new CountDownLatch(1);
+
+        db1.addCheckpointListener(new DbCheckpointListener() {
+            /** {@inheritDoc} */
+            @Override public void beforeCheckpointBegin(Context ctx) {
+                //No-op.
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onMarkCheckpointBegin(Context ctx) {
+                // No-op.
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onCheckpointBegin(Context ctx) {
+                SnapshotFutureTask task = cctx1.snapshotMgr().lastScheduledRemoteSnapshotTask(grid(0).localNode().id());
+
+                // Skip first remote snapshot creation due to it will be cancelled.
+                if (task == null || cancelLatch.getCount() > 0)
+                    return;
+
+                Map<Integer, Set<Integer>> processed = GridTestUtils.getFieldValue(task,
+                    SnapshotFutureTask.class,
+                    "processed");
+
+                if (!processed.isEmpty()) {
+                    assert rmtPartCRCs.isEmpty();
+
+                    // Calculate actual partition CRCs when the checkpoint will be finished on this node.
+                    ctx.finishedStateFut().listen(f -> {
+                        File cacheWorkDir = ((FilePageStoreManager)grid(1).context().cache().context().pageStore())
+                            .cacheWorkDir(dfltCacheCfg);
+
+                        rmtPartCRCs.putAll(calculateCRC32Partitions(cacheWorkDir));
+                    });
+                }
+            }
+        });
+
+        IgniteSnapshotManager mgr0 = snp(ig0);
+
+        UUID rmtNodeId = grid(1).localNode().id();
+        Map<String, Integer> snpPartCRCs = new HashMap<>();
+
+        Map<Integer, Set<Integer>> parts = new HashMap<>();
+        parts.put(CU.cacheId(DEFAULT_CACHE_NAME), null);
+
+        IgniteInternalFuture<?> loadFut = GridTestUtils.runMultiThreadedAsync(() -> {
+            while (!Thread.currentThread().isInterrupted())
+                ig0.cache(DEFAULT_CACHE_NAME).put(cntr.incrementAndGet(), cntr.incrementAndGet());
+        }, 5, "cache-loader-");
+
+        try {
+            // Snapshot must be taken on node1 and transmitted to node0.
+            IgniteInternalFuture<?> fut = mgr0.requestRemoteSnapshot(rmtNodeId,
+                parts,
+                new BiConsumer<File, GroupPartitionId>() {
+                    @Override public void accept(File file, GroupPartitionId gprPartId) {
+                        log.info("Snapshot partition received successfully [rmtNodeId=" + rmtNodeId +
+                            ", part=" + file.getAbsolutePath() + ", gprPartId=" + gprPartId + ']');
+
+                        cancelLatch.countDown();
+                    }
+                });
+
+            cancelLatch.await();
+
+            fut.cancel();
+
+            IgniteInternalFuture<?> fut2 = mgr0.requestRemoteSnapshot(rmtNodeId,
+                parts,
+                (part, pair) -> {
+                    try {
+                        snpPartCRCs.put(part.getName(), FastCrc.calcCrc(part));
+                    }
+                    catch (IOException e) {
+                        throw new IgniteException(e);
+                    }
+                });
+
+            fut2.get();
+        }
+        finally {
+            loadFut.cancel();
+        }
+
+        assertEquals("Partitions from remote node must have the same CRCs as those which have been received",
+            rmtPartCRCs, snpPartCRCs);
+    }
+
+    /** @throws Exception If fails. */
+    @Test
+    public void testSnapshotRemoteOnBothNodes() throws Exception {
+        IgniteEx ig0 = startGrids(2);
+
+        ig0.cluster().state(ClusterState.ACTIVE);
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            ig0.cache(DEFAULT_CACHE_NAME).put(i, i);
+
+        forceCheckpoint(ig0);
+
+        IgniteSnapshotManager mgr0 = snp(ig0);
+        IgniteSnapshotManager mgr1 = snp(grid(1));
+
+        UUID node0 = grid(0).localNode().id();
+        UUID node1 = grid(1).localNode().id();
+
+        Map<Integer, Set<Integer>> fromNode1 = owningParts(ig0,
+            new HashSet<>(Collections.singletonList(CU.cacheId(DEFAULT_CACHE_NAME))),
+            node1);
+
+        Map<Integer, Set<Integer>> fromNode0 = owningParts(grid(1),
+            new HashSet<>(Collections.singletonList(CU.cacheId(DEFAULT_CACHE_NAME))),
+            node0);
+
+        // Snapshot must be taken on node1 and transmitted to node0.
+        IgniteInternalFuture<?> futFrom1To0 = mgr0.requestRemoteSnapshot(node1, fromNode1,
+            (part, pair) -> assertTrue("Received partition has not been requested", fromNode1.get(pair.getGroupId())
+                    .remove(pair.getPartitionId())));
+        IgniteInternalFuture<?> futFrom0To1 = mgr1.requestRemoteSnapshot(node0, fromNode0,
+            (part, pair) -> assertTrue("Received partition has not been requested", fromNode0.get(pair.getGroupId())
+                .remove(pair.getPartitionId())));
+
+        futFrom0To1.get();
+        futFrom1To0.get();
+
+        assertTrue("Not all of partitions have been received: " + fromNode1,
+            fromNode1.get(CU.cacheId(DEFAULT_CACHE_NAME)).isEmpty());
+        assertTrue("Not all of partitions have been received: " + fromNode0,
+            fromNode0.get(CU.cacheId(DEFAULT_CACHE_NAME)).isEmpty());
+    }
+
+    /** @throws Exception If fails. */
+    @Test(expected = ClusterTopologyCheckedException.class)
+    public void testRemoteSnapshotRequestedNodeLeft() throws Exception {
+        IgniteEx ig0 = startGridWithCache(dfltCacheCfg, CACHE_KEYS_RANGE);
+        IgniteEx ig1 = startGrid(1);
+
+        ig0.cluster().setBaselineTopology(ig0.cluster().forServers().nodes());
+
+        awaitPartitionMapExchange();
+
+        CountDownLatch hold = new CountDownLatch(1);
+
+        ((GridCacheDatabaseSharedManager)ig1.context().cache().context().database())
+            .addCheckpointListener(new DbCheckpointListener() {
+                /** {@inheritDoc} */
+                @Override public void beforeCheckpointBegin(Context ctx) throws IgniteCheckedException {
+                    // Listener will be executed inside the checkpoint thead.
+                    U.await(hold);
+                }
+
+                /** {@inheritDoc} */
+                @Override public void onMarkCheckpointBegin(Context ctx) {
+                    // No-op.
+                }
+
+                /** {@inheritDoc} */
+                @Override public void onCheckpointBegin(Context ctx) {
+                    // No-op.
+                }
+            });
+
+        UUID rmtNodeId = ig1.localNode().id();
+
+        Map<Integer, Set<Integer>> parts = new HashMap<>();
+        parts.put(CU.cacheId(DEFAULT_CACHE_NAME), null);
+
+        snp(ig0).requestRemoteSnapshot(rmtNodeId, parts, (part, grp) -> {});
+
+        IgniteInternalFuture<?>[] futs = new IgniteInternalFuture[1];
+
+        assertTrue(GridTestUtils.waitForCondition(new GridAbsPredicate() {
+            @Override public boolean apply() {
+                IgniteInternalFuture<Boolean> snpFut = snp(ig1)
+                    .lastScheduledRemoteSnapshotTask(ig0.localNode().id());
+
+                if (snpFut == null)
+                    return false;
+                else
+                    futs[0] = snpFut;
+
+                return true;
+            }
+        }, 5_000L));
+
+        stopGrid(0);
+
+        hold.countDown();
+
+        futs[0].get();
+    }
+
+    /**
+     * <pre>
+     * 1. Start 2 nodes.
+     * 2. Request snapshot from 2-nd node
+     * 3. Block snapshot-request message.
+     * 4. Start 3-rd node and change BLT.
+     * 5. Stop 3-rd node and change BLT.
+     * 6. 2-nd node now have MOVING partitions to be preloaded.
+     * 7. Release snapshot-request message.
+     * 8. Should get an error of snapshot creation since MOVING partitions cannot be snapshot.
+     * </pre>
+     *
+     * @throws Exception If fails.
+     */
+    @Test(expected = IgniteCheckedException.class)
+    public void testRemoteOutdatedSnapshot() throws Exception {
+        IgniteEx ig0 = startGrids(2);
+
+        ig0.cluster().state(ClusterState.ACTIVE);
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            ig0.cache(DEFAULT_CACHE_NAME).put(i, i);
+
+        awaitPartitionMapExchange();
+
+        forceCheckpoint();
+
+        TestRecordingCommunicationSpi.spi(ig0)
+            .blockMessages((node, msg) -> msg instanceof SnapshotRequestMessage);
+
+        UUID rmtNodeId = grid(1).localNode().id();
+
+        IgniteSnapshotManager mgr0 = snp(ig0);
+
+        // Snapshot must be taken on node1 and transmitted to node0.
+        IgniteInternalFuture<?> snpFut = mgr0.requestRemoteSnapshot(rmtNodeId,
+            owningParts(ig0, new HashSet<>(Collections.singletonList(CU.cacheId(DEFAULT_CACHE_NAME))), rmtNodeId),
+            (part, grp) -> {});
+
+        TestRecordingCommunicationSpi.spi(ig0)
+            .waitForBlocked();
+
+        startGrid(2);
+
+        ig0.cluster().setBaselineTopology(ig0.cluster().forServers().nodes());
+
+        awaitPartitionMapExchange();
+
+        stopGrid(2);
+
+        TestRecordingCommunicationSpi.spi(grid(1))
+            .blockMessages((node, msg) ->  msg instanceof GridDhtPartitionDemandMessage);
+
+        ig0.cluster().setBaselineTopology(ig0.cluster().forServers().nodes());
+
+        TestRecordingCommunicationSpi.spi(ig0)
+            .stopBlock(true, obj -> obj.get2().message() instanceof SnapshotRequestMessage);
+
+        snpFut.get();
+    }
+
+    /** @throws Exception If fails. */
+    @Test(expected = IgniteCheckedException.class)
+    public void testLocalSnapshotOnCacheStopped() throws Exception {
+        IgniteEx ig = startGridWithCache(dfltCacheCfg, CACHE_KEYS_RANGE);
+
+        startGrid(1);
+
+        ig.cluster().state(ClusterState.ACTIVE);
+
+        awaitPartitionMapExchange();
+
+        GridCacheSharedContext<?, ?> cctx0 = ig.context().cache().context();
+        IgniteSnapshotManager mgr = snp(ig);
+
+        CountDownLatch cpLatch = new CountDownLatch(1);
+
+        IgniteInternalFuture<?> snpFut = startLocalSnapshotTask(cctx0,
+            SNAPSHOT_NAME,
+            F.asMap(CU.cacheId(DEFAULT_CACHE_NAME), null),
+            new DelegateSnapshotSender(log, mgr.snapshotExecutorService(), mgr.localSnapshotSenderFactory().apply(SNAPSHOT_NAME)) {
+                @Override public void sendPart0(File part, String cacheDirName, GroupPartitionId pair, Long length) {
+                    try {
+                        U.await(cpLatch);
+
+                            delegate.sendPart0(part, cacheDirName, pair, length);
+                        } catch (IgniteInterruptedCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                    }
+                });
+
+        IgniteCache<?, ?> cache = ig.getOrCreateCache(DEFAULT_CACHE_NAME);
+
+        cache.destroy();
+
+        cpLatch.countDown();
+
+        snpFut.get(5_000, TimeUnit.MILLISECONDS);
+    }
+
+    /**
+     * @param src Source node to calculate.
+     * @param grps Groups to collect owning parts.
+     * @param rmtNodeId Remote node id.
+     * @return Map of collected parts.
+     */
+    private static Map<Integer, Set<Integer>> owningParts(IgniteEx src, Set<Integer> grps, UUID rmtNodeId) {
+        Map<Integer, Set<Integer>> result = new HashMap<>();
+
+        for (Integer grpId : grps) {
+            Set<Integer> parts = src.context()
+                .cache()
+                .cacheGroup(grpId)
+                .topology()
+                .partitions(rmtNodeId)
+                .entrySet()
+                .stream()
+                .filter(p -> p.getValue() == GridDhtPartitionState.OWNING)
+                .map(Map.Entry::getKey)
+                .collect(Collectors.toSet());
+
+            result.put(grpId, parts);
+        }
+
+        return result;
+    }
+
+    /**
+     * @param snpName Unique snapshot name.
+     * @param parts Collection of pairs group and appropriate cache partition to be snapshot.
+     * @param snpSndr Sender which used for snapshot sub-task processing.
+     * @return Future which will be completed when snapshot is done.
+     */
+    private static SnapshotFutureTask startLocalSnapshotTask(
+        GridCacheSharedContext<?, ?> cctx,
+        String snpName,
+        Map<Integer, Set<Integer>> parts,
+        SnapshotSender snpSndr
+    ) throws IgniteCheckedException{
+        SnapshotFutureTask snpFutTask = cctx.snapshotMgr().registerSnapshotTask(snpName, cctx.localNodeId(), parts, snpSndr);
+
+        snpFutTask.start();
+
+        // Snapshot is still in the INIT state. beforeCheckpoint has been skipped
+        // due to checkpoint already running and we need to schedule the next one
+        // right after current will be completed.
+        cctx.database().forceCheckpoint(String.format(CP_SNAPSHOT_REASON, snpName));
+
+        snpFutTask.awaitStarted();
+
+        return snpFutTask;
+    }
+
+    /** */
+    private static class ZeroPartitionAffinityFunction extends RendezvousAffinityFunction {
+        @Override public int partition(Object key) {
+            return 0;
+        }
+    }
+
+    /** */
+    private static class TestOrderItem implements Serializable {
+        /** Serial version. */
+        private static final long serialVersionUID = 0L;
+
+        /** Order key. */
+        private final int key;
+
+        /** Order value. */
+        private final int value;
+
+        public TestOrderItem(int key, int value) {
 
 Review comment:
   Fixed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r410205545
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/distributed/dht/preloader/GridDhtPartitionsExchangeFuture.java
 ##########
 @@ -937,6 +939,19 @@ else if (msg instanceof WalStateAbstractMessage)
             for (PartitionsExchangeAware comp : cctx.exchange().exchangeAwareComponents())
                 comp.onInitAfterTopologyLock(this);
 
+            // For pme-free exchanges onInitAfterTopologyLock must be
+            // invoked prior to onDoneBeforeTopologyUnlock
 
 Review comment:
   Point

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r409020105
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/GridCacheProcessor.java
 ##########
 @@ -1994,7 +1997,7 @@ private GridCacheContext prepareCacheContext(
      *
      * @param cctx Cache context.
      */
-    private void stopCacheSafely(GridCacheContext<?, ?> cctx) {
+    public void stopCacheSafely(GridCacheContext<?, ?> cctx) {
 
 Review comment:
   Fixed

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r408253133
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManager.java
 ##########
 @@ -0,0 +1,1906 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.channels.FileChannel;
+import java.nio.file.FileVisitResult;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.SimpleFileVisitor;
+import java.nio.file.attribute.BasicFileAttributes;
+import java.util.ArrayDeque;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Queue;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.Executor;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.concurrent.atomic.LongAdder;
+import java.util.function.BiConsumer;
+import java.util.function.BiFunction;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.events.DiscoveryEvent;
+import org.apache.ignite.failure.FailureContext;
+import org.apache.ignite.failure.FailureType;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.GridTopic;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.events.DiscoveryCustomEvent;
+import org.apache.ignite.internal.managers.communication.GridIoManager;
+import org.apache.ignite.internal.managers.communication.GridMessageListener;
+import org.apache.ignite.internal.managers.communication.TransmissionCancelledException;
+import org.apache.ignite.internal.managers.communication.TransmissionHandler;
+import org.apache.ignite.internal.managers.communication.TransmissionMeta;
+import org.apache.ignite.internal.managers.communication.TransmissionPolicy;
+import org.apache.ignite.internal.managers.discovery.DiscoCache;
+import org.apache.ignite.internal.managers.discovery.DiscoveryCustomMessage;
+import org.apache.ignite.internal.managers.discovery.GridDiscoveryManager;
+import org.apache.ignite.internal.managers.eventstorage.DiscoveryEventListener;
+import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.CacheType;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.PartitionsExchangeAware;
+import org.apache.ignite.internal.processors.cache.persistence.StorageException;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.filename.PdsFolderSettings;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.MetastorageLifecycleListener;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadOnlyMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadWriteMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.tree.io.PageIO;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.processors.marshaller.MappedName;
+import org.apache.ignite.internal.processors.metric.impl.LongAdderMetric;
+import org.apache.ignite.internal.util.GridBusyLock;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.distributed.InitMessage;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.tostring.GridToStringInclude;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.S;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.lang.IgniteUuid;
+import org.apache.ignite.thread.IgniteThreadPoolExecutor;
+import org.apache.ignite.thread.OomExceptionHandler;
+import org.jetbrains.annotations.Nullable;
+
+import static java.nio.file.StandardOpenOption.READ;
+import static org.apache.ignite.cluster.ClusterState.active;
+import static org.apache.ignite.configuration.IgniteConfiguration.DFLT_SNAPSHOT_DIRECTORY;
+import static org.apache.ignite.events.EventType.EVT_NODE_FAILED;
+import static org.apache.ignite.events.EventType.EVT_NODE_LEFT;
+import static org.apache.ignite.internal.IgniteFeatures.PERSISTENCE_CACHE_SNAPSHOT;
+import static org.apache.ignite.internal.IgniteFeatures.nodeSupports;
+import static org.apache.ignite.internal.MarshallerContextImpl.saveMappings;
+import static org.apache.ignite.internal.events.DiscoveryCustomEvent.EVT_DISCOVERY_CUSTOM_EVT;
+import static org.apache.ignite.internal.managers.communication.GridIoPolicy.SYSTEM_POOL;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.INDEX_PARTITION;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.MAX_PARTITION_ID;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.INDEX_FILE_NAME;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.PART_FILE_TEMPLATE;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFile;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFileName;
+import static org.apache.ignite.internal.processors.cache.persistence.filename.PdsConsistentIdProcessor.DB_DEFAULT_FOLDER;
+import static org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId.getFlagByPartId;
+import static org.apache.ignite.internal.util.IgniteUtils.isLocalNodeCoordinator;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.END_SNAPSHOT;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.START_SNAPSHOT;
+
+/**
+ * Internal implementation of snapshot operations over persistence caches.
+ * <p>
+ * There are two major actions available:
+ * <ul>
+ *     <li>Create snapshot of the whole cluster cache groups by triggering PME to achieve consistency.</li>
+ *     <li>Create local snapshot of requested cache groups and send it to the node which request this operation.
+ *     Cache groups will be transmitted using internal API for transferring files. See {@link TransmissionHandler}.</li>
+ * </ul>
+ */
+public class IgniteSnapshotManager extends GridCacheSharedManagerAdapter
+    implements IgniteSnapshot, PartitionsExchangeAware, MetastorageLifecycleListener {
+    /** File with delta pages suffix. */
+    public static final String DELTA_SUFFIX = ".delta";
+
+    /** File name template consists of delta pages. */
+    public static final String PART_DELTA_TEMPLATE = PART_FILE_TEMPLATE + DELTA_SUFFIX;
+
+    /** File name template for index delta pages. */
+    public static final String INDEX_DELTA_NAME = INDEX_FILE_NAME + DELTA_SUFFIX;
+
+    /** Text Reason for checkpoint to start snapshot operation. */
+    public static final String CP_SNAPSHOT_REASON = "Checkpoint started to enforce snapshot operation: %s";
+
+    /** Name prefix for each remote snapshot operation. */
+    public static final String RMT_SNAPSHOT_PREFIX = "snapshot_";
+
+    /** Default snapshot directory for loading remote snapshots. */
+    public static final String DFLT_SNAPSHOT_TMP_DIR = "snp";
+
+    /** Timeout in millisecond for snapshot operations. */
+    public static final long DFLT_SNAPSHOT_TIMEOUT = 15_000L;
+
+    /** Snapshot in progress error message. */
+    public static final String SNP_IN_PROGRESS_ERR_MSG = "Operation rejected due to the snapshot operation in progress.";
+
+    /** Error message to finalize snapshot tasks. */
+    public static final String SNP_NODE_STOPPING_ERR_MSG = "Snapshot has been cancelled due to the local node " +
+        "is stopping";
+
+    /** Metastorage key to save currently running snapshot. */
+    public static final String SNP_RUNNING_KEY = "snapshot-running";
+
+    /** Prefix for snapshot threads. */
+    private static final String SNAPSHOT_RUNNER_THREAD_PREFIX = "snapshot-runner";
+
+    /** Total number of thread to perform local snapshot. */
+    private static final int SNAPSHOT_THREAD_POOL_SIZE = 4;
+
+    /** Default snapshot topic to receive snapshots from remote node. */
+    private static final Object DFLT_INITIAL_SNAPSHOT_TOPIC = GridTopic.TOPIC_SNAPSHOT.topic("rmt_snp");
+
+    /** File transmission parameter of cache group id. */
+    private static final String SNP_GRP_ID_PARAM = "grpId";
+
+    /** File transmission parameter of cache partition id. */
+    private static final String SNP_PART_ID_PARAM = "partId";
+
+    /** File transmission parameter of node-sender directory path with its consistentId (e.g. db/IgniteNode0). */
+    private static final String SNP_DB_NODE_PATH_PARAM = "dbNodePath";
+
+    /** File transmission parameter of a cache directory with is currently sends its partitions. */
+    private static final String SNP_CACHE_DIR_NAME_PARAM = "cacheDirName";
+
+    /** Snapshot parameter name for a file transmission. */
+    private static final String SNP_NAME_PARAM = "snpName";
+
+    /** Total snapshot files count which receiver should expect to receive. */
+    private static final String SNP_PARTITIONS_CNT = "partsCnt";
+
+    /**
+     * Local buffer to perform copy-on-write operations with pages for {@code SnapshotFutureTask.PageStoreSerialWriter}s.
+     * It is important to have only only buffer per thread (instead of creating each buffer per
+     * each {@code SnapshotFutureTask.PageStoreSerialWriter}) this is redundant and can lead to OOM errors. Direct buffer
+     * deallocate only when ByteBuffer is garbage collected, but it can get out of off-heap memory before it.
+     */
+    private final ThreadLocal<ByteBuffer> locBuff;
+
+    /** Map of registered cache snapshot processes and their corresponding contexts. */
+    private final ConcurrentMap<String, SnapshotFutureTask> locSnpTasks = new ConcurrentHashMap<>();
+
+    /** Lock to protect the resources is used. */
+    private final GridBusyLock busyLock = new GridBusyLock();
+
+    /** Requested snapshot from remote node. */
+    private final AtomicReference<RemoteSnapshotFuture> rmtSnpReq = new AtomicReference<>();
+
+    /** Mutex used to order cluster snapshot operation progress. */
+    private final Object snpOpMux = new Object();
+
+    /** Take snapshot operation procedure. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> startSnpProc;
+
+    /** Check previously performed snapshot operation and delete uncompleted files if need. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> endSnpProc;
+
+    /** Resolved persistent data storage settings. */
+    private volatile PdsFolderSettings pdsSettings;
+
+    /** Fully initialized metastorage. */
+    private volatile ReadWriteMetastorage metaStorage;
+
+    /** Local snapshot sender factory. */
+    private Function<String, SnapshotSender> locSndrFactory = localSnapshotSenderFactory();
+
+    /** Main snapshot directory to save created snapshots. */
+    private volatile File locSnpDir;
+
+    /**
+     * Working directory for loaded snapshots from the remote nodes and storing
+     * temporary partition delta-files of locally started snapshot process.
+     */
+    private File tmpWorkDir;
+
+    /** Factory to working with delta as file storage. */
+    private volatile FileIOFactory ioFactory = new RandomAccessFileIOFactory();
+
+    /** Factory to create page store for restore. */
+    private volatile BiFunction<Integer, Boolean, FilePageStoreFactory> storeFactory;
+
+    /** Snapshot thread pool to perform local partition snapshots. */
+    private ExecutorService snpRunner;
+
+    /** System discovery message listener. */
+    private DiscoveryEventListener discoLsnr;
+
+    /** Cluster snapshot operation requested by user. */
+    private GridFutureAdapter<Void> clusterSnpFut;
+
+    /** Current snapshot operation on local node. */
+    private volatile SnapshotOperationRequest clusterSnpRq;
+
+    /** {@code true} if recovery process occurred for snapshot. */
+    private volatile boolean recovered;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public IgniteSnapshotManager(GridKernalContext ctx) {
+        locBuff = ThreadLocal.withInitial(() ->
+            ByteBuffer.allocateDirect(ctx.config().getDataStorageConfiguration().getPageSize())
+                .order(ByteOrder.nativeOrder()));
+
+        startSnpProc = new DistributedProcess<>(ctx, START_SNAPSHOT, this::initLocalSnapshotStartStage,
+            this::processLocalSnapshotStartStageResult);
+
+        endSnpProc = new DistributedProcess<>(ctx, END_SNAPSHOT, this::initLocalSnapshotEndStage,
+            this::processLocalSnapshotEndStageResult);
+    }
+
+    /**
+     * @param snapshotCacheDir Snapshot directory to store files.
+     * @param partId Cache partition identifier.
+     * @return A file representation.
+     */
+    public static File partDeltaFile(File snapshotCacheDir, int partId) {
+        return new File(snapshotCacheDir, partDeltaFileName(partId));
+    }
+
+    /**
+     * @param partId Partition id.
+     * @return File name of delta partition pages.
+     */
+    public static String partDeltaFileName(int partId) {
+        assert partId <= MAX_PARTITION_ID || partId == INDEX_PARTITION;
+
+        return partId == INDEX_PARTITION ? INDEX_DELTA_NAME : String.format(PART_DELTA_TEMPLATE, partId);
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void start0() throws IgniteCheckedException {
+        super.start0();
+
+        GridKernalContext ctx = cctx.kernalContext();
+
+        if (ctx.clientNode())
+            return;
+
+        if (!CU.isPersistenceEnabled(ctx.config()))
+            return;
+
+        snpRunner = new IgniteThreadPoolExecutor(SNAPSHOT_RUNNER_THREAD_PREFIX,
+            cctx.igniteInstanceName(),
+            SNAPSHOT_THREAD_POOL_SIZE,
+            SNAPSHOT_THREAD_POOL_SIZE,
+            IgniteConfiguration.DFLT_THREAD_KEEP_ALIVE_TIME,
+            new LinkedBlockingQueue<>(),
+            SYSTEM_POOL,
+            new OomExceptionHandler(ctx));
+
+        assert cctx.pageStore() instanceof FilePageStoreManager;
+
+        FilePageStoreManager storeMgr = (FilePageStoreManager)cctx.pageStore();
+
+        pdsSettings = cctx.kernalContext().pdsFolderResolver().resolveFolders();
+
+        locSnpDir = resolveSnapshotWorkDirectory(ctx.config());
+        tmpWorkDir = Paths.get(storeMgr.workDir().getAbsolutePath(), DFLT_SNAPSHOT_TMP_DIR).toFile();
+
+        U.ensureDirectory(locSnpDir, "snapshot work directory", log);
+        U.ensureDirectory(tmpWorkDir, "temp directory for snapshot creation", log);
+
+        storeFactory = storeMgr::getPageStoreFactory;
+
+        cctx.exchange().registerExchangeAwareComponent(this);
+        ctx.internalSubscriptionProcessor().registerMetastorageListener(this);
+
+        // Receive remote snapshots requests.
+        cctx.gridIO().addMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC, new GridMessageListener() {
+            @Override public void onMessage(UUID nodeId, Object msg, byte plc) {
+                if (!busyLock.enterBusy())
+                    return;
+
+                try {
+                    if (msg instanceof SnapshotRequestMessage) {
+                        SnapshotRequestMessage reqMsg0 = (SnapshotRequestMessage)msg;
+                        String snpName = reqMsg0.snapshotName();
+
+                        synchronized (this) {
+                            SnapshotFutureTask task = lastScheduledRemoteSnapshotTask(nodeId);
+
+                            if (task != null) {
+                                // Task will also be removed from local map due to the listener on future done.
+                                task.cancel();
+
+                                log.info("Snapshot request has been cancelled due to another request received " +
+                                    "[prevSnpResp=" + task + ", msg0=" + reqMsg0 + ']');
+                            }
+                        }
+
+                        SnapshotFutureTask task = registerSnapshotTask(snpName,
+                            nodeId,
+                            reqMsg0.parts(),
+                            remoteSnapshotSender(snpName, nodeId));
+
+                        task.listen(f -> {
+                            if (f.error() == null)
+                                return;
+
+                            U.error(log, "Failed to process request of creating a snapshot " +
+                                "[from=" + nodeId + ", msg=" + reqMsg0 + ']', f.error());
+
+                            try {
+                                cctx.gridIO().sendToCustomTopic(nodeId,
+                                    DFLT_INITIAL_SNAPSHOT_TOPIC,
+                                    new SnapshotResponseMessage(reqMsg0.snapshotName(), f.error().getMessage()),
+                                    SYSTEM_POOL);
+                            }
+                            catch (IgniteCheckedException ex0) {
+                                U.error(log, "Fail to send the response message with processing snapshot request " +
+                                    "error [request=" + reqMsg0 + ", nodeId=" + nodeId + ']', ex0);
+                            }
+                        });
+
+                        task.start();
+                    }
+                    else if (msg instanceof SnapshotResponseMessage) {
+                        SnapshotResponseMessage respMsg0 = (SnapshotResponseMessage)msg;
+
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.snpName.equals(respMsg0.snapshotName())) {
+                            if (log.isInfoEnabled()) {
+                                log.info("A stale snapshot response message has been received. Will be ignored " +
+                                    "[fromNodeId=" + nodeId + ", response=" + respMsg0 + ']');
+                            }
+
+                            return;
+                        }
+
+                        if (respMsg0.errorMessage() != null) {
+                            fut0.onDone(new IgniteCheckedException("Request cancelled. The snapshot operation stopped " +
+                                "on the remote node with an error: " + respMsg0.errorMessage()));
+                        }
+                    }
+                }
+                catch (Throwable e) {
+                    U.error(log, "Processing snapshot request from remote node fails with an error", e);
+
+                    cctx.kernalContext().failure().process(new FailureContext(FailureType.CRITICAL_ERROR, e));
+                }
+                finally {
+                    busyLock.leaveBusy();
+                }
+            }
+        });
+
+        cctx.gridEvents().addDiscoveryEventListener(discoLsnr = (evt, discoCache) -> {
+            if (!busyLock.enterBusy())
+                return;
+
+            try {
+                UUID leftNodeId = evt.eventNode().id();
+
+                if (evt.type() == EVT_DISCOVERY_CUSTOM_EVT) {
+                    DiscoveryCustomEvent evt0 = (DiscoveryCustomEvent)evt;
+
+                    if (evt0.customMessage() instanceof InitMessage) {
+                        InitMessage<?> msg = (InitMessage<?>)evt0.customMessage();
+
+                        // This happens when #takeSnapshot() method already invoked and distributed process
+                        // starts its action.
+                        if (msg.type() == START_SNAPSHOT.ordinal()) {
+                            assert clusterSnpRq != null ||
+                                !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()) : evt;
+
+                            DiscoveryCustomEvent customEvt = new DiscoveryCustomEvent();
+
+                            customEvt.node(evt0.node());
+                            customEvt.eventNode(evt0.eventNode());
+                            customEvt.affinityTopologyVersion(evt0.affinityTopologyVersion());
+                            customEvt.customMessage(new SnapshotStartDiscoveryMessage(discoCache, msg.processId()));
+
+                            // Handle new event inside discovery thread, so no guarantees will be violated.
+                            cctx.exchange().onDiscoveryEvent(customEvt, discoCache);
+                        }
+                    }
+                }
+                else if (evt.type() == EVT_NODE_LEFT || evt.type() == EVT_NODE_FAILED) {
+                    SnapshotOperationRequest snpRq = clusterSnpRq;
+
+                    for (SnapshotFutureTask sctx : locSnpTasks.values()) {
+                        if (sctx.sourceNodeId().equals(leftNodeId) ||
+                            (snpRq != null &&
+                                snpRq.snpName.equals(sctx.snapshotName()) &&
+                                snpRq.bltNodes.contains(leftNodeId))) {
+                            sctx.acceptException(new ClusterTopologyCheckedException("The node which requested snapshot " +
+                                "creation has left the grid"));
+                        }
+                    }
+
+                    RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                    if (snpTrFut != null && snpTrFut.rmtNodeId.equals(leftNodeId)) {
+                        snpTrFut.onDone(new ClusterTopologyCheckedException("The node from which a snapshot has been " +
+                            "requested left the grid"));
+                    }
+                }
+            }
+            finally {
+                busyLock.leaveBusy();
+            }
+        }, EVT_NODE_LEFT, EVT_NODE_FAILED, EVT_DISCOVERY_CUSTOM_EVT);
+
+        // Remote snapshot handler.
+        cctx.kernalContext().io().addTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC, new TransmissionHandler() {
+            @Override public void onEnd(UUID nodeId) {
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                assert snpTrFut.stores.isEmpty() : snpTrFut.stores.entrySet();
+                assert snpTrFut.partsLeft == 0 : snpTrFut;
+
+                snpTrFut.onDone();
+
+                log.info("Requested snapshot from remote node has been fully received " +
+                    "[snpName=" + snpTrFut.snpName + ", snpTrans=" + snpTrFut + ']');
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onException(UUID nodeId, Throwable err) {
+                RemoteSnapshotFuture fut = rmtSnpReq.get();
+
+                if (fut == null)
+                    return;
+
+                if (fut.rmtNodeId.equals(nodeId))
+                    fut.onDone(err);
+            }
+
+            /** {@inheritDoc} */
+            @Override public String filePath(UUID nodeId, TransmissionMeta fileMeta) {
+                Integer partId = (Integer)fileMeta.params().get(SNP_PART_ID_PARAM);
+                String rmtDbNodePath = (String)fileMeta.params().get(SNP_DB_NODE_PATH_PARAM);
+                String cacheDirName = (String)fileMeta.params().get(SNP_CACHE_DIR_NAME_PARAM);
+
+                RemoteSnapshotFuture transFut = resolve(nodeId, fileMeta);
+
+                try {
+                    File cacheDir = U.resolveWorkDirectory(tmpWorkDir.getAbsolutePath(),
+                        Paths.get(transFut.snpName, rmtDbNodePath, cacheDirName).toString(),
+                        false);
+
+                    return new File(cacheDir, getPartitionFileName(partId)).getAbsolutePath();
+                }
+                catch (IgniteCheckedException e) {
+                    throw new IgniteException(e);
+                }
+            }
+
+            /**
+             * @param nodeId Remote node id.
+             * @param meta Transmission meta.
+             * @return Resolved transmission future.
+             */
+            private RemoteSnapshotFuture resolve(UUID nodeId, TransmissionMeta meta) {
+                String snpName = (String)meta.params().get(SNP_NAME_PARAM);
+                Integer partsCnt = (Integer)meta.params().get(SNP_PARTITIONS_CNT);
+
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                if (snpTrFut == null || !snpTrFut.snpName.equals(snpName)) {
+                    throw new TransmissionCancelledException("Stale snapshot transmission will be ignored " +
+                        "[snpName=" + snpName + ", meta=" + meta + ", snpTrFut=" + snpTrFut + ']');
+                }
+
+                assert snpTrFut.snpName.equals(snpName) && snpTrFut.rmtNodeId.equals(nodeId) :
+                    "Another transmission in progress [snpTrFut=" + snpTrFut + ", nodeId=" + snpName + ']';
+
+                if (snpTrFut.partsLeft == -1)
+                    snpTrFut.partsLeft = partsCnt;
+
+                return snpTrFut;
+            }
+
+            /**
+             * @param snpTrans Current snapshot transmission.
+             * @param grpPartId Pair of group id and its partition id.
+             */
+            private void finishRecover(RemoteSnapshotFuture snpTrans, GroupPartitionId grpPartId) {
+                FilePageStore pageStore = null;
+
+                try {
+                    pageStore = snpTrans.stores.remove(grpPartId);
+
+                    pageStore.finishRecover();
+
+                    snpTrans.partConsumer.accept(new File(pageStore.getFileAbsolutePath()), grpPartId);
+
+                    snpTrans.partsLeft--;
+                }
+                catch (StorageException e) {
+                    throw new IgniteException(e);
+                }
+                finally {
+                    U.closeQuiet(pageStore);
+                }
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<ByteBuffer> chunkHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+
+                RemoteSnapshotFuture snpTrFut = resolve(nodeId, initMeta);
+
+                GroupPartitionId grpPartId = new GroupPartitionId(grpId, partId);
+                FilePageStore pageStore = snpTrFut.stores.get(grpPartId);
+
+                if (pageStore == null) {
+                    throw new IgniteException("Partition must be loaded before applying snapshot delta pages " +
+                        "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                pageStore.beginRecover();
+
+                // No snapshot delta pages received. Finalize recovery.
+                if (initMeta.count() == 0)
+                    finishRecover(snpTrFut, grpPartId);
+
+                return new Consumer<ByteBuffer>() {
+                    final LongAdder transferred = new LongAdder();
+
+                    @Override public void accept(ByteBuffer buff) {
+                        try {
+                            assert initMeta.count() != 0 : initMeta;
+
+                            RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                            if (fut0 == null || !fut0.equals(snpTrFut) || fut0.isCancelled()) {
+                                throw new TransmissionCancelledException("Snapshot request is cancelled " +
+                                    "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                            }
+
+                            pageStore.write(PageIO.getPageId(buff), buff, 0, false);
+
+                            transferred.add(buff.capacity());
+
+                            if (transferred.longValue() == initMeta.count())
+                                finishRecover(snpTrFut, grpPartId);
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                    }
+                };
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<File> fileHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+                String snpName = (String)initMeta.params().get(SNP_NAME_PARAM);
+
+                assert grpId != null;
+                assert partId != null;
+                assert snpName != null;
+                assert storeFactory != null;
+
+                RemoteSnapshotFuture transFut = rmtSnpReq.get();
+
+                if (transFut == null) {
+                    throw new IgniteException("Snapshot transmission with given name doesn't exists " +
+                        "[snpName=" + snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                return new Consumer<File>() {
+                    @Override public void accept(File file) {
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.equals(transFut) || fut0.isCancelled()) {
+                            throw new TransmissionCancelledException("Snapshot request is cancelled [snpName=" + snpName +
+                                ", grpId=" + grpId + ", partId=" + partId + ']');
+                        }
+
+                        busyLock.enterBusy();
+
+                        try {
+                            FilePageStore pageStore = (FilePageStore)storeFactory
+                                .apply(grpId, false)
+                                .createPageStore(getFlagByPartId(partId),
+                                    file::toPath,
+                                    new LongAdderMetric("NO_OP", null));
+
+                            transFut.stores.put(new GroupPartitionId(grpId, partId), pageStore);
+
+                            pageStore.init();
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                        finally {
+                            busyLock.leaveBusy();
+                        }
+                    }
+                };
+            }
+        });
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void stop0(boolean cancel) {
+        busyLock.block();
+
+        try {
+            // Try stop all snapshot processing if not yet.
+            for (SnapshotFutureTask sctx : locSnpTasks.values())
+                sctx.acceptException(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+            locSnpTasks.clear();
+
+            RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+            if (snpTrFut != null)
+                snpTrFut.cancel();
+
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null) {
+                    clusterSnpFut.onDone(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+                    clusterSnpFut = null;
+                }
+            }
+
+            if (snpRunner != null)
+                snpRunner.shutdownNow();
+
+            cctx.kernalContext().io().removeMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC);
+            cctx.kernalContext().io().removeTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC);
+
+            if (discoLsnr != null)
+                cctx.kernalContext().event().removeDiscoveryEventListener(discoLsnr);
+
+            cctx.exchange().unregisterExchangeAwareComponent(this);
+        }
+        finally {
+            busyLock.unblock();
+        }
+    }
+
+    /**
+     * Concurrently traverse the snapshot directory for given local node folder name and
+     * delete recursively all files from it if exist.
+     *
+     * @param snpDir Snapshot dire
+     * @param folderName Local node folder name (see U.maskForFileName with consistent id).
+     */
+    public static void deleteSnapshot(File snpDir, String folderName) {
+        if (!snpDir.exists())
+            return;
+
+        assert snpDir.isDirectory() : snpDir;
+
+        try {
+            List<Path> dirs = new ArrayList<>();
+
+            Files.walkFileTree(snpDir.toPath(), new SimpleFileVisitor<Path>() {
+                @Override public FileVisitResult preVisitDirectory(Path dir,
+                    BasicFileAttributes attrs) throws IOException {
+                    if (Files.isDirectory(dir) &&
+                        Files.exists(dir) &&
+                        folderName.equals(dir.getFileName().toString())) {
+                        // Directory found, add it for processing.
+                        dirs.add(dir);
+                    }
+
+                    return super.preVisitDirectory(dir, attrs);
+                }
+
+                @Override public FileVisitResult visitFileFailed(Path file, IOException exc) {
+                    // Skip files which can be concurrently removed from FileTree.
+                    return FileVisitResult.CONTINUE;
+                }
+            });
+
+            dirs.forEach(U::delete);
+
+            File db = new File(snpDir, DB_DEFAULT_FOLDER);
+
+            if (!db.exists() || db.list().length == 0)
+                U.delete(snpDir);
+        }
+        catch (IOException e) {
+            throw new IgniteException(e);
+        }
+    }
+
+    /**
+     * @param snpName Snapshot name.
+     * @return Local snapshot directory for snapshot with given name.
+     */
+    public File snapshotLocalDir(String snpName) {
+        assert locSnpDir != null;
+
+        return new File(locSnpDir, snpName);
+    }
+
+    /**
+     * @return Node snapshot working directory.
+     */
+    public File snapshotTmpDir() {
+        assert tmpWorkDir != null;
+
+        return tmpWorkDir;
+    }
+
+    /**
+     * @param req Request on snapshot creation.
+     * @return Future which will be completed when a snapshot has been started.
+     */
+    private IgniteInternalFuture<SnapshotOperationResponse> initLocalSnapshotStartStage(SnapshotOperationRequest req) {
+        if (cctx.kernalContext().clientNode() ||
+            !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()))
+            return new GridFinishedFuture<>();
+
+        // Executed inside discovery notifier thread, prior to firing discovery custom event,
+        // so it is safe to set new snapshot task inside this method without synchronization.
+        if (clusterSnpRq != null) {
+            return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot operation has been rejected. " +
+                "Another snapshot operation in progress [req=" + req + ", curr=" + clusterSnpRq + ']'));
+        }
+
+        // Collection of pairs group and appropriate cache partition to be snapshot.
+        Map<Integer, Set<Integer>> parts = new HashMap<>();
+
+        for (Integer grpId : req.grpIds)
+            parts.put(grpId, null);
+
+        SnapshotFutureTask task0 = registerSnapshotTask(req.snpName,
+            req.srcNodeId,
+            parts,
+            locSndrFactory.apply(req.snpName));
+
+        clusterSnpRq = req;
+
+        return task0.chain(f -> new SnapshotOperationResponse());
+    }
+
+    /**
+     * @param id Request id.
+     * @param res Results.
+     * @param err Errors.
+     */
+    private void processLocalSnapshotStartStageResult(UUID id, Map<UUID, SnapshotOperationResponse> res, Map<UUID, Exception> err) {
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        if (snpRq == null)
+            return;
+
+        if (isLocalNodeCoordinator(cctx.discovery())) {
+            Set<UUID> missed = new HashSet<>(snpRq.bltNodes);
+            missed.removeAll(res.keySet());
+            missed.removeAll(err.keySet());
+
+            snpRq.hasErr = !F.isEmpty(err) || !missed.isEmpty();
+
+            if (snpRq.hasErr) {
+                U.warn(log, "Execution of local snapshot tasks fails or them haven't been executed " +
+                    "due to some of nodes left the cluster. Uncompleted snapshot will be deleted " +
+                    "[err=" + err + ", missed=" + missed + ']');
+            }
+
+            endSnpProc.start(UUID.randomUUID(), snpRq);
+        }
+    }
+
+    /**
+     * @param req Request on snapshot creation.
+     * @return Future which will be completed when the snapshot will be finalized.
+     */
+    private IgniteInternalFuture<SnapshotOperationResponse> initLocalSnapshotEndStage(SnapshotOperationRequest req) {
+        if (clusterSnpRq == null)
+            return new GridFinishedFuture<>(new SnapshotOperationResponse());
+
+        try {
+            if (req.hasErr)
+                deleteSnapshot(snapshotLocalDir(req.snpName), pdsSettings.folderName());
+
+            removeLastMetaStorageKey();
+        }
+        catch (Exception e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        return new GridFinishedFuture<>(new SnapshotOperationResponse());
+    }
+
+    /**
+     * @param id Request id.
+     * @param res Results.
+     * @param err Errors.
+     */
+    private void processLocalSnapshotEndStageResult(UUID id, Map<UUID, SnapshotOperationResponse> res, Map<UUID, Exception> err) {
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        if (snpRq == null)
+            return;
+
+        Set<UUID> endFail = new HashSet<>(snpRq.bltNodes);
+        endFail.removeAll(res.keySet());
+
+        clusterSnpRq = null;
+
+        synchronized (snpOpMux) {
+            if (clusterSnpFut != null) {
+                if (endFail.isEmpty() && !snpRq.hasErr) {
+                    clusterSnpFut.onDone();
+
+                    if (log.isInfoEnabled())
+                        log.info("Cluster-wide snapshot operation finished successfully [req=" + snpRq + ']');
+                }
+                else {
+                    clusterSnpFut.onDone(new IgniteCheckedException("Snapshot creation has been finished with an error. " +
+                        "Local snapshot tasks may not finished completely or finalizing results fails " +
+                        "[hasErr" + snpRq.hasErr + ", fail=" + endFail + ']'));
+                }
+
+                clusterSnpFut = null;
+            }
+        }
+    }
+
+    /**
+     * @return {@code True} if snapshot operation started.
+     */
+    public boolean inProgress() {
+        synchronized (snpOpMux) {
+            return clusterSnpRq != null || clusterSnpFut != null;
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public List<String> getSnapshots() {
+        if (cctx.kernalContext().clientNode())
+            throw new UnsupportedOperationException("Client and daemon nodes can not perform this operation.");
+
+        synchronized (snpOpMux) {
+            return Arrays.stream(locSnpDir.listFiles(File::isDirectory))
+                .map(File::getName)
+                .collect(Collectors.toList());
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public IgniteFuture<Void> createSnapshot(String name) {
+        if (cctx.kernalContext().clientNode()) {
+            return new IgniteFinishedFutureImpl<>(new UnsupportedOperationException("Client and daemon nodes can not " +
+                "perform this operation."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(cctx.discovery().allNodes(), PERSISTENCE_CACHE_SNAPSHOT)) {
+            return new IgniteFinishedFutureImpl<>(new IllegalStateException("Not all nodes in the cluster support " +
+                "a snapshot operation."));
+        }
+
+        if (!active(cctx.kernalContext().state().clusterState().state())) {
+            return new IgniteFinishedFutureImpl<>(new IgniteException("Snapshot operation has been rejected. " +
+                "The cluster is inactive."));
+        }
+
+        DiscoveryDataClusterState clusterState = cctx.kernalContext().state().clusterState();
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(new IgniteException("Snapshot operation has been rejected. " +
+                "The baseline topology is not configured for cluster."));
+        }
+
+        GridFutureAdapter<Void> snpFut0;
+
+        synchronized (snpOpMux) {
+            if (clusterSnpFut != null && !clusterSnpFut.isDone()) {
+                return new IgniteFinishedFutureImpl<>(new IgniteException("Create snapshot request has been rejected. " +
+                    "The previous snapshot operation was not completed."));
+            }
+
+            if (clusterSnpRq != null) {
+                return new IgniteFinishedFutureImpl<>(new IgniteException("Create snapshot request has been rejected. " +
+                    "Parallel snapshot processes are not allowed."));
+            }
+
+            if (getSnapshots().contains(name))
+                return new IgniteFinishedFutureImpl<>(new IgniteException("Create snapshot request has been rejected. " +
+                    "Snapshot with given name already exists."));
+
+            snpFut0 = new GridFutureAdapter<>();
+
+            clusterSnpFut = snpFut0;
+        }
+
+        List<Integer> grps = cctx.cache().persistentGroups().stream()
+            .filter(g -> cctx.cache().cacheType(g.cacheOrGroupName()) == CacheType.USER)
+            .filter(g -> !g.config().isEncryptionEnabled())
+            .map(CacheGroupDescriptor::groupId)
+            .collect(Collectors.toList());
+
+        List<ClusterNode> srvNodes = cctx.discovery().serverNodes(AffinityTopologyVersion.NONE);
+
+        startSnpProc.start(UUID.randomUUID(), new SnapshotOperationRequest(cctx.localNodeId(),
+            name,
+            grps,
+            new HashSet<>(F.viewReadOnly(srvNodes,
+                F.node2id(),
+                (node) -> CU.baselineNode(node, clusterState)))));
+
+        if (log.isInfoEnabled())
+            log.info("Cluster-wide snapshot operation started [snpName=" + name + ", grps=" + grps + ']');
+
+        return new IgniteFutureImpl<>(snpFut0);
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onReadyForReadWrite(ReadWriteMetastorage metaStorage) throws IgniteCheckedException {
+        synchronized (snpOpMux) {
+            this.metaStorage = metaStorage;
+
+            if (recovered)
+                removeLastMetaStorageKey();
+
+            recovered = false;
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onReadyForRead(ReadOnlyMetastorage metaStorage) throws IgniteCheckedException {
+        // Snapshot which has not been completed due to the local node crashed must be deleted.
+        String snpName = (String)metaStorage.read(SNP_RUNNING_KEY);
+
+        if (snpName == null)
+            return;
+
+        recovered = true;
+
+        for (File tmp : snapshotTmpDir().listFiles())
+            U.delete(tmp);
+
+        deleteSnapshot(snapshotLocalDir(snpName), pdsSettings.folderName());
+
+        if (log.isInfoEnabled()) {
+            log.info("Previous attempt to create snapshot fail due to the local node crash. All resources " +
+                "related to snapshot operation have been deleted: " + snpName);
+        }
+    }
+
+    /**
+     * @param evt Discovery event to check.
+     * @return {@code true} if exchange started by snapshot operation.
+     */
+    public static boolean isSnapshotOperation(DiscoveryEvent evt) {
+        return !evt.eventNode().isClient() &&
+            evt.type() == EVT_DISCOVERY_CUSTOM_EVT &&
+            ((DiscoveryCustomEvent)evt).customMessage() instanceof SnapshotStartDiscoveryMessage;
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onDoneBeforeTopologyUnlock(GridDhtPartitionsExchangeFuture fut) {
+        if (clusterSnpRq == null || cctx.kernalContext().clientNode())
+            return;
+
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        SnapshotFutureTask task = locSnpTasks.get(snpRq.snpName);
+
+        if (task == null)
+            return;
+
+        if (task.start()) {
+            cctx.database().forceCheckpoint(String.format("Start snapshot operation: %s", snpRq.snpName));
+
+            // schedule task on checkpoint and wait when it starts
+            try {
+                task.awaitStarted();
+            }
+            catch (IgniteCheckedException e) {
+                U.error(log, "Fail to wait while cluster-wide snapshot operation started", e);
+            }
+        }
+    }
+
+    /**
+     * @param parts Collection of pairs group and appropriate cache partition to be snapshot.
+     * @param rmtNodeId The remote node to connect to.
+     * @param partConsumer Received partition handler.
+     * @return Future which will be completed when requested snapshot fully received.
+     */
+    public IgniteInternalFuture<Void> createRemoteSnapshot(
+        UUID rmtNodeId,
+        Map<Integer, Set<Integer>> parts,
+        BiConsumer<File, GroupPartitionId> partConsumer
+    ) {
+        assert partConsumer != null;
+
+        ClusterNode rmtNode = cctx.discovery().node(rmtNodeId);
+
+        if (!nodeSupports(rmtNode, PERSISTENCE_CACHE_SNAPSHOT))
+            return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot on remote node is not supported: " + rmtNode.id()));
+
+        if (rmtNode == null) {
+            return new GridFinishedFuture<>(new ClusterTopologyCheckedException("Snapshot request cannot be performed. " +
+                "Remote node left the grid [rmtNodeId=" + rmtNodeId + ']'));
+        }
+
+        String snpName = RMT_SNAPSHOT_PREFIX + UUID.randomUUID().toString();
+
+        RemoteSnapshotFuture snpTransFut = new RemoteSnapshotFuture(rmtNodeId, snpName, partConsumer);
+
+        busyLock.enterBusy();
+        SnapshotRequestMessage msg0;
+
+        try {
+            msg0 = new SnapshotRequestMessage(snpName, parts);
+
+            RemoteSnapshotFuture fut = rmtSnpReq.get();
+
+            try {
+                if (fut != null)
+                    fut.get(DFLT_SNAPSHOT_TIMEOUT, TimeUnit.MILLISECONDS);
+            }
+            catch (IgniteCheckedException e) {
+                if (log.isInfoEnabled())
+                    log.info("The previous snapshot request finished with an exception:" + e.getMessage());
+            }
+
+            try {
+                if (rmtSnpReq.compareAndSet(null, snpTransFut)) {
+                    cctx.gridIO().sendOrderedMessage(rmtNode, DFLT_INITIAL_SNAPSHOT_TOPIC, msg0, SYSTEM_POOL,
+                        Long.MAX_VALUE, true);
+                }
+                else
+                    return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot request has been concurrently interrupted."));
+
+            }
+            catch (IgniteCheckedException e) {
+                rmtSnpReq.compareAndSet(snpTransFut, null);
+
+                return new GridFinishedFuture<>(e);
+            }
+        }
+        finally {
+            busyLock.leaveBusy();
+        }
+
+        if (log.isInfoEnabled()) {
+            log.info("Snapshot request is sent to the remote node [rmtNodeId=" + rmtNodeId +
+                ", msg0=" + msg0 + ", snpTransFut=" + snpTransFut +
+                ", topVer=" + cctx.discovery().topologyVersionEx() + ']');
+        }
+
+        return snpTransFut;
+    }
+
+    /**
+     * @param grps List of cache groups which will be destroyed.
+     */
+    public void onCacheGroupsStopped(List<Integer> grps) {
+        for (SnapshotFutureTask sctx : locSnpTasks.values()) {
+            Set<Integer> retain = new HashSet<>(grps);
+            retain.retainAll(sctx.affectedCacheGroups());
+
+            if (!retain.isEmpty()) {
+                sctx.acceptException(new IgniteCheckedException("Snapshot has been interrupted due to some of the required " +
+                    "cache groups stopped: " + retain));
+            }
+        }
+    }
+
+    /**
+     * @param snpName Unique snapshot name.
+     * @param srcNodeId Node id which cause snapshot operation.
+     * @param parts Collection of pairs group and appropriate cache partition to be snapshot.
+     * @param snpSndr Factory which produces snapshot receiver instance.
+     * @return Snapshot operation task which should be registered on checkpoint to run.
+     */
+    SnapshotFutureTask registerSnapshotTask(
+        String snpName,
+        UUID srcNodeId,
+        Map<Integer, Set<Integer>> parts,
+        SnapshotSender snpSndr
+    ) {
+        if (!busyLock.enterBusy())
+            return new SnapshotFutureTask(new IgniteCheckedException("Snapshot manager is stopping [locNodeId=" + cctx.localNodeId() + ']'));
+
+        try {
+            if (locSnpTasks.containsKey(snpName))
+                return new SnapshotFutureTask(new IgniteCheckedException("Snapshot with requested name is already scheduled: " + snpName));
+
+            SnapshotFutureTask snpFutTask;
+
+            SnapshotFutureTask prev = locSnpTasks.putIfAbsent(snpName,
+                snpFutTask = new SnapshotFutureTask(cctx,
+                    srcNodeId,
+                    snpName,
+                    tmpWorkDir,
+                    ioFactory,
+                    snpSndr,
+                    parts,
+                    locBuff));
+
+            if (prev != null)
+                return new SnapshotFutureTask(new IgniteCheckedException("Snapshot with requested name is already scheduled: " + snpName));
+
+            if (log.isInfoEnabled()) {
+                log.info("Snapshot task has been registered on local node [sctx=" + this +
+                    ", topVer=" + cctx.discovery().topologyVersionEx() + ']');
+            }
+
+            snpFutTask.listen(f -> locSnpTasks.remove(snpName));
+
+            return snpFutTask;
+        }
+        finally {
+            busyLock.leaveBusy();
+        }
+    }
+
+    /**
+     * @param factory Factory which produces {@link LocalSnapshotSender} implementation.
+     */
+    void setLocalSnapshotSenderFactory(Function<String, SnapshotSender> factory) {
+        locSndrFactory = factory;
+    }
+
+    /**
+     * @return Factory which produces {@link LocalSnapshotSender} implementation.
+     */
+    Function<String, SnapshotSender> localSnapshotSenderFactory() {
+        return LocalSnapshotSender::new;
+    }
+
+    /**
+     * @param snpName Snapshot name.
+     * @param rmtNodeId Remote node id to send snapshot to.
+     * @return Snapshot sender instance.
+     */
+    SnapshotSender remoteSnapshotSender(String snpName, UUID rmtNodeId) {
+        // Remote snapshots can be send only by single threaded executor since only one transmissionSender created.
+        return new RemoteSnapshotSender(log,
+            new SequentialExecutorWrapper(log, snpRunner),
+            () -> igniteCacheStoragePath(pdsSettings),
+            cctx.gridIO().openTransmissionSender(rmtNodeId, DFLT_INITIAL_SNAPSHOT_TOPIC),
+            snpName);
+    }
+
+    /** Snapshot finished successfully or already restored. Key can be removed. */
+    private void removeLastMetaStorageKey() throws IgniteCheckedException {
+        cctx.database().checkpointReadLock();
+
+        try {
+            metaStorage.remove(SNP_RUNNING_KEY);
+        }
+        finally {
+            cctx.database().checkpointReadUnlock();
+        }
+    }
+
+    /**
+     * @return The executor service used to run snapshot tasks.
+     */
+    ExecutorService snapshotExecutorService() {
+        assert snpRunner != null;
+
+        return snpRunner;
+    }
+
+    /**
+     * @param ioFactory Factory to create IO interface over a page stores.
+     */
+    void ioFactory(FileIOFactory ioFactory) {
+        this.ioFactory = ioFactory;
+    }
+
+    /**
+     * @param nodeId Remote node id on which requests has been registered.
+     * @return Snapshot future related to given node id.
+     */
+    SnapshotFutureTask lastScheduledRemoteSnapshotTask(UUID nodeId) {
+        return locSnpTasks.values().stream()
+            .filter(t -> t.type() == RemoteSnapshotSender.class && t.sourceNodeId().equals(nodeId))
+            .findFirst()
+            .orElse(null);
+    }
+
+    /**
+     * @return Relative configured path of persistence data storage directory for the local node.
+     * Example: {@code snapshotWorkDir/db/IgniteNodeName0}
+     */
+    static String igniteCacheStoragePath(PdsFolderSettings pcfg) {
+        return Paths.get(DB_DEFAULT_FOLDER, pcfg.folderName()).toString();
+    }
+
+    /**
+     * @param cfg Ignite configuration.
+     * @return Snapshot work path.
+     */
+    static File resolveSnapshotWorkDirectory(IgniteConfiguration cfg) {
+        try {
+            return cfg.getSnapshotPath() == null ?
+                U.resolveWorkDirectory(cfg.getWorkDirectory(), DFLT_SNAPSHOT_DIRECTORY, false) :
+                U.resolveWorkDirectory(cfg.getWorkDirectory(), cfg.getSnapshotPath(), false);
+        }
+        catch (IgniteCheckedException e) {
+            throw new IgniteException(e);
+        }
+    }
+
+    /** Remote snapshot future which tracks remote snapshot transmission result. */
+    private class RemoteSnapshotFuture extends GridFutureAdapter<Void> {
+        /** Snapshot name to create. */
+        private final String snpName;
+
+        /** Remote node id to request snapshot from. */
+        private final UUID rmtNodeId;
+
+        /** Collection of partition to be received. */
+        private final Map<GroupPartitionId, FilePageStore> stores = new ConcurrentHashMap<>();
+
+        /** Partition handler given by request initiator. */
+        private final BiConsumer<File, GroupPartitionId> partConsumer;
+
+        /** Counter which show how many partitions left to be received. */
+        private int partsLeft = -1;
+
+        /**
+         * @param partConsumer Received partition handler.
+         */
+        public RemoteSnapshotFuture(UUID rmtNodeId, String snpName, BiConsumer<File, GroupPartitionId> partConsumer) {
+            this.snpName = snpName;
+            this.rmtNodeId = rmtNodeId;
+            this.partConsumer = partConsumer;
+        }
+
+        /** {@inheritDoc} */
+        @Override public boolean cancel() {
+            return onCancelled();
+        }
+
+        /** {@inheritDoc} */
+        @Override protected boolean onDone(@Nullable Void res, @Nullable Throwable err, boolean cancel) {
+            assert err != null || cancel || stores.isEmpty() : "Not all file storage processed: " + stores;
+
+            rmtSnpReq.compareAndSet(this, null);
+
+            if (err != null || cancel) {
+                // Close non finished file storage.
+                for (Map.Entry<GroupPartitionId, FilePageStore> entry : stores.entrySet()) {
+                    FilePageStore store = entry.getValue();
+
+                    try {
+                        store.stop(true);
+                    }
+                    catch (StorageException e) {
+                        log.warning("Error stopping received file page store", e);
+                    }
+                }
+            }
+
+            U.delete(Paths.get(tmpWorkDir.getAbsolutePath(), snpName));
+
+            return super.onDone(res, err, cancel);
+        }
+
+        /** {@inheritDoc} */
+        @Override public boolean equals(Object o) {
+            if (this == o)
+                return true;
+
+            if (o == null || getClass() != o.getClass())
+                return false;
+
+            RemoteSnapshotFuture fut = (RemoteSnapshotFuture)o;
+
+            return rmtNodeId.equals(fut.rmtNodeId) &&
+                snpName.equals(fut.snpName);
+        }
+
+        /** {@inheritDoc} */
+        @Override public int hashCode() {
+            return Objects.hash(rmtNodeId, snpName);
+        }
+
+        /** {@inheritDoc} */
+        @Override public String toString() {
+            return S.toString(RemoteSnapshotFuture.class, this);
+        }
+    }
+
+    /**
+     * Such an executor can executes tasks not in a single thread, but executes them
+     * on different threads sequentially. It's important for some {@link SnapshotSender}'s
+     * to process sub-task sequentially due to all these sub-tasks may share a single socket
+     * channel to send data to.
+     */
+    private static class SequentialExecutorWrapper implements Executor {
+        /** Ignite logger. */
+        private final IgniteLogger log;
+
+        /** Queue of task to execute. */
+        private final Queue<Runnable> tasks = new ArrayDeque<>();
+
+        /** Delegate executor. */
+        private final Executor executor;
+
+        /** Currently running task. */
+        private volatile Runnable active;
+
+        /** If wrapped executor is shutting down. */
+        private volatile boolean stopping;
+
+        /**
+         * @param executor Executor to run tasks on.
+         */
+        public SequentialExecutorWrapper(IgniteLogger log, Executor executor) {
+            this.log = log.getLogger(SequentialExecutorWrapper.class);
+            this.executor = executor;
+        }
+
+        /** {@inheritDoc} */
+        @Override public synchronized void execute(final Runnable r) {
+            assert !stopping : "Task must be cancelled prior to the wrapped executor is shutting down.";
+
+            tasks.offer(() -> {
+                try {
+                    r.run();
+                }
+                finally {
+                    scheduleNext();
+                }
+            });
+
+            if (active == null)
+                scheduleNext();
+        }
+
+        /** */
+        protected synchronized void scheduleNext() {
+            if ((active = tasks.poll()) != null) {
+                try {
+                    executor.execute(active);
+                }
+                catch (RejectedExecutionException e) {
+                    tasks.clear();
+
+                    stopping = true;
+
+                    log.warning("Task is outdated. Wrapped executor is shutting down.", e);
+                }
+            }
+        }
+    }
+
+    /**
+     *
+     */
+    private static class RemoteSnapshotSender extends SnapshotSender {
+        /** The sender which sends files to remote node. */
+        private final GridIoManager.TransmissionSender sndr;
+
+        /** Relative node path initializer. */
+        private final Supplier<String> initPath;
+
+        /** Snapshot name */
+        private final String snpName;
+
+        /** Local node persistent directory with consistent id. */
+        private String relativeNodePath;
+
+        /** The number of cache partition files expected to be processed. */
+        private int partsCnt;
+
+        /**
+         * @param log Ignite logger.
+         * @param sndr File sender instance.
+         * @param snpName Snapshot name.
+         */
+        public RemoteSnapshotSender(
+            IgniteLogger log,
+            Executor exec,
+            Supplier<String> initPath,
+            GridIoManager.TransmissionSender sndr,
+            String snpName
+        ) {
+            super(log, exec);
+
+            this.sndr = sndr;
+            this.snpName = snpName;
+            this.initPath = initPath;
+        }
+
+        /** {@inheritDoc} */
+        @Override protected void init(int partsCnt) {
+            this.partsCnt = partsCnt;
+
+            relativeNodePath = initPath.get();
+
+            if (relativeNodePath == null)
+                throw new IgniteException("Relative node path cannot be empty.");
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendPart0(File part, String cacheDirName, GroupPartitionId pair, Long len) {
+            try {
+                assert part.exists();
+                assert len > 0 : "Requested partitions has incorrect file length " +
+                    "[pair=" + pair + ", cacheDirName=" + cacheDirName + ']';
+
+                sndr.send(part, 0, len, transmissionParams(snpName, cacheDirName, pair), TransmissionPolicy.FILE);
+
+                if (log.isInfoEnabled()) {
+                    log.info("Partition file has been send [part=" + part.getName() + ", pair=" + pair +
+                        ", length=" + len + ']');
+                }
+            }
+            catch (TransmissionCancelledException e) {
+                if (log.isInfoEnabled()) {
+                    log.info("Transmission partition file has been interrupted [part=" + part.getName() +
+                        ", pair=" + pair + ']');
+                }
+            }
+            catch (IgniteCheckedException | InterruptedException | IOException e) {
+                U.error(log, "Error sending partition file [part=" + part.getName() + ", pair=" + pair +
+                    ", length=" + len + ']', e);
+
+                throw new IgniteException(e);
+            }
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendDelta0(File delta, String cacheDirName, GroupPartitionId pair) {
+            try {
+                sndr.send(delta, transmissionParams(snpName, cacheDirName, pair), TransmissionPolicy.CHUNK);
+
+                if (log.isInfoEnabled())
+                    log.info("Delta pages storage has been send [part=" + delta.getName() + ", pair=" + pair + ']');
+            }
+            catch (TransmissionCancelledException e) {
+                if (log.isInfoEnabled()) {
+                    log.info("Transmission delta pages has been interrupted [part=" + delta.getName() +
+                        ", pair=" + pair + ']');
+                }
+            }
+            catch (IgniteCheckedException | InterruptedException | IOException e) {
+                U.error(log, "Error sending delta file  [part=" + delta.getName() + ", pair=" + pair + ']', e);
+
+                throw new IgniteException(e);
+            }
+        }
+
+        /**
+         * @param cacheDirName Cache directory name.
+         * @param pair Cache group id with corresponding partition id.
+         * @return Map of params.
+         */
+        private Map<String, Serializable> transmissionParams(String snpName, String cacheDirName,
+            GroupPartitionId pair) {
+            Map<String, Serializable> params = new HashMap<>();
+
+            params.put(SNP_GRP_ID_PARAM, pair.getGroupId());
+            params.put(SNP_PART_ID_PARAM, pair.getPartitionId());
+            params.put(SNP_DB_NODE_PATH_PARAM, relativeNodePath);
+            params.put(SNP_CACHE_DIR_NAME_PARAM, cacheDirName);
+            params.put(SNP_NAME_PARAM, snpName);
+            params.put(SNP_PARTITIONS_CNT, partsCnt);
+
+            return params;
+        }
+
+        /** {@inheritDoc} */
+        @Override public void close0(@Nullable Throwable th) {
+            U.closeQuiet(sndr);
+
+            if (th == null) {
+                if (log.isInfoEnabled())
+                    log.info("The remote snapshot sender closed normally [snpName=" + snpName + ']');
+            }
+            else {
+                U.warn(log, "The remote snapshot sender closed due to an error occurred while processing " +
+                    "snapshot operation [snpName=" + snpName + ']', th);
+            }
+        }
+    }
+
+    /**
+     * Snapshot sender which writes all data to local directory.
+     */
+    private class LocalSnapshotSender extends SnapshotSender {
+        /** Snapshot name. */
+        private final String snpName;
+
+        /** Local snapshot directory. */
+        private final File snpLocDir;
+
+        /** Local node snapshot directory calculated on snapshot directory. */
+        private File dbDir;
+
+        /** Size of page. */
+        private final int pageSize;
+
+        /**
+         * @param snpName Snapshot name.
+         */
+        public LocalSnapshotSender(String snpName) {
+            super(IgniteSnapshotManager.this.log, snpRunner);
+
+            this.snpName = snpName;
+            snpLocDir = snapshotLocalDir(snpName);
+            pageSize = cctx.kernalContext().config().getDataStorageConfiguration().getPageSize();
+        }
+
+        /** {@inheritDoc} */
+        @Override protected void init(int partsCnt) {
+            dbDir = new File (snpLocDir, igniteCacheStoragePath(pdsSettings));
+
+            if (dbDir.exists()) {
+                throw new IgniteException("Snapshot with given name already exists " +
+                    "[snpName=" + snpName + ", absPath=" + dbDir.getAbsolutePath() + ']');
+            }
+
+            cctx.database().checkpointReadLock();
+
+            try {
+                assert metaStorage != null && metaStorage.read(SNP_RUNNING_KEY) == null :
+                    "The previous snapshot hasn't been completed correctly";
+
+                metaStorage.write(SNP_RUNNING_KEY, snpName);
+
+                U.ensureDirectory(dbDir, "snapshot work directory", log);
+            }
+            catch (IgniteCheckedException e) {
+                throw new IgniteException(e);
+            }
+            finally {
+                cctx.database().checkpointReadUnlock();
+            }
+
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendCacheConfig0(File ccfg, String cacheDirName) {
+            assert dbDir != null;
+
+            try {
+                File cacheDir = U.resolveWorkDirectory(dbDir.getAbsolutePath(), cacheDirName, false);
+
+                copy(ccfg, new File(cacheDir, ccfg.getName()), ccfg.length());
+            }
+            catch (IgniteCheckedException | IOException e) {
+                throw new IgniteException(e);
+            }
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendMarshallerMeta0(List<Map<Integer, MappedName>> mappings) {
+            if (mappings == null)
+                return;
+
+            saveMappings(cctx.kernalContext(), mappings, snpLocDir);
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendBinaryMeta0(Collection<BinaryType> types) {
+            if (types == null)
+                return;
+
+            cctx.kernalContext().cacheObjects().saveMetadata(types, snpLocDir);
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendPart0(File part, String cacheDirName, GroupPartitionId pair, Long len) {
+            try {
+                if (len == 0)
+                    return;
+
+                File cacheDir = U.resolveWorkDirectory(dbDir.getAbsolutePath(), cacheDirName, false);
+
+                File snpPart = new File(cacheDir, part.getName());
+
+                if (!snpPart.exists() || snpPart.delete())
+                    snpPart.createNewFile();
+
+                copy(part, snpPart, len);
+
+                if (log.isInfoEnabled()) {
+                    log.info("Partition has been snapshot [snapshotDir=" + dbDir.getAbsolutePath() +
+                        ", cacheDirName=" + cacheDirName + ", part=" + part.getName() +
+                        ", length=" + part.length() + ", snapshot=" + snpPart.getName() + ']');
+                }
+            }
+            catch (IOException | IgniteCheckedException ex) {
+                throw new IgniteException(ex);
+            }
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendDelta0(File delta, String cacheDirName, GroupPartitionId pair) {
+            File snpPart = getPartitionFile(dbDir, cacheDirName, pair.getPartitionId());
+
+            if (log.isInfoEnabled()) {
+                log.info("Start partition snapshot recovery with the given delta page file [part=" + snpPart +
+                    ", delta=" + delta + ']');
+            }
+
+            try (FileIO fileIo = ioFactory.create(delta, READ);
+                 FilePageStore pageStore = (FilePageStore)storeFactory
+                     .apply(pair.getGroupId(), false)
+                     .createPageStore(getFlagByPartId(pair.getPartitionId()),
+                         snpPart::toPath,
+                         new LongAdderMetric("NO_OP", null))
+            ) {
+                ByteBuffer pageBuf = ByteBuffer.allocate(pageSize)
+                    .order(ByteOrder.nativeOrder());
+
+                long totalBytes = fileIo.size();
+
+                assert totalBytes % pageSize == 0 : "Given file with delta pages has incorrect size: " + fileIo.size();
+
+                pageStore.beginRecover();
+
+                for (long pos = 0; pos < totalBytes; pos += pageSize) {
+                    long read = fileIo.readFully(pageBuf, pos);
+
+                    assert read == pageBuf.capacity();
+
+                    pageBuf.flip();
+
+                    if (log.isDebugEnabled()) {
+                        log.debug("Read page given delta file [path=" + delta.getName() +
+                            ", pageId=" + PageIO.getPageId(pageBuf) + ", pos=" + pos + ", pages=" + (totalBytes / pageSize) +
+                            ", crcBuff=" + FastCrc.calcCrc(pageBuf, pageBuf.limit()) + ", crcPage=" + PageIO.getCrc(pageBuf) + ']');
+
+                        pageBuf.rewind();
+                    }
+
+                    pageStore.write(PageIO.getPageId(pageBuf), pageBuf, 0, false);
+
+                    pageBuf.flip();
+                }
+
+                pageStore.finishRecover();
+            }
+            catch (IOException | IgniteCheckedException e) {
+                throw new IgniteException(e);
+            }
+        }
+
+        /** {@inheritDoc} */
+        @Override protected void close0(@Nullable Throwable th) {
+            if (th == null) {
+                if (log.isInfoEnabled())
+                    log.info("Local snapshot sender closed, resources released [dbNodeSnpDir=" + dbDir + ']');
+            }
+            else {
+                deleteSnapshot(snpLocDir, pdsSettings.folderName());
+
+                U.warn(log, "Local snapshot sender closed due to an error occurred", th);
+            }
+        }
+
+        /**
+         * @param from Copy from file.
+         * @param to Copy data to file.
+         * @param length Number of bytes to copy from beginning.
+         * @throws IOException If fails.
+         */
+        private void copy(File from, File to, long length) throws IOException {
+            try (FileIO src = ioFactory.create(from, READ);
+                 FileChannel dest = new FileOutputStream(to).getChannel()) {
+                if (src.size() < length) {
+                    throw new IgniteException("The source file to copy has to enough length " +
+                        "[expected=" + length + ", actual=" + src.size() + ']');
+                }
+
+                src.position(0);
+
+                long written = 0;
+
+                while (written < length)
+                    written += src.transferTo(written, length - written, dest);
+            }
+        }
+    }
+
+    /** Snapshot start request for {@link DistributedProcess} initiate message. */
+    private static class SnapshotOperationRequest implements Serializable {
+        /** Serial version uid. */
+        private static final long serialVersionUID = 0L;
+
+        /** Source node id which trigger request. */
+        private final UUID srcNodeId;
+
+        /** Snapshot name. */
+        private final String snpName;
+
+        @GridToStringInclude
+        /** The list of cache groups to include into snapshot. */
+        private final List<Integer> grpIds;
+
+        @GridToStringInclude
+        /** The list of affected by snapshot operation baseline nodes. */
+        private final Set<UUID> bltNodes;
+
+        /** {@code true} if an execution of local snapshot tasks failed with an error. */
+        private volatile boolean hasErr;
+
+        /**
+         * @param snpName Snapshot name.
+         * @param grpIds Cache groups to include into snapshot.
+         */
+        public SnapshotOperationRequest(UUID srcNodeId, String snpName, List<Integer> grpIds, Set<UUID> bltNodes) {
+            this.srcNodeId = srcNodeId;
+            this.snpName = snpName;
+            this.grpIds = grpIds;
+            this.bltNodes = bltNodes;
+        }
+
+        /** {@inheritDoc} */
+        @Override public boolean equals(Object o) {
 
 Review comment:
   Do we really need to override `equals()` and `hashCode()`? We didn't use these methods explicitly and never use object as key for hashmaps. Did I miss something?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r407968807
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManager.java
 ##########
 @@ -0,0 +1,1894 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.channels.FileChannel;
+import java.nio.file.FileVisitResult;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.SimpleFileVisitor;
+import java.nio.file.attribute.BasicFileAttributes;
+import java.util.ArrayDeque;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Queue;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.Executor;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.concurrent.atomic.LongAdder;
+import java.util.function.BiConsumer;
+import java.util.function.BiFunction;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.events.DiscoveryEvent;
+import org.apache.ignite.failure.FailureContext;
+import org.apache.ignite.failure.FailureType;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.GridTopic;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.events.DiscoveryCustomEvent;
+import org.apache.ignite.internal.managers.communication.GridIoManager;
+import org.apache.ignite.internal.managers.communication.GridMessageListener;
+import org.apache.ignite.internal.managers.communication.TransmissionCancelledException;
+import org.apache.ignite.internal.managers.communication.TransmissionHandler;
+import org.apache.ignite.internal.managers.communication.TransmissionMeta;
+import org.apache.ignite.internal.managers.communication.TransmissionPolicy;
+import org.apache.ignite.internal.managers.discovery.DiscoCache;
+import org.apache.ignite.internal.managers.discovery.DiscoveryCustomMessage;
+import org.apache.ignite.internal.managers.discovery.GridDiscoveryManager;
+import org.apache.ignite.internal.managers.eventstorage.DiscoveryEventListener;
+import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.CacheType;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.PartitionsExchangeAware;
+import org.apache.ignite.internal.processors.cache.persistence.StorageException;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.filename.PdsFolderSettings;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.MetastorageLifecycleListener;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadOnlyMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadWriteMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.tree.io.PageIO;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.processors.marshaller.MappedName;
+import org.apache.ignite.internal.processors.metric.impl.LongAdderMetric;
+import org.apache.ignite.internal.util.GridBusyLock;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.distributed.InitMessage;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.tostring.GridToStringInclude;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.S;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.lang.IgniteUuid;
+import org.apache.ignite.thread.IgniteThreadPoolExecutor;
+import org.apache.ignite.thread.OomExceptionHandler;
+import org.jetbrains.annotations.Nullable;
+
+import static java.nio.file.StandardOpenOption.READ;
+import static org.apache.ignite.cluster.ClusterState.active;
+import static org.apache.ignite.configuration.IgniteConfiguration.DFLT_SNAPSHOT_DIRECTORY;
+import static org.apache.ignite.events.EventType.EVT_NODE_FAILED;
+import static org.apache.ignite.events.EventType.EVT_NODE_LEFT;
+import static org.apache.ignite.internal.IgniteFeatures.PERSISTENCE_CACHE_SNAPSHOT;
+import static org.apache.ignite.internal.IgniteFeatures.nodeSupports;
+import static org.apache.ignite.internal.MarshallerContextImpl.saveMappings;
+import static org.apache.ignite.internal.events.DiscoveryCustomEvent.EVT_DISCOVERY_CUSTOM_EVT;
+import static org.apache.ignite.internal.managers.communication.GridIoPolicy.SYSTEM_POOL;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.INDEX_PARTITION;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.MAX_PARTITION_ID;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.INDEX_FILE_NAME;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.PART_FILE_TEMPLATE;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFile;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFileName;
+import static org.apache.ignite.internal.processors.cache.persistence.filename.PdsConsistentIdProcessor.DB_DEFAULT_FOLDER;
+import static org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId.getFlagByPartId;
+import static org.apache.ignite.internal.util.IgniteUtils.isLocalNodeCoordinator;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.END_SNAPSHOT;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.START_SNAPSHOT;
+
+/**
+ * Internal implementation of snapshot operations over persistence caches.
+ * <p>
+ * There are two major actions available:
+ * <ul>
+ *     <li>Create snapshot of the whole cluster cache groups by triggering PME to achieve consistency.</li>
+ *     <li>Create local snapshot of requested cache groups and send it to the node which request this operation.
+ *     Cache groups will be transmitted using internal API for transferring files. See {@link TransmissionHandler}.</li>
+ * </ul>
+ */
+public class IgniteSnapshotManager extends GridCacheSharedManagerAdapter
+    implements IgniteSnapshot, PartitionsExchangeAware, MetastorageLifecycleListener {
+    /** File with delta pages suffix. */
+    public static final String DELTA_SUFFIX = ".delta";
+
+    /** File name template consists of delta pages. */
+    public static final String PART_DELTA_TEMPLATE = PART_FILE_TEMPLATE + DELTA_SUFFIX;
+
+    /** File name template for index delta pages. */
+    public static final String INDEX_DELTA_NAME = INDEX_FILE_NAME + DELTA_SUFFIX;
+
+    /** Text Reason for checkpoint to start snapshot operation. */
+    public static final String CP_SNAPSHOT_REASON = "Checkpoint started to enforce snapshot operation: %s";
+
+    /** Name prefix for each remote snapshot operation. */
+    public static final String RMT_SNAPSHOT_PREFIX = "snapshot_";
+
+    /** Default snapshot directory for loading remote snapshots. */
+    public static final String DFLT_SNAPSHOT_TMP_DIR = "snp";
+
+    /** Timeout in millisecond for snapshot operations. */
+    public static final long DFLT_SNAPSHOT_TIMEOUT = 15_000L;
+
+    /** Snapshot in progress error message. */
+    public static final String SNP_IN_PROGRESS_ERR_MSG = "Operation rejected due to the snapshot operation in progress.";
+
+    /** Error message to finalize snapshot tasks. */
+    public static final String SNP_NODE_STOPPING_ERR_MSG = "Snapshot has been cancelled due to the local node " +
+        "is stopping";
+
+    /** Metastorage key to save currently running snapshot. */
+    public static final String SNP_RUNNING_KEY = "snapshot-running";
+
+    /** Prefix for snapshot threads. */
+    private static final String SNAPSHOT_RUNNER_THREAD_PREFIX = "snapshot-runner";
+
+    /** Total number of thread to perform local snapshot. */
+    private static final int SNAPSHOT_THREAD_POOL_SIZE = 4;
+
+    /** Default snapshot topic to receive snapshots from remote node. */
+    private static final Object DFLT_INITIAL_SNAPSHOT_TOPIC = GridTopic.TOPIC_SNAPSHOT.topic("rmt_snp");
+
+    /** File transmission parameter of cache group id. */
+    private static final String SNP_GRP_ID_PARAM = "grpId";
+
+    /** File transmission parameter of cache partition id. */
+    private static final String SNP_PART_ID_PARAM = "partId";
+
+    /** File transmission parameter of node-sender directory path with its consistentId (e.g. db/IgniteNode0). */
+    private static final String SNP_DB_NODE_PATH_PARAM = "dbNodePath";
+
+    /** File transmission parameter of a cache directory with is currently sends its partitions. */
+    private static final String SNP_CACHE_DIR_NAME_PARAM = "cacheDirName";
+
+    /** Snapshot parameter name for a file transmission. */
+    private static final String SNP_NAME_PARAM = "snpName";
+
+    /** Total snapshot files count which receiver should expect to receive. */
+    private static final String SNP_PARTITIONS_CNT = "partsCnt";
+
+    /**
+     * Local buffer to perform copy-on-write operations with pages for {@code SnapshotFutureTask.PageStoreSerialWriter}s.
+     * It is important to have only only buffer per thread (instead of creating each buffer per
+     * each {@code SnapshotFutureTask.PageStoreSerialWriter}) this is redundant and can lead to OOM errors. Direct buffer
+     * deallocate only when ByteBuffer is garbage collected, but it can get out of off-heap memory before it.
+     */
+    private final ThreadLocal<ByteBuffer> locBuff;
+
+    /** Map of registered cache snapshot processes and their corresponding contexts. */
+    private final ConcurrentMap<String, SnapshotFutureTask> locSnpTasks = new ConcurrentHashMap<>();
+
+    /** Lock to protect the resources is used. */
+    private final GridBusyLock busyLock = new GridBusyLock();
+
+    /** Requested snapshot from remote node. */
+    private final AtomicReference<RemoteSnapshotFuture> rmtSnpReq = new AtomicReference<>();
+
+    /** Mutex used to order cluster snapshot operation progress. */
+    private final Object snpOpMux = new Object();
+
+    /** Take snapshot operation procedure. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> startSnpProc;
+
+    /** Check previously preformed snapshot operation and delete uncompleted files if need. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> endSnpProc;
+
+    /** Resolved persistent data storage settings. */
+    private volatile PdsFolderSettings pdsSettings;
+
+    /** Fully initialized metastorage. */
+    private volatile ReadWriteMetastorage metaStorage;
+
+    /** Local snapshot sender factory. */
+    private Function<String, SnapshotSender> locSndrFactory = this::localSnapshotSender;
+
+    /** Main snapshot directory to save created snapshots. */
+    private volatile File locSnpDir;
+
+    /**
+     * Working directory for loaded snapshots from the remote nodes and storing
+     * temporary partition delta-files of locally started snapshot process.
+     */
+    private File tmpWorkDir;
+
+    /** Factory to working with delta as file storage. */
+    private volatile FileIOFactory ioFactory = new RandomAccessFileIOFactory();
+
+    /** Factory to create page store for restore. */
+    private volatile BiFunction<Integer, Boolean, FilePageStoreFactory> storeFactory;
+
+    /** Snapshot thread pool to perform local partition snapshots. */
+    private ExecutorService snpRunner;
+
+    /** System discovery message listener. */
+    private DiscoveryEventListener discoLsnr;
+
+    /** Cluster snapshot operation requested by user. */
+    private GridFutureAdapter<Void> clusterSnpFut;
+
+    /** Current snapshot operation on local node. */
+    private volatile SnapshotOperationRequest clusterSnpRq;
+
+    /** {@code true} if recovery process occurred for snapshot. */
+    private volatile boolean recovered;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public IgniteSnapshotManager(GridKernalContext ctx) {
+        locBuff = ThreadLocal.withInitial(() ->
+            ByteBuffer.allocateDirect(ctx.config().getDataStorageConfiguration().getPageSize())
+                .order(ByteOrder.nativeOrder()));
+
+        startSnpProc = new DistributedProcess<>(ctx, START_SNAPSHOT, this::startLocalSnapshot,
+            this::startLocalSnapshotResult);
+
+        endSnpProc = new DistributedProcess<>(ctx, END_SNAPSHOT, this::endLocalSnapshot,
+            this::endLocalSnapshotResult);
+    }
+
+    /**
+     * @param snapshotCacheDir Snapshot directory to store files.
+     * @param partId Cache partition identifier.
+     * @return A file representation.
+     */
+    public static File partDeltaFile(File snapshotCacheDir, int partId) {
+        return new File(snapshotCacheDir, partDeltaFileName(partId));
+    }
+
+    /**
+     * @param partId Partition id.
+     * @return File name of delta partition pages.
+     */
+    public static String partDeltaFileName(int partId) {
+        assert partId <= MAX_PARTITION_ID || partId == INDEX_PARTITION;
+
+        return partId == INDEX_PARTITION ? INDEX_DELTA_NAME : String.format(PART_DELTA_TEMPLATE, partId);
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void start0() throws IgniteCheckedException {
+        super.start0();
+
+        GridKernalContext ctx = cctx.kernalContext();
+
+        if (ctx.clientNode())
+            return;
+
+        if (!CU.isPersistenceEnabled(ctx.config()))
+            return;
+
+        snpRunner = new IgniteThreadPoolExecutor(SNAPSHOT_RUNNER_THREAD_PREFIX,
+            cctx.igniteInstanceName(),
+            SNAPSHOT_THREAD_POOL_SIZE,
+            SNAPSHOT_THREAD_POOL_SIZE,
+            IgniteConfiguration.DFLT_THREAD_KEEP_ALIVE_TIME,
+            new LinkedBlockingQueue<>(),
+            SYSTEM_POOL,
+            new OomExceptionHandler(ctx));
+
+        assert cctx.pageStore() instanceof FilePageStoreManager;
+
+        FilePageStoreManager storeMgr = (FilePageStoreManager)cctx.pageStore();
+
+        pdsSettings = cctx.kernalContext().pdsFolderResolver().resolveFolders();
+
+        locSnpDir = snapshotPath(ctx.config()).toFile();
+        tmpWorkDir = Paths.get(storeMgr.workDir().getAbsolutePath(), DFLT_SNAPSHOT_TMP_DIR).toFile();
+
+        U.ensureDirectory(locSnpDir, "snapshot work directory", log);
+        U.ensureDirectory(tmpWorkDir, "temp directory for snapshot creation", log);
+
+        storeFactory = storeMgr::getPageStoreFactory;
+
+        cctx.exchange().registerExchangeAwareComponent(this);
+        ctx.internalSubscriptionProcessor().registerMetastorageListener(this);
+
+        // Receive remote snapshots requests.
+        cctx.gridIO().addMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC, new GridMessageListener() {
+            @Override public void onMessage(UUID nodeId, Object msg, byte plc) {
+                if (!busyLock.enterBusy())
+                    return;
+
+                try {
+                    if (msg instanceof SnapshotRequestMessage) {
+                        SnapshotRequestMessage reqMsg0 = (SnapshotRequestMessage)msg;
+                        String snpName = reqMsg0.snapshotName();
+
+                        synchronized (this) {
+                            SnapshotFutureTask task = lastScheduledRemoteSnapshotTask(nodeId);
+
+                            if (task != null) {
+                                // Task will also be removed from local map due to the listener on future done.
+                                task.cancel();
+
+                                log.info("Snapshot request has been cancelled due to another request received " +
+                                    "[prevSnpResp=" + task + ", msg0=" + reqMsg0 + ']');
+                            }
+                        }
+
+                        SnapshotFutureTask task = registerSnapshotTask(snpName,
+                            nodeId,
+                            reqMsg0.parts(),
+                            remoteSnapshotSender(snpName, nodeId));
+
+                        task.listen(f -> {
+                            if (f.error() == null)
+                                return;
+
+                            U.error(log, "Failed to process request of creating a snapshot " +
+                                "[from=" + nodeId + ", msg=" + reqMsg0 + ']', f.error());
+
+                            try {
+                                cctx.gridIO().sendToCustomTopic(nodeId,
+                                    DFLT_INITIAL_SNAPSHOT_TOPIC,
+                                    new SnapshotResponseMessage(reqMsg0.snapshotName(), f.error().getMessage()),
+                                    SYSTEM_POOL);
+                            }
+                            catch (IgniteCheckedException ex0) {
+                                U.error(log, "Fail to send the response message with processing snapshot request " +
+                                    "error [request=" + reqMsg0 + ", nodeId=" + nodeId + ']', ex0);
+                            }
+                        });
+
+                        task.start();
+                    }
+                    else if (msg instanceof SnapshotResponseMessage) {
+                        SnapshotResponseMessage respMsg0 = (SnapshotResponseMessage)msg;
+
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.snpName.equals(respMsg0.snapshotName())) {
+                            if (log.isInfoEnabled()) {
+                                log.info("A stale snapshot response message has been received. Will be ignored " +
+                                    "[fromNodeId=" + nodeId + ", response=" + respMsg0 + ']');
+                            }
+
+                            return;
+                        }
+
+                        if (respMsg0.errorMessage() != null) {
+                            fut0.onDone(new IgniteCheckedException("Request cancelled. The snapshot operation stopped " +
+                                "on the remote node with an error: " + respMsg0.errorMessage()));
+                        }
+                    }
+                }
+                catch (Throwable e) {
+                    U.error(log, "Processing snapshot request from remote node fails with an error", e);
+
+                    cctx.kernalContext().failure().process(new FailureContext(FailureType.CRITICAL_ERROR, e));
+                }
+                finally {
+                    busyLock.leaveBusy();
+                }
+            }
+        });
+
+        cctx.gridEvents().addDiscoveryEventListener(discoLsnr = (evt, discoCache) -> {
+            if (!busyLock.enterBusy())
+                return;
+
+            try {
+                UUID leftNodeId = evt.eventNode().id();
+
+                if (evt.type() == EVT_DISCOVERY_CUSTOM_EVT) {
+                    DiscoveryCustomEvent evt0 = (DiscoveryCustomEvent)evt;
+
+                    if (evt0.customMessage() instanceof InitMessage) {
+                        InitMessage<?> msg = (InitMessage<?>)evt0.customMessage();
+
+                        // This happens when #takeSnapshot() method already invoked and distributed process
+                        // starts its action.
+                        if (msg.type() == START_SNAPSHOT.ordinal()) {
+                            assert clusterSnpRq != null ||
+                                !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()) : evt;
+
+                            DiscoveryCustomEvent customEvt = new DiscoveryCustomEvent();
+
+                            customEvt.node(evt0.node());
+                            customEvt.eventNode(evt0.eventNode());
+                            customEvt.affinityTopologyVersion(evt0.affinityTopologyVersion());
+                            customEvt.customMessage(new SnapshotStartDiscoveryMessage(discoCache, msg.processId()));
+
+                            // Handle new event inside discovery thread, so no guarantees will be violated.
+                            cctx.exchange().onDiscoveryEvent(customEvt, discoCache);
+                        }
+                    }
+                }
+                else if (evt.type() == EVT_NODE_LEFT || evt.type() == EVT_NODE_FAILED) {
+                    SnapshotOperationRequest snpRq = clusterSnpRq;
+
+                    for (SnapshotFutureTask sctx : locSnpTasks.values()) {
+                        if (sctx.sourceNodeId().equals(leftNodeId) ||
+                            (snpRq != null &&
+                                snpRq.snpName.equals(sctx.snapshotName()) &&
+                                snpRq.bltNodes.contains(leftNodeId))) {
+                            sctx.acceptException(new ClusterTopologyCheckedException("The node which requested snapshot " +
+                                "creation has left the grid"));
+                        }
+                    }
+
+                    RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                    if (snpTrFut != null && snpTrFut.rmtNodeId.equals(leftNodeId)) {
+                        snpTrFut.onDone(new ClusterTopologyCheckedException("The node from which a snapshot has been " +
+                            "requested left the grid"));
+                    }
+                }
+            }
+            finally {
+                busyLock.leaveBusy();
+            }
+        }, EVT_NODE_LEFT, EVT_NODE_FAILED, EVT_DISCOVERY_CUSTOM_EVT);
+
+        // Remote snapshot handler.
+        cctx.kernalContext().io().addTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC, new TransmissionHandler() {
+            @Override public void onEnd(UUID nodeId) {
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                assert snpTrFut.stores.isEmpty() : snpTrFut.stores.entrySet();
+                assert snpTrFut.partsLeft == 0 : snpTrFut;
+
+                snpTrFut.onDone();
+
+                log.info("Requested snapshot from remote node has been fully received " +
+                    "[snpName=" + snpTrFut.snpName + ", snpTrans=" + snpTrFut + ']');
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onException(UUID nodeId, Throwable err) {
+                RemoteSnapshotFuture fut = rmtSnpReq.get();
+
+                if (fut == null)
+                    return;
+
+                if (fut.rmtNodeId.equals(nodeId))
+                    fut.onDone(err);
+            }
+
+            /** {@inheritDoc} */
+            @Override public String filePath(UUID nodeId, TransmissionMeta fileMeta) {
+                Integer partId = (Integer)fileMeta.params().get(SNP_PART_ID_PARAM);
+                String rmtDbNodePath = (String)fileMeta.params().get(SNP_DB_NODE_PATH_PARAM);
+                String cacheDirName = (String)fileMeta.params().get(SNP_CACHE_DIR_NAME_PARAM);
+
+                RemoteSnapshotFuture transFut = resolve(nodeId, fileMeta);
+
+                try {
+                    File cacheDir = U.resolveWorkDirectory(tmpWorkDir.getAbsolutePath(),
+                        Paths.get(transFut.snpName, rmtDbNodePath, cacheDirName).toString(),
+                        false);
+
+                    return new File(cacheDir, getPartitionFileName(partId)).getAbsolutePath();
+                }
+                catch (IgniteCheckedException e) {
+                    throw new IgniteException(e);
+                }
+            }
+
+            /**
+             * @param nodeId Remote node id.
+             * @param meta Transmission meta.
+             * @return Resolved transmission future.
+             */
+            private RemoteSnapshotFuture resolve(UUID nodeId, TransmissionMeta meta) {
+                String snpName = (String)meta.params().get(SNP_NAME_PARAM);
+                Integer partsCnt = (Integer)meta.params().get(SNP_PARTITIONS_CNT);
+
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                if (snpTrFut == null || !snpTrFut.snpName.equals(snpName)) {
+                    throw new TransmissionCancelledException("Stale snapshot transmission will be ignored " +
+                        "[snpName=" + snpName + ", meta=" + meta + ", snpTrFut=" + snpTrFut + ']');
+                }
+
+                assert snpTrFut.snpName.equals(snpName) && snpTrFut.rmtNodeId.equals(nodeId) :
+                    "Another transmission in progress [snpTrFut=" + snpTrFut + ", nodeId=" + snpName + ']';
+
+                if (snpTrFut.partsLeft == -1)
+                    snpTrFut.partsLeft = partsCnt;
+
+                return snpTrFut;
+            }
+
+            /**
+             * @param snpTrans Current snapshot transmission.
+             * @param grpPartId Pair of group id and its partition id.
+             */
+            private void finishRecover(RemoteSnapshotFuture snpTrans, GroupPartitionId grpPartId) {
+                FilePageStore pageStore = null;
+
+                try {
+                    pageStore = snpTrans.stores.remove(grpPartId);
+
+                    pageStore.finishRecover();
+
+                    snpTrans.partConsumer.accept(new File(pageStore.getFileAbsolutePath()), grpPartId);
+
+                    snpTrans.partsLeft--;
+                }
+                catch (StorageException e) {
+                    throw new IgniteException(e);
+                }
+                finally {
+                    U.closeQuiet(pageStore);
+                }
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<ByteBuffer> chunkHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+
+                RemoteSnapshotFuture snpTrFut = resolve(nodeId, initMeta);
+
+                GroupPartitionId grpPartId = new GroupPartitionId(grpId, partId);
+                FilePageStore pageStore = snpTrFut.stores.get(grpPartId);
+
+                if (pageStore == null) {
+                    throw new IgniteException("Partition must be loaded before applying snapshot delta pages " +
+                        "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                pageStore.beginRecover();
+
+                // No snapshot delta pages received. Finalize recovery.
+                if (initMeta.count() == 0)
+                    finishRecover(snpTrFut, grpPartId);
+
+                return new Consumer<ByteBuffer>() {
+                    final LongAdder transferred = new LongAdder();
+
+                    @Override public void accept(ByteBuffer buff) {
+                        try {
+                            assert initMeta.count() != 0 : initMeta;
+
+                            RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                            if (fut0 == null || !fut0.equals(snpTrFut) || fut0.isCancelled()) {
+                                throw new TransmissionCancelledException("Snapshot request is cancelled " +
+                                    "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                            }
+
+                            pageStore.write(PageIO.getPageId(buff), buff, 0, false);
+
+                            transferred.add(buff.capacity());
+
+                            if (transferred.longValue() == initMeta.count())
+                                finishRecover(snpTrFut, grpPartId);
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                    }
+                };
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<File> fileHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+                String snpName = (String)initMeta.params().get(SNP_NAME_PARAM);
+
+                assert grpId != null;
+                assert partId != null;
+                assert snpName != null;
+                assert storeFactory != null;
+
+                RemoteSnapshotFuture transFut = rmtSnpReq.get();
+
+                if (transFut == null) {
+                    throw new IgniteException("Snapshot transmission with given name doesn't exists " +
+                        "[snpName=" + snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                return new Consumer<File>() {
+                    @Override public void accept(File file) {
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.equals(transFut) || fut0.isCancelled()) {
+                            throw new TransmissionCancelledException("Snapshot request is cancelled [snpName=" + snpName +
+                                ", grpId=" + grpId + ", partId=" + partId + ']');
+                        }
+
+                        busyLock.enterBusy();
+
+                        try {
+                            FilePageStore pageStore = (FilePageStore)storeFactory
+                                .apply(grpId, false)
+                                .createPageStore(getFlagByPartId(partId),
+                                    file::toPath,
+                                    new LongAdderMetric("NO_OP", null));
+
+                            transFut.stores.put(new GroupPartitionId(grpId, partId), pageStore);
+
+                            pageStore.init();
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                        finally {
+                            busyLock.leaveBusy();
+                        }
+                    }
+                };
+            }
+        });
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void stop0(boolean cancel) {
+        busyLock.block();
+
+        try {
+            // Try stop all snapshot processing if not yet.
+            for (SnapshotFutureTask sctx : locSnpTasks.values())
+                sctx.acceptException(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+            locSnpTasks.clear();
+
+            RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+            if (snpTrFut != null)
+                snpTrFut.cancel();
+
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null) {
+                    clusterSnpFut.onDone(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+                    clusterSnpFut = null;
+                }
+            }
+
+            if (snpRunner != null)
+                snpRunner.shutdownNow();
+
+            cctx.kernalContext().io().removeMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC);
+            cctx.kernalContext().io().removeTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC);
+
+            if (discoLsnr != null)
+                cctx.kernalContext().event().removeDiscoveryEventListener(discoLsnr);
+
+            cctx.exchange().unregisterExchangeAwareComponent(this);
+        }
+        finally {
+            busyLock.unblock();
+        }
+    }
+
+    /**
+     * Concurrently traverse the snapshot directory for given local node folder name and
 
 Review comment:
   See no concurrency here

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r410191925
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotSender.java
 ##########
 @@ -0,0 +1,234 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.util.Collection;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.Executor;
+import java.util.concurrent.locks.ReadWriteLock;
+import java.util.concurrent.locks.ReentrantReadWriteLock;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.marshaller.MappedName;
+import org.jetbrains.annotations.Nullable;
+
+/**
+ *
+ */
+abstract class SnapshotSender {
+    /** Busy processing lock. */
+    private final ReadWriteLock lock = new ReentrantReadWriteLock();
+
+    /** Executor to run operation at. */
+    private final Executor exec;
+
+    /** {@code true} if sender is currently working */
 
 Review comment:
   Point

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r409739291
 
 

 ##########
 File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotSelfTest.java
 ##########
 @@ -0,0 +1,734 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.file.OpenOption;
+import java.util.Collections;
+import java.util.List;
+import java.util.Random;
+import java.util.concurrent.Callable;
+import java.util.concurrent.CountDownLatch;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.function.Function;
+import java.util.function.Predicate;
+import org.apache.ignite.Ignite;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.Ignition;
+import org.apache.ignite.cache.CacheAtomicityMode;
+import org.apache.ignite.cache.affinity.rendezvous.RendezvousAffinityFunction;
+import org.apache.ignite.cache.query.ScanQuery;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.IgniteInterruptedCheckedException;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.TestRecordingCommunicationSpi;
+import org.apache.ignite.internal.events.DiscoveryCustomEvent;
+import org.apache.ignite.internal.managers.discovery.DiscoveryCustomMessage;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionDemandMessage;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionSupplyMessage;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.PartitionsExchangeAware;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.metric.MetricRegistry;
+import org.apache.ignite.internal.processors.metric.impl.ObjectGauge;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.distributed.FullMessage;
+import org.apache.ignite.internal.util.distributed.SingleNodeMessage;
+import org.apache.ignite.internal.util.typedef.G;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.spi.metric.LongMetric;
+import org.apache.ignite.testframework.GridTestUtils;
+import org.apache.ignite.transactions.Transaction;
+import org.junit.Before;
+import org.junit.Test;
+
+import static org.apache.ignite.cluster.ClusterState.ACTIVE;
+import static org.apache.ignite.internal.events.DiscoveryCustomEvent.EVT_DISCOVERY_CUSTOM_EVT;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.SNAPSHOT_METRICS;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.SNP_IN_PROGRESS_ERR_MSG;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.SNP_NODE_STOPPING_ERR_MSG;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.isSnapshotOperation;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.resolveSnapshotWorkDirectory;
+import static org.apache.ignite.testframework.GridTestUtils.assertThrowsAnyCause;
+import static org.apache.ignite.testframework.GridTestUtils.assertThrowsWithCause;
+
+/**
+ * Cluster-wide snapshot test.
+ */
+public class IgniteClusterSnapshotSelfTest extends AbstractSnapshotSelfTest {
+    /** Random instance. */
+    private static final Random R = new Random();
+
+    /** Time to wait while rebalance may happen. */
+    private static final long REBALANCE_AWAIT_TIME = GridTestUtils.SF.applyLB(10_000, 3_000);
+
+    /** Cache configuration for test. */
+    private static CacheConfiguration<Integer, Integer> txCcfg = new CacheConfiguration<Integer, Integer>("txCacheName")
+        .setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL)
+        .setBackups(2)
+        .setAffinity(new RendezvousAffinityFunction(false)
+            .setPartitions(CACHE_PARTS_COUNT));
+
+    /** {@code true} if node should be started in separate jvm. */
+    protected volatile boolean jvm;
+
+    /** @throws Exception If fails. */
+    @Before
+    @Override public void beforeTestSnapshot() throws Exception {
+        super.beforeTestSnapshot();
+
+        jvm = false;
+    }
+
+    /**
+     * Take snapshot from the whole cluster and check snapshot consistency.
+     * Note: Client nodes and server nodes not in baseline topology must not be affected.
+     *
+     * @throws Exception If fails.
+     */
+    @Test
+    public void testConsistentClusterSnapshotUnderLoad() throws Exception {
+        int grids = 3;
+        String snpName = "backup23012020";
+        AtomicInteger atKey = new AtomicInteger(CACHE_KEYS_RANGE);
+        AtomicInteger txKey = new AtomicInteger(CACHE_KEYS_RANGE);
+
+        IgniteEx ignite = startGrids(grids);
+        startClientGrid();
+
+        ignite.cluster().baselineAutoAdjustEnabled(false);
+        ignite.cluster().state(ACTIVE);
+
+        // Start node not in baseline.
+        IgniteEx notBltIgnite = startGrid(grids);
+        File locSnpDir = snp(notBltIgnite).snapshotLocalDir(SNAPSHOT_NAME);
+        String notBltDirName = folderName(notBltIgnite);
+
+        IgniteCache<Integer, Integer> cache = ignite.createCache(txCcfg);
+
+        for (int idx = 0; idx < CACHE_KEYS_RANGE; idx++) {
+            cache.put(txKey.incrementAndGet(), -1);
+            ignite.cache(DEFAULT_CACHE_NAME).put(atKey.incrementAndGet(), -1);
+        }
+
+        forceCheckpoint();
+
+        CountDownLatch loadLatch = new CountDownLatch(1);
+
+        ignite.context().cache().context().exchange().registerExchangeAwareComponent(new PartitionsExchangeAware() {
+            /** {@inheritDoc} */
+            @Override public void onInitBeforeTopologyLock(GridDhtPartitionsExchangeFuture fut) {
+                if (fut.firstEvent().type() != EVT_DISCOVERY_CUSTOM_EVT)
+                    return;
+
+                // First discovery custom event will be a snapshot operation.
+                assertTrue(isSnapshotOperation(fut.firstEvent()));
+                assertTrue("Snapshot must use pme-free exchange", fut.context().exchangeFreeSwitch());
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onInitAfterTopologyLock(GridDhtPartitionsExchangeFuture fut) {
+                if (fut.firstEvent().type() != EVT_DISCOVERY_CUSTOM_EVT)
+                    return;
+
+                DiscoveryCustomMessage msg = ((DiscoveryCustomEvent)fut.firstEvent()).customMessage();
+
+                assertNotNull(msg);
+
+                if (msg instanceof SnapshotDiscoveryMessage)
+                    loadLatch.countDown();
+            }
+        });
+
+        // Start cache load
+        IgniteInternalFuture<Long> loadFut = GridTestUtils.runMultiThreadedAsync(() -> {
+            try {
+                U.await(loadLatch);
+
+                while (!Thread.currentThread().isInterrupted()) {
+                    int txIdx = R.nextInt(grids);
+
+                    // zero out the sign bit
+                    grid(txIdx).cache(txCcfg.getName()).put(txKey.incrementAndGet(), R.nextInt() & Integer.MAX_VALUE);
+
+                    int atomicIdx = R.nextInt(grids);
+
+                    grid(atomicIdx).cache(DEFAULT_CACHE_NAME).put(atKey.incrementAndGet(), R.nextInt() & Integer.MAX_VALUE);
+                }
+            }
+            catch (IgniteInterruptedCheckedException e) {
+                throw new RuntimeException(e);
+            }
+        }, 3, "cache-put-");
+
+        try {
+            IgniteFuture<Void> fut = ignite.snapshot().createSnapshot(snpName);
+
+            U.await(loadLatch, 10, TimeUnit.SECONDS);
+
+            fut.get();
+        }
+        finally {
+            loadFut.cancel();
+        }
+
+        // cluster can be deactivated but we must test snapshot restore when binary recovery also occurred
+        stopAllGrids();
+
+        assertTrue("Snapshot directory must be empty for node not in baseline topology: " + notBltDirName,
+            !searchDirectoryRecursively(locSnpDir.toPath(), notBltDirName).isPresent());
+
+        IgniteEx snpIg0 = startGridsFromSnapshot(grids, snpName);
+
+        assertEquals("The number of all (primary + backup) cache keys mismatch for cache: " + DEFAULT_CACHE_NAME,
+            CACHE_KEYS_RANGE, snpIg0.cache(DEFAULT_CACHE_NAME).size());
+
+        assertEquals("The number of all (primary + backup) cache keys mismatch for cache: " + txCcfg.getName(),
+            CACHE_KEYS_RANGE, snpIg0.cache(txCcfg.getName()).size());
+
+        snpIg0.cache(DEFAULT_CACHE_NAME).query(new ScanQuery<>(null))
+            .forEach(e -> assertTrue("Snapshot must contains only negative values " +
+                "[cache=" + DEFAULT_CACHE_NAME + ", entry=" + e +']', (Integer)e.getValue() < 0));
+
+        snpIg0.cache(txCcfg.getName()).query(new ScanQuery<>(null))
+            .forEach(e -> assertTrue("Snapshot must contains only negative values " +
+                "[cache=" + txCcfg.getName() + ", entry=" + e + ']', (Integer)e.getValue() < 0));
+    }
+
+    /** @throws Exception If fails. */
+    @Test
+    public void testSnapshotPrimaryBackupsTheSame() throws Exception {
+        int grids = 3;
+        AtomicInteger cacheKey = new AtomicInteger();
+
+        IgniteEx ignite = startGridsWithCache(grids, dfltCacheCfg, CACHE_KEYS_RANGE);
+
+        IgniteInternalFuture<Long> atLoadFut = GridTestUtils.runMultiThreadedAsync(() -> {
+            while (!Thread.currentThread().isInterrupted()) {
+                int gId = R.nextInt(grids);
+
+                grid(gId).cache(DEFAULT_CACHE_NAME)
+                    .put(cacheKey.incrementAndGet(), 0);
+            }
+        }, 5, "atomic-cache-put-");
+
+        IgniteInternalFuture<Long> txLoadFut = GridTestUtils.runMultiThreadedAsync(() -> {
+            while (!Thread.currentThread().isInterrupted()) {
+                int gId = R.nextInt(grids);
+
+                IgniteCache<Integer, Integer> txCache = grid(gId).getOrCreateCache(txCcfg);
+
+                try (Transaction tx = grid(gId).transactions().txStart()) {
+                    txCache.put(cacheKey.incrementAndGet(), 0);
+
+                    tx.commit();
+                }
+            }
+        }, 5, "tx-cache-put-");
+
+        try {
+            IgniteFuture<Void> fut = ignite.snapshot().createSnapshot(SNAPSHOT_NAME);
+
+            fut.get();
+        }
+        finally {
+            txLoadFut.cancel();
+            atLoadFut.cancel();
+        }
+
+        stopAllGrids();
+
+        IgniteEx snpIg0 = startGridsFromSnapshot(grids, cfg -> resolveSnapshotWorkDirectory(cfg).getAbsolutePath(), SNAPSHOT_NAME, false);
+
+        // Block whole rebalancing.
+        for (Ignite g : G.allGrids())
+            TestRecordingCommunicationSpi.spi(g).blockMessages((node, msg) -> msg instanceof GridDhtPartitionDemandMessage);
+
+        snpIg0.cluster().state(ACTIVE);
+
+        assertFalse("Primary and backup in snapshot must have the same counters. Rebalance must not happen.",
+            GridTestUtils.waitForCondition(() -> {
+                boolean hasMsgs = false;
+
+                for (Ignite g : G.allGrids())
+                    hasMsgs |= TestRecordingCommunicationSpi.spi(g).hasBlockedMessages();
+
+                return hasMsgs;
+            }, REBALANCE_AWAIT_TIME));
+
+        TestRecordingCommunicationSpi.stopBlockAll();
+    }
+
+    /** @throws Exception If fails. */
+    @Test
+    public void testRejectCacheStopDuringClusterSnapshot() throws Exception {
+        // Block the full message, so cluster-wide snapshot operation would not be fully completed.
+        IgniteEx ignite = startGridsWithCache(3, dfltCacheCfg, CACHE_KEYS_RANGE);
+
+        BlockingCustomMessageDiscoverySpi spi = discoSpi(ignite);
+        spi.block((msg) -> {
+            if (msg instanceof FullMessage) {
+                FullMessage<?> msg0 = (FullMessage<?>)msg;
+
+                assertEquals("Snapshot distributed process must be used",
+                    DistributedProcess.DistributedProcessType.START_SNAPSHOT.ordinal(), msg0.type());
+
+                assertTrue("Snapshot has to be finished successfully on all nodes", msg0.error().isEmpty());
+
+                return true;
+            }
+
+            return false;
+        });
+
+        IgniteFuture<Void> fut = ignite.snapshot().createSnapshot(SNAPSHOT_NAME);
+
+        spi.waitBlocked(10_000L);
+
+        // Creating of new caches should not be blocked.
+        ignite.getOrCreateCache(dfltCacheCfg.setName("default2"))
+            .put(1, 1);
+
+        forceCheckpoint();
+
+        assertThrowsAnyCause(log,
+            () -> {
+                ignite.destroyCache(DEFAULT_CACHE_NAME);
+
+                return 0;
+            },
+            IgniteCheckedException.class,
+            SNP_IN_PROGRESS_ERR_MSG);
+
+        spi.unblock();
+
+        fut.get();
+    }
+
+    /** @throws Exception If fails. */
+    @Test
+    public void testBltChangeDuringClusterSnapshot() throws Exception {
+        IgniteEx ignite = startGridsWithCache(3, dfltCacheCfg, CACHE_KEYS_RANGE);
+
+        startGrid(3);
+
+        long topVer = ignite.cluster().topologyVersion();
+
+        BlockingCustomMessageDiscoverySpi spi = discoSpi(ignite);
+        spi.block((msg) -> msg instanceof FullMessage);
+
+        IgniteFuture<Void> fut = ignite.snapshot().createSnapshot(SNAPSHOT_NAME);
+
+        spi.waitBlocked(10_000L);
+
+        // Not baseline node joins successfully.
+        String grid4Dir = folderName(startGrid(4));
+
+        // Not blt node left the cluster and snapshot not affected.
+        stopGrid(4);
+
+        // Client node must connect successfully.
+        startClientGrid(4);
+
+        // Changing baseline complete successfully.
+        ignite.cluster().setBaselineTopology(topVer);
+
+        spi.unblock();
+
+        fut.get();
+
+        assertTrue("Snapshot directory must be empty for node 0 due to snapshot future fail: " + grid4Dir,
+            !searchDirectoryRecursively(snp(ignite).snapshotLocalDir(SNAPSHOT_NAME).toPath(), grid4Dir).isPresent());
+    }
+
+    /** @throws Exception If fails. */
+    @Test
+    public void testClusterSnapshotExOnInitiatorLeft() throws Exception {
+        IgniteEx ignite = startGridsWithCache(2, dfltCacheCfg, CACHE_KEYS_RANGE);
+
+        BlockingCustomMessageDiscoverySpi spi = discoSpi(ignite);
+        spi.block((msg) -> msg instanceof FullMessage);
+
+        IgniteFuture<Void> fut = ignite.snapshot().createSnapshot(SNAPSHOT_NAME);
+
+        spi.waitBlocked(10_000L);
+
+        ignite.close();
+
+        assertThrowsAnyCause(log,
+            fut::get,
+            NodeStoppingException.class,
+            SNP_NODE_STOPPING_ERR_MSG);
+    }
+
+    /** @throws Exception If fails. */
+    @Test
+    public void testSnapshotExistsException() throws Exception {
+        IgniteEx ignite = startGridsWithCache(2, dfltCacheCfg, CACHE_KEYS_RANGE);
+
+        ignite.snapshot().createSnapshot(SNAPSHOT_NAME).get();
+
+        assertThrowsAnyCause(log,
+            () -> ignite.snapshot().createSnapshot(SNAPSHOT_NAME).get(),
+            IgniteException.class,
+            "Snapshot with given name already exists.");
+
+        stopAllGrids();
+
+        // Check that snapshot has not been accidentally deleted.
+        IgniteEx snp = startGridsFromSnapshot(2, SNAPSHOT_NAME);
+
+        assertSnapshotCacheKeys(snp.cache(dfltCacheCfg.getName()));
+    }
+
+    /** @throws Exception If fails. */
+    @Test
+    public void testClusterSnapshotCleanedOnLeft() throws Exception {
+        CountDownLatch block = new CountDownLatch(1);
+        CountDownLatch partProcessed = new CountDownLatch(1);
+
+        IgniteEx ignite = startGridsWithCache(2, dfltCacheCfg, CACHE_KEYS_RANGE);
+
+        File locSnpDir = snp(ignite).snapshotLocalDir(SNAPSHOT_NAME);
+        String dirNameIgnite0 = folderName(ignite);
+
+        String dirNameIgnite1 = folderName(grid(1));
+
+        snp(grid(1)).localSnapshotSenderFactory(
+            blockingLocalSnapshotSender(grid(1), partProcessed, block));
+
+        TestRecordingCommunicationSpi commSpi1 = TestRecordingCommunicationSpi.spi(grid(1));
+        commSpi1.blockMessages((node, msg) -> msg instanceof SingleNodeMessage);
+
+        IgniteFuture<?> fut = ignite.snapshot().createSnapshot(SNAPSHOT_NAME);
+
+        U.await(partProcessed);
+
+        stopGrid(1);
+
+        block.countDown();
+
+        assertThrowsAnyCause(log,
+            fut::get,
+            IgniteCheckedException.class,
+            "Snapshot creation has been finished with an error");
+
+        assertTrue("Snapshot directory must be empty for node 0 due to snapshot future fail: " + dirNameIgnite0,
+            !searchDirectoryRecursively(locSnpDir.toPath(), dirNameIgnite0).isPresent());
+
+        startGrid(1);
+
+        awaitPartitionMapExchange();
+
+        // Snapshot directory must be cleaned.
+        assertTrue("Snapshot directory must be empty for node 1 due to snapshot future fail: " + dirNameIgnite1,
+            !searchDirectoryRecursively(locSnpDir.toPath(), dirNameIgnite1).isPresent());
+
+        List<String> allSnapshots = snp(ignite).getSnapshots();
+
+        assertTrue("Snapshot directory must be empty due to snapshot fail: " + allSnapshots,
+            allSnapshots.isEmpty());
+    }
+
+    /** @throws Exception If fails. */
+    @Test
+    public void testRecoveryClusterSnapshotJvmHalted() throws Exception {
+        IgniteEx ignite = startGridsWithCache(2, dfltCacheCfg, CACHE_KEYS_RANGE);
+
+        String grid0Dir = folderName(ignite);
+        String grid1Dir = folderName(grid(1));
+        File locSnpDir = snp(ignite).snapshotLocalDir(SNAPSHOT_NAME);
+
+        jvm = true;
+
+        IgniteConfiguration cfg2 = optimize(getConfiguration(getTestIgniteInstanceName(2)));
+
+        cfg2.getDataStorageConfiguration()
+            .setFileIOFactory(new HaltJvmFileIOFactory(new RandomAccessFileIOFactory(),
+                (Predicate<File> & Serializable) file -> {
+                    // Trying to create FileIO over partition file.
+                    return file.getAbsolutePath().contains(SNAPSHOT_NAME);
+                }));
+
+        startGrid(cfg2);
+
+        String grid2Dir = U.maskForFileName(cfg2.getConsistentId().toString());
+
+        jvm = false;
+
+        ignite.cluster().setBaselineTopology(ignite.cluster().topologyVersion());
+
+        awaitPartitionMapExchange();
+
+        assertThrowsAnyCause(log,
+            () -> ignite.snapshot().createSnapshot(SNAPSHOT_NAME).get(),
+            IgniteCheckedException.class,
+            "Snapshot creation has been finished with an error");
+
+        assertTrue("Snapshot directory must be empty: " + grid0Dir,
+            !searchDirectoryRecursively(locSnpDir.toPath(), grid0Dir).isPresent());
+
+        assertTrue("Snapshot directory must be empty: " + grid1Dir,
+            !searchDirectoryRecursively(locSnpDir.toPath(), grid1Dir).isPresent());
+
+        assertTrue("Snapshot directory must exist due to grid2 has been halted and cleanup not fully performed: " + grid2Dir,
+            searchDirectoryRecursively(locSnpDir.toPath(), grid2Dir).isPresent());
+
+        IgniteEx grid2 = startGrid(2);
+
+        assertTrue("Snapshot directory must be empty after recovery: " + grid2Dir,
+            !searchDirectoryRecursively(locSnpDir.toPath(), grid2Dir).isPresent());
+
+        awaitPartitionMapExchange();
+
+        assertTrue("Snapshot directory must be empty", grid2.snapshot().getSnapshots().isEmpty());
+
+        ignite.snapshot().createSnapshot(SNAPSHOT_NAME)
+            .get();
+
+        stopAllGrids();
+
+        IgniteEx snp = startGridsFromSnapshot(2, SNAPSHOT_NAME);
+
+        assertSnapshotCacheKeys(snp.cache(dfltCacheCfg.getName()));
+    }
+
+    /** @throws Exception If fails. */
+    @Test
+    public void testClusterSnapshotWithRebalancing() throws Exception {
+        IgniteEx ignite = startGridsWithCache(2, dfltCacheCfg, CACHE_KEYS_RANGE);
+
+        TestRecordingCommunicationSpi commSpi = TestRecordingCommunicationSpi.spi(ignite);
+        commSpi.blockMessages((node, msg) -> msg instanceof GridDhtPartitionSupplyMessage);
+
+        startGrid(2);
+
+        ignite.cluster().setBaselineTopology(ignite.cluster().topologyVersion());
+
+        commSpi.waitForBlocked();
+
+        IgniteFuture<Void> fut = ignite.snapshot().createSnapshot(SNAPSHOT_NAME);
+
+        commSpi.stopBlock(true);
+
+        fut.get();
+
+        stopAllGrids();
+
+        IgniteEx snp = startGridsFromSnapshot(3, SNAPSHOT_NAME);
+
+        awaitPartitionMapExchange();
+
+        assertSnapshotCacheKeys(snp.cache(dfltCacheCfg.getName()));
+    }
+
+    /** @throws Exception If fails. */
+    @Test
+    public void testClusterSnapshotWithExplicitPath() throws Exception {
+        File exSnpDir = U.resolveWorkDirectory(U.defaultWorkDirectory(), "ex_snapshots", true);
+
+        try {
+            IgniteEx ignite = null;
+
+            for (int i = 0; i < 2; i++) {
+                IgniteConfiguration cfg = optimize(getConfiguration(getTestIgniteInstanceName(i)));
+
+                cfg.setSnapshotPath(exSnpDir.getAbsolutePath());
+
+                ignite = startGrid(cfg);
+            }
+
+            ignite.cluster().baselineAutoAdjustEnabled(false);
+            ignite.cluster().state(ACTIVE);
+
+            for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+                ignite.cache(DEFAULT_CACHE_NAME).put(i, i);
+
+            forceCheckpoint();
 
 Review comment:
   Fixed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r409067245
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManager.java
 ##########
 @@ -361,19 +356,19 @@ public static String partDeltaFileName(int partId) {
         MetricRegistry mreg = cctx.kernalContext().metric().registry(SNAPSHOT_METRICS);
 
         mreg.register("LastSnapshotStartTime", () -> lastSeenSnpFut.startTime,
-            "The system time approximated by 10 ms when the last cluster snapshot operation has been started.");
+            "The system time approximated by 10 ms of the last started cluster snapshot request on this node.");
         mreg.register("LastSnapshotEndTime", () -> lastSeenSnpFut.endTime,
-            "The system time approximated by 10 ms when the last cluster snapshot operation has been finished.");
+            "The system time approximated by 10 ms of the last started cluster snapshot request on this node.");
 
 Review comment:
   The description is the same now. "approximated by 10 ms" can be omitted, but it's up to you.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r410205599
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManager.java
 ##########
 @@ -0,0 +1,1944 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.channels.FileChannel;
+import java.nio.file.FileVisitResult;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.SimpleFileVisitor;
+import java.nio.file.attribute.BasicFileAttributes;
+import java.util.ArrayDeque;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Queue;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.Executor;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.concurrent.atomic.LongAdder;
+import java.util.function.BiConsumer;
+import java.util.function.BiFunction;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.events.DiscoveryEvent;
+import org.apache.ignite.failure.FailureContext;
+import org.apache.ignite.failure.FailureType;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.GridTopic;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.events.DiscoveryCustomEvent;
+import org.apache.ignite.internal.managers.communication.GridIoManager;
+import org.apache.ignite.internal.managers.communication.GridMessageListener;
+import org.apache.ignite.internal.managers.communication.TransmissionCancelledException;
+import org.apache.ignite.internal.managers.communication.TransmissionHandler;
+import org.apache.ignite.internal.managers.communication.TransmissionMeta;
+import org.apache.ignite.internal.managers.communication.TransmissionPolicy;
+import org.apache.ignite.internal.managers.eventstorage.DiscoveryEventListener;
+import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.CacheType;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.PartitionsExchangeAware;
+import org.apache.ignite.internal.processors.cache.persistence.StorageException;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.filename.PdsFolderSettings;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.MetastorageLifecycleListener;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadOnlyMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadWriteMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.tree.io.PageIO;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.processors.marshaller.MappedName;
+import org.apache.ignite.internal.processors.metric.MetricRegistry;
+import org.apache.ignite.internal.processors.metric.impl.LongAdderMetric;
+import org.apache.ignite.internal.util.GridBusyLock;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.distributed.InitMessage;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.lang.GridClosureException;
+import org.apache.ignite.internal.util.tostring.GridToStringInclude;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.A;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.S;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.thread.IgniteThreadPoolExecutor;
+import org.apache.ignite.thread.OomExceptionHandler;
+import org.jetbrains.annotations.Nullable;
+
+import static java.nio.file.StandardOpenOption.READ;
+import static org.apache.ignite.cluster.ClusterState.active;
+import static org.apache.ignite.events.EventType.EVT_NODE_FAILED;
+import static org.apache.ignite.events.EventType.EVT_NODE_LEFT;
+import static org.apache.ignite.internal.IgniteFeatures.PERSISTENCE_CACHE_SNAPSHOT;
+import static org.apache.ignite.internal.IgniteFeatures.nodeSupports;
+import static org.apache.ignite.internal.MarshallerContextImpl.mappingFileStoreWorkDir;
+import static org.apache.ignite.internal.MarshallerContextImpl.saveMappings;
+import static org.apache.ignite.internal.events.DiscoveryCustomEvent.EVT_DISCOVERY_CUSTOM_EVT;
+import static org.apache.ignite.internal.managers.communication.GridIoPolicy.SYSTEM_POOL;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.INDEX_PARTITION;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.MAX_PARTITION_ID;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.resolveBinaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.INDEX_FILE_NAME;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.PART_FILE_TEMPLATE;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFile;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFileName;
+import static org.apache.ignite.internal.processors.cache.persistence.filename.PdsConsistentIdProcessor.DB_DEFAULT_FOLDER;
+import static org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId.getFlagByPartId;
+import static org.apache.ignite.internal.util.IgniteUtils.isLocalNodeCoordinator;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.END_SNAPSHOT;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.START_SNAPSHOT;
+
+/**
+ * Internal implementation of snapshot operations over persistence caches.
+ * <p>
+ * There are two major actions available:
+ * <ul>
+ *     <li>Create snapshot of the whole cluster cache groups by triggering PME to achieve consistency.</li>
+ *     <li>Create local snapshot of requested cache groups and send it to the node which request this operation.
+ *     Cache groups will be transmitted using internal API for transferring files. See {@link TransmissionHandler}.</li>
+ * </ul>
+ */
+public class IgniteSnapshotManager extends GridCacheSharedManagerAdapter
+    implements IgniteSnapshot, PartitionsExchangeAware, MetastorageLifecycleListener {
+    /** File with delta pages suffix. */
+    public static final String DELTA_SUFFIX = ".delta";
+
+    /** File name template consists of delta pages. */
+    public static final String PART_DELTA_TEMPLATE = PART_FILE_TEMPLATE + DELTA_SUFFIX;
+
+    /** File name template for index delta pages. */
+    public static final String INDEX_DELTA_NAME = INDEX_FILE_NAME + DELTA_SUFFIX;
+
+    /** Text Reason for checkpoint to start snapshot operation. */
+    public static final String CP_SNAPSHOT_REASON = "Checkpoint started to enforce snapshot operation: %s";
+
+    /** Name prefix for each remote snapshot operation. */
+    public static final String RMT_SNAPSHOT_PREFIX = "snapshot_";
+
+    /** Default snapshot directory for loading remote snapshots. */
+    public static final String DFLT_SNAPSHOT_TMP_DIR = "snp";
+
+    /** Timeout in millisecond for snapshot operations. */
+    public static final long DFLT_SNAPSHOT_TIMEOUT = 15_000L;
+
+    /** Snapshot in progress error message. */
+    public static final String SNP_IN_PROGRESS_ERR_MSG = "Operation rejected due to the snapshot operation in progress.";
+
+    /** Error message to finalize snapshot tasks. */
+    public static final String SNP_NODE_STOPPING_ERR_MSG = "Snapshot has been cancelled due to the local node " +
+        "is stopping";
+
+    /** Metastorage key to save currently running snapshot. */
+    public static final String SNP_RUNNING_KEY = "snapshot-running";
+
+    /** Snapshot metrics prefix. */
+    public static final String SNAPSHOT_METRICS = "snapshot";
+
+    /** Prefix for snapshot threads. */
+    private static final String SNAPSHOT_RUNNER_THREAD_PREFIX = "snapshot-runner";
+
+    /** Total number of thread to perform local snapshot. */
+    private static final int SNAPSHOT_THREAD_POOL_SIZE = 4;
+
+    /** Default snapshot topic to receive snapshots from remote node. */
+    private static final Object DFLT_INITIAL_SNAPSHOT_TOPIC = GridTopic.TOPIC_SNAPSHOT.topic("rmt_snp");
+
+    /** File transmission parameter of cache group id. */
+    private static final String SNP_GRP_ID_PARAM = "grpId";
+
+    /** File transmission parameter of cache partition id. */
+    private static final String SNP_PART_ID_PARAM = "partId";
+
+    /** File transmission parameter of node-sender directory path with its consistentId (e.g. db/IgniteNode0). */
+    private static final String SNP_DB_NODE_PATH_PARAM = "dbNodePath";
+
+    /** File transmission parameter of a cache directory with is currently sends its partitions. */
+    private static final String SNP_CACHE_DIR_NAME_PARAM = "cacheDirName";
+
+    /** Snapshot parameter name for a file transmission. */
+    private static final String SNP_NAME_PARAM = "snpName";
+
+    /** Total snapshot files count which receiver should expect to receive. */
+    private static final String SNP_PARTITIONS_CNT = "partsCnt";
+
+    /**
+     * Local buffer to perform copy-on-write operations with pages for {@code SnapshotFutureTask.PageStoreSerialWriter}s.
+     * It is important to have only only buffer per thread (instead of creating each buffer per
+     * each {@code SnapshotFutureTask.PageStoreSerialWriter}) this is redundant and can lead to OOM errors. Direct buffer
+     * deallocate only when ByteBuffer is garbage collected, but it can get out of off-heap memory before it.
+     */
+    private final ThreadLocal<ByteBuffer> locBuff;
+
+    /** Map of registered cache snapshot processes and their corresponding contexts. */
+    private final ConcurrentMap<String, SnapshotFutureTask> locSnpTasks = new ConcurrentHashMap<>();
+
+    /** Lock to protect the resources is used. */
+    private final GridBusyLock busyLock = new GridBusyLock();
+
+    /** Requested snapshot from remote node. */
+    private final AtomicReference<RemoteSnapshotFuture> rmtSnpReq = new AtomicReference<>();
+
+    /** Mutex used to order cluster snapshot operation progress. */
+    private final Object snpOpMux = new Object();
+
+    /** Take snapshot operation procedure. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> startSnpProc;
+
+    /** Check previously performed snapshot operation and delete uncompleted files if need. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> endSnpProc;
+
+    /** Resolved persistent data storage settings. */
+    private volatile PdsFolderSettings pdsSettings;
+
+    /** Fully initialized metastorage. */
+    private volatile ReadWriteMetastorage metaStorage;
+
+    /** Local snapshot sender factory. */
+    private Function<String, SnapshotSender> locSndrFactory = LocalSnapshotSender::new;
+
+    /** Main snapshot directory to save created snapshots. */
+    private volatile File locSnpDir;
+
+    /**
+     * Working directory for loaded snapshots from the remote nodes and storing
+     * temporary partition delta-files of locally started snapshot process.
+     */
+    private File tmpWorkDir;
+
+    /** Factory to working with delta as file storage. */
+    private volatile FileIOFactory ioFactory = new RandomAccessFileIOFactory();
+
+    /** Factory to create page store for restore. */
+    private volatile BiFunction<Integer, Boolean, FilePageStoreFactory> storeFactory;
+
+    /** Snapshot thread pool to perform local partition snapshots. */
+    private ExecutorService snpRunner;
+
+    /** System discovery message listener. */
+    private DiscoveryEventListener discoLsnr;
+
+    /** Cluster snapshot operation requested by user. */
+    private ClusterSnapshotFuture clusterSnpFut;
+
+    /** Current snapshot operation on local node. */
+    private volatile SnapshotOperationRequest clusterSnpReq;
+
+    /** {@code true} if recovery process occurred for snapshot. */
+    private volatile boolean recovered;
+
+    /** Last seen cluster snapshot operation. */
+    private volatile ClusterSnapshotFuture lastSeenSnpFut = new ClusterSnapshotFuture();
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public IgniteSnapshotManager(GridKernalContext ctx) {
+        locBuff = ThreadLocal.withInitial(() ->
+            ByteBuffer.allocateDirect(ctx.config().getDataStorageConfiguration().getPageSize())
+                .order(ByteOrder.nativeOrder()));
+
+        startSnpProc = new DistributedProcess<>(ctx, START_SNAPSHOT, this::initLocalSnapshotStartStage,
+            this::processLocalSnapshotStartStageResult, SnapshotStartDiscoveryMessage::new);
+
+        endSnpProc = new DistributedProcess<>(ctx, END_SNAPSHOT, this::initLocalSnapshotEndStage,
+            this::processLocalSnapshotEndStageResult);
+    }
+
+    /**
+     * @param snapshotCacheDir Snapshot directory to store files.
+     * @param partId Cache partition identifier.
+     * @return A file representation.
+     */
+    public static File partDeltaFile(File snapshotCacheDir, int partId) {
+        return new File(snapshotCacheDir, partDeltaFileName(partId));
+    }
+
+    /**
+     * @param partId Partition id.
+     * @return File name of delta partition pages.
+     */
+    public static String partDeltaFileName(int partId) {
+        assert partId <= MAX_PARTITION_ID || partId == INDEX_PARTITION;
+
+        return partId == INDEX_PARTITION ? INDEX_DELTA_NAME : String.format(PART_DELTA_TEMPLATE, partId);
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void start0() throws IgniteCheckedException {
+        super.start0();
+
+        GridKernalContext ctx = cctx.kernalContext();
+
+        if (ctx.clientNode())
+            return;
+
+        if (!CU.isPersistenceEnabled(ctx.config()))
+            return;
+
+        snpRunner = new IgniteThreadPoolExecutor(SNAPSHOT_RUNNER_THREAD_PREFIX,
+            cctx.igniteInstanceName(),
+            SNAPSHOT_THREAD_POOL_SIZE,
+            SNAPSHOT_THREAD_POOL_SIZE,
+            IgniteConfiguration.DFLT_THREAD_KEEP_ALIVE_TIME,
+            new LinkedBlockingQueue<>(),
+            SYSTEM_POOL,
+            new OomExceptionHandler(ctx));
+
+        assert cctx.pageStore() instanceof FilePageStoreManager;
+
+        FilePageStoreManager storeMgr = (FilePageStoreManager)cctx.pageStore();
+
+        pdsSettings = cctx.kernalContext().pdsFolderResolver().resolveFolders();
+
+        locSnpDir = resolveSnapshotWorkDirectory(ctx.config());
+        tmpWorkDir = Paths.get(storeMgr.workDir().getAbsolutePath(), DFLT_SNAPSHOT_TMP_DIR).toFile();
+
+        U.ensureDirectory(locSnpDir, "snapshot work directory", log);
+        U.ensureDirectory(tmpWorkDir, "temp directory for snapshot creation", log);
+
+        MetricRegistry mreg = cctx.kernalContext().metric().registry(SNAPSHOT_METRICS);
+
+        mreg.register("LastSnapshotStartTime", () -> lastSeenSnpFut.startTime,
+            "The system time of the last cluster snapshot request start time on this node.");
+        mreg.register("LastSnapshotEndTime", () -> lastSeenSnpFut.endTime,
+            "The system time of the last cluster snapshot request end time on this node.");
+        mreg.register("LastSnapshotName", () -> lastSeenSnpFut.name, String.class,
+            "The name of last started cluster snapshot request on this node.");
+        mreg.register("LastSnapshotErrorMessage",
+            () -> lastSeenSnpFut.error() == null ? null : lastSeenSnpFut.error().getMessage(),
+            String.class,
+            "The error message of last started cluster snapshot request which fail with an error. " +
+                "This value will be 'null' if last snapshot request has been completed successfully.");
+        mreg.register("LocalSnapshotList", this::getSnapshots, List.class,
+            "The list of names of all snapshots currently saved on the local node with respect to " +
+                "the configured via IgniteConfiguration snapshot working path.");
+
+        storeFactory = storeMgr::getPageStoreFactory;
+
+        cctx.exchange().registerExchangeAwareComponent(this);
+        ctx.internalSubscriptionProcessor().registerMetastorageListener(this);
+
+        // Receive remote snapshots requests.
+        cctx.gridIO().addMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC, new GridMessageListener() {
+            @Override public void onMessage(UUID nodeId, Object msg, byte plc) {
+                if (!busyLock.enterBusy())
+                    return;
+
+                try {
+                    if (msg instanceof SnapshotRequestMessage) {
+                        SnapshotRequestMessage reqMsg0 = (SnapshotRequestMessage)msg;
+                        String snpName = reqMsg0.snapshotName();
+
+                        synchronized (this) {
+                            SnapshotFutureTask task = lastScheduledRemoteSnapshotTask(nodeId);
+
+                            if (task != null) {
+                                // Task will also be removed from local map due to the listener on future done.
+                                task.cancel();
+
+                                log.info("Snapshot request has been cancelled due to another request received " +
+                                    "[prevSnpResp=" + task + ", msg0=" + reqMsg0 + ']');
+                            }
+                        }
+
+                        SnapshotFutureTask task = registerSnapshotTask(snpName,
+                            nodeId,
+                            reqMsg0.parts(),
+                            remoteSnapshotSender(snpName, nodeId));
+
+                        task.listen(f -> {
+                            if (f.error() == null)
+                                return;
+
+                            U.error(log, "Failed to process request of creating a snapshot " +
+                                "[from=" + nodeId + ", msg=" + reqMsg0 + ']', f.error());
+
+                            try {
+                                cctx.gridIO().sendToCustomTopic(nodeId,
+                                    DFLT_INITIAL_SNAPSHOT_TOPIC,
+                                    new SnapshotResponseMessage(reqMsg0.snapshotName(), f.error().getMessage()),
+                                    SYSTEM_POOL);
+                            }
+                            catch (IgniteCheckedException ex0) {
+                                U.error(log, "Fail to send the response message with processing snapshot request " +
+                                    "error [request=" + reqMsg0 + ", nodeId=" + nodeId + ']', ex0);
+                            }
+                        });
+
+                        task.start();
+                    }
+                    else if (msg instanceof SnapshotResponseMessage) {
+                        SnapshotResponseMessage respMsg0 = (SnapshotResponseMessage)msg;
+
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.snpName.equals(respMsg0.snapshotName())) {
+                            if (log.isInfoEnabled()) {
+                                log.info("A stale snapshot response message has been received. Will be ignored " +
+                                    "[fromNodeId=" + nodeId + ", response=" + respMsg0 + ']');
+                            }
+
+                            return;
+                        }
+
+                        if (respMsg0.errorMessage() != null) {
+                            fut0.onDone(new IgniteCheckedException("Request cancelled. The snapshot operation stopped " +
+                                "on the remote node with an error: " + respMsg0.errorMessage()));
+                        }
+                    }
+                }
+                catch (Throwable e) {
+                    U.error(log, "Processing snapshot request from remote node fails with an error", e);
+
+                    cctx.kernalContext().failure().process(new FailureContext(FailureType.CRITICAL_ERROR, e));
+                }
+                finally {
+                    busyLock.leaveBusy();
+                }
+            }
+        });
+
+        cctx.gridEvents().addDiscoveryEventListener(discoLsnr = (evt, discoCache) -> {
+            if (!busyLock.enterBusy())
+                return;
+
+            try {
+                UUID leftNodeId = evt.eventNode().id();
+
+                if (evt.type() == EVT_NODE_LEFT || evt.type() == EVT_NODE_FAILED) {
+                    SnapshotOperationRequest snpReq = clusterSnpReq;
+
+                    for (SnapshotFutureTask sctx : locSnpTasks.values()) {
+                        if (sctx.sourceNodeId().equals(leftNodeId) ||
+                            (snpReq != null &&
+                                snpReq.snpName.equals(sctx.snapshotName()) &&
+                                snpReq.bltNodes.contains(leftNodeId))) {
+                            sctx.acceptException(new ClusterTopologyCheckedException("Snapshot operation interrupted. " +
+                                "One of baseline nodes left the cluster: " + leftNodeId));
+                        }
+                    }
+
+                    RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                    if (snpTrFut != null && snpTrFut.rmtNodeId.equals(leftNodeId)) {
+                        snpTrFut.onDone(new ClusterTopologyCheckedException("The node from which a snapshot has been " +
+                            "requested left the grid"));
+                    }
+                }
+            }
+            finally {
+                busyLock.leaveBusy();
+            }
+        }, EVT_NODE_LEFT, EVT_NODE_FAILED);
+
+        // Remote snapshot handler.
+        cctx.kernalContext().io().addTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC, new TransmissionHandler() {
+            @Override public void onEnd(UUID nodeId) {
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                assert snpTrFut.stores.isEmpty() : snpTrFut.stores.entrySet();
+                assert snpTrFut.partsLeft == 0 : snpTrFut;
+
+                snpTrFut.onDone();
+
+                log.info("Requested snapshot from remote node has been fully received " +
+                    "[snpName=" + snpTrFut.snpName + ", snpTrans=" + snpTrFut + ']');
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onException(UUID nodeId, Throwable err) {
+                RemoteSnapshotFuture fut = rmtSnpReq.get();
+
+                if (fut == null)
+                    return;
+
+                if (fut.rmtNodeId.equals(nodeId))
+                    fut.onDone(err);
+            }
+
+            /** {@inheritDoc} */
+            @Override public String filePath(UUID nodeId, TransmissionMeta fileMeta) {
+                Integer partId = (Integer)fileMeta.params().get(SNP_PART_ID_PARAM);
+                String rmtDbNodePath = (String)fileMeta.params().get(SNP_DB_NODE_PATH_PARAM);
+                String cacheDirName = (String)fileMeta.params().get(SNP_CACHE_DIR_NAME_PARAM);
+
+                RemoteSnapshotFuture transFut = resolve(nodeId, fileMeta);
+
+                try {
+                    File cacheDir = U.resolveWorkDirectory(tmpWorkDir.getAbsolutePath(),
+                        Paths.get(transFut.snpName, rmtDbNodePath, cacheDirName).toString(),
+                        false);
+
+                    return new File(cacheDir, getPartitionFileName(partId)).getAbsolutePath();
+                }
+                catch (IgniteCheckedException e) {
+                    throw new IgniteException(e);
+                }
+            }
+
+            /**
+             * @param nodeId Remote node id.
+             * @param meta Transmission meta.
+             * @return Resolved transmission future.
+             */
+            private RemoteSnapshotFuture resolve(UUID nodeId, TransmissionMeta meta) {
+                String snpName = (String)meta.params().get(SNP_NAME_PARAM);
+                Integer partsCnt = (Integer)meta.params().get(SNP_PARTITIONS_CNT);
+
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                if (snpTrFut == null || !snpTrFut.snpName.equals(snpName)) {
+                    throw new TransmissionCancelledException("Stale snapshot transmission will be ignored " +
+                        "[snpName=" + snpName + ", meta=" + meta + ", snpTrFut=" + snpTrFut + ']');
+                }
+
+                assert snpTrFut.snpName.equals(snpName) && snpTrFut.rmtNodeId.equals(nodeId) :
+                    "Another transmission in progress [snpTrFut=" + snpTrFut + ", nodeId=" + snpName + ']';
+
+                if (snpTrFut.partsLeft == -1)
+                    snpTrFut.partsLeft = partsCnt;
+
+                return snpTrFut;
+            }
+
+            /**
+             * @param snpTrans Current snapshot transmission.
+             * @param grpPartId Pair of group id and its partition id.
+             */
+            private void finishRecover(RemoteSnapshotFuture snpTrans, GroupPartitionId grpPartId) {
+                FilePageStore pageStore = null;
+
+                try {
+                    pageStore = snpTrans.stores.remove(grpPartId);
+
+                    pageStore.finishRecover();
+
+                    snpTrans.partConsumer.accept(new File(pageStore.getFileAbsolutePath()), grpPartId);
+
+                    snpTrans.partsLeft--;
+                }
+                catch (StorageException e) {
+                    throw new IgniteException(e);
+                }
+                finally {
+                    U.closeQuiet(pageStore);
+                }
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<ByteBuffer> chunkHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+
+                RemoteSnapshotFuture snpTrFut = resolve(nodeId, initMeta);
+
+                GroupPartitionId grpPartId = new GroupPartitionId(grpId, partId);
+                FilePageStore pageStore = snpTrFut.stores.get(grpPartId);
+
+                if (pageStore == null) {
+                    throw new IgniteException("Partition must be loaded before applying snapshot delta pages " +
+                        "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                pageStore.beginRecover();
+
+                // No snapshot delta pages received. Finalize recovery.
+                if (initMeta.count() == 0)
+                    finishRecover(snpTrFut, grpPartId);
+
+                return new Consumer<ByteBuffer>() {
+                    final LongAdder transferred = new LongAdder();
+
+                    @Override public void accept(ByteBuffer buff) {
+                        try {
+                            assert initMeta.count() != 0 : initMeta;
+
+                            RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                            if (fut0 == null || !fut0.equals(snpTrFut) || fut0.isCancelled()) {
+                                throw new TransmissionCancelledException("Snapshot request is cancelled " +
+                                    "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                            }
+
+                            pageStore.write(PageIO.getPageId(buff), buff, 0, false);
+
+                            transferred.add(buff.capacity());
+
+                            if (transferred.longValue() == initMeta.count())
+                                finishRecover(snpTrFut, grpPartId);
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                    }
+                };
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<File> fileHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+                String snpName = (String)initMeta.params().get(SNP_NAME_PARAM);
+
+                assert grpId != null;
+                assert partId != null;
+                assert snpName != null;
+                assert storeFactory != null;
+
+                RemoteSnapshotFuture transFut = rmtSnpReq.get();
+
+                if (transFut == null) {
+                    throw new IgniteException("Snapshot transmission with given name doesn't exists " +
+                        "[snpName=" + snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                return new Consumer<File>() {
+                    @Override public void accept(File file) {
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.equals(transFut) || fut0.isCancelled()) {
+                            throw new TransmissionCancelledException("Snapshot request is cancelled [snpName=" + snpName +
+                                ", grpId=" + grpId + ", partId=" + partId + ']');
+                        }
+
+                        busyLock.enterBusy();
+
+                        try {
+                            FilePageStore pageStore = (FilePageStore)storeFactory
+                                .apply(grpId, false)
+                                .createPageStore(getFlagByPartId(partId),
+                                    file::toPath,
+                                    new LongAdderMetric("NO_OP", null));
+
+                            transFut.stores.put(new GroupPartitionId(grpId, partId), pageStore);
+
+                            pageStore.init();
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                        finally {
+                            busyLock.leaveBusy();
+                        }
+                    }
+                };
+            }
+        });
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void stop0(boolean cancel) {
+        busyLock.block();
+
+        try {
+            // Try stop all snapshot processing if not yet.
+            for (SnapshotFutureTask sctx : locSnpTasks.values())
+                sctx.acceptException(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+            locSnpTasks.clear();
+
+            RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+            if (snpTrFut != null)
+                snpTrFut.cancel();
+
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null) {
+                    clusterSnpFut.onDone(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+                    clusterSnpFut = null;
+                }
+            }
+
+            if (snpRunner != null)
+                snpRunner.shutdownNow();
+
+            cctx.kernalContext().io().removeMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC);
+            cctx.kernalContext().io().removeTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC);
+
+            if (discoLsnr != null)
+                cctx.kernalContext().event().removeDiscoveryEventListener(discoLsnr);
+
+            cctx.exchange().unregisterExchangeAwareComponent(this);
+        }
+        finally {
+            busyLock.unblock();
+        }
+    }
+
+    /**
+     * @param snpDir Snapshot dire
 
 Review comment:
   fixed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r409381635
 
 

 ##########
 File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotSelfTest.java
 ##########
 @@ -0,0 +1,734 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.file.OpenOption;
+import java.util.Collections;
+import java.util.List;
+import java.util.Random;
+import java.util.concurrent.Callable;
+import java.util.concurrent.CountDownLatch;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.function.Function;
+import java.util.function.Predicate;
+import org.apache.ignite.Ignite;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.Ignition;
+import org.apache.ignite.cache.CacheAtomicityMode;
+import org.apache.ignite.cache.affinity.rendezvous.RendezvousAffinityFunction;
+import org.apache.ignite.cache.query.ScanQuery;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.IgniteInterruptedCheckedException;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.TestRecordingCommunicationSpi;
+import org.apache.ignite.internal.events.DiscoveryCustomEvent;
+import org.apache.ignite.internal.managers.discovery.DiscoveryCustomMessage;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionDemandMessage;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionSupplyMessage;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.PartitionsExchangeAware;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.metric.MetricRegistry;
+import org.apache.ignite.internal.processors.metric.impl.ObjectGauge;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.distributed.FullMessage;
+import org.apache.ignite.internal.util.distributed.SingleNodeMessage;
+import org.apache.ignite.internal.util.typedef.G;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.spi.metric.LongMetric;
+import org.apache.ignite.testframework.GridTestUtils;
+import org.apache.ignite.transactions.Transaction;
+import org.junit.Before;
+import org.junit.Test;
+
+import static org.apache.ignite.cluster.ClusterState.ACTIVE;
+import static org.apache.ignite.internal.events.DiscoveryCustomEvent.EVT_DISCOVERY_CUSTOM_EVT;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.SNAPSHOT_METRICS;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.SNP_IN_PROGRESS_ERR_MSG;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.SNP_NODE_STOPPING_ERR_MSG;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.isSnapshotOperation;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.resolveSnapshotWorkDirectory;
+import static org.apache.ignite.testframework.GridTestUtils.assertThrowsAnyCause;
+import static org.apache.ignite.testframework.GridTestUtils.assertThrowsWithCause;
+
+/**
+ * Cluster-wide snapshot test.
+ */
+public class IgniteClusterSnapshotSelfTest extends AbstractSnapshotSelfTest {
+    /** Random instance. */
+    private static final Random R = new Random();
+
+    /** Time to wait while rebalance may happen. */
+    private static final long REBALANCE_AWAIT_TIME = GridTestUtils.SF.applyLB(10_000, 3_000);
+
+    /** Cache configuration for test. */
+    private static CacheConfiguration<Integer, Integer> txCcfg = new CacheConfiguration<Integer, Integer>("txCacheName")
+        .setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL)
+        .setBackups(2)
+        .setAffinity(new RendezvousAffinityFunction(false)
+            .setPartitions(CACHE_PARTS_COUNT));
+
+    /** {@code true} if node should be started in separate jvm. */
+    protected volatile boolean jvm;
+
+    /** @throws Exception If fails. */
+    @Before
+    @Override public void beforeTestSnapshot() throws Exception {
+        super.beforeTestSnapshot();
+
+        jvm = false;
+    }
+
+    /**
+     * Take snapshot from the whole cluster and check snapshot consistency.
+     * Note: Client nodes and server nodes not in baseline topology must not be affected.
+     *
+     * @throws Exception If fails.
+     */
+    @Test
+    public void testConsistentClusterSnapshotUnderLoad() throws Exception {
+        int grids = 3;
+        String snpName = "backup23012020";
+        AtomicInteger atKey = new AtomicInteger(CACHE_KEYS_RANGE);
+        AtomicInteger txKey = new AtomicInteger(CACHE_KEYS_RANGE);
+
+        IgniteEx ignite = startGrids(grids);
+        startClientGrid();
+
+        ignite.cluster().baselineAutoAdjustEnabled(false);
+        ignite.cluster().state(ACTIVE);
+
+        // Start node not in baseline.
+        IgniteEx notBltIgnite = startGrid(grids);
+        File locSnpDir = snp(notBltIgnite).snapshotLocalDir(SNAPSHOT_NAME);
+        String notBltDirName = folderName(notBltIgnite);
+
+        IgniteCache<Integer, Integer> cache = ignite.createCache(txCcfg);
+
+        for (int idx = 0; idx < CACHE_KEYS_RANGE; idx++) {
+            cache.put(txKey.incrementAndGet(), -1);
+            ignite.cache(DEFAULT_CACHE_NAME).put(atKey.incrementAndGet(), -1);
+        }
+
+        forceCheckpoint();
+
+        CountDownLatch loadLatch = new CountDownLatch(1);
+
+        ignite.context().cache().context().exchange().registerExchangeAwareComponent(new PartitionsExchangeAware() {
+            /** {@inheritDoc} */
+            @Override public void onInitBeforeTopologyLock(GridDhtPartitionsExchangeFuture fut) {
+                if (fut.firstEvent().type() != EVT_DISCOVERY_CUSTOM_EVT)
+                    return;
+
+                // First discovery custom event will be a snapshot operation.
+                assertTrue(isSnapshotOperation(fut.firstEvent()));
+                assertTrue("Snapshot must use pme-free exchange", fut.context().exchangeFreeSwitch());
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onInitAfterTopologyLock(GridDhtPartitionsExchangeFuture fut) {
+                if (fut.firstEvent().type() != EVT_DISCOVERY_CUSTOM_EVT)
+                    return;
+
+                DiscoveryCustomMessage msg = ((DiscoveryCustomEvent)fut.firstEvent()).customMessage();
+
+                assertNotNull(msg);
+
+                if (msg instanceof SnapshotDiscoveryMessage)
+                    loadLatch.countDown();
+            }
+        });
+
+        // Start cache load
+        IgniteInternalFuture<Long> loadFut = GridTestUtils.runMultiThreadedAsync(() -> {
+            try {
+                U.await(loadLatch);
+
+                while (!Thread.currentThread().isInterrupted()) {
+                    int txIdx = R.nextInt(grids);
 
 Review comment:
   It's better to use ThreadLocalRandom for multi-threads even for tests.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r409544257
 
 

 ##########
 File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotSelfTest.java
 ##########
 @@ -0,0 +1,734 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.file.OpenOption;
+import java.util.Collections;
+import java.util.List;
+import java.util.Random;
+import java.util.concurrent.Callable;
+import java.util.concurrent.CountDownLatch;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.function.Function;
+import java.util.function.Predicate;
+import org.apache.ignite.Ignite;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.Ignition;
+import org.apache.ignite.cache.CacheAtomicityMode;
+import org.apache.ignite.cache.affinity.rendezvous.RendezvousAffinityFunction;
+import org.apache.ignite.cache.query.ScanQuery;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.IgniteInterruptedCheckedException;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.TestRecordingCommunicationSpi;
+import org.apache.ignite.internal.events.DiscoveryCustomEvent;
+import org.apache.ignite.internal.managers.discovery.DiscoveryCustomMessage;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionDemandMessage;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionSupplyMessage;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.PartitionsExchangeAware;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.metric.MetricRegistry;
+import org.apache.ignite.internal.processors.metric.impl.ObjectGauge;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.distributed.FullMessage;
+import org.apache.ignite.internal.util.distributed.SingleNodeMessage;
+import org.apache.ignite.internal.util.typedef.G;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.spi.metric.LongMetric;
+import org.apache.ignite.testframework.GridTestUtils;
+import org.apache.ignite.transactions.Transaction;
+import org.junit.Before;
+import org.junit.Test;
+
+import static org.apache.ignite.cluster.ClusterState.ACTIVE;
+import static org.apache.ignite.internal.events.DiscoveryCustomEvent.EVT_DISCOVERY_CUSTOM_EVT;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.SNAPSHOT_METRICS;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.SNP_IN_PROGRESS_ERR_MSG;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.SNP_NODE_STOPPING_ERR_MSG;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.isSnapshotOperation;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.resolveSnapshotWorkDirectory;
+import static org.apache.ignite.testframework.GridTestUtils.assertThrowsAnyCause;
+import static org.apache.ignite.testframework.GridTestUtils.assertThrowsWithCause;
+
+/**
+ * Cluster-wide snapshot test.
+ */
+public class IgniteClusterSnapshotSelfTest extends AbstractSnapshotSelfTest {
+    /** Random instance. */
+    private static final Random R = new Random();
+
+    /** Time to wait while rebalance may happen. */
+    private static final long REBALANCE_AWAIT_TIME = GridTestUtils.SF.applyLB(10_000, 3_000);
+
+    /** Cache configuration for test. */
+    private static CacheConfiguration<Integer, Integer> txCcfg = new CacheConfiguration<Integer, Integer>("txCacheName")
+        .setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL)
+        .setBackups(2)
+        .setAffinity(new RendezvousAffinityFunction(false)
+            .setPartitions(CACHE_PARTS_COUNT));
+
+    /** {@code true} if node should be started in separate jvm. */
+    protected volatile boolean jvm;
+
+    /** @throws Exception If fails. */
+    @Before
+    @Override public void beforeTestSnapshot() throws Exception {
+        super.beforeTestSnapshot();
+
+        jvm = false;
+    }
+
+    /**
+     * Take snapshot from the whole cluster and check snapshot consistency.
+     * Note: Client nodes and server nodes not in baseline topology must not be affected.
+     *
+     * @throws Exception If fails.
+     */
+    @Test
+    public void testConsistentClusterSnapshotUnderLoad() throws Exception {
+        int grids = 3;
+        String snpName = "backup23012020";
+        AtomicInteger atKey = new AtomicInteger(CACHE_KEYS_RANGE);
+        AtomicInteger txKey = new AtomicInteger(CACHE_KEYS_RANGE);
+
+        IgniteEx ignite = startGrids(grids);
+        startClientGrid();
+
+        ignite.cluster().baselineAutoAdjustEnabled(false);
+        ignite.cluster().state(ACTIVE);
+
+        // Start node not in baseline.
+        IgniteEx notBltIgnite = startGrid(grids);
+        File locSnpDir = snp(notBltIgnite).snapshotLocalDir(SNAPSHOT_NAME);
+        String notBltDirName = folderName(notBltIgnite);
+
+        IgniteCache<Integer, Integer> cache = ignite.createCache(txCcfg);
+
+        for (int idx = 0; idx < CACHE_KEYS_RANGE; idx++) {
+            cache.put(txKey.incrementAndGet(), -1);
+            ignite.cache(DEFAULT_CACHE_NAME).put(atKey.incrementAndGet(), -1);
+        }
+
+        forceCheckpoint();
+
+        CountDownLatch loadLatch = new CountDownLatch(1);
+
+        ignite.context().cache().context().exchange().registerExchangeAwareComponent(new PartitionsExchangeAware() {
+            /** {@inheritDoc} */
+            @Override public void onInitBeforeTopologyLock(GridDhtPartitionsExchangeFuture fut) {
+                if (fut.firstEvent().type() != EVT_DISCOVERY_CUSTOM_EVT)
+                    return;
+
+                // First discovery custom event will be a snapshot operation.
+                assertTrue(isSnapshotOperation(fut.firstEvent()));
+                assertTrue("Snapshot must use pme-free exchange", fut.context().exchangeFreeSwitch());
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onInitAfterTopologyLock(GridDhtPartitionsExchangeFuture fut) {
+                if (fut.firstEvent().type() != EVT_DISCOVERY_CUSTOM_EVT)
+                    return;
+
+                DiscoveryCustomMessage msg = ((DiscoveryCustomEvent)fut.firstEvent()).customMessage();
+
+                assertNotNull(msg);
+
+                if (msg instanceof SnapshotDiscoveryMessage)
+                    loadLatch.countDown();
+            }
+        });
+
+        // Start cache load
+        IgniteInternalFuture<Long> loadFut = GridTestUtils.runMultiThreadedAsync(() -> {
+            try {
+                U.await(loadLatch);
+
+                while (!Thread.currentThread().isInterrupted()) {
+                    int txIdx = R.nextInt(grids);
 
 Review comment:
   Fixed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r408001728
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManager.java
 ##########
 @@ -0,0 +1,1894 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.channels.FileChannel;
+import java.nio.file.FileVisitResult;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.SimpleFileVisitor;
+import java.nio.file.attribute.BasicFileAttributes;
+import java.util.ArrayDeque;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Queue;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.Executor;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.concurrent.atomic.LongAdder;
+import java.util.function.BiConsumer;
+import java.util.function.BiFunction;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.events.DiscoveryEvent;
+import org.apache.ignite.failure.FailureContext;
+import org.apache.ignite.failure.FailureType;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.GridTopic;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.events.DiscoveryCustomEvent;
+import org.apache.ignite.internal.managers.communication.GridIoManager;
+import org.apache.ignite.internal.managers.communication.GridMessageListener;
+import org.apache.ignite.internal.managers.communication.TransmissionCancelledException;
+import org.apache.ignite.internal.managers.communication.TransmissionHandler;
+import org.apache.ignite.internal.managers.communication.TransmissionMeta;
+import org.apache.ignite.internal.managers.communication.TransmissionPolicy;
+import org.apache.ignite.internal.managers.discovery.DiscoCache;
+import org.apache.ignite.internal.managers.discovery.DiscoveryCustomMessage;
+import org.apache.ignite.internal.managers.discovery.GridDiscoveryManager;
+import org.apache.ignite.internal.managers.eventstorage.DiscoveryEventListener;
+import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.CacheType;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.PartitionsExchangeAware;
+import org.apache.ignite.internal.processors.cache.persistence.StorageException;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.filename.PdsFolderSettings;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.MetastorageLifecycleListener;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadOnlyMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadWriteMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.tree.io.PageIO;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.processors.marshaller.MappedName;
+import org.apache.ignite.internal.processors.metric.impl.LongAdderMetric;
+import org.apache.ignite.internal.util.GridBusyLock;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.distributed.InitMessage;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.tostring.GridToStringInclude;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.S;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.lang.IgniteUuid;
+import org.apache.ignite.thread.IgniteThreadPoolExecutor;
+import org.apache.ignite.thread.OomExceptionHandler;
+import org.jetbrains.annotations.Nullable;
+
+import static java.nio.file.StandardOpenOption.READ;
+import static org.apache.ignite.cluster.ClusterState.active;
+import static org.apache.ignite.configuration.IgniteConfiguration.DFLT_SNAPSHOT_DIRECTORY;
+import static org.apache.ignite.events.EventType.EVT_NODE_FAILED;
+import static org.apache.ignite.events.EventType.EVT_NODE_LEFT;
+import static org.apache.ignite.internal.IgniteFeatures.PERSISTENCE_CACHE_SNAPSHOT;
+import static org.apache.ignite.internal.IgniteFeatures.nodeSupports;
+import static org.apache.ignite.internal.MarshallerContextImpl.saveMappings;
+import static org.apache.ignite.internal.events.DiscoveryCustomEvent.EVT_DISCOVERY_CUSTOM_EVT;
+import static org.apache.ignite.internal.managers.communication.GridIoPolicy.SYSTEM_POOL;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.INDEX_PARTITION;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.MAX_PARTITION_ID;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.INDEX_FILE_NAME;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.PART_FILE_TEMPLATE;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFile;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFileName;
+import static org.apache.ignite.internal.processors.cache.persistence.filename.PdsConsistentIdProcessor.DB_DEFAULT_FOLDER;
+import static org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId.getFlagByPartId;
+import static org.apache.ignite.internal.util.IgniteUtils.isLocalNodeCoordinator;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.END_SNAPSHOT;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.START_SNAPSHOT;
+
+/**
+ * Internal implementation of snapshot operations over persistence caches.
+ * <p>
+ * There are two major actions available:
+ * <ul>
+ *     <li>Create snapshot of the whole cluster cache groups by triggering PME to achieve consistency.</li>
+ *     <li>Create local snapshot of requested cache groups and send it to the node which request this operation.
+ *     Cache groups will be transmitted using internal API for transferring files. See {@link TransmissionHandler}.</li>
+ * </ul>
+ */
+public class IgniteSnapshotManager extends GridCacheSharedManagerAdapter
+    implements IgniteSnapshot, PartitionsExchangeAware, MetastorageLifecycleListener {
+    /** File with delta pages suffix. */
+    public static final String DELTA_SUFFIX = ".delta";
+
+    /** File name template consists of delta pages. */
+    public static final String PART_DELTA_TEMPLATE = PART_FILE_TEMPLATE + DELTA_SUFFIX;
+
+    /** File name template for index delta pages. */
+    public static final String INDEX_DELTA_NAME = INDEX_FILE_NAME + DELTA_SUFFIX;
+
+    /** Text Reason for checkpoint to start snapshot operation. */
+    public static final String CP_SNAPSHOT_REASON = "Checkpoint started to enforce snapshot operation: %s";
+
+    /** Name prefix for each remote snapshot operation. */
+    public static final String RMT_SNAPSHOT_PREFIX = "snapshot_";
+
+    /** Default snapshot directory for loading remote snapshots. */
+    public static final String DFLT_SNAPSHOT_TMP_DIR = "snp";
+
+    /** Timeout in millisecond for snapshot operations. */
+    public static final long DFLT_SNAPSHOT_TIMEOUT = 15_000L;
+
+    /** Snapshot in progress error message. */
+    public static final String SNP_IN_PROGRESS_ERR_MSG = "Operation rejected due to the snapshot operation in progress.";
+
+    /** Error message to finalize snapshot tasks. */
+    public static final String SNP_NODE_STOPPING_ERR_MSG = "Snapshot has been cancelled due to the local node " +
+        "is stopping";
+
+    /** Metastorage key to save currently running snapshot. */
+    public static final String SNP_RUNNING_KEY = "snapshot-running";
+
+    /** Prefix for snapshot threads. */
+    private static final String SNAPSHOT_RUNNER_THREAD_PREFIX = "snapshot-runner";
+
+    /** Total number of thread to perform local snapshot. */
+    private static final int SNAPSHOT_THREAD_POOL_SIZE = 4;
+
+    /** Default snapshot topic to receive snapshots from remote node. */
+    private static final Object DFLT_INITIAL_SNAPSHOT_TOPIC = GridTopic.TOPIC_SNAPSHOT.topic("rmt_snp");
+
+    /** File transmission parameter of cache group id. */
+    private static final String SNP_GRP_ID_PARAM = "grpId";
+
+    /** File transmission parameter of cache partition id. */
+    private static final String SNP_PART_ID_PARAM = "partId";
+
+    /** File transmission parameter of node-sender directory path with its consistentId (e.g. db/IgniteNode0). */
+    private static final String SNP_DB_NODE_PATH_PARAM = "dbNodePath";
+
+    /** File transmission parameter of a cache directory with is currently sends its partitions. */
+    private static final String SNP_CACHE_DIR_NAME_PARAM = "cacheDirName";
+
+    /** Snapshot parameter name for a file transmission. */
+    private static final String SNP_NAME_PARAM = "snpName";
+
+    /** Total snapshot files count which receiver should expect to receive. */
+    private static final String SNP_PARTITIONS_CNT = "partsCnt";
+
+    /**
+     * Local buffer to perform copy-on-write operations with pages for {@code SnapshotFutureTask.PageStoreSerialWriter}s.
+     * It is important to have only only buffer per thread (instead of creating each buffer per
+     * each {@code SnapshotFutureTask.PageStoreSerialWriter}) this is redundant and can lead to OOM errors. Direct buffer
+     * deallocate only when ByteBuffer is garbage collected, but it can get out of off-heap memory before it.
+     */
+    private final ThreadLocal<ByteBuffer> locBuff;
+
+    /** Map of registered cache snapshot processes and their corresponding contexts. */
+    private final ConcurrentMap<String, SnapshotFutureTask> locSnpTasks = new ConcurrentHashMap<>();
+
+    /** Lock to protect the resources is used. */
+    private final GridBusyLock busyLock = new GridBusyLock();
+
+    /** Requested snapshot from remote node. */
+    private final AtomicReference<RemoteSnapshotFuture> rmtSnpReq = new AtomicReference<>();
+
+    /** Mutex used to order cluster snapshot operation progress. */
+    private final Object snpOpMux = new Object();
+
+    /** Take snapshot operation procedure. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> startSnpProc;
+
+    /** Check previously preformed snapshot operation and delete uncompleted files if need. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> endSnpProc;
+
+    /** Resolved persistent data storage settings. */
+    private volatile PdsFolderSettings pdsSettings;
+
+    /** Fully initialized metastorage. */
+    private volatile ReadWriteMetastorage metaStorage;
+
+    /** Local snapshot sender factory. */
+    private Function<String, SnapshotSender> locSndrFactory = this::localSnapshotSender;
+
+    /** Main snapshot directory to save created snapshots. */
+    private volatile File locSnpDir;
+
+    /**
+     * Working directory for loaded snapshots from the remote nodes and storing
+     * temporary partition delta-files of locally started snapshot process.
+     */
+    private File tmpWorkDir;
+
+    /** Factory to working with delta as file storage. */
+    private volatile FileIOFactory ioFactory = new RandomAccessFileIOFactory();
+
+    /** Factory to create page store for restore. */
+    private volatile BiFunction<Integer, Boolean, FilePageStoreFactory> storeFactory;
+
+    /** Snapshot thread pool to perform local partition snapshots. */
+    private ExecutorService snpRunner;
+
+    /** System discovery message listener. */
+    private DiscoveryEventListener discoLsnr;
+
+    /** Cluster snapshot operation requested by user. */
+    private GridFutureAdapter<Void> clusterSnpFut;
+
+    /** Current snapshot operation on local node. */
+    private volatile SnapshotOperationRequest clusterSnpRq;
+
+    /** {@code true} if recovery process occurred for snapshot. */
+    private volatile boolean recovered;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public IgniteSnapshotManager(GridKernalContext ctx) {
+        locBuff = ThreadLocal.withInitial(() ->
+            ByteBuffer.allocateDirect(ctx.config().getDataStorageConfiguration().getPageSize())
+                .order(ByteOrder.nativeOrder()));
+
+        startSnpProc = new DistributedProcess<>(ctx, START_SNAPSHOT, this::startLocalSnapshot,
+            this::startLocalSnapshotResult);
+
+        endSnpProc = new DistributedProcess<>(ctx, END_SNAPSHOT, this::endLocalSnapshot,
+            this::endLocalSnapshotResult);
+    }
+
+    /**
+     * @param snapshotCacheDir Snapshot directory to store files.
+     * @param partId Cache partition identifier.
+     * @return A file representation.
+     */
+    public static File partDeltaFile(File snapshotCacheDir, int partId) {
+        return new File(snapshotCacheDir, partDeltaFileName(partId));
+    }
+
+    /**
+     * @param partId Partition id.
+     * @return File name of delta partition pages.
+     */
+    public static String partDeltaFileName(int partId) {
+        assert partId <= MAX_PARTITION_ID || partId == INDEX_PARTITION;
+
+        return partId == INDEX_PARTITION ? INDEX_DELTA_NAME : String.format(PART_DELTA_TEMPLATE, partId);
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void start0() throws IgniteCheckedException {
+        super.start0();
+
+        GridKernalContext ctx = cctx.kernalContext();
+
+        if (ctx.clientNode())
+            return;
+
+        if (!CU.isPersistenceEnabled(ctx.config()))
+            return;
+
+        snpRunner = new IgniteThreadPoolExecutor(SNAPSHOT_RUNNER_THREAD_PREFIX,
+            cctx.igniteInstanceName(),
+            SNAPSHOT_THREAD_POOL_SIZE,
+            SNAPSHOT_THREAD_POOL_SIZE,
+            IgniteConfiguration.DFLT_THREAD_KEEP_ALIVE_TIME,
+            new LinkedBlockingQueue<>(),
+            SYSTEM_POOL,
+            new OomExceptionHandler(ctx));
+
+        assert cctx.pageStore() instanceof FilePageStoreManager;
+
+        FilePageStoreManager storeMgr = (FilePageStoreManager)cctx.pageStore();
+
+        pdsSettings = cctx.kernalContext().pdsFolderResolver().resolveFolders();
+
+        locSnpDir = snapshotPath(ctx.config()).toFile();
+        tmpWorkDir = Paths.get(storeMgr.workDir().getAbsolutePath(), DFLT_SNAPSHOT_TMP_DIR).toFile();
+
+        U.ensureDirectory(locSnpDir, "snapshot work directory", log);
+        U.ensureDirectory(tmpWorkDir, "temp directory for snapshot creation", log);
+
+        storeFactory = storeMgr::getPageStoreFactory;
+
+        cctx.exchange().registerExchangeAwareComponent(this);
+        ctx.internalSubscriptionProcessor().registerMetastorageListener(this);
+
+        // Receive remote snapshots requests.
+        cctx.gridIO().addMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC, new GridMessageListener() {
+            @Override public void onMessage(UUID nodeId, Object msg, byte plc) {
+                if (!busyLock.enterBusy())
+                    return;
+
+                try {
+                    if (msg instanceof SnapshotRequestMessage) {
+                        SnapshotRequestMessage reqMsg0 = (SnapshotRequestMessage)msg;
+                        String snpName = reqMsg0.snapshotName();
+
+                        synchronized (this) {
+                            SnapshotFutureTask task = lastScheduledRemoteSnapshotTask(nodeId);
+
+                            if (task != null) {
+                                // Task will also be removed from local map due to the listener on future done.
+                                task.cancel();
+
+                                log.info("Snapshot request has been cancelled due to another request received " +
+                                    "[prevSnpResp=" + task + ", msg0=" + reqMsg0 + ']');
+                            }
+                        }
+
+                        SnapshotFutureTask task = registerSnapshotTask(snpName,
+                            nodeId,
+                            reqMsg0.parts(),
+                            remoteSnapshotSender(snpName, nodeId));
+
+                        task.listen(f -> {
+                            if (f.error() == null)
+                                return;
+
+                            U.error(log, "Failed to process request of creating a snapshot " +
+                                "[from=" + nodeId + ", msg=" + reqMsg0 + ']', f.error());
+
+                            try {
+                                cctx.gridIO().sendToCustomTopic(nodeId,
+                                    DFLT_INITIAL_SNAPSHOT_TOPIC,
+                                    new SnapshotResponseMessage(reqMsg0.snapshotName(), f.error().getMessage()),
+                                    SYSTEM_POOL);
+                            }
+                            catch (IgniteCheckedException ex0) {
+                                U.error(log, "Fail to send the response message with processing snapshot request " +
+                                    "error [request=" + reqMsg0 + ", nodeId=" + nodeId + ']', ex0);
+                            }
+                        });
+
+                        task.start();
+                    }
+                    else if (msg instanceof SnapshotResponseMessage) {
+                        SnapshotResponseMessage respMsg0 = (SnapshotResponseMessage)msg;
+
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.snpName.equals(respMsg0.snapshotName())) {
+                            if (log.isInfoEnabled()) {
+                                log.info("A stale snapshot response message has been received. Will be ignored " +
+                                    "[fromNodeId=" + nodeId + ", response=" + respMsg0 + ']');
+                            }
+
+                            return;
+                        }
+
+                        if (respMsg0.errorMessage() != null) {
+                            fut0.onDone(new IgniteCheckedException("Request cancelled. The snapshot operation stopped " +
+                                "on the remote node with an error: " + respMsg0.errorMessage()));
+                        }
+                    }
+                }
+                catch (Throwable e) {
+                    U.error(log, "Processing snapshot request from remote node fails with an error", e);
+
+                    cctx.kernalContext().failure().process(new FailureContext(FailureType.CRITICAL_ERROR, e));
+                }
+                finally {
+                    busyLock.leaveBusy();
+                }
+            }
+        });
+
+        cctx.gridEvents().addDiscoveryEventListener(discoLsnr = (evt, discoCache) -> {
+            if (!busyLock.enterBusy())
+                return;
+
+            try {
+                UUID leftNodeId = evt.eventNode().id();
+
+                if (evt.type() == EVT_DISCOVERY_CUSTOM_EVT) {
+                    DiscoveryCustomEvent evt0 = (DiscoveryCustomEvent)evt;
+
+                    if (evt0.customMessage() instanceof InitMessage) {
+                        InitMessage<?> msg = (InitMessage<?>)evt0.customMessage();
+
+                        // This happens when #takeSnapshot() method already invoked and distributed process
+                        // starts its action.
+                        if (msg.type() == START_SNAPSHOT.ordinal()) {
+                            assert clusterSnpRq != null ||
+                                !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()) : evt;
+
+                            DiscoveryCustomEvent customEvt = new DiscoveryCustomEvent();
+
+                            customEvt.node(evt0.node());
+                            customEvt.eventNode(evt0.eventNode());
+                            customEvt.affinityTopologyVersion(evt0.affinityTopologyVersion());
+                            customEvt.customMessage(new SnapshotStartDiscoveryMessage(discoCache, msg.processId()));
+
+                            // Handle new event inside discovery thread, so no guarantees will be violated.
+                            cctx.exchange().onDiscoveryEvent(customEvt, discoCache);
+                        }
+                    }
+                }
+                else if (evt.type() == EVT_NODE_LEFT || evt.type() == EVT_NODE_FAILED) {
+                    SnapshotOperationRequest snpRq = clusterSnpRq;
+
+                    for (SnapshotFutureTask sctx : locSnpTasks.values()) {
+                        if (sctx.sourceNodeId().equals(leftNodeId) ||
+                            (snpRq != null &&
+                                snpRq.snpName.equals(sctx.snapshotName()) &&
+                                snpRq.bltNodes.contains(leftNodeId))) {
+                            sctx.acceptException(new ClusterTopologyCheckedException("The node which requested snapshot " +
+                                "creation has left the grid"));
+                        }
+                    }
+
+                    RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                    if (snpTrFut != null && snpTrFut.rmtNodeId.equals(leftNodeId)) {
+                        snpTrFut.onDone(new ClusterTopologyCheckedException("The node from which a snapshot has been " +
+                            "requested left the grid"));
+                    }
+                }
+            }
+            finally {
+                busyLock.leaveBusy();
+            }
+        }, EVT_NODE_LEFT, EVT_NODE_FAILED, EVT_DISCOVERY_CUSTOM_EVT);
+
+        // Remote snapshot handler.
+        cctx.kernalContext().io().addTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC, new TransmissionHandler() {
+            @Override public void onEnd(UUID nodeId) {
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                assert snpTrFut.stores.isEmpty() : snpTrFut.stores.entrySet();
+                assert snpTrFut.partsLeft == 0 : snpTrFut;
+
+                snpTrFut.onDone();
+
+                log.info("Requested snapshot from remote node has been fully received " +
+                    "[snpName=" + snpTrFut.snpName + ", snpTrans=" + snpTrFut + ']');
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onException(UUID nodeId, Throwable err) {
+                RemoteSnapshotFuture fut = rmtSnpReq.get();
+
+                if (fut == null)
+                    return;
+
+                if (fut.rmtNodeId.equals(nodeId))
+                    fut.onDone(err);
+            }
+
+            /** {@inheritDoc} */
+            @Override public String filePath(UUID nodeId, TransmissionMeta fileMeta) {
+                Integer partId = (Integer)fileMeta.params().get(SNP_PART_ID_PARAM);
+                String rmtDbNodePath = (String)fileMeta.params().get(SNP_DB_NODE_PATH_PARAM);
+                String cacheDirName = (String)fileMeta.params().get(SNP_CACHE_DIR_NAME_PARAM);
+
+                RemoteSnapshotFuture transFut = resolve(nodeId, fileMeta);
+
+                try {
+                    File cacheDir = U.resolveWorkDirectory(tmpWorkDir.getAbsolutePath(),
+                        Paths.get(transFut.snpName, rmtDbNodePath, cacheDirName).toString(),
+                        false);
+
+                    return new File(cacheDir, getPartitionFileName(partId)).getAbsolutePath();
+                }
+                catch (IgniteCheckedException e) {
+                    throw new IgniteException(e);
+                }
+            }
+
+            /**
+             * @param nodeId Remote node id.
+             * @param meta Transmission meta.
+             * @return Resolved transmission future.
+             */
+            private RemoteSnapshotFuture resolve(UUID nodeId, TransmissionMeta meta) {
+                String snpName = (String)meta.params().get(SNP_NAME_PARAM);
+                Integer partsCnt = (Integer)meta.params().get(SNP_PARTITIONS_CNT);
+
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                if (snpTrFut == null || !snpTrFut.snpName.equals(snpName)) {
+                    throw new TransmissionCancelledException("Stale snapshot transmission will be ignored " +
+                        "[snpName=" + snpName + ", meta=" + meta + ", snpTrFut=" + snpTrFut + ']');
+                }
+
+                assert snpTrFut.snpName.equals(snpName) && snpTrFut.rmtNodeId.equals(nodeId) :
+                    "Another transmission in progress [snpTrFut=" + snpTrFut + ", nodeId=" + snpName + ']';
+
+                if (snpTrFut.partsLeft == -1)
+                    snpTrFut.partsLeft = partsCnt;
+
+                return snpTrFut;
+            }
+
+            /**
+             * @param snpTrans Current snapshot transmission.
+             * @param grpPartId Pair of group id and its partition id.
+             */
+            private void finishRecover(RemoteSnapshotFuture snpTrans, GroupPartitionId grpPartId) {
+                FilePageStore pageStore = null;
+
+                try {
+                    pageStore = snpTrans.stores.remove(grpPartId);
+
+                    pageStore.finishRecover();
+
+                    snpTrans.partConsumer.accept(new File(pageStore.getFileAbsolutePath()), grpPartId);
+
+                    snpTrans.partsLeft--;
+                }
+                catch (StorageException e) {
+                    throw new IgniteException(e);
+                }
+                finally {
+                    U.closeQuiet(pageStore);
+                }
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<ByteBuffer> chunkHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+
+                RemoteSnapshotFuture snpTrFut = resolve(nodeId, initMeta);
+
+                GroupPartitionId grpPartId = new GroupPartitionId(grpId, partId);
+                FilePageStore pageStore = snpTrFut.stores.get(grpPartId);
+
+                if (pageStore == null) {
+                    throw new IgniteException("Partition must be loaded before applying snapshot delta pages " +
+                        "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                pageStore.beginRecover();
+
+                // No snapshot delta pages received. Finalize recovery.
+                if (initMeta.count() == 0)
+                    finishRecover(snpTrFut, grpPartId);
+
+                return new Consumer<ByteBuffer>() {
+                    final LongAdder transferred = new LongAdder();
+
+                    @Override public void accept(ByteBuffer buff) {
+                        try {
+                            assert initMeta.count() != 0 : initMeta;
+
+                            RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                            if (fut0 == null || !fut0.equals(snpTrFut) || fut0.isCancelled()) {
+                                throw new TransmissionCancelledException("Snapshot request is cancelled " +
+                                    "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                            }
+
+                            pageStore.write(PageIO.getPageId(buff), buff, 0, false);
+
+                            transferred.add(buff.capacity());
+
+                            if (transferred.longValue() == initMeta.count())
+                                finishRecover(snpTrFut, grpPartId);
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                    }
+                };
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<File> fileHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+                String snpName = (String)initMeta.params().get(SNP_NAME_PARAM);
+
+                assert grpId != null;
+                assert partId != null;
+                assert snpName != null;
+                assert storeFactory != null;
+
+                RemoteSnapshotFuture transFut = rmtSnpReq.get();
+
+                if (transFut == null) {
+                    throw new IgniteException("Snapshot transmission with given name doesn't exists " +
+                        "[snpName=" + snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                return new Consumer<File>() {
+                    @Override public void accept(File file) {
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.equals(transFut) || fut0.isCancelled()) {
+                            throw new TransmissionCancelledException("Snapshot request is cancelled [snpName=" + snpName +
+                                ", grpId=" + grpId + ", partId=" + partId + ']');
+                        }
+
+                        busyLock.enterBusy();
+
+                        try {
+                            FilePageStore pageStore = (FilePageStore)storeFactory
+                                .apply(grpId, false)
+                                .createPageStore(getFlagByPartId(partId),
+                                    file::toPath,
+                                    new LongAdderMetric("NO_OP", null));
+
+                            transFut.stores.put(new GroupPartitionId(grpId, partId), pageStore);
+
+                            pageStore.init();
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                        finally {
+                            busyLock.leaveBusy();
+                        }
+                    }
+                };
+            }
+        });
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void stop0(boolean cancel) {
+        busyLock.block();
+
+        try {
+            // Try stop all snapshot processing if not yet.
+            for (SnapshotFutureTask sctx : locSnpTasks.values())
+                sctx.acceptException(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+            locSnpTasks.clear();
+
+            RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+            if (snpTrFut != null)
+                snpTrFut.cancel();
+
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null) {
+                    clusterSnpFut.onDone(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+                    clusterSnpFut = null;
+                }
+            }
+
+            if (snpRunner != null)
+                snpRunner.shutdownNow();
+
+            cctx.kernalContext().io().removeMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC);
+            cctx.kernalContext().io().removeTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC);
+
+            if (discoLsnr != null)
+                cctx.kernalContext().event().removeDiscoveryEventListener(discoLsnr);
+
+            cctx.exchange().unregisterExchangeAwareComponent(this);
+        }
+        finally {
+            busyLock.unblock();
+        }
+    }
+
+    /**
+     * Concurrently traverse the snapshot directory for given local node folder name and
+     * delete recursively all files from it if exist.
+     *
+     * @param snpDir Snapshot dire
+     * @param folderName Local node folder name (see U.maskForFileName with consistent id).
+     */
+    public static void deleteSnapshot(File snpDir, String folderName) {
+        if (!snpDir.exists())
+            return;
+
+        assert snpDir.isDirectory() : snpDir;
+
+        try {
+            List<Path> dirs = new ArrayList<>();
+
+            Files.walkFileTree(snpDir.toPath(), new SimpleFileVisitor<Path>() {
+                @Override public FileVisitResult preVisitDirectory(Path dir,
+                    BasicFileAttributes attrs) throws IOException {
+                    if (Files.isDirectory(dir) &&
+                        Files.exists(dir) &&
+                        folderName.equals(dir.getFileName().toString())) {
+                        // Directory found, add it for processing.
+                        dirs.add(dir);
+                    }
+
+                    return super.preVisitDirectory(dir, attrs);
+                }
+
+                @Override public FileVisitResult visitFileFailed(Path file, IOException exc) {
+                    // Skip files which can be concurrently removed from FileTree.
+                    return FileVisitResult.CONTINUE;
+                }
+            });
+
+            dirs.forEach(U::delete);
+
+            File db = new File(snpDir, DB_DEFAULT_FOLDER);
+
+            if (!db.exists() || db.list().length == 0)
+                U.delete(snpDir);
+        }
+        catch (IOException e) {
+            throw new IgniteException(e);
+        }
+    }
+
+    /**
+     * @param snpName Snapshot name.
+     * @return Local snapshot directory for snapshot with given name.
+     */
+    public File snapshotLocalDir(String snpName) {
+        assert locSnpDir != null;
+
+        return new File(locSnpDir, snpName);
+    }
+
+    /**
+     * @return Node snapshot working directory.
+     */
+    public File snapshotTmpDir() {
+        assert tmpWorkDir != null;
+
+        return tmpWorkDir;
+    }
+
+    /**
+     * @param req Request on snapshot creation.
+     * @return Future which will be completed when a snapshot has been started.
+     */
+    private IgniteInternalFuture<SnapshotOperationResponse> startLocalSnapshot(SnapshotOperationRequest req) {
+        if (cctx.kernalContext().clientNode() ||
+            !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()))
+            return new GridFinishedFuture<>();
+
+        // Executed inside discovery notifier thread, prior to firing discovery custom event,
+        // so it is safe to set new snapshot task inside this method without synchronization.
+        if (clusterSnpRq != null) {
+            return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot operation has been rejected. " +
+                "Another snapshot operation in progress [req=" + req + ", curr=" + clusterSnpRq + ']'));
+        }
+
+        // Collection of pairs group and appropriate cache partition to be snapshot.
+        Map<Integer, Set<Integer>> parts = new HashMap<>();
+
+        for (Integer grpId : req.grpIds)
+            parts.put(grpId, null);
+
+        SnapshotFutureTask task0 = registerSnapshotTask(req.snpName,
+            req.srcNodeId,
+            parts,
+            locSndrFactory.apply(req.snpName));
+
+        clusterSnpRq = req;
+
+        return task0.chain(f -> new SnapshotOperationResponse());
+    }
+
+    /**
+     * @param id Request id.
+     * @param res Results.
+     * @param err Errors.
+     */
+    private void startLocalSnapshotResult(UUID id, Map<UUID, SnapshotOperationResponse> res, Map<UUID, Exception> err) {
 
 Review comment:
   Sounds like we need to start some result here, but actually this method to process result. I think it's better to rename the method to something like `processLocalSnapshotStartStageResult`. The same for `endLocalSnapshotResult`. Also, perhaps `startLocalSnapshot` can be renamed to something like `initLocalSnapshotStartStage` for symmetry.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r409102757
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManager.java
 ##########
 @@ -0,0 +1,1906 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.channels.FileChannel;
+import java.nio.file.FileVisitResult;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.SimpleFileVisitor;
+import java.nio.file.attribute.BasicFileAttributes;
+import java.util.ArrayDeque;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Queue;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.Executor;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.concurrent.atomic.LongAdder;
+import java.util.function.BiConsumer;
+import java.util.function.BiFunction;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.events.DiscoveryEvent;
+import org.apache.ignite.failure.FailureContext;
+import org.apache.ignite.failure.FailureType;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.GridTopic;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.events.DiscoveryCustomEvent;
+import org.apache.ignite.internal.managers.communication.GridIoManager;
+import org.apache.ignite.internal.managers.communication.GridMessageListener;
+import org.apache.ignite.internal.managers.communication.TransmissionCancelledException;
+import org.apache.ignite.internal.managers.communication.TransmissionHandler;
+import org.apache.ignite.internal.managers.communication.TransmissionMeta;
+import org.apache.ignite.internal.managers.communication.TransmissionPolicy;
+import org.apache.ignite.internal.managers.discovery.DiscoCache;
+import org.apache.ignite.internal.managers.discovery.DiscoveryCustomMessage;
+import org.apache.ignite.internal.managers.discovery.GridDiscoveryManager;
+import org.apache.ignite.internal.managers.eventstorage.DiscoveryEventListener;
+import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.CacheType;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.PartitionsExchangeAware;
+import org.apache.ignite.internal.processors.cache.persistence.StorageException;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.filename.PdsFolderSettings;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.MetastorageLifecycleListener;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadOnlyMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadWriteMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.tree.io.PageIO;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.processors.marshaller.MappedName;
+import org.apache.ignite.internal.processors.metric.impl.LongAdderMetric;
+import org.apache.ignite.internal.util.GridBusyLock;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.distributed.InitMessage;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.tostring.GridToStringInclude;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.S;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.lang.IgniteUuid;
+import org.apache.ignite.thread.IgniteThreadPoolExecutor;
+import org.apache.ignite.thread.OomExceptionHandler;
+import org.jetbrains.annotations.Nullable;
+
+import static java.nio.file.StandardOpenOption.READ;
+import static org.apache.ignite.cluster.ClusterState.active;
+import static org.apache.ignite.configuration.IgniteConfiguration.DFLT_SNAPSHOT_DIRECTORY;
+import static org.apache.ignite.events.EventType.EVT_NODE_FAILED;
+import static org.apache.ignite.events.EventType.EVT_NODE_LEFT;
+import static org.apache.ignite.internal.IgniteFeatures.PERSISTENCE_CACHE_SNAPSHOT;
+import static org.apache.ignite.internal.IgniteFeatures.nodeSupports;
+import static org.apache.ignite.internal.MarshallerContextImpl.saveMappings;
+import static org.apache.ignite.internal.events.DiscoveryCustomEvent.EVT_DISCOVERY_CUSTOM_EVT;
+import static org.apache.ignite.internal.managers.communication.GridIoPolicy.SYSTEM_POOL;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.INDEX_PARTITION;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.MAX_PARTITION_ID;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.INDEX_FILE_NAME;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.PART_FILE_TEMPLATE;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFile;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFileName;
+import static org.apache.ignite.internal.processors.cache.persistence.filename.PdsConsistentIdProcessor.DB_DEFAULT_FOLDER;
+import static org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId.getFlagByPartId;
+import static org.apache.ignite.internal.util.IgniteUtils.isLocalNodeCoordinator;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.END_SNAPSHOT;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.START_SNAPSHOT;
+
+/**
+ * Internal implementation of snapshot operations over persistence caches.
+ * <p>
+ * There are two major actions available:
+ * <ul>
+ *     <li>Create snapshot of the whole cluster cache groups by triggering PME to achieve consistency.</li>
+ *     <li>Create local snapshot of requested cache groups and send it to the node which request this operation.
+ *     Cache groups will be transmitted using internal API for transferring files. See {@link TransmissionHandler}.</li>
+ * </ul>
+ */
+public class IgniteSnapshotManager extends GridCacheSharedManagerAdapter
+    implements IgniteSnapshot, PartitionsExchangeAware, MetastorageLifecycleListener {
+    /** File with delta pages suffix. */
+    public static final String DELTA_SUFFIX = ".delta";
+
+    /** File name template consists of delta pages. */
+    public static final String PART_DELTA_TEMPLATE = PART_FILE_TEMPLATE + DELTA_SUFFIX;
+
+    /** File name template for index delta pages. */
+    public static final String INDEX_DELTA_NAME = INDEX_FILE_NAME + DELTA_SUFFIX;
+
+    /** Text Reason for checkpoint to start snapshot operation. */
+    public static final String CP_SNAPSHOT_REASON = "Checkpoint started to enforce snapshot operation: %s";
+
+    /** Name prefix for each remote snapshot operation. */
+    public static final String RMT_SNAPSHOT_PREFIX = "snapshot_";
+
+    /** Default snapshot directory for loading remote snapshots. */
+    public static final String DFLT_SNAPSHOT_TMP_DIR = "snp";
+
+    /** Timeout in millisecond for snapshot operations. */
+    public static final long DFLT_SNAPSHOT_TIMEOUT = 15_000L;
+
+    /** Snapshot in progress error message. */
+    public static final String SNP_IN_PROGRESS_ERR_MSG = "Operation rejected due to the snapshot operation in progress.";
+
+    /** Error message to finalize snapshot tasks. */
+    public static final String SNP_NODE_STOPPING_ERR_MSG = "Snapshot has been cancelled due to the local node " +
+        "is stopping";
+
+    /** Metastorage key to save currently running snapshot. */
+    public static final String SNP_RUNNING_KEY = "snapshot-running";
+
+    /** Prefix for snapshot threads. */
+    private static final String SNAPSHOT_RUNNER_THREAD_PREFIX = "snapshot-runner";
+
+    /** Total number of thread to perform local snapshot. */
+    private static final int SNAPSHOT_THREAD_POOL_SIZE = 4;
+
+    /** Default snapshot topic to receive snapshots from remote node. */
+    private static final Object DFLT_INITIAL_SNAPSHOT_TOPIC = GridTopic.TOPIC_SNAPSHOT.topic("rmt_snp");
+
+    /** File transmission parameter of cache group id. */
+    private static final String SNP_GRP_ID_PARAM = "grpId";
+
+    /** File transmission parameter of cache partition id. */
+    private static final String SNP_PART_ID_PARAM = "partId";
+
+    /** File transmission parameter of node-sender directory path with its consistentId (e.g. db/IgniteNode0). */
+    private static final String SNP_DB_NODE_PATH_PARAM = "dbNodePath";
+
+    /** File transmission parameter of a cache directory with is currently sends its partitions. */
+    private static final String SNP_CACHE_DIR_NAME_PARAM = "cacheDirName";
+
+    /** Snapshot parameter name for a file transmission. */
+    private static final String SNP_NAME_PARAM = "snpName";
+
+    /** Total snapshot files count which receiver should expect to receive. */
+    private static final String SNP_PARTITIONS_CNT = "partsCnt";
+
+    /**
+     * Local buffer to perform copy-on-write operations with pages for {@code SnapshotFutureTask.PageStoreSerialWriter}s.
+     * It is important to have only only buffer per thread (instead of creating each buffer per
+     * each {@code SnapshotFutureTask.PageStoreSerialWriter}) this is redundant and can lead to OOM errors. Direct buffer
+     * deallocate only when ByteBuffer is garbage collected, but it can get out of off-heap memory before it.
+     */
+    private final ThreadLocal<ByteBuffer> locBuff;
+
+    /** Map of registered cache snapshot processes and their corresponding contexts. */
+    private final ConcurrentMap<String, SnapshotFutureTask> locSnpTasks = new ConcurrentHashMap<>();
+
+    /** Lock to protect the resources is used. */
+    private final GridBusyLock busyLock = new GridBusyLock();
+
+    /** Requested snapshot from remote node. */
+    private final AtomicReference<RemoteSnapshotFuture> rmtSnpReq = new AtomicReference<>();
+
+    /** Mutex used to order cluster snapshot operation progress. */
+    private final Object snpOpMux = new Object();
+
+    /** Take snapshot operation procedure. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> startSnpProc;
+
+    /** Check previously performed snapshot operation and delete uncompleted files if need. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> endSnpProc;
+
+    /** Resolved persistent data storage settings. */
+    private volatile PdsFolderSettings pdsSettings;
+
+    /** Fully initialized metastorage. */
+    private volatile ReadWriteMetastorage metaStorage;
+
+    /** Local snapshot sender factory. */
+    private Function<String, SnapshotSender> locSndrFactory = localSnapshotSenderFactory();
+
+    /** Main snapshot directory to save created snapshots. */
+    private volatile File locSnpDir;
+
+    /**
+     * Working directory for loaded snapshots from the remote nodes and storing
+     * temporary partition delta-files of locally started snapshot process.
+     */
+    private File tmpWorkDir;
+
+    /** Factory to working with delta as file storage. */
+    private volatile FileIOFactory ioFactory = new RandomAccessFileIOFactory();
+
+    /** Factory to create page store for restore. */
+    private volatile BiFunction<Integer, Boolean, FilePageStoreFactory> storeFactory;
+
+    /** Snapshot thread pool to perform local partition snapshots. */
+    private ExecutorService snpRunner;
+
+    /** System discovery message listener. */
+    private DiscoveryEventListener discoLsnr;
+
+    /** Cluster snapshot operation requested by user. */
+    private GridFutureAdapter<Void> clusterSnpFut;
+
+    /** Current snapshot operation on local node. */
+    private volatile SnapshotOperationRequest clusterSnpRq;
+
+    /** {@code true} if recovery process occurred for snapshot. */
+    private volatile boolean recovered;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public IgniteSnapshotManager(GridKernalContext ctx) {
+        locBuff = ThreadLocal.withInitial(() ->
+            ByteBuffer.allocateDirect(ctx.config().getDataStorageConfiguration().getPageSize())
+                .order(ByteOrder.nativeOrder()));
+
+        startSnpProc = new DistributedProcess<>(ctx, START_SNAPSHOT, this::initLocalSnapshotStartStage,
+            this::processLocalSnapshotStartStageResult);
+
+        endSnpProc = new DistributedProcess<>(ctx, END_SNAPSHOT, this::initLocalSnapshotEndStage,
+            this::processLocalSnapshotEndStageResult);
+    }
+
+    /**
+     * @param snapshotCacheDir Snapshot directory to store files.
+     * @param partId Cache partition identifier.
+     * @return A file representation.
+     */
+    public static File partDeltaFile(File snapshotCacheDir, int partId) {
+        return new File(snapshotCacheDir, partDeltaFileName(partId));
+    }
+
+    /**
+     * @param partId Partition id.
+     * @return File name of delta partition pages.
+     */
+    public static String partDeltaFileName(int partId) {
+        assert partId <= MAX_PARTITION_ID || partId == INDEX_PARTITION;
+
+        return partId == INDEX_PARTITION ? INDEX_DELTA_NAME : String.format(PART_DELTA_TEMPLATE, partId);
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void start0() throws IgniteCheckedException {
+        super.start0();
+
+        GridKernalContext ctx = cctx.kernalContext();
+
+        if (ctx.clientNode())
+            return;
+
+        if (!CU.isPersistenceEnabled(ctx.config()))
+            return;
+
+        snpRunner = new IgniteThreadPoolExecutor(SNAPSHOT_RUNNER_THREAD_PREFIX,
+            cctx.igniteInstanceName(),
+            SNAPSHOT_THREAD_POOL_SIZE,
+            SNAPSHOT_THREAD_POOL_SIZE,
+            IgniteConfiguration.DFLT_THREAD_KEEP_ALIVE_TIME,
+            new LinkedBlockingQueue<>(),
+            SYSTEM_POOL,
+            new OomExceptionHandler(ctx));
+
+        assert cctx.pageStore() instanceof FilePageStoreManager;
+
+        FilePageStoreManager storeMgr = (FilePageStoreManager)cctx.pageStore();
+
+        pdsSettings = cctx.kernalContext().pdsFolderResolver().resolveFolders();
+
+        locSnpDir = resolveSnapshotWorkDirectory(ctx.config());
+        tmpWorkDir = Paths.get(storeMgr.workDir().getAbsolutePath(), DFLT_SNAPSHOT_TMP_DIR).toFile();
+
+        U.ensureDirectory(locSnpDir, "snapshot work directory", log);
+        U.ensureDirectory(tmpWorkDir, "temp directory for snapshot creation", log);
+
+        storeFactory = storeMgr::getPageStoreFactory;
+
+        cctx.exchange().registerExchangeAwareComponent(this);
+        ctx.internalSubscriptionProcessor().registerMetastorageListener(this);
+
+        // Receive remote snapshots requests.
+        cctx.gridIO().addMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC, new GridMessageListener() {
+            @Override public void onMessage(UUID nodeId, Object msg, byte plc) {
+                if (!busyLock.enterBusy())
+                    return;
+
+                try {
+                    if (msg instanceof SnapshotRequestMessage) {
+                        SnapshotRequestMessage reqMsg0 = (SnapshotRequestMessage)msg;
+                        String snpName = reqMsg0.snapshotName();
+
+                        synchronized (this) {
+                            SnapshotFutureTask task = lastScheduledRemoteSnapshotTask(nodeId);
+
+                            if (task != null) {
+                                // Task will also be removed from local map due to the listener on future done.
+                                task.cancel();
+
+                                log.info("Snapshot request has been cancelled due to another request received " +
+                                    "[prevSnpResp=" + task + ", msg0=" + reqMsg0 + ']');
+                            }
+                        }
+
+                        SnapshotFutureTask task = registerSnapshotTask(snpName,
+                            nodeId,
+                            reqMsg0.parts(),
+                            remoteSnapshotSender(snpName, nodeId));
+
+                        task.listen(f -> {
+                            if (f.error() == null)
+                                return;
+
+                            U.error(log, "Failed to process request of creating a snapshot " +
+                                "[from=" + nodeId + ", msg=" + reqMsg0 + ']', f.error());
+
+                            try {
+                                cctx.gridIO().sendToCustomTopic(nodeId,
+                                    DFLT_INITIAL_SNAPSHOT_TOPIC,
+                                    new SnapshotResponseMessage(reqMsg0.snapshotName(), f.error().getMessage()),
+                                    SYSTEM_POOL);
+                            }
+                            catch (IgniteCheckedException ex0) {
+                                U.error(log, "Fail to send the response message with processing snapshot request " +
+                                    "error [request=" + reqMsg0 + ", nodeId=" + nodeId + ']', ex0);
+                            }
+                        });
+
+                        task.start();
+                    }
+                    else if (msg instanceof SnapshotResponseMessage) {
+                        SnapshotResponseMessage respMsg0 = (SnapshotResponseMessage)msg;
+
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.snpName.equals(respMsg0.snapshotName())) {
+                            if (log.isInfoEnabled()) {
+                                log.info("A stale snapshot response message has been received. Will be ignored " +
+                                    "[fromNodeId=" + nodeId + ", response=" + respMsg0 + ']');
+                            }
+
+                            return;
+                        }
+
+                        if (respMsg0.errorMessage() != null) {
+                            fut0.onDone(new IgniteCheckedException("Request cancelled. The snapshot operation stopped " +
+                                "on the remote node with an error: " + respMsg0.errorMessage()));
+                        }
+                    }
+                }
+                catch (Throwable e) {
+                    U.error(log, "Processing snapshot request from remote node fails with an error", e);
+
+                    cctx.kernalContext().failure().process(new FailureContext(FailureType.CRITICAL_ERROR, e));
+                }
+                finally {
+                    busyLock.leaveBusy();
+                }
+            }
+        });
+
+        cctx.gridEvents().addDiscoveryEventListener(discoLsnr = (evt, discoCache) -> {
+            if (!busyLock.enterBusy())
+                return;
+
+            try {
+                UUID leftNodeId = evt.eventNode().id();
+
+                if (evt.type() == EVT_DISCOVERY_CUSTOM_EVT) {
+                    DiscoveryCustomEvent evt0 = (DiscoveryCustomEvent)evt;
+
+                    if (evt0.customMessage() instanceof InitMessage) {
+                        InitMessage<?> msg = (InitMessage<?>)evt0.customMessage();
+
+                        // This happens when #takeSnapshot() method already invoked and distributed process
+                        // starts its action.
+                        if (msg.type() == START_SNAPSHOT.ordinal()) {
+                            assert clusterSnpRq != null ||
+                                !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()) : evt;
+
+                            DiscoveryCustomEvent customEvt = new DiscoveryCustomEvent();
+
+                            customEvt.node(evt0.node());
+                            customEvt.eventNode(evt0.eventNode());
+                            customEvt.affinityTopologyVersion(evt0.affinityTopologyVersion());
+                            customEvt.customMessage(new SnapshotStartDiscoveryMessage(discoCache, msg.processId()));
+
+                            // Handle new event inside discovery thread, so no guarantees will be violated.
+                            cctx.exchange().onDiscoveryEvent(customEvt, discoCache);
+                        }
+                    }
+                }
+                else if (evt.type() == EVT_NODE_LEFT || evt.type() == EVT_NODE_FAILED) {
+                    SnapshotOperationRequest snpRq = clusterSnpRq;
+
+                    for (SnapshotFutureTask sctx : locSnpTasks.values()) {
+                        if (sctx.sourceNodeId().equals(leftNodeId) ||
+                            (snpRq != null &&
+                                snpRq.snpName.equals(sctx.snapshotName()) &&
+                                snpRq.bltNodes.contains(leftNodeId))) {
+                            sctx.acceptException(new ClusterTopologyCheckedException("The node which requested snapshot " +
+                                "creation has left the grid"));
+                        }
+                    }
+
+                    RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                    if (snpTrFut != null && snpTrFut.rmtNodeId.equals(leftNodeId)) {
+                        snpTrFut.onDone(new ClusterTopologyCheckedException("The node from which a snapshot has been " +
+                            "requested left the grid"));
+                    }
+                }
+            }
+            finally {
+                busyLock.leaveBusy();
+            }
+        }, EVT_NODE_LEFT, EVT_NODE_FAILED, EVT_DISCOVERY_CUSTOM_EVT);
+
+        // Remote snapshot handler.
+        cctx.kernalContext().io().addTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC, new TransmissionHandler() {
+            @Override public void onEnd(UUID nodeId) {
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                assert snpTrFut.stores.isEmpty() : snpTrFut.stores.entrySet();
+                assert snpTrFut.partsLeft == 0 : snpTrFut;
+
+                snpTrFut.onDone();
+
+                log.info("Requested snapshot from remote node has been fully received " +
+                    "[snpName=" + snpTrFut.snpName + ", snpTrans=" + snpTrFut + ']');
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onException(UUID nodeId, Throwable err) {
+                RemoteSnapshotFuture fut = rmtSnpReq.get();
+
+                if (fut == null)
+                    return;
+
+                if (fut.rmtNodeId.equals(nodeId))
+                    fut.onDone(err);
+            }
+
+            /** {@inheritDoc} */
+            @Override public String filePath(UUID nodeId, TransmissionMeta fileMeta) {
+                Integer partId = (Integer)fileMeta.params().get(SNP_PART_ID_PARAM);
+                String rmtDbNodePath = (String)fileMeta.params().get(SNP_DB_NODE_PATH_PARAM);
+                String cacheDirName = (String)fileMeta.params().get(SNP_CACHE_DIR_NAME_PARAM);
+
+                RemoteSnapshotFuture transFut = resolve(nodeId, fileMeta);
+
+                try {
+                    File cacheDir = U.resolveWorkDirectory(tmpWorkDir.getAbsolutePath(),
+                        Paths.get(transFut.snpName, rmtDbNodePath, cacheDirName).toString(),
+                        false);
+
+                    return new File(cacheDir, getPartitionFileName(partId)).getAbsolutePath();
+                }
+                catch (IgniteCheckedException e) {
+                    throw new IgniteException(e);
+                }
+            }
+
+            /**
+             * @param nodeId Remote node id.
+             * @param meta Transmission meta.
+             * @return Resolved transmission future.
+             */
+            private RemoteSnapshotFuture resolve(UUID nodeId, TransmissionMeta meta) {
+                String snpName = (String)meta.params().get(SNP_NAME_PARAM);
+                Integer partsCnt = (Integer)meta.params().get(SNP_PARTITIONS_CNT);
+
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                if (snpTrFut == null || !snpTrFut.snpName.equals(snpName)) {
+                    throw new TransmissionCancelledException("Stale snapshot transmission will be ignored " +
+                        "[snpName=" + snpName + ", meta=" + meta + ", snpTrFut=" + snpTrFut + ']');
+                }
+
+                assert snpTrFut.snpName.equals(snpName) && snpTrFut.rmtNodeId.equals(nodeId) :
+                    "Another transmission in progress [snpTrFut=" + snpTrFut + ", nodeId=" + snpName + ']';
+
+                if (snpTrFut.partsLeft == -1)
+                    snpTrFut.partsLeft = partsCnt;
+
+                return snpTrFut;
+            }
+
+            /**
+             * @param snpTrans Current snapshot transmission.
+             * @param grpPartId Pair of group id and its partition id.
+             */
+            private void finishRecover(RemoteSnapshotFuture snpTrans, GroupPartitionId grpPartId) {
+                FilePageStore pageStore = null;
+
+                try {
+                    pageStore = snpTrans.stores.remove(grpPartId);
+
+                    pageStore.finishRecover();
+
+                    snpTrans.partConsumer.accept(new File(pageStore.getFileAbsolutePath()), grpPartId);
+
+                    snpTrans.partsLeft--;
+                }
+                catch (StorageException e) {
+                    throw new IgniteException(e);
+                }
+                finally {
+                    U.closeQuiet(pageStore);
+                }
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<ByteBuffer> chunkHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+
+                RemoteSnapshotFuture snpTrFut = resolve(nodeId, initMeta);
+
+                GroupPartitionId grpPartId = new GroupPartitionId(grpId, partId);
+                FilePageStore pageStore = snpTrFut.stores.get(grpPartId);
+
+                if (pageStore == null) {
+                    throw new IgniteException("Partition must be loaded before applying snapshot delta pages " +
+                        "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                pageStore.beginRecover();
+
+                // No snapshot delta pages received. Finalize recovery.
+                if (initMeta.count() == 0)
+                    finishRecover(snpTrFut, grpPartId);
+
+                return new Consumer<ByteBuffer>() {
+                    final LongAdder transferred = new LongAdder();
+
+                    @Override public void accept(ByteBuffer buff) {
+                        try {
+                            assert initMeta.count() != 0 : initMeta;
+
+                            RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                            if (fut0 == null || !fut0.equals(snpTrFut) || fut0.isCancelled()) {
+                                throw new TransmissionCancelledException("Snapshot request is cancelled " +
+                                    "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                            }
+
+                            pageStore.write(PageIO.getPageId(buff), buff, 0, false);
+
+                            transferred.add(buff.capacity());
+
+                            if (transferred.longValue() == initMeta.count())
+                                finishRecover(snpTrFut, grpPartId);
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                    }
+                };
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<File> fileHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+                String snpName = (String)initMeta.params().get(SNP_NAME_PARAM);
+
+                assert grpId != null;
+                assert partId != null;
+                assert snpName != null;
+                assert storeFactory != null;
+
+                RemoteSnapshotFuture transFut = rmtSnpReq.get();
+
+                if (transFut == null) {
+                    throw new IgniteException("Snapshot transmission with given name doesn't exists " +
+                        "[snpName=" + snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                return new Consumer<File>() {
+                    @Override public void accept(File file) {
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.equals(transFut) || fut0.isCancelled()) {
+                            throw new TransmissionCancelledException("Snapshot request is cancelled [snpName=" + snpName +
+                                ", grpId=" + grpId + ", partId=" + partId + ']');
+                        }
+
+                        busyLock.enterBusy();
+
+                        try {
+                            FilePageStore pageStore = (FilePageStore)storeFactory
+                                .apply(grpId, false)
+                                .createPageStore(getFlagByPartId(partId),
+                                    file::toPath,
+                                    new LongAdderMetric("NO_OP", null));
+
+                            transFut.stores.put(new GroupPartitionId(grpId, partId), pageStore);
+
+                            pageStore.init();
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                        finally {
+                            busyLock.leaveBusy();
+                        }
+                    }
+                };
+            }
+        });
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void stop0(boolean cancel) {
+        busyLock.block();
+
+        try {
+            // Try stop all snapshot processing if not yet.
+            for (SnapshotFutureTask sctx : locSnpTasks.values())
+                sctx.acceptException(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+            locSnpTasks.clear();
+
+            RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+            if (snpTrFut != null)
+                snpTrFut.cancel();
+
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null) {
+                    clusterSnpFut.onDone(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+                    clusterSnpFut = null;
+                }
+            }
+
+            if (snpRunner != null)
+                snpRunner.shutdownNow();
+
+            cctx.kernalContext().io().removeMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC);
+            cctx.kernalContext().io().removeTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC);
+
+            if (discoLsnr != null)
+                cctx.kernalContext().event().removeDiscoveryEventListener(discoLsnr);
+
+            cctx.exchange().unregisterExchangeAwareComponent(this);
+        }
+        finally {
+            busyLock.unblock();
+        }
+    }
+
+    /**
+     * Concurrently traverse the snapshot directory for given local node folder name and
+     * delete recursively all files from it if exist.
+     *
+     * @param snpDir Snapshot dire
+     * @param folderName Local node folder name (see U.maskForFileName with consistent id).
+     */
+    public static void deleteSnapshot(File snpDir, String folderName) {
+        if (!snpDir.exists())
+            return;
+
+        assert snpDir.isDirectory() : snpDir;
+
+        try {
+            List<Path> dirs = new ArrayList<>();
+
+            Files.walkFileTree(snpDir.toPath(), new SimpleFileVisitor<Path>() {
+                @Override public FileVisitResult preVisitDirectory(Path dir,
+                    BasicFileAttributes attrs) throws IOException {
+                    if (Files.isDirectory(dir) &&
+                        Files.exists(dir) &&
+                        folderName.equals(dir.getFileName().toString())) {
+                        // Directory found, add it for processing.
+                        dirs.add(dir);
+                    }
+
+                    return super.preVisitDirectory(dir, attrs);
+                }
+
+                @Override public FileVisitResult visitFileFailed(Path file, IOException exc) {
+                    // Skip files which can be concurrently removed from FileTree.
+                    return FileVisitResult.CONTINUE;
+                }
+            });
+
+            dirs.forEach(U::delete);
+
+            File db = new File(snpDir, DB_DEFAULT_FOLDER);
+
+            if (!db.exists() || db.list().length == 0)
+                U.delete(snpDir);
+        }
+        catch (IOException e) {
+            throw new IgniteException(e);
+        }
+    }
+
+    /**
+     * @param snpName Snapshot name.
+     * @return Local snapshot directory for snapshot with given name.
+     */
+    public File snapshotLocalDir(String snpName) {
+        assert locSnpDir != null;
+
+        return new File(locSnpDir, snpName);
+    }
+
+    /**
+     * @return Node snapshot working directory.
+     */
+    public File snapshotTmpDir() {
+        assert tmpWorkDir != null;
+
+        return tmpWorkDir;
+    }
+
+    /**
+     * @param req Request on snapshot creation.
+     * @return Future which will be completed when a snapshot has been started.
+     */
+    private IgniteInternalFuture<SnapshotOperationResponse> initLocalSnapshotStartStage(SnapshotOperationRequest req) {
+        if (cctx.kernalContext().clientNode() ||
+            !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()))
+            return new GridFinishedFuture<>();
+
+        // Executed inside discovery notifier thread, prior to firing discovery custom event,
+        // so it is safe to set new snapshot task inside this method without synchronization.
+        if (clusterSnpRq != null) {
+            return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot operation has been rejected. " +
+                "Another snapshot operation in progress [req=" + req + ", curr=" + clusterSnpRq + ']'));
+        }
+
+        // Collection of pairs group and appropriate cache partition to be snapshot.
+        Map<Integer, Set<Integer>> parts = new HashMap<>();
+
+        for (Integer grpId : req.grpIds)
+            parts.put(grpId, null);
+
+        SnapshotFutureTask task0 = registerSnapshotTask(req.snpName,
+            req.srcNodeId,
+            parts,
+            locSndrFactory.apply(req.snpName));
+
+        clusterSnpRq = req;
+
+        return task0.chain(f -> new SnapshotOperationResponse());
+    }
+
+    /**
+     * @param id Request id.
+     * @param res Results.
+     * @param err Errors.
+     */
+    private void processLocalSnapshotStartStageResult(UUID id, Map<UUID, SnapshotOperationResponse> res, Map<UUID, Exception> err) {
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        if (snpRq == null)
+            return;
+
+        if (isLocalNodeCoordinator(cctx.discovery())) {
+            Set<UUID> missed = new HashSet<>(snpRq.bltNodes);
+            missed.removeAll(res.keySet());
+            missed.removeAll(err.keySet());
+
+            snpRq.hasErr = !F.isEmpty(err) || !missed.isEmpty();
+
+            if (snpRq.hasErr) {
+                U.warn(log, "Execution of local snapshot tasks fails or them haven't been executed " +
+                    "due to some of nodes left the cluster. Uncompleted snapshot will be deleted " +
+                    "[err=" + err + ", missed=" + missed + ']');
+            }
+
+            endSnpProc.start(UUID.randomUUID(), snpRq);
+        }
+    }
+
+    /**
+     * @param req Request on snapshot creation.
+     * @return Future which will be completed when the snapshot will be finalized.
+     */
+    private IgniteInternalFuture<SnapshotOperationResponse> initLocalSnapshotEndStage(SnapshotOperationRequest req) {
+        if (clusterSnpRq == null)
+            return new GridFinishedFuture<>(new SnapshotOperationResponse());
+
+        try {
+            if (req.hasErr)
+                deleteSnapshot(snapshotLocalDir(req.snpName), pdsSettings.folderName());
+
+            removeLastMetaStorageKey();
+        }
+        catch (Exception e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        return new GridFinishedFuture<>(new SnapshotOperationResponse());
+    }
+
+    /**
+     * @param id Request id.
+     * @param res Results.
+     * @param err Errors.
+     */
+    private void processLocalSnapshotEndStageResult(UUID id, Map<UUID, SnapshotOperationResponse> res, Map<UUID, Exception> err) {
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        if (snpRq == null)
+            return;
+
+        Set<UUID> endFail = new HashSet<>(snpRq.bltNodes);
+        endFail.removeAll(res.keySet());
+
+        clusterSnpRq = null;
+
+        synchronized (snpOpMux) {
+            if (clusterSnpFut != null) {
+                if (endFail.isEmpty() && !snpRq.hasErr) {
+                    clusterSnpFut.onDone();
+
+                    if (log.isInfoEnabled())
+                        log.info("Cluster-wide snapshot operation finished successfully [req=" + snpRq + ']');
+                }
+                else {
+                    clusterSnpFut.onDone(new IgniteCheckedException("Snapshot creation has been finished with an error. " +
+                        "Local snapshot tasks may not finished completely or finalizing results fails " +
+                        "[hasErr" + snpRq.hasErr + ", fail=" + endFail + ']'));
+                }
+
+                clusterSnpFut = null;
+            }
+        }
+    }
+
+    /**
+     * @return {@code True} if snapshot operation started.
+     */
+    public boolean inProgress() {
+        synchronized (snpOpMux) {
+            return clusterSnpRq != null || clusterSnpFut != null;
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public List<String> getSnapshots() {
+        if (cctx.kernalContext().clientNode())
+            throw new UnsupportedOperationException("Client and daemon nodes can not perform this operation.");
+
+        synchronized (snpOpMux) {
+            return Arrays.stream(locSnpDir.listFiles(File::isDirectory))
+                .map(File::getName)
+                .collect(Collectors.toList());
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public IgniteFuture<Void> createSnapshot(String name) {
+        if (cctx.kernalContext().clientNode()) {
+            return new IgniteFinishedFutureImpl<>(new UnsupportedOperationException("Client and daemon nodes can not " +
+                "perform this operation."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(cctx.discovery().allNodes(), PERSISTENCE_CACHE_SNAPSHOT)) {
+            return new IgniteFinishedFutureImpl<>(new IllegalStateException("Not all nodes in the cluster support " +
+                "a snapshot operation."));
+        }
+
+        if (!active(cctx.kernalContext().state().clusterState().state())) {
+            return new IgniteFinishedFutureImpl<>(new IgniteException("Snapshot operation has been rejected. " +
+                "The cluster is inactive."));
+        }
+
+        DiscoveryDataClusterState clusterState = cctx.kernalContext().state().clusterState();
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(new IgniteException("Snapshot operation has been rejected. " +
+                "The baseline topology is not configured for cluster."));
+        }
+
+        GridFutureAdapter<Void> snpFut0;
+
+        synchronized (snpOpMux) {
+            if (clusterSnpFut != null && !clusterSnpFut.isDone()) {
 
 Review comment:
   Fixed but checking distributed process ids on StageResult.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r409023158
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/MarshallerContextImpl.java
 ##########
 @@ -168,27 +170,73 @@ private void initializeCaches() {
     }
 
     /**
-     * @param platformId Platform id.
-     * @param marshallerMappings All marshaller mappings for given platformId.
-     * @throws IgniteCheckedException In case of failure to process incoming marshaller mappings.
+     * @param log Ignite logger.
+     * @param mappings All marshaller mappings to write.
      */
-    public void onMappingDataReceived(byte platformId, Map<Integer, MappedName> marshallerMappings)
-        throws IgniteCheckedException
-    {
-        ConcurrentMap<Integer, MappedName> platformCache = getCacheFor(platformId);
+    public void onMappingDataReceived(IgniteLogger log, List<Map<Integer, MappedName>> mappings) {
+        addPlatformMappings(log,
+            mappings,
+            this::getCacheFor,
+            (mappedName, clsName) ->
+                mappedName == null || F.isEmpty(clsName) || !clsName.equals(mappedName.className()),
+            fileStore);
+    }
+
+    /**
+     * @param ctx Kernal context.
+     * @param mappings Marshaller mappings to save.
+     * @param dir Directory to save given mappings to.
+     */
+    public static void saveMappings(GridKernalContext ctx, List<Map<Integer, MappedName>> mappings, File dir) {
+        MarshallerMappingFileStore writer = new MarshallerMappingFileStore(ctx,
+            mappingFileStoreWorkDir(dir.getAbsolutePath()));
+
+        addPlatformMappings(ctx.log(MarshallerContextImpl.class),
+            mappings,
+            b -> new ConcurrentHashMap<>(),
+            (mappedName, clsName) -> true,
+            writer);
+    }
+
+    /**
+     * @param mappings Map of marshaller mappings.
+     * @param mappedCache Cache to attach new mappings to.
+     * @param cacheAddPred Check mapping can be added.
+     * @param writer Persistence mapping writer.
+     */
+    private static void addPlatformMappings(
+        IgniteLogger log,
+        List<Map<Integer, MappedName>> mappings,
+        Function<Byte, ConcurrentMap<Integer, MappedName>> mappedCache,
+        BiPredicate<MappedName, String> cacheAddPred,
+        MarshallerMappingFileStore writer
+    ) {
+        if (mappings == null)
+            return;
+
+        for (byte platformId = 0; platformId < mappings.size(); platformId++) {
+            Map<Integer, MappedName> attach = mappings.get(platformId);
 
-        for (Map.Entry<Integer, MappedName> e : marshallerMappings.entrySet()) {
-            int typeId = e.getKey();
-            String clsName = e.getValue().className();
+            if (attach == null)
+                return;
 
 Review comment:
   Fixed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r410033430
 
 

 ##########
 File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotSelfTest.java
 ##########
 @@ -83,15 +84,12 @@
  * Cluster-wide snapshot test.
  */
 public class IgniteClusterSnapshotSelfTest extends AbstractSnapshotSelfTest {
-    /** Random instance. */
-    private static final Random R = new Random();
-
     /** Time to wait while rebalance may happen. */
     private static final long REBALANCE_AWAIT_TIME = GridTestUtils.SF.applyLB(10_000, 3_000);
 
     /** Cache configuration for test. */
-    private static CacheConfiguration<Integer, Integer> txCcfg = new CacheConfiguration<Integer, Integer>("txCacheName")
-        .setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL)
+    private static CacheConfiguration<Integer, Integer> atomicCcfg = new CacheConfiguration<Integer, Integer>("txCacheName")
 
 Review comment:
   `txCacheName` for atomic cache looks confusing

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r410148881
 
 

 ##########
 File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotSelfTest.java
 ##########
 @@ -286,6 +293,134 @@ public void testSnapshotPrimaryBackupsTheSame() throws Exception {
         TestRecordingCommunicationSpi.stopBlockAll();
     }
 
+    /** @throws Exception If fails. */
+    @Test
+    public void testClusterSnapshotConsistencyUnderLoad() throws Exception {
+        int clients = 50;
+        int balance = 10_000;
+        int transferLimit = 1000;
+        int total = clients * balance * 2;
+        int grids = 3;
+        int transferThreadCnt = 4;
+        AtomicBoolean stop = new AtomicBoolean(false);
+        CountDownLatch txStarted = new CountDownLatch(1);
+
+        CacheConfiguration<Integer, Account> eastCcfg = txCacheConfig(new CacheConfiguration<>("east"));
+        CacheConfiguration<Integer, Account> westCcfg = txCacheConfig(new CacheConfiguration<>("west"));
+
+        for (int i = 0; i < grids; i++)
+            startGrid(optimize(getConfiguration(getTestIgniteInstanceName(i)).setCacheConfiguration(eastCcfg, westCcfg)));
+
+        grid(0).cluster().state(ACTIVE);
+
+        Ignite client = startClientGrid(grids);
+
+        IgniteCache<Integer, Account> eastCache = client.cache(eastCcfg.getName());
+        IgniteCache<Integer, Account> westCache = client.cache(westCcfg.getName());
+
+        // Create clients with zero balance.
 
 Review comment:
   Fixed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r410246784
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManager.java
 ##########
 @@ -0,0 +1,1944 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.channels.FileChannel;
+import java.nio.file.FileVisitResult;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.SimpleFileVisitor;
+import java.nio.file.attribute.BasicFileAttributes;
+import java.util.ArrayDeque;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Queue;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.Executor;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.concurrent.atomic.LongAdder;
+import java.util.function.BiConsumer;
+import java.util.function.BiFunction;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.events.DiscoveryEvent;
+import org.apache.ignite.failure.FailureContext;
+import org.apache.ignite.failure.FailureType;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.GridTopic;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.events.DiscoveryCustomEvent;
+import org.apache.ignite.internal.managers.communication.GridIoManager;
+import org.apache.ignite.internal.managers.communication.GridMessageListener;
+import org.apache.ignite.internal.managers.communication.TransmissionCancelledException;
+import org.apache.ignite.internal.managers.communication.TransmissionHandler;
+import org.apache.ignite.internal.managers.communication.TransmissionMeta;
+import org.apache.ignite.internal.managers.communication.TransmissionPolicy;
+import org.apache.ignite.internal.managers.eventstorage.DiscoveryEventListener;
+import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.CacheType;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.PartitionsExchangeAware;
+import org.apache.ignite.internal.processors.cache.persistence.StorageException;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.filename.PdsFolderSettings;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.MetastorageLifecycleListener;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadOnlyMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadWriteMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.tree.io.PageIO;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.processors.marshaller.MappedName;
+import org.apache.ignite.internal.processors.metric.MetricRegistry;
+import org.apache.ignite.internal.processors.metric.impl.LongAdderMetric;
+import org.apache.ignite.internal.util.GridBusyLock;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.distributed.InitMessage;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.lang.GridClosureException;
+import org.apache.ignite.internal.util.tostring.GridToStringInclude;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.A;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.S;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.thread.IgniteThreadPoolExecutor;
+import org.apache.ignite.thread.OomExceptionHandler;
+import org.jetbrains.annotations.Nullable;
+
+import static java.nio.file.StandardOpenOption.READ;
+import static org.apache.ignite.cluster.ClusterState.active;
+import static org.apache.ignite.events.EventType.EVT_NODE_FAILED;
+import static org.apache.ignite.events.EventType.EVT_NODE_LEFT;
+import static org.apache.ignite.internal.IgniteFeatures.PERSISTENCE_CACHE_SNAPSHOT;
+import static org.apache.ignite.internal.IgniteFeatures.nodeSupports;
+import static org.apache.ignite.internal.MarshallerContextImpl.mappingFileStoreWorkDir;
+import static org.apache.ignite.internal.MarshallerContextImpl.saveMappings;
+import static org.apache.ignite.internal.events.DiscoveryCustomEvent.EVT_DISCOVERY_CUSTOM_EVT;
+import static org.apache.ignite.internal.managers.communication.GridIoPolicy.SYSTEM_POOL;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.INDEX_PARTITION;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.MAX_PARTITION_ID;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.resolveBinaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.INDEX_FILE_NAME;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.PART_FILE_TEMPLATE;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFile;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFileName;
+import static org.apache.ignite.internal.processors.cache.persistence.filename.PdsConsistentIdProcessor.DB_DEFAULT_FOLDER;
+import static org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId.getFlagByPartId;
+import static org.apache.ignite.internal.util.IgniteUtils.isLocalNodeCoordinator;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.END_SNAPSHOT;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.START_SNAPSHOT;
+
+/**
+ * Internal implementation of snapshot operations over persistence caches.
+ * <p>
+ * There are two major actions available:
+ * <ul>
+ *     <li>Create snapshot of the whole cluster cache groups by triggering PME to achieve consistency.</li>
+ *     <li>Create local snapshot of requested cache groups and send it to the node which request this operation.
+ *     Cache groups will be transmitted using internal API for transferring files. See {@link TransmissionHandler}.</li>
+ * </ul>
+ */
+public class IgniteSnapshotManager extends GridCacheSharedManagerAdapter
+    implements IgniteSnapshot, PartitionsExchangeAware, MetastorageLifecycleListener {
+    /** File with delta pages suffix. */
+    public static final String DELTA_SUFFIX = ".delta";
+
+    /** File name template consists of delta pages. */
+    public static final String PART_DELTA_TEMPLATE = PART_FILE_TEMPLATE + DELTA_SUFFIX;
+
+    /** File name template for index delta pages. */
+    public static final String INDEX_DELTA_NAME = INDEX_FILE_NAME + DELTA_SUFFIX;
+
+    /** Text Reason for checkpoint to start snapshot operation. */
+    public static final String CP_SNAPSHOT_REASON = "Checkpoint started to enforce snapshot operation: %s";
+
+    /** Name prefix for each remote snapshot operation. */
+    public static final String RMT_SNAPSHOT_PREFIX = "snapshot_";
+
+    /** Default snapshot directory for loading remote snapshots. */
+    public static final String DFLT_SNAPSHOT_TMP_DIR = "snp";
+
+    /** Timeout in millisecond for snapshot operations. */
+    public static final long DFLT_SNAPSHOT_TIMEOUT = 15_000L;
+
+    /** Snapshot in progress error message. */
+    public static final String SNP_IN_PROGRESS_ERR_MSG = "Operation rejected due to the snapshot operation in progress.";
+
+    /** Error message to finalize snapshot tasks. */
+    public static final String SNP_NODE_STOPPING_ERR_MSG = "Snapshot has been cancelled due to the local node " +
+        "is stopping";
+
+    /** Metastorage key to save currently running snapshot. */
+    public static final String SNP_RUNNING_KEY = "snapshot-running";
+
+    /** Snapshot metrics prefix. */
+    public static final String SNAPSHOT_METRICS = "snapshot";
+
+    /** Prefix for snapshot threads. */
+    private static final String SNAPSHOT_RUNNER_THREAD_PREFIX = "snapshot-runner";
+
+    /** Total number of thread to perform local snapshot. */
+    private static final int SNAPSHOT_THREAD_POOL_SIZE = 4;
+
+    /** Default snapshot topic to receive snapshots from remote node. */
+    private static final Object DFLT_INITIAL_SNAPSHOT_TOPIC = GridTopic.TOPIC_SNAPSHOT.topic("rmt_snp");
+
+    /** File transmission parameter of cache group id. */
+    private static final String SNP_GRP_ID_PARAM = "grpId";
+
+    /** File transmission parameter of cache partition id. */
+    private static final String SNP_PART_ID_PARAM = "partId";
+
+    /** File transmission parameter of node-sender directory path with its consistentId (e.g. db/IgniteNode0). */
+    private static final String SNP_DB_NODE_PATH_PARAM = "dbNodePath";
+
+    /** File transmission parameter of a cache directory with is currently sends its partitions. */
+    private static final String SNP_CACHE_DIR_NAME_PARAM = "cacheDirName";
+
+    /** Snapshot parameter name for a file transmission. */
+    private static final String SNP_NAME_PARAM = "snpName";
+
+    /** Total snapshot files count which receiver should expect to receive. */
+    private static final String SNP_PARTITIONS_CNT = "partsCnt";
+
+    /**
+     * Local buffer to perform copy-on-write operations with pages for {@code SnapshotFutureTask.PageStoreSerialWriter}s.
+     * It is important to have only only buffer per thread (instead of creating each buffer per
+     * each {@code SnapshotFutureTask.PageStoreSerialWriter}) this is redundant and can lead to OOM errors. Direct buffer
+     * deallocate only when ByteBuffer is garbage collected, but it can get out of off-heap memory before it.
+     */
+    private final ThreadLocal<ByteBuffer> locBuff;
+
+    /** Map of registered cache snapshot processes and their corresponding contexts. */
+    private final ConcurrentMap<String, SnapshotFutureTask> locSnpTasks = new ConcurrentHashMap<>();
+
+    /** Lock to protect the resources is used. */
+    private final GridBusyLock busyLock = new GridBusyLock();
+
+    /** Requested snapshot from remote node. */
+    private final AtomicReference<RemoteSnapshotFuture> rmtSnpReq = new AtomicReference<>();
+
+    /** Mutex used to order cluster snapshot operation progress. */
+    private final Object snpOpMux = new Object();
+
+    /** Take snapshot operation procedure. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> startSnpProc;
+
+    /** Check previously performed snapshot operation and delete uncompleted files if need. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> endSnpProc;
+
+    /** Resolved persistent data storage settings. */
+    private volatile PdsFolderSettings pdsSettings;
+
+    /** Fully initialized metastorage. */
+    private volatile ReadWriteMetastorage metaStorage;
+
+    /** Local snapshot sender factory. */
+    private Function<String, SnapshotSender> locSndrFactory = LocalSnapshotSender::new;
+
+    /** Main snapshot directory to save created snapshots. */
+    private volatile File locSnpDir;
+
+    /**
+     * Working directory for loaded snapshots from the remote nodes and storing
+     * temporary partition delta-files of locally started snapshot process.
+     */
+    private File tmpWorkDir;
+
+    /** Factory to working with delta as file storage. */
+    private volatile FileIOFactory ioFactory = new RandomAccessFileIOFactory();
+
+    /** Factory to create page store for restore. */
+    private volatile BiFunction<Integer, Boolean, FilePageStoreFactory> storeFactory;
+
+    /** Snapshot thread pool to perform local partition snapshots. */
+    private ExecutorService snpRunner;
+
+    /** System discovery message listener. */
+    private DiscoveryEventListener discoLsnr;
+
+    /** Cluster snapshot operation requested by user. */
+    private ClusterSnapshotFuture clusterSnpFut;
+
+    /** Current snapshot operation on local node. */
+    private volatile SnapshotOperationRequest clusterSnpReq;
+
+    /** {@code true} if recovery process occurred for snapshot. */
+    private volatile boolean recovered;
+
+    /** Last seen cluster snapshot operation. */
+    private volatile ClusterSnapshotFuture lastSeenSnpFut = new ClusterSnapshotFuture();
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public IgniteSnapshotManager(GridKernalContext ctx) {
+        locBuff = ThreadLocal.withInitial(() ->
+            ByteBuffer.allocateDirect(ctx.config().getDataStorageConfiguration().getPageSize())
+                .order(ByteOrder.nativeOrder()));
+
+        startSnpProc = new DistributedProcess<>(ctx, START_SNAPSHOT, this::initLocalSnapshotStartStage,
+            this::processLocalSnapshotStartStageResult, SnapshotStartDiscoveryMessage::new);
+
+        endSnpProc = new DistributedProcess<>(ctx, END_SNAPSHOT, this::initLocalSnapshotEndStage,
+            this::processLocalSnapshotEndStageResult);
+    }
+
+    /**
+     * @param snapshotCacheDir Snapshot directory to store files.
+     * @param partId Cache partition identifier.
+     * @return A file representation.
+     */
+    public static File partDeltaFile(File snapshotCacheDir, int partId) {
+        return new File(snapshotCacheDir, partDeltaFileName(partId));
+    }
+
+    /**
+     * @param partId Partition id.
+     * @return File name of delta partition pages.
+     */
+    public static String partDeltaFileName(int partId) {
+        assert partId <= MAX_PARTITION_ID || partId == INDEX_PARTITION;
+
+        return partId == INDEX_PARTITION ? INDEX_DELTA_NAME : String.format(PART_DELTA_TEMPLATE, partId);
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void start0() throws IgniteCheckedException {
+        super.start0();
+
+        GridKernalContext ctx = cctx.kernalContext();
+
+        if (ctx.clientNode())
+            return;
+
+        if (!CU.isPersistenceEnabled(ctx.config()))
+            return;
+
+        snpRunner = new IgniteThreadPoolExecutor(SNAPSHOT_RUNNER_THREAD_PREFIX,
+            cctx.igniteInstanceName(),
+            SNAPSHOT_THREAD_POOL_SIZE,
+            SNAPSHOT_THREAD_POOL_SIZE,
+            IgniteConfiguration.DFLT_THREAD_KEEP_ALIVE_TIME,
+            new LinkedBlockingQueue<>(),
+            SYSTEM_POOL,
+            new OomExceptionHandler(ctx));
+
+        assert cctx.pageStore() instanceof FilePageStoreManager;
+
+        FilePageStoreManager storeMgr = (FilePageStoreManager)cctx.pageStore();
+
+        pdsSettings = cctx.kernalContext().pdsFolderResolver().resolveFolders();
+
+        locSnpDir = resolveSnapshotWorkDirectory(ctx.config());
+        tmpWorkDir = Paths.get(storeMgr.workDir().getAbsolutePath(), DFLT_SNAPSHOT_TMP_DIR).toFile();
+
+        U.ensureDirectory(locSnpDir, "snapshot work directory", log);
+        U.ensureDirectory(tmpWorkDir, "temp directory for snapshot creation", log);
+
+        MetricRegistry mreg = cctx.kernalContext().metric().registry(SNAPSHOT_METRICS);
+
+        mreg.register("LastSnapshotStartTime", () -> lastSeenSnpFut.startTime,
+            "The system time of the last cluster snapshot request start time on this node.");
+        mreg.register("LastSnapshotEndTime", () -> lastSeenSnpFut.endTime,
+            "The system time of the last cluster snapshot request end time on this node.");
+        mreg.register("LastSnapshotName", () -> lastSeenSnpFut.name, String.class,
+            "The name of last started cluster snapshot request on this node.");
+        mreg.register("LastSnapshotErrorMessage",
+            () -> lastSeenSnpFut.error() == null ? null : lastSeenSnpFut.error().getMessage(),
+            String.class,
+            "The error message of last started cluster snapshot request which fail with an error. " +
+                "This value will be 'null' if last snapshot request has been completed successfully.");
+        mreg.register("LocalSnapshotList", this::getSnapshots, List.class,
+            "The list of names of all snapshots currently saved on the local node with respect to " +
+                "the configured via IgniteConfiguration snapshot working path.");
+
+        storeFactory = storeMgr::getPageStoreFactory;
+
+        cctx.exchange().registerExchangeAwareComponent(this);
+        ctx.internalSubscriptionProcessor().registerMetastorageListener(this);
+
+        // Receive remote snapshots requests.
+        cctx.gridIO().addMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC, new GridMessageListener() {
+            @Override public void onMessage(UUID nodeId, Object msg, byte plc) {
+                if (!busyLock.enterBusy())
+                    return;
+
+                try {
+                    if (msg instanceof SnapshotRequestMessage) {
+                        SnapshotRequestMessage reqMsg0 = (SnapshotRequestMessage)msg;
+                        String snpName = reqMsg0.snapshotName();
+
+                        synchronized (this) {
+                            SnapshotFutureTask task = lastScheduledRemoteSnapshotTask(nodeId);
+
+                            if (task != null) {
+                                // Task will also be removed from local map due to the listener on future done.
+                                task.cancel();
+
+                                log.info("Snapshot request has been cancelled due to another request received " +
+                                    "[prevSnpResp=" + task + ", msg0=" + reqMsg0 + ']');
+                            }
+                        }
+
+                        SnapshotFutureTask task = registerSnapshotTask(snpName,
+                            nodeId,
+                            reqMsg0.parts(),
+                            remoteSnapshotSender(snpName, nodeId));
+
+                        task.listen(f -> {
+                            if (f.error() == null)
+                                return;
+
+                            U.error(log, "Failed to process request of creating a snapshot " +
+                                "[from=" + nodeId + ", msg=" + reqMsg0 + ']', f.error());
+
+                            try {
+                                cctx.gridIO().sendToCustomTopic(nodeId,
+                                    DFLT_INITIAL_SNAPSHOT_TOPIC,
+                                    new SnapshotResponseMessage(reqMsg0.snapshotName(), f.error().getMessage()),
+                                    SYSTEM_POOL);
+                            }
+                            catch (IgniteCheckedException ex0) {
+                                U.error(log, "Fail to send the response message with processing snapshot request " +
+                                    "error [request=" + reqMsg0 + ", nodeId=" + nodeId + ']', ex0);
+                            }
+                        });
+
+                        task.start();
+                    }
+                    else if (msg instanceof SnapshotResponseMessage) {
+                        SnapshotResponseMessage respMsg0 = (SnapshotResponseMessage)msg;
+
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.snpName.equals(respMsg0.snapshotName())) {
+                            if (log.isInfoEnabled()) {
+                                log.info("A stale snapshot response message has been received. Will be ignored " +
+                                    "[fromNodeId=" + nodeId + ", response=" + respMsg0 + ']');
+                            }
+
+                            return;
+                        }
+
+                        if (respMsg0.errorMessage() != null) {
+                            fut0.onDone(new IgniteCheckedException("Request cancelled. The snapshot operation stopped " +
+                                "on the remote node with an error: " + respMsg0.errorMessage()));
+                        }
+                    }
+                }
+                catch (Throwable e) {
+                    U.error(log, "Processing snapshot request from remote node fails with an error", e);
+
+                    cctx.kernalContext().failure().process(new FailureContext(FailureType.CRITICAL_ERROR, e));
+                }
+                finally {
+                    busyLock.leaveBusy();
+                }
+            }
+        });
+
+        cctx.gridEvents().addDiscoveryEventListener(discoLsnr = (evt, discoCache) -> {
+            if (!busyLock.enterBusy())
+                return;
+
+            try {
+                UUID leftNodeId = evt.eventNode().id();
+
+                if (evt.type() == EVT_NODE_LEFT || evt.type() == EVT_NODE_FAILED) {
+                    SnapshotOperationRequest snpReq = clusterSnpReq;
+
+                    for (SnapshotFutureTask sctx : locSnpTasks.values()) {
+                        if (sctx.sourceNodeId().equals(leftNodeId) ||
+                            (snpReq != null &&
+                                snpReq.snpName.equals(sctx.snapshotName()) &&
+                                snpReq.bltNodes.contains(leftNodeId))) {
+                            sctx.acceptException(new ClusterTopologyCheckedException("Snapshot operation interrupted. " +
+                                "One of baseline nodes left the cluster: " + leftNodeId));
+                        }
+                    }
+
+                    RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                    if (snpTrFut != null && snpTrFut.rmtNodeId.equals(leftNodeId)) {
+                        snpTrFut.onDone(new ClusterTopologyCheckedException("The node from which a snapshot has been " +
+                            "requested left the grid"));
+                    }
+                }
+            }
+            finally {
+                busyLock.leaveBusy();
+            }
+        }, EVT_NODE_LEFT, EVT_NODE_FAILED);
+
+        // Remote snapshot handler.
+        cctx.kernalContext().io().addTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC, new TransmissionHandler() {
+            @Override public void onEnd(UUID nodeId) {
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                assert snpTrFut.stores.isEmpty() : snpTrFut.stores.entrySet();
+                assert snpTrFut.partsLeft == 0 : snpTrFut;
+
+                snpTrFut.onDone();
+
+                log.info("Requested snapshot from remote node has been fully received " +
+                    "[snpName=" + snpTrFut.snpName + ", snpTrans=" + snpTrFut + ']');
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onException(UUID nodeId, Throwable err) {
+                RemoteSnapshotFuture fut = rmtSnpReq.get();
+
+                if (fut == null)
+                    return;
+
+                if (fut.rmtNodeId.equals(nodeId))
+                    fut.onDone(err);
+            }
+
+            /** {@inheritDoc} */
+            @Override public String filePath(UUID nodeId, TransmissionMeta fileMeta) {
+                Integer partId = (Integer)fileMeta.params().get(SNP_PART_ID_PARAM);
+                String rmtDbNodePath = (String)fileMeta.params().get(SNP_DB_NODE_PATH_PARAM);
+                String cacheDirName = (String)fileMeta.params().get(SNP_CACHE_DIR_NAME_PARAM);
+
+                RemoteSnapshotFuture transFut = resolve(nodeId, fileMeta);
+
+                try {
+                    File cacheDir = U.resolveWorkDirectory(tmpWorkDir.getAbsolutePath(),
+                        Paths.get(transFut.snpName, rmtDbNodePath, cacheDirName).toString(),
+                        false);
+
+                    return new File(cacheDir, getPartitionFileName(partId)).getAbsolutePath();
+                }
+                catch (IgniteCheckedException e) {
+                    throw new IgniteException(e);
+                }
+            }
+
+            /**
+             * @param nodeId Remote node id.
+             * @param meta Transmission meta.
+             * @return Resolved transmission future.
+             */
+            private RemoteSnapshotFuture resolve(UUID nodeId, TransmissionMeta meta) {
+                String snpName = (String)meta.params().get(SNP_NAME_PARAM);
+                Integer partsCnt = (Integer)meta.params().get(SNP_PARTITIONS_CNT);
+
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                if (snpTrFut == null || !snpTrFut.snpName.equals(snpName)) {
+                    throw new TransmissionCancelledException("Stale snapshot transmission will be ignored " +
+                        "[snpName=" + snpName + ", meta=" + meta + ", snpTrFut=" + snpTrFut + ']');
+                }
+
+                assert snpTrFut.snpName.equals(snpName) && snpTrFut.rmtNodeId.equals(nodeId) :
+                    "Another transmission in progress [snpTrFut=" + snpTrFut + ", nodeId=" + snpName + ']';
+
+                if (snpTrFut.partsLeft == -1)
+                    snpTrFut.partsLeft = partsCnt;
+
+                return snpTrFut;
+            }
+
+            /**
+             * @param snpTrans Current snapshot transmission.
+             * @param grpPartId Pair of group id and its partition id.
+             */
+            private void finishRecover(RemoteSnapshotFuture snpTrans, GroupPartitionId grpPartId) {
+                FilePageStore pageStore = null;
+
+                try {
+                    pageStore = snpTrans.stores.remove(grpPartId);
+
+                    pageStore.finishRecover();
+
+                    snpTrans.partConsumer.accept(new File(pageStore.getFileAbsolutePath()), grpPartId);
+
+                    snpTrans.partsLeft--;
+                }
+                catch (StorageException e) {
+                    throw new IgniteException(e);
+                }
+                finally {
+                    U.closeQuiet(pageStore);
+                }
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<ByteBuffer> chunkHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+
+                RemoteSnapshotFuture snpTrFut = resolve(nodeId, initMeta);
+
+                GroupPartitionId grpPartId = new GroupPartitionId(grpId, partId);
+                FilePageStore pageStore = snpTrFut.stores.get(grpPartId);
+
+                if (pageStore == null) {
+                    throw new IgniteException("Partition must be loaded before applying snapshot delta pages " +
+                        "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                pageStore.beginRecover();
+
+                // No snapshot delta pages received. Finalize recovery.
+                if (initMeta.count() == 0)
+                    finishRecover(snpTrFut, grpPartId);
+
+                return new Consumer<ByteBuffer>() {
+                    final LongAdder transferred = new LongAdder();
+
+                    @Override public void accept(ByteBuffer buff) {
+                        try {
+                            assert initMeta.count() != 0 : initMeta;
+
+                            RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                            if (fut0 == null || !fut0.equals(snpTrFut) || fut0.isCancelled()) {
+                                throw new TransmissionCancelledException("Snapshot request is cancelled " +
+                                    "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                            }
+
+                            pageStore.write(PageIO.getPageId(buff), buff, 0, false);
+
+                            transferred.add(buff.capacity());
+
+                            if (transferred.longValue() == initMeta.count())
+                                finishRecover(snpTrFut, grpPartId);
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                    }
+                };
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<File> fileHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+                String snpName = (String)initMeta.params().get(SNP_NAME_PARAM);
+
+                assert grpId != null;
+                assert partId != null;
+                assert snpName != null;
+                assert storeFactory != null;
+
+                RemoteSnapshotFuture transFut = rmtSnpReq.get();
+
+                if (transFut == null) {
+                    throw new IgniteException("Snapshot transmission with given name doesn't exists " +
+                        "[snpName=" + snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                return new Consumer<File>() {
+                    @Override public void accept(File file) {
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.equals(transFut) || fut0.isCancelled()) {
+                            throw new TransmissionCancelledException("Snapshot request is cancelled [snpName=" + snpName +
+                                ", grpId=" + grpId + ", partId=" + partId + ']');
+                        }
+
+                        busyLock.enterBusy();
+
+                        try {
+                            FilePageStore pageStore = (FilePageStore)storeFactory
+                                .apply(grpId, false)
+                                .createPageStore(getFlagByPartId(partId),
+                                    file::toPath,
+                                    new LongAdderMetric("NO_OP", null));
+
+                            transFut.stores.put(new GroupPartitionId(grpId, partId), pageStore);
+
+                            pageStore.init();
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                        finally {
+                            busyLock.leaveBusy();
+                        }
+                    }
+                };
+            }
+        });
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void stop0(boolean cancel) {
+        busyLock.block();
+
+        try {
+            // Try stop all snapshot processing if not yet.
+            for (SnapshotFutureTask sctx : locSnpTasks.values())
+                sctx.acceptException(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+            locSnpTasks.clear();
+
+            RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+            if (snpTrFut != null)
+                snpTrFut.cancel();
+
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null) {
+                    clusterSnpFut.onDone(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+                    clusterSnpFut = null;
+                }
+            }
+
+            if (snpRunner != null)
+                snpRunner.shutdownNow();
+
+            cctx.kernalContext().io().removeMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC);
+            cctx.kernalContext().io().removeTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC);
+
+            if (discoLsnr != null)
+                cctx.kernalContext().event().removeDiscoveryEventListener(discoLsnr);
+
+            cctx.exchange().unregisterExchangeAwareComponent(this);
+        }
+        finally {
+            busyLock.unblock();
+        }
+    }
+
+    /**
+     * @param snpDir Snapshot dir.
+     * @param folderName Local node folder name (see {@link U#maskForFileName} with consistent id).
+     */
+    public static void deleteSnapshot(File snpDir, String folderName) {
+        if (!snpDir.exists())
+            return;
+
+        assert snpDir.isDirectory() : snpDir;
+
+        try {
+            File binDir = resolveBinaryWorkDir(snpDir.getAbsolutePath(), folderName);
+            File dbDir = U.resolveWorkDirectory(snpDir.getAbsolutePath(), databaseRelativePath(folderName), false);
+
+            U.delete(binDir);
+            U.delete(dbDir);
+
+            File marshDir = mappingFileStoreWorkDir(snpDir.getAbsolutePath());
+
+            // Concurrently traverse the snapshot marshaller directory and delete all files.
+            Files.walkFileTree(marshDir.toPath(), new SimpleFileVisitor<Path>() {
+                @Override public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException {
+                    U.delete(file);
+
+                    return FileVisitResult.CONTINUE;
+                }
+
+                @Override public FileVisitResult visitFileFailed(Path file, IOException exc) {
+                    // Skip files which can be concurrently removed from FileTree.
+                    return FileVisitResult.CONTINUE;
+                }
+            });
+
+            File db = new File(snpDir, DB_DEFAULT_FOLDER);
+
+            if (!db.exists() || db.list().length == 0)
+                U.delete(snpDir);
+        }
+        catch (IOException | IgniteCheckedException e) {
+            throw new IgniteException(e);
+        }
+    }
+
+    /**
+     * @param snpName Snapshot name.
+     * @return Local snapshot directory for snapshot with given name.
+     */
+    public File snapshotLocalDir(String snpName) {
+        assert locSnpDir != null;
+
+        return new File(locSnpDir, snpName);
+    }
+
+    /**
+     * @return Node snapshot working directory.
+     */
+    public File snapshotTmpDir() {
+        assert tmpWorkDir != null;
+
+        return tmpWorkDir;
+    }
+
+    /**
+     * @param req Request on snapshot creation.
+     * @return Future which will be completed when a snapshot has been started.
+     */
+    private IgniteInternalFuture<SnapshotOperationResponse> initLocalSnapshotStartStage(SnapshotOperationRequest req) {
+        if (cctx.kernalContext().clientNode() ||
+            !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()))
+            return new GridFinishedFuture<>();
+
+        // Executed inside discovery notifier thread, prior to firing discovery custom event,
+        // so it is safe to set new snapshot task inside this method without synchronization.
+        if (clusterSnpReq != null) {
+            return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot operation has been rejected. " +
+                "Another snapshot operation in progress [req=" + req + ", curr=" + clusterSnpReq + ']'));
+        }
+
+        Set<UUID> leftNodes = new HashSet<>(req.bltNodes);
+        leftNodes.removeAll(F.viewReadOnly(cctx.discovery().serverNodes(AffinityTopologyVersion.NONE),
+            F.node2id()));
+
+        if (!leftNodes.isEmpty()) {
+            return new GridFinishedFuture<>(new IgniteCheckedException("Some of baseline nodes left the cluster " +
+                "prior to snapshot operation start: " + leftNodes));
+        }
+
+        Set<Integer> leftGrps = new HashSet<>(req.grpIds);
+        leftGrps.removeAll(cctx.cache().cacheGroupDescriptors().keySet());
+
+        if (!leftGrps.isEmpty()) {
+            return new GridFinishedFuture<>(new IgniteCheckedException("Some of requested cache groups doesn't exist " +
+                "on the local node [missed=" + leftGrps + ", nodeId=" + cctx.localNodeId() + ']'));
+        }
+
+        Map<Integer, Set<Integer>> parts = new HashMap<>();
+
+        // Prepare collection of pairs group and appropriate cache partition to be snapshot.
+        // Cache group context may be 'null' on some nodes e.g. a node filter is set.
+        for (Integer grpId : req.grpIds) {
+            if (cctx.cache().cacheGroup(grpId) == null)
+                continue;
+
+            parts.put(grpId, null);
+        }
+
+        if (parts.isEmpty())
+            return new GridFinishedFuture<>();
+
+        SnapshotFutureTask task0 = registerSnapshotTask(req.snpName,
+            req.srcNodeId,
+            parts,
+            locSndrFactory.apply(req.snpName));
+
+        clusterSnpReq = req;
+
+        return task0.chain(fut -> {
+            if (fut.error() == null)
+                return new SnapshotOperationResponse();
+            else
+                throw new GridClosureException(fut.error());
+        });
+    }
+
+    /**
+     * @param id Request id.
+     * @param res Results.
+     * @param err Errors.
+     */
+    private void processLocalSnapshotStartStageResult(UUID id, Map<UUID, SnapshotOperationResponse> res, Map<UUID, Exception> err) {
+        if (cctx.kernalContext().clientNode())
+            return;
+
+        SnapshotOperationRequest snpReq = clusterSnpReq;
+
+        if (snpReq == null || !snpReq.rqId.equals(id)) {
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null && clusterSnpFut.rqId.equals(id)) {
+                    clusterSnpFut.onDone(new IgniteCheckedException("Snapshot operation has not been fully completed " +
+                        "[err=" + err + ", snpReq=" + snpReq + ']'));
+
+                    clusterSnpFut = null;
+                }
+
+                return;
+            }
+        }
+
+        if (isLocalNodeCoordinator(cctx.discovery())) {
+            Set<UUID> missed = new HashSet<>(snpReq.bltNodes);
+            missed.removeAll(res.keySet());
+            missed.removeAll(err.keySet());
+
+            snpReq.hasErr = !F.isEmpty(err) || !missed.isEmpty();
+
+            if (snpReq.hasErr) {
+                U.warn(log, "Execution of local snapshot tasks fails or them haven't been executed " +
+                    "due to some of nodes left the cluster. Uncompleted snapshot will be deleted " +
+                    "[err=" + err + ", missed=" + missed + ']');
+            }
+
+            endSnpProc.start(UUID.randomUUID(), snpReq);
+        }
+    }
+
+    /**
+     * @param req Request on snapshot creation.
+     * @return Future which will be completed when the snapshot will be finalized.
+     */
+    private IgniteInternalFuture<SnapshotOperationResponse> initLocalSnapshotEndStage(SnapshotOperationRequest req) {
+        if (clusterSnpReq == null)
+            return new GridFinishedFuture<>(new SnapshotOperationResponse());
+
+        try {
+            if (req.hasErr)
+                deleteSnapshot(snapshotLocalDir(req.snpName), pdsSettings.folderName());
+
+            removeLastMetaStorageKey();
+        }
+        catch (Exception e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        return new GridFinishedFuture<>(new SnapshotOperationResponse());
+    }
+
+    /**
+     * @param id Request id.
+     * @param res Results.
+     * @param err Errors.
+     */
+    private void processLocalSnapshotEndStageResult(UUID id, Map<UUID, SnapshotOperationResponse> res, Map<UUID, Exception> err) {
+        SnapshotOperationRequest snpReq = clusterSnpReq;
+
+        if (snpReq == null)
+            return;
+
+        Set<UUID> endFail = new HashSet<>(snpReq.bltNodes);
+        endFail.removeAll(res.keySet());
+
+        clusterSnpReq = null;
+
+        synchronized (snpOpMux) {
+            if (clusterSnpFut != null) {
+                if (endFail.isEmpty() && !snpReq.hasErr) {
+                    clusterSnpFut.onDone();
+
+                    if (log.isInfoEnabled())
+                        log.info("Cluster-wide snapshot operation finished successfully [req=" + snpReq + ']');
+                }
+                else {
+                    clusterSnpFut.onDone(new IgniteCheckedException("Snapshot creation has been finished with an error. " +
+                        "Local snapshot tasks may not finished completely or finalizing results fails " +
+                        "[hasErr=" + snpReq.hasErr + ", fail=" + endFail + ", err=" + err + ']'));
+                }
+
+                clusterSnpFut = null;
+            }
+        }
+    }
+
+    /**
+     * @return {@code True} if snapshot operation is in progress.
+     */
+    public boolean isSnapshotCreating() {
+        if (clusterSnpReq != null)
+            return true;
+
+        synchronized (snpOpMux) {
+            return clusterSnpReq != null || clusterSnpFut != null;
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public List<String> getSnapshots() {
+        if (cctx.kernalContext().clientNode())
+            throw new UnsupportedOperationException("Client and daemon nodes can not perform this operation.");
+
+        synchronized (snpOpMux) {
+            return Arrays.stream(locSnpDir.listFiles(File::isDirectory))
+                .map(File::getName)
+                .collect(Collectors.toList());
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public IgniteFuture<Void> createSnapshot(String name) {
+        A.notNullOrEmpty(name, "name");
+
+        try {
+            if (cctx.kernalContext().clientNode())
+                throw new UnsupportedOperationException("Client and daemon nodes can not perform this operation.");
+
+            if (!IgniteFeatures.allNodesSupports(cctx.discovery().allNodes(), PERSISTENCE_CACHE_SNAPSHOT))
+                throw new IgniteException("Not all nodes in the cluster support a snapshot operation.");
+
+            if (!active(cctx.kernalContext().state().clusterState().state()))
+                throw new IgniteException("Snapshot operation has been rejected. The cluster is inactive.");
+
+            DiscoveryDataClusterState clusterState = cctx.kernalContext().state().clusterState();
+
+            if (!clusterState.hasBaselineTopology())
+                throw new IgniteException("Snapshot operation has been rejected. The baseline topology is not configured for cluster.");
+
+            ClusterSnapshotFuture snpFut0;
+
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null && !clusterSnpFut.isDone())
+                    throw new IgniteException("Create snapshot request has been rejected. The previous snapshot operation was not completed.");
+
+                if (clusterSnpReq != null)
+                    throw new IgniteException("Create snapshot request has been rejected. Parallel snapshot processes are not allowed.");
+
+                if (getSnapshots().contains(name))
+                    throw new IgniteException("Create snapshot request has been rejected. Snapshot with given name already exists.");
+
+                snpFut0 = new ClusterSnapshotFuture(UUID.randomUUID(), name);
+
+                clusterSnpFut = snpFut0;
+                lastSeenSnpFut = snpFut0;
+            }
+
+            List<Integer> grps = cctx.cache().persistentGroups().stream()
+                .filter(g -> cctx.cache().cacheType(g.cacheOrGroupName()) == CacheType.USER)
+                .filter(g -> !g.config().isEncryptionEnabled())
+                .map(CacheGroupDescriptor::groupId)
+                .collect(Collectors.toList());
+
+            List<ClusterNode> srvNodes = cctx.discovery().serverNodes(AffinityTopologyVersion.NONE);
+
+            startSnpProc.start(snpFut0.rqId, new SnapshotOperationRequest(snpFut0.rqId,
+                cctx.localNodeId(),
+                name,
+                grps,
+                new HashSet<>(F.viewReadOnly(srvNodes,
+                    F.node2id(),
+                    (node) -> CU.baselineNode(node, clusterState)))));
+
+            if (log.isInfoEnabled())
+                log.info("Cluster-wide snapshot operation started [snpName=" + name + ", grps=" + grps + ']');
+
+            return new IgniteFutureImpl<>(snpFut0);
+        }
+        catch (Exception e) {
+            U.error(log, "Start snapshot operation failed", e);
+
+            lastSeenSnpFut = new ClusterSnapshotFuture(name, e);
+
+            return new IgniteFinishedFutureImpl<>(e);
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onReadyForReadWrite(ReadWriteMetastorage metaStorage) throws IgniteCheckedException {
+        synchronized (snpOpMux) {
+            this.metaStorage = metaStorage;
+
+            if (recovered)
+                removeLastMetaStorageKey();
+
+            recovered = false;
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onReadyForRead(ReadOnlyMetastorage metaStorage) throws IgniteCheckedException {
+        // Snapshot which has not been completed due to the local node crashed must be deleted.
+        String snpName = (String)metaStorage.read(SNP_RUNNING_KEY);
+
+        if (snpName == null)
+            return;
+
+        recovered = true;
+
+        for (File tmp : snapshotTmpDir().listFiles())
+            U.delete(tmp);
+
+        deleteSnapshot(snapshotLocalDir(snpName), pdsSettings.folderName());
+
+        if (log.isInfoEnabled()) {
+            log.info("Previous attempt to create snapshot fail due to the local node crash. All resources " +
+                "related to snapshot operation have been deleted: " + snpName);
+        }
+    }
+
+    /**
+     * @param evt Discovery event to check.
+     * @return {@code true} if exchange started by snapshot operation.
+     */
+    public static boolean isSnapshotOperation(DiscoveryEvent evt) {
+        return !evt.eventNode().isClient() &&
+            evt.type() == EVT_DISCOVERY_CUSTOM_EVT &&
+            ((DiscoveryCustomEvent)evt).customMessage() instanceof SnapshotStartDiscoveryMessage;
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onDoneBeforeTopologyUnlock(GridDhtPartitionsExchangeFuture fut) {
+        if (clusterSnpReq == null || cctx.kernalContext().clientNode())
+            return;
+
+        SnapshotOperationRequest snpReq = clusterSnpReq;
+
+        SnapshotFutureTask task = locSnpTasks.get(snpReq.snpName);
+
+        if (task == null)
+            return;
+
+        if (task.start()) {
+            cctx.database().forceCheckpoint(String.format("Start snapshot operation: %s", snpReq.snpName));
+
+            // Schedule task on a checkpoint and wait when it starts.
+            try {
+                task.awaitStarted();
+            }
+            catch (IgniteCheckedException e) {
+                U.error(log, "Fail to wait while cluster-wide snapshot operation started", e);
+            }
+        }
+    }
+
+    /**
+     * @param parts Collection of pairs group and appropriate cache partition to be snapshot.
+     * @param rmtNodeId The remote node to connect to.
+     * @param partConsumer Received partition handler.
+     * @return Future which will be completed when requested snapshot fully received.
+     */
+    public IgniteInternalFuture<Void> requestRemoteSnapshot(
+        UUID rmtNodeId,
+        Map<Integer, Set<Integer>> parts,
+        BiConsumer<File, GroupPartitionId> partConsumer
+    ) {
+        assert partConsumer != null;
+
+        ClusterNode rmtNode = cctx.discovery().node(rmtNodeId);
+
+        if (!nodeSupports(rmtNode, PERSISTENCE_CACHE_SNAPSHOT))
+            return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot on remote node is not supported: " + rmtNode.id()));
+
+        if (rmtNode == null) {
+            return new GridFinishedFuture<>(new ClusterTopologyCheckedException("Snapshot request cannot be performed. " +
+                "Remote node left the grid [rmtNodeId=" + rmtNodeId + ']'));
+        }
+
+        String snpName = RMT_SNAPSHOT_PREFIX + UUID.randomUUID().toString();
+
+        RemoteSnapshotFuture snpTransFut = new RemoteSnapshotFuture(rmtNodeId, snpName, partConsumer);
+
+        busyLock.enterBusy();
+        SnapshotRequestMessage msg0;
+
+        try {
+            msg0 = new SnapshotRequestMessage(snpName, parts);
+
+            RemoteSnapshotFuture fut = rmtSnpReq.get();
+
+            try {
+                if (fut != null)
+                    fut.get(DFLT_SNAPSHOT_TIMEOUT, TimeUnit.MILLISECONDS);
+            }
+            catch (IgniteCheckedException e) {
+                if (log.isInfoEnabled())
+                    log.info("The previous snapshot request finished with an exception:" + e.getMessage());
+            }
+
+            try {
+                if (rmtSnpReq.compareAndSet(null, snpTransFut)) {
+                    cctx.gridIO().sendOrderedMessage(rmtNode, DFLT_INITIAL_SNAPSHOT_TOPIC, msg0, SYSTEM_POOL,
+                        Long.MAX_VALUE, true);
+                }
+                else
+                    return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot request has been concurrently interrupted."));
+
+            }
+            catch (IgniteCheckedException e) {
+                rmtSnpReq.compareAndSet(snpTransFut, null);
+
+                return new GridFinishedFuture<>(e);
+            }
+        }
+        finally {
+            busyLock.leaveBusy();
+        }
+
+        if (log.isInfoEnabled()) {
+            log.info("Snapshot request is sent to the remote node [rmtNodeId=" + rmtNodeId +
+                ", msg0=" + msg0 + ", snpTransFut=" + snpTransFut +
+                ", topVer=" + cctx.discovery().topologyVersionEx() + ']');
+        }
+
+        return snpTransFut;
+    }
+
+    /**
+     * @param grps List of cache groups which will be destroyed.
+     */
+    public void onCacheGroupsStopped(List<Integer> grps) {
+        for (SnapshotFutureTask sctx : locSnpTasks.values()) {
+            Set<Integer> retain = new HashSet<>(grps);
+            retain.retainAll(sctx.affectedCacheGroups());
+
+            if (!retain.isEmpty()) {
+                sctx.acceptException(new IgniteCheckedException("Snapshot has been interrupted due to some of the required " +
+                    "cache groups stopped: " + retain));
+            }
+        }
+    }
+
+    /**
+     * @param snpName Unique snapshot name.
+     * @param srcNodeId Node id which cause snapshot operation.
+     * @param parts Collection of pairs group and appropriate cache partition to be snapshot.
+     * @param snpSndr Factory which produces snapshot receiver instance.
+     * @return Snapshot operation task which should be registered on checkpoint to run.
+     */
+    SnapshotFutureTask registerSnapshotTask(
+        String snpName,
+        UUID srcNodeId,
+        Map<Integer, Set<Integer>> parts,
+        SnapshotSender snpSndr
+    ) {
+        if (!busyLock.enterBusy())
+            return new SnapshotFutureTask(new IgniteCheckedException("Snapshot manager is stopping [locNodeId=" + cctx.localNodeId() + ']'));
+
+        try {
+            if (locSnpTasks.containsKey(snpName))
+                return new SnapshotFutureTask(new IgniteCheckedException("Snapshot with requested name is already scheduled: " + snpName));
+
+            SnapshotFutureTask snpFutTask;
+
+            SnapshotFutureTask prev = locSnpTasks.putIfAbsent(snpName,
+                snpFutTask = new SnapshotFutureTask(cctx,
+                    srcNodeId,
+                    snpName,
+                    tmpWorkDir,
+                    ioFactory,
+                    snpSndr,
+                    parts,
+                    locBuff));
+
+            if (prev != null)
+                return new SnapshotFutureTask(new IgniteCheckedException("Snapshot with requested name is already scheduled: " + snpName));
+
+            if (log.isInfoEnabled()) {
+                log.info("Snapshot task has been registered on local node [sctx=" + this +
+                    ", topVer=" + cctx.discovery().topologyVersionEx() + ']');
+            }
+
+            snpFutTask.listen(f -> locSnpTasks.remove(snpName));
+
+            return snpFutTask;
+        }
+        finally {
+            busyLock.leaveBusy();
+        }
+    }
+
+    /**
+     * @param factory Factory which produces {@link LocalSnapshotSender} implementation.
+     */
+    void localSnapshotSenderFactory(Function<String, SnapshotSender> factory) {
+        locSndrFactory = factory;
+    }
+
+    /**
+     * @return Factory which produces {@link LocalSnapshotSender} implementation.
+     */
+    Function<String, SnapshotSender> localSnapshotSenderFactory() {
+        return locSndrFactory;
+    }
+
+    /**
+     * @param snpName Snapshot name.
+     * @param rmtNodeId Remote node id to send snapshot to.
+     * @return Snapshot sender instance.
+     */
+    SnapshotSender remoteSnapshotSender(String snpName, UUID rmtNodeId) {
+        // Remote snapshots can be send only by single threaded executor since only one transmissionSender created.
+        return new RemoteSnapshotSender(log,
+            new SequentialExecutorWrapper(log, snpRunner),
+            () -> databaseRelativePath(pdsSettings.folderName()),
+            cctx.gridIO().openTransmissionSender(rmtNodeId, DFLT_INITIAL_SNAPSHOT_TOPIC),
+            snpName);
+    }
+
+    /** Snapshot finished successfully or already restored. Key can be removed. */
+    private void removeLastMetaStorageKey() throws IgniteCheckedException {
+        cctx.database().checkpointReadLock();
+
+        try {
+            metaStorage.remove(SNP_RUNNING_KEY);
+        }
+        finally {
+            cctx.database().checkpointReadUnlock();
+        }
+    }
+
+    /**
+     * @return The executor service used to run snapshot tasks.
+     */
+    ExecutorService snapshotExecutorService() {
+        assert snpRunner != null;
+
+        return snpRunner;
+    }
+
+    /**
+     * @param ioFactory Factory to create IO interface over a page stores.
+     */
+    void ioFactory(FileIOFactory ioFactory) {
+        this.ioFactory = ioFactory;
+    }
+
+    /**
+     * @param nodeId Remote node id on which requests has been registered.
+     * @return Snapshot future related to given node id.
+     */
+    SnapshotFutureTask lastScheduledRemoteSnapshotTask(UUID nodeId) {
+        return locSnpTasks.values().stream()
+            .filter(t -> t.type() == RemoteSnapshotSender.class && t.sourceNodeId().equals(nodeId))
+            .findFirst()
+            .orElse(null);
+    }
+
+    /**
+     * @return Relative configured path of persistence data storage directory for the local node.
+     * Example: {@code snapshotWorkDir/db/IgniteNodeName0}
+     */
+    static String databaseRelativePath(String folderName) {
+        return Paths.get(DB_DEFAULT_FOLDER, folderName).toString();
+    }
+
+    /**
+     * @param cfg Ignite configuration.
+     * @return Snapshot directory resolved through given configuration.
+     */
+    static File resolveSnapshotWorkDirectory(IgniteConfiguration cfg) {
+        try {
+            return U.resolveWorkDirectory(cfg.getWorkDirectory(), cfg.getSnapshotPath(), false);
+        }
+        catch (IgniteCheckedException e) {
+            throw new IgniteException(e);
+        }
+    }
+
+    /** Remote snapshot future which tracks remote snapshot transmission result. */
+    private class RemoteSnapshotFuture extends GridFutureAdapter<Void> {
+        /** Snapshot name to create. */
+        private final String snpName;
+
+        /** Remote node id to request snapshot from. */
+        private final UUID rmtNodeId;
+
+        /** Collection of partition to be received. */
+        private final Map<GroupPartitionId, FilePageStore> stores = new ConcurrentHashMap<>();
+
+        /** Partition handler given by request initiator. */
+        private final BiConsumer<File, GroupPartitionId> partConsumer;
+
+        /** Counter which show how many partitions left to be received. */
+        private int partsLeft = -1;
+
+        /**
+         * @param partConsumer Received partition handler.
+         */
+        public RemoteSnapshotFuture(UUID rmtNodeId, String snpName, BiConsumer<File, GroupPartitionId> partConsumer) {
+            this.snpName = snpName;
+            this.rmtNodeId = rmtNodeId;
+            this.partConsumer = partConsumer;
+        }
+
+        /** {@inheritDoc} */
+        @Override public boolean cancel() {
+            return onCancelled();
+        }
+
+        /** {@inheritDoc} */
+        @Override protected boolean onDone(@Nullable Void res, @Nullable Throwable err, boolean cancel) {
+            assert err != null || cancel || stores.isEmpty() : "Not all file storage processed: " + stores;
+
+            rmtSnpReq.compareAndSet(this, null);
+
+            if (err != null || cancel) {
+                // Close non finished file storage.
+                for (Map.Entry<GroupPartitionId, FilePageStore> entry : stores.entrySet()) {
+                    FilePageStore store = entry.getValue();
+
+                    try {
+                        store.stop(true);
+                    }
+                    catch (StorageException e) {
+                        log.warning("Error stopping received file page store", e);
+                    }
+                }
+            }
+
+            U.delete(Paths.get(tmpWorkDir.getAbsolutePath(), snpName));
+
+            return super.onDone(res, err, cancel);
+        }
+
+        /** {@inheritDoc} */
+        @Override public boolean equals(Object o) {
+            if (this == o)
+                return true;
+
+            if (o == null || getClass() != o.getClass())
+                return false;
+
+            RemoteSnapshotFuture fut = (RemoteSnapshotFuture)o;
+
+            return rmtNodeId.equals(fut.rmtNodeId) &&
+                snpName.equals(fut.snpName);
+        }
+
+        /** {@inheritDoc} */
+        @Override public int hashCode() {
+            return Objects.hash(rmtNodeId, snpName);
+        }
+
+        /** {@inheritDoc} */
+        @Override public String toString() {
+            return S.toString(RemoteSnapshotFuture.class, this);
+        }
+    }
+
+    /**
+     * Such an executor can executes tasks not in a single thread, but executes them
+     * on different threads sequentially. It's important for some {@link SnapshotSender}'s
+     * to process sub-task sequentially due to all these sub-tasks may share a single socket
+     * channel to send data to.
+     */
+    private static class SequentialExecutorWrapper implements Executor {
+        /** Ignite logger. */
+        private final IgniteLogger log;
+
+        /** Queue of task to execute. */
+        private final Queue<Runnable> tasks = new ArrayDeque<>();
+
+        /** Delegate executor. */
+        private final Executor executor;
+
+        /** Currently running task. */
+        private volatile Runnable active;
+
+        /** If wrapped executor is shutting down. */
+        private volatile boolean stopping;
+
+        /**
+         * @param executor Executor to run tasks on.
+         */
+        public SequentialExecutorWrapper(IgniteLogger log, Executor executor) {
+            this.log = log.getLogger(SequentialExecutorWrapper.class);
+            this.executor = executor;
+        }
+
+        /** {@inheritDoc} */
+        @Override public synchronized void execute(final Runnable r) {
+            assert !stopping : "Task must be cancelled prior to the wrapped executor is shutting down.";
+
+            tasks.offer(() -> {
+                try {
+                    r.run();
+                }
+                finally {
+                    scheduleNext();
+                }
+            });
+
+            if (active == null)
+                scheduleNext();
+        }
+
+        /** */
+        protected synchronized void scheduleNext() {
+            if ((active = tasks.poll()) != null) {
+                try {
+                    executor.execute(active);
+                }
+                catch (RejectedExecutionException e) {
+                    tasks.clear();
+
+                    stopping = true;
+
+                    log.warning("Task is outdated. Wrapped executor is shutting down.", e);
+                }
+            }
+        }
+    }
+
+    /**
+     *
+     */
+    private static class RemoteSnapshotSender extends SnapshotSender {
+        /** The sender which sends files to remote node. */
+        private final GridIoManager.TransmissionSender sndr;
+
+        /** Relative node path initializer. */
+        private final Supplier<String> initPath;
+
+        /** Snapshot name. */
+        private final String snpName;
+
+        /** Local node persistent directory with consistent id. */
+        private String relativeNodePath;
+
+        /** The number of cache partition files expected to be processed. */
+        private int partsCnt;
+
+        /**
+         * @param log Ignite logger.
+         * @param sndr File sender instance.
+         * @param snpName Snapshot name.
+         */
+        public RemoteSnapshotSender(
+            IgniteLogger log,
+            Executor exec,
+            Supplier<String> initPath,
+            GridIoManager.TransmissionSender sndr,
+            String snpName
+        ) {
+            super(log, exec);
+
+            this.sndr = sndr;
+            this.snpName = snpName;
+            this.initPath = initPath;
+        }
+
+        /** {@inheritDoc} */
+        @Override protected void init(int partsCnt) {
+            this.partsCnt = partsCnt;
+
+            relativeNodePath = initPath.get();
+
+            if (relativeNodePath == null)
+                throw new IgniteException("Relative node path cannot be empty.");
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendPart0(File part, String cacheDirName, GroupPartitionId pair, Long len) {
+            try {
+                assert part.exists();
+                assert len > 0 : "Requested partitions has incorrect file length " +
+                    "[pair=" + pair + ", cacheDirName=" + cacheDirName + ']';
+
+                sndr.send(part, 0, len, transmissionParams(snpName, cacheDirName, pair), TransmissionPolicy.FILE);
+
+                if (log.isInfoEnabled()) {
+                    log.info("Partition file has been send [part=" + part.getName() + ", pair=" + pair +
+                        ", length=" + len + ']');
+                }
+            }
+            catch (TransmissionCancelledException e) {
+                if (log.isInfoEnabled()) {
+                    log.info("Transmission partition file has been interrupted [part=" + part.getName() +
+                        ", pair=" + pair + ']');
+                }
+            }
+            catch (IgniteCheckedException | InterruptedException | IOException e) {
+                U.error(log, "Error sending partition file [part=" + part.getName() + ", pair=" + pair +
+                    ", length=" + len + ']', e);
+
+                throw new IgniteException(e);
+            }
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendDelta0(File delta, String cacheDirName, GroupPartitionId pair) {
+            try {
+                sndr.send(delta, transmissionParams(snpName, cacheDirName, pair), TransmissionPolicy.CHUNK);
+
+                if (log.isInfoEnabled())
+                    log.info("Delta pages storage has been send [part=" + delta.getName() + ", pair=" + pair + ']');
+            }
+            catch (TransmissionCancelledException e) {
+                if (log.isInfoEnabled()) {
+                    log.info("Transmission delta pages has been interrupted [part=" + delta.getName() +
+                        ", pair=" + pair + ']');
+                }
+            }
+            catch (IgniteCheckedException | InterruptedException | IOException e) {
+                U.error(log, "Error sending delta file  [part=" + delta.getName() + ", pair=" + pair + ']', e);
+
+                throw new IgniteException(e);
+            }
+        }
+
+        /**
+         * @param cacheDirName Cache directory name.
+         * @param pair Cache group id with corresponding partition id.
+         * @return Map of params.
+         */
+        private Map<String, Serializable> transmissionParams(String snpName, String cacheDirName,
+            GroupPartitionId pair) {
+            Map<String, Serializable> params = new HashMap<>();
+
+            params.put(SNP_GRP_ID_PARAM, pair.getGroupId());
+            params.put(SNP_PART_ID_PARAM, pair.getPartitionId());
+            params.put(SNP_DB_NODE_PATH_PARAM, relativeNodePath);
+            params.put(SNP_CACHE_DIR_NAME_PARAM, cacheDirName);
+            params.put(SNP_NAME_PARAM, snpName);
+            params.put(SNP_PARTITIONS_CNT, partsCnt);
+
+            return params;
+        }
+
+        /** {@inheritDoc} */
+        @Override public void close0(@Nullable Throwable th) {
+            U.closeQuiet(sndr);
+
+            if (th == null) {
+                if (log.isInfoEnabled())
+                    log.info("The remote snapshot sender closed normally [snpName=" + snpName + ']');
+            }
+            else {
+                U.warn(log, "The remote snapshot sender closed due to an error occurred while processing " +
+                    "snapshot operation [snpName=" + snpName + ']', th);
+            }
+        }
+    }
+
+    /**
+     * Snapshot sender which writes all data to local directory.
+     */
+    private class LocalSnapshotSender extends SnapshotSender {
+        /** Snapshot name. */
+        private final String snpName;
+
+        /** Local snapshot directory. */
+        private final File snpLocDir;
+
+        /** Local node snapshot directory calculated on snapshot directory. */
+        private File dbDir;
+
+        /** Size of page. */
+        private final int pageSize;
+
+        /**
+         * @param snpName Snapshot name.
+         */
+        public LocalSnapshotSender(String snpName) {
+            super(IgniteSnapshotManager.this.log, snpRunner);
+
+            this.snpName = snpName;
+            snpLocDir = snapshotLocalDir(snpName);
+            pageSize = cctx.kernalContext().config().getDataStorageConfiguration().getPageSize();
+        }
+
+        /** {@inheritDoc} */
+        @Override protected void init(int partsCnt) {
+            dbDir = new File (snpLocDir, databaseRelativePath(pdsSettings.folderName()));
+
+            if (dbDir.exists()) {
+                throw new IgniteException("Snapshot with given name already exists " +
+                    "[snpName=" + snpName + ", absPath=" + dbDir.getAbsolutePath() + ']');
+            }
+
+            cctx.database().checkpointReadLock();
+
+            try {
+                assert metaStorage != null && metaStorage.read(SNP_RUNNING_KEY) == null :
+                    "The previous snapshot hasn't been completed correctly";
+
+                metaStorage.write(SNP_RUNNING_KEY, snpName);
+
+                U.ensureDirectory(dbDir, "snapshot work directory", log);
+            }
+            catch (IgniteCheckedException e) {
+                throw new IgniteException(e);
+            }
+            finally {
+                cctx.database().checkpointReadUnlock();
+            }
+
 
 Review comment:
   Redundant NL

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r409392157
 
 

 ##########
 File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotSelfTest.java
 ##########
 @@ -0,0 +1,734 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.file.OpenOption;
+import java.util.Collections;
+import java.util.List;
+import java.util.Random;
+import java.util.concurrent.Callable;
+import java.util.concurrent.CountDownLatch;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.function.Function;
+import java.util.function.Predicate;
+import org.apache.ignite.Ignite;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.Ignition;
+import org.apache.ignite.cache.CacheAtomicityMode;
+import org.apache.ignite.cache.affinity.rendezvous.RendezvousAffinityFunction;
+import org.apache.ignite.cache.query.ScanQuery;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.IgniteInterruptedCheckedException;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.TestRecordingCommunicationSpi;
+import org.apache.ignite.internal.events.DiscoveryCustomEvent;
+import org.apache.ignite.internal.managers.discovery.DiscoveryCustomMessage;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionDemandMessage;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionSupplyMessage;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.PartitionsExchangeAware;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.metric.MetricRegistry;
+import org.apache.ignite.internal.processors.metric.impl.ObjectGauge;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.distributed.FullMessage;
+import org.apache.ignite.internal.util.distributed.SingleNodeMessage;
+import org.apache.ignite.internal.util.typedef.G;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.spi.metric.LongMetric;
+import org.apache.ignite.testframework.GridTestUtils;
+import org.apache.ignite.transactions.Transaction;
+import org.junit.Before;
+import org.junit.Test;
+
+import static org.apache.ignite.cluster.ClusterState.ACTIVE;
+import static org.apache.ignite.internal.events.DiscoveryCustomEvent.EVT_DISCOVERY_CUSTOM_EVT;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.SNAPSHOT_METRICS;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.SNP_IN_PROGRESS_ERR_MSG;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.SNP_NODE_STOPPING_ERR_MSG;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.isSnapshotOperation;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.resolveSnapshotWorkDirectory;
+import static org.apache.ignite.testframework.GridTestUtils.assertThrowsAnyCause;
+import static org.apache.ignite.testframework.GridTestUtils.assertThrowsWithCause;
+
+/**
+ * Cluster-wide snapshot test.
+ */
+public class IgniteClusterSnapshotSelfTest extends AbstractSnapshotSelfTest {
+    /** Random instance. */
+    private static final Random R = new Random();
+
+    /** Time to wait while rebalance may happen. */
+    private static final long REBALANCE_AWAIT_TIME = GridTestUtils.SF.applyLB(10_000, 3_000);
+
+    /** Cache configuration for test. */
+    private static CacheConfiguration<Integer, Integer> txCcfg = new CacheConfiguration<Integer, Integer>("txCacheName")
+        .setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL)
+        .setBackups(2)
+        .setAffinity(new RendezvousAffinityFunction(false)
+            .setPartitions(CACHE_PARTS_COUNT));
+
+    /** {@code true} if node should be started in separate jvm. */
+    protected volatile boolean jvm;
+
+    /** @throws Exception If fails. */
+    @Before
+    @Override public void beforeTestSnapshot() throws Exception {
+        super.beforeTestSnapshot();
+
+        jvm = false;
+    }
+
+    /**
+     * Take snapshot from the whole cluster and check snapshot consistency.
+     * Note: Client nodes and server nodes not in baseline topology must not be affected.
+     *
+     * @throws Exception If fails.
+     */
+    @Test
+    public void testConsistentClusterSnapshotUnderLoad() throws Exception {
+        int grids = 3;
+        String snpName = "backup23012020";
+        AtomicInteger atKey = new AtomicInteger(CACHE_KEYS_RANGE);
+        AtomicInteger txKey = new AtomicInteger(CACHE_KEYS_RANGE);
+
+        IgniteEx ignite = startGrids(grids);
+        startClientGrid();
+
+        ignite.cluster().baselineAutoAdjustEnabled(false);
+        ignite.cluster().state(ACTIVE);
+
+        // Start node not in baseline.
+        IgniteEx notBltIgnite = startGrid(grids);
+        File locSnpDir = snp(notBltIgnite).snapshotLocalDir(SNAPSHOT_NAME);
+        String notBltDirName = folderName(notBltIgnite);
+
+        IgniteCache<Integer, Integer> cache = ignite.createCache(txCcfg);
+
+        for (int idx = 0; idx < CACHE_KEYS_RANGE; idx++) {
+            cache.put(txKey.incrementAndGet(), -1);
+            ignite.cache(DEFAULT_CACHE_NAME).put(atKey.incrementAndGet(), -1);
+        }
+
+        forceCheckpoint();
+
+        CountDownLatch loadLatch = new CountDownLatch(1);
+
+        ignite.context().cache().context().exchange().registerExchangeAwareComponent(new PartitionsExchangeAware() {
+            /** {@inheritDoc} */
+            @Override public void onInitBeforeTopologyLock(GridDhtPartitionsExchangeFuture fut) {
+                if (fut.firstEvent().type() != EVT_DISCOVERY_CUSTOM_EVT)
+                    return;
+
+                // First discovery custom event will be a snapshot operation.
+                assertTrue(isSnapshotOperation(fut.firstEvent()));
+                assertTrue("Snapshot must use pme-free exchange", fut.context().exchangeFreeSwitch());
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onInitAfterTopologyLock(GridDhtPartitionsExchangeFuture fut) {
+                if (fut.firstEvent().type() != EVT_DISCOVERY_CUSTOM_EVT)
+                    return;
+
+                DiscoveryCustomMessage msg = ((DiscoveryCustomEvent)fut.firstEvent()).customMessage();
+
+                assertNotNull(msg);
+
+                if (msg instanceof SnapshotDiscoveryMessage)
+                    loadLatch.countDown();
+            }
+        });
+
+        // Start cache load
+        IgniteInternalFuture<Long> loadFut = GridTestUtils.runMultiThreadedAsync(() -> {
+            try {
+                U.await(loadLatch);
+
+                while (!Thread.currentThread().isInterrupted()) {
+                    int txIdx = R.nextInt(grids);
+
+                    // zero out the sign bit
+                    grid(txIdx).cache(txCcfg.getName()).put(txKey.incrementAndGet(), R.nextInt() & Integer.MAX_VALUE);
+
+                    int atomicIdx = R.nextInt(grids);
+
+                    grid(atomicIdx).cache(DEFAULT_CACHE_NAME).put(atKey.incrementAndGet(), R.nextInt() & Integer.MAX_VALUE);
+                }
+            }
+            catch (IgniteInterruptedCheckedException e) {
+                throw new RuntimeException(e);
+            }
+        }, 3, "cache-put-");
+
+        try {
+            IgniteFuture<Void> fut = ignite.snapshot().createSnapshot(snpName);
+
+            U.await(loadLatch, 10, TimeUnit.SECONDS);
+
+            fut.get();
+        }
+        finally {
+            loadFut.cancel();
+        }
+
+        // cluster can be deactivated but we must test snapshot restore when binary recovery also occurred
 
 Review comment:
   Upcase, point

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r409410157
 
 

 ##########
 File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotSelfTest.java
 ##########
 @@ -0,0 +1,734 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.file.OpenOption;
+import java.util.Collections;
+import java.util.List;
+import java.util.Random;
+import java.util.concurrent.Callable;
+import java.util.concurrent.CountDownLatch;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.function.Function;
+import java.util.function.Predicate;
+import org.apache.ignite.Ignite;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.Ignition;
+import org.apache.ignite.cache.CacheAtomicityMode;
+import org.apache.ignite.cache.affinity.rendezvous.RendezvousAffinityFunction;
+import org.apache.ignite.cache.query.ScanQuery;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.IgniteInterruptedCheckedException;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.TestRecordingCommunicationSpi;
+import org.apache.ignite.internal.events.DiscoveryCustomEvent;
+import org.apache.ignite.internal.managers.discovery.DiscoveryCustomMessage;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionDemandMessage;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionSupplyMessage;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.PartitionsExchangeAware;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.metric.MetricRegistry;
+import org.apache.ignite.internal.processors.metric.impl.ObjectGauge;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.distributed.FullMessage;
+import org.apache.ignite.internal.util.distributed.SingleNodeMessage;
+import org.apache.ignite.internal.util.typedef.G;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.spi.metric.LongMetric;
+import org.apache.ignite.testframework.GridTestUtils;
+import org.apache.ignite.transactions.Transaction;
+import org.junit.Before;
+import org.junit.Test;
+
+import static org.apache.ignite.cluster.ClusterState.ACTIVE;
+import static org.apache.ignite.internal.events.DiscoveryCustomEvent.EVT_DISCOVERY_CUSTOM_EVT;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.SNAPSHOT_METRICS;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.SNP_IN_PROGRESS_ERR_MSG;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.SNP_NODE_STOPPING_ERR_MSG;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.isSnapshotOperation;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.resolveSnapshotWorkDirectory;
+import static org.apache.ignite.testframework.GridTestUtils.assertThrowsAnyCause;
+import static org.apache.ignite.testframework.GridTestUtils.assertThrowsWithCause;
+
+/**
+ * Cluster-wide snapshot test.
+ */
+public class IgniteClusterSnapshotSelfTest extends AbstractSnapshotSelfTest {
+    /** Random instance. */
+    private static final Random R = new Random();
+
+    /** Time to wait while rebalance may happen. */
+    private static final long REBALANCE_AWAIT_TIME = GridTestUtils.SF.applyLB(10_000, 3_000);
+
+    /** Cache configuration for test. */
+    private static CacheConfiguration<Integer, Integer> txCcfg = new CacheConfiguration<Integer, Integer>("txCacheName")
+        .setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL)
+        .setBackups(2)
+        .setAffinity(new RendezvousAffinityFunction(false)
+            .setPartitions(CACHE_PARTS_COUNT));
+
+    /** {@code true} if node should be started in separate jvm. */
+    protected volatile boolean jvm;
+
+    /** @throws Exception If fails. */
+    @Before
+    @Override public void beforeTestSnapshot() throws Exception {
+        super.beforeTestSnapshot();
+
+        jvm = false;
+    }
+
+    /**
+     * Take snapshot from the whole cluster and check snapshot consistency.
+     * Note: Client nodes and server nodes not in baseline topology must not be affected.
+     *
+     * @throws Exception If fails.
+     */
+    @Test
+    public void testConsistentClusterSnapshotUnderLoad() throws Exception {
+        int grids = 3;
+        String snpName = "backup23012020";
+        AtomicInteger atKey = new AtomicInteger(CACHE_KEYS_RANGE);
+        AtomicInteger txKey = new AtomicInteger(CACHE_KEYS_RANGE);
+
+        IgniteEx ignite = startGrids(grids);
+        startClientGrid();
+
+        ignite.cluster().baselineAutoAdjustEnabled(false);
+        ignite.cluster().state(ACTIVE);
+
+        // Start node not in baseline.
+        IgniteEx notBltIgnite = startGrid(grids);
+        File locSnpDir = snp(notBltIgnite).snapshotLocalDir(SNAPSHOT_NAME);
+        String notBltDirName = folderName(notBltIgnite);
+
+        IgniteCache<Integer, Integer> cache = ignite.createCache(txCcfg);
+
+        for (int idx = 0; idx < CACHE_KEYS_RANGE; idx++) {
+            cache.put(txKey.incrementAndGet(), -1);
+            ignite.cache(DEFAULT_CACHE_NAME).put(atKey.incrementAndGet(), -1);
+        }
+
+        forceCheckpoint();
+
+        CountDownLatch loadLatch = new CountDownLatch(1);
+
+        ignite.context().cache().context().exchange().registerExchangeAwareComponent(new PartitionsExchangeAware() {
+            /** {@inheritDoc} */
+            @Override public void onInitBeforeTopologyLock(GridDhtPartitionsExchangeFuture fut) {
+                if (fut.firstEvent().type() != EVT_DISCOVERY_CUSTOM_EVT)
+                    return;
+
+                // First discovery custom event will be a snapshot operation.
+                assertTrue(isSnapshotOperation(fut.firstEvent()));
+                assertTrue("Snapshot must use pme-free exchange", fut.context().exchangeFreeSwitch());
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onInitAfterTopologyLock(GridDhtPartitionsExchangeFuture fut) {
+                if (fut.firstEvent().type() != EVT_DISCOVERY_CUSTOM_EVT)
+                    return;
+
+                DiscoveryCustomMessage msg = ((DiscoveryCustomEvent)fut.firstEvent()).customMessage();
+
+                assertNotNull(msg);
+
+                if (msg instanceof SnapshotDiscoveryMessage)
+                    loadLatch.countDown();
+            }
+        });
+
+        // Start cache load
+        IgniteInternalFuture<Long> loadFut = GridTestUtils.runMultiThreadedAsync(() -> {
+            try {
+                U.await(loadLatch);
+
+                while (!Thread.currentThread().isInterrupted()) {
+                    int txIdx = R.nextInt(grids);
+
+                    // zero out the sign bit
+                    grid(txIdx).cache(txCcfg.getName()).put(txKey.incrementAndGet(), R.nextInt() & Integer.MAX_VALUE);
+
+                    int atomicIdx = R.nextInt(grids);
+
+                    grid(atomicIdx).cache(DEFAULT_CACHE_NAME).put(atKey.incrementAndGet(), R.nextInt() & Integer.MAX_VALUE);
+                }
+            }
+            catch (IgniteInterruptedCheckedException e) {
+                throw new RuntimeException(e);
+            }
+        }, 3, "cache-put-");
+
+        try {
+            IgniteFuture<Void> fut = ignite.snapshot().createSnapshot(snpName);
+
+            U.await(loadLatch, 10, TimeUnit.SECONDS);
+
+            fut.get();
+        }
+        finally {
+            loadFut.cancel();
+        }
+
+        // cluster can be deactivated but we must test snapshot restore when binary recovery also occurred
+        stopAllGrids();
+
+        assertTrue("Snapshot directory must be empty for node not in baseline topology: " + notBltDirName,
+            !searchDirectoryRecursively(locSnpDir.toPath(), notBltDirName).isPresent());
+
+        IgniteEx snpIg0 = startGridsFromSnapshot(grids, snpName);
+
+        assertEquals("The number of all (primary + backup) cache keys mismatch for cache: " + DEFAULT_CACHE_NAME,
+            CACHE_KEYS_RANGE, snpIg0.cache(DEFAULT_CACHE_NAME).size());
+
+        assertEquals("The number of all (primary + backup) cache keys mismatch for cache: " + txCcfg.getName(),
+            CACHE_KEYS_RANGE, snpIg0.cache(txCcfg.getName()).size());
+
+        snpIg0.cache(DEFAULT_CACHE_NAME).query(new ScanQuery<>(null))
+            .forEach(e -> assertTrue("Snapshot must contains only negative values " +
+                "[cache=" + DEFAULT_CACHE_NAME + ", entry=" + e +']', (Integer)e.getValue() < 0));
+
+        snpIg0.cache(txCcfg.getName()).query(new ScanQuery<>(null))
+            .forEach(e -> assertTrue("Snapshot must contains only negative values " +
+                "[cache=" + txCcfg.getName() + ", entry=" + e + ']', (Integer)e.getValue() < 0));
+    }
+
+    /** @throws Exception If fails. */
+    @Test
+    public void testSnapshotPrimaryBackupsTheSame() throws Exception {
+        int grids = 3;
+        AtomicInteger cacheKey = new AtomicInteger();
+
+        IgniteEx ignite = startGridsWithCache(grids, dfltCacheCfg, CACHE_KEYS_RANGE);
+
+        IgniteInternalFuture<Long> atLoadFut = GridTestUtils.runMultiThreadedAsync(() -> {
+            while (!Thread.currentThread().isInterrupted()) {
+                int gId = R.nextInt(grids);
+
+                grid(gId).cache(DEFAULT_CACHE_NAME)
+                    .put(cacheKey.incrementAndGet(), 0);
+            }
+        }, 5, "atomic-cache-put-");
+
+        IgniteInternalFuture<Long> txLoadFut = GridTestUtils.runMultiThreadedAsync(() -> {
+            while (!Thread.currentThread().isInterrupted()) {
+                int gId = R.nextInt(grids);
+
+                IgniteCache<Integer, Integer> txCache = grid(gId).getOrCreateCache(txCcfg);
+
+                try (Transaction tx = grid(gId).transactions().txStart()) {
+                    txCache.put(cacheKey.incrementAndGet(), 0);
+
+                    tx.commit();
+                }
+            }
+        }, 5, "tx-cache-put-");
+
+        try {
+            IgniteFuture<Void> fut = ignite.snapshot().createSnapshot(SNAPSHOT_NAME);
+
+            fut.get();
+        }
+        finally {
+            txLoadFut.cancel();
+            atLoadFut.cancel();
+        }
+
+        stopAllGrids();
+
+        IgniteEx snpIg0 = startGridsFromSnapshot(grids, cfg -> resolveSnapshotWorkDirectory(cfg).getAbsolutePath(), SNAPSHOT_NAME, false);
+
+        // Block whole rebalancing.
+        for (Ignite g : G.allGrids())
+            TestRecordingCommunicationSpi.spi(g).blockMessages((node, msg) -> msg instanceof GridDhtPartitionDemandMessage);
+
+        snpIg0.cluster().state(ACTIVE);
+
+        assertFalse("Primary and backup in snapshot must have the same counters. Rebalance must not happen.",
+            GridTestUtils.waitForCondition(() -> {
+                boolean hasMsgs = false;
+
+                for (Ignite g : G.allGrids())
+                    hasMsgs |= TestRecordingCommunicationSpi.spi(g).hasBlockedMessages();
+
+                return hasMsgs;
+            }, REBALANCE_AWAIT_TIME));
+
+        TestRecordingCommunicationSpi.stopBlockAll();
+    }
+
+    /** @throws Exception If fails. */
+    @Test
+    public void testRejectCacheStopDuringClusterSnapshot() throws Exception {
+        // Block the full message, so cluster-wide snapshot operation would not be fully completed.
+        IgniteEx ignite = startGridsWithCache(3, dfltCacheCfg, CACHE_KEYS_RANGE);
+
+        BlockingCustomMessageDiscoverySpi spi = discoSpi(ignite);
+        spi.block((msg) -> {
+            if (msg instanceof FullMessage) {
+                FullMessage<?> msg0 = (FullMessage<?>)msg;
+
+                assertEquals("Snapshot distributed process must be used",
+                    DistributedProcess.DistributedProcessType.START_SNAPSHOT.ordinal(), msg0.type());
+
+                assertTrue("Snapshot has to be finished successfully on all nodes", msg0.error().isEmpty());
+
+                return true;
+            }
+
+            return false;
+        });
+
+        IgniteFuture<Void> fut = ignite.snapshot().createSnapshot(SNAPSHOT_NAME);
+
+        spi.waitBlocked(10_000L);
+
+        // Creating of new caches should not be blocked.
+        ignite.getOrCreateCache(dfltCacheCfg.setName("default2"))
+            .put(1, 1);
+
+        forceCheckpoint();
+
+        assertThrowsAnyCause(log,
+            () -> {
+                ignite.destroyCache(DEFAULT_CACHE_NAME);
+
+                return 0;
+            },
+            IgniteCheckedException.class,
+            SNP_IN_PROGRESS_ERR_MSG);
+
+        spi.unblock();
+
+        fut.get();
+    }
+
+    /** @throws Exception If fails. */
+    @Test
+    public void testBltChangeDuringClusterSnapshot() throws Exception {
+        IgniteEx ignite = startGridsWithCache(3, dfltCacheCfg, CACHE_KEYS_RANGE);
+
+        startGrid(3);
+
+        long topVer = ignite.cluster().topologyVersion();
+
+        BlockingCustomMessageDiscoverySpi spi = discoSpi(ignite);
+        spi.block((msg) -> msg instanceof FullMessage);
+
+        IgniteFuture<Void> fut = ignite.snapshot().createSnapshot(SNAPSHOT_NAME);
+
+        spi.waitBlocked(10_000L);
+
+        // Not baseline node joins successfully.
+        String grid4Dir = folderName(startGrid(4));
+
+        // Not blt node left the cluster and snapshot not affected.
+        stopGrid(4);
+
+        // Client node must connect successfully.
+        startClientGrid(4);
+
+        // Changing baseline complete successfully.
+        ignite.cluster().setBaselineTopology(topVer);
+
+        spi.unblock();
+
+        fut.get();
+
+        assertTrue("Snapshot directory must be empty for node 0 due to snapshot future fail: " + grid4Dir,
+            !searchDirectoryRecursively(snp(ignite).snapshotLocalDir(SNAPSHOT_NAME).toPath(), grid4Dir).isPresent());
+    }
+
+    /** @throws Exception If fails. */
+    @Test
+    public void testClusterSnapshotExOnInitiatorLeft() throws Exception {
+        IgniteEx ignite = startGridsWithCache(2, dfltCacheCfg, CACHE_KEYS_RANGE);
+
+        BlockingCustomMessageDiscoverySpi spi = discoSpi(ignite);
+        spi.block((msg) -> msg instanceof FullMessage);
+
+        IgniteFuture<Void> fut = ignite.snapshot().createSnapshot(SNAPSHOT_NAME);
+
+        spi.waitBlocked(10_000L);
+
+        ignite.close();
+
+        assertThrowsAnyCause(log,
+            fut::get,
+            NodeStoppingException.class,
+            SNP_NODE_STOPPING_ERR_MSG);
+    }
+
+    /** @throws Exception If fails. */
+    @Test
+    public void testSnapshotExistsException() throws Exception {
+        IgniteEx ignite = startGridsWithCache(2, dfltCacheCfg, CACHE_KEYS_RANGE);
+
+        ignite.snapshot().createSnapshot(SNAPSHOT_NAME).get();
+
+        assertThrowsAnyCause(log,
+            () -> ignite.snapshot().createSnapshot(SNAPSHOT_NAME).get(),
+            IgniteException.class,
+            "Snapshot with given name already exists.");
+
+        stopAllGrids();
+
+        // Check that snapshot has not been accidentally deleted.
+        IgniteEx snp = startGridsFromSnapshot(2, SNAPSHOT_NAME);
+
+        assertSnapshotCacheKeys(snp.cache(dfltCacheCfg.getName()));
+    }
+
+    /** @throws Exception If fails. */
+    @Test
+    public void testClusterSnapshotCleanedOnLeft() throws Exception {
+        CountDownLatch block = new CountDownLatch(1);
+        CountDownLatch partProcessed = new CountDownLatch(1);
+
+        IgniteEx ignite = startGridsWithCache(2, dfltCacheCfg, CACHE_KEYS_RANGE);
+
+        File locSnpDir = snp(ignite).snapshotLocalDir(SNAPSHOT_NAME);
+        String dirNameIgnite0 = folderName(ignite);
+
+        String dirNameIgnite1 = folderName(grid(1));
+
+        snp(grid(1)).localSnapshotSenderFactory(
+            blockingLocalSnapshotSender(grid(1), partProcessed, block));
+
+        TestRecordingCommunicationSpi commSpi1 = TestRecordingCommunicationSpi.spi(grid(1));
+        commSpi1.blockMessages((node, msg) -> msg instanceof SingleNodeMessage);
+
+        IgniteFuture<?> fut = ignite.snapshot().createSnapshot(SNAPSHOT_NAME);
+
+        U.await(partProcessed);
+
+        stopGrid(1);
+
+        block.countDown();
+
+        assertThrowsAnyCause(log,
+            fut::get,
+            IgniteCheckedException.class,
+            "Snapshot creation has been finished with an error");
+
+        assertTrue("Snapshot directory must be empty for node 0 due to snapshot future fail: " + dirNameIgnite0,
+            !searchDirectoryRecursively(locSnpDir.toPath(), dirNameIgnite0).isPresent());
+
+        startGrid(1);
+
+        awaitPartitionMapExchange();
+
+        // Snapshot directory must be cleaned.
+        assertTrue("Snapshot directory must be empty for node 1 due to snapshot future fail: " + dirNameIgnite1,
+            !searchDirectoryRecursively(locSnpDir.toPath(), dirNameIgnite1).isPresent());
+
+        List<String> allSnapshots = snp(ignite).getSnapshots();
+
+        assertTrue("Snapshot directory must be empty due to snapshot fail: " + allSnapshots,
+            allSnapshots.isEmpty());
+    }
+
+    /** @throws Exception If fails. */
+    @Test
+    public void testRecoveryClusterSnapshotJvmHalted() throws Exception {
+        IgniteEx ignite = startGridsWithCache(2, dfltCacheCfg, CACHE_KEYS_RANGE);
+
+        String grid0Dir = folderName(ignite);
+        String grid1Dir = folderName(grid(1));
+        File locSnpDir = snp(ignite).snapshotLocalDir(SNAPSHOT_NAME);
+
+        jvm = true;
+
+        IgniteConfiguration cfg2 = optimize(getConfiguration(getTestIgniteInstanceName(2)));
+
+        cfg2.getDataStorageConfiguration()
+            .setFileIOFactory(new HaltJvmFileIOFactory(new RandomAccessFileIOFactory(),
+                (Predicate<File> & Serializable) file -> {
+                    // Trying to create FileIO over partition file.
+                    return file.getAbsolutePath().contains(SNAPSHOT_NAME);
+                }));
+
+        startGrid(cfg2);
+
+        String grid2Dir = U.maskForFileName(cfg2.getConsistentId().toString());
+
+        jvm = false;
+
+        ignite.cluster().setBaselineTopology(ignite.cluster().topologyVersion());
+
+        awaitPartitionMapExchange();
+
+        assertThrowsAnyCause(log,
+            () -> ignite.snapshot().createSnapshot(SNAPSHOT_NAME).get(),
+            IgniteCheckedException.class,
+            "Snapshot creation has been finished with an error");
+
+        assertTrue("Snapshot directory must be empty: " + grid0Dir,
+            !searchDirectoryRecursively(locSnpDir.toPath(), grid0Dir).isPresent());
+
+        assertTrue("Snapshot directory must be empty: " + grid1Dir,
+            !searchDirectoryRecursively(locSnpDir.toPath(), grid1Dir).isPresent());
+
+        assertTrue("Snapshot directory must exist due to grid2 has been halted and cleanup not fully performed: " + grid2Dir,
+            searchDirectoryRecursively(locSnpDir.toPath(), grid2Dir).isPresent());
+
+        IgniteEx grid2 = startGrid(2);
+
+        assertTrue("Snapshot directory must be empty after recovery: " + grid2Dir,
+            !searchDirectoryRecursively(locSnpDir.toPath(), grid2Dir).isPresent());
+
+        awaitPartitionMapExchange();
+
+        assertTrue("Snapshot directory must be empty", grid2.snapshot().getSnapshots().isEmpty());
+
+        ignite.snapshot().createSnapshot(SNAPSHOT_NAME)
+            .get();
+
+        stopAllGrids();
+
+        IgniteEx snp = startGridsFromSnapshot(2, SNAPSHOT_NAME);
+
+        assertSnapshotCacheKeys(snp.cache(dfltCacheCfg.getName()));
+    }
+
+    /** @throws Exception If fails. */
+    @Test
+    public void testClusterSnapshotWithRebalancing() throws Exception {
+        IgniteEx ignite = startGridsWithCache(2, dfltCacheCfg, CACHE_KEYS_RANGE);
+
+        TestRecordingCommunicationSpi commSpi = TestRecordingCommunicationSpi.spi(ignite);
+        commSpi.blockMessages((node, msg) -> msg instanceof GridDhtPartitionSupplyMessage);
+
+        startGrid(2);
+
+        ignite.cluster().setBaselineTopology(ignite.cluster().topologyVersion());
+
+        commSpi.waitForBlocked();
+
+        IgniteFuture<Void> fut = ignite.snapshot().createSnapshot(SNAPSHOT_NAME);
+
+        commSpi.stopBlock(true);
+
+        fut.get();
+
+        stopAllGrids();
+
+        IgniteEx snp = startGridsFromSnapshot(3, SNAPSHOT_NAME);
+
+        awaitPartitionMapExchange();
+
+        assertSnapshotCacheKeys(snp.cache(dfltCacheCfg.getName()));
+    }
+
+    /** @throws Exception If fails. */
+    @Test
+    public void testClusterSnapshotWithExplicitPath() throws Exception {
+        File exSnpDir = U.resolveWorkDirectory(U.defaultWorkDirectory(), "ex_snapshots", true);
+
+        try {
+            IgniteEx ignite = null;
+
+            for (int i = 0; i < 2; i++) {
+                IgniteConfiguration cfg = optimize(getConfiguration(getTestIgniteInstanceName(i)));
+
+                cfg.setSnapshotPath(exSnpDir.getAbsolutePath());
+
+                ignite = startGrid(cfg);
+            }
+
+            ignite.cluster().baselineAutoAdjustEnabled(false);
+            ignite.cluster().state(ACTIVE);
+
+            for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+                ignite.cache(DEFAULT_CACHE_NAME).put(i, i);
+
+            forceCheckpoint();
 
 Review comment:
   Why do we need explicit checkpoint here?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r409512635
 
 

 ##########
 File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManagerSelfTest.java
 ##########
 @@ -0,0 +1,770 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.nio.file.OpenOption;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.CountDownLatch;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.function.BiConsumer;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.cache.affinity.rendezvous.RendezvousAffinityFunction;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.IgniteInterruptedCheckedException;
+import org.apache.ignite.internal.TestRecordingCommunicationSpi;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionDemandMessage;
+import org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionState;
+import org.apache.ignite.internal.processors.cache.persistence.CheckpointState;
+import org.apache.ignite.internal.processors.cache.persistence.DbCheckpointListener;
+import org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIODecorator;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.filename.PdsFolderSettings;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.util.lang.GridAbsPredicate;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.testframework.GridTestUtils;
+import org.junit.Test;
+
+import static org.apache.ignite.internal.MarshallerContextImpl.mappingFileStoreWorkDir;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.resolveBinaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.cacheDirName;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.CP_SNAPSHOT_REASON;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.testframework.GridTestUtils.assertThrowsAnyCause;
+
+/**
+ * Default snapshot manager test.
+ */
+public class IgniteSnapshotManagerSelfTest extends AbstractSnapshotSelfTest {
+    /** @throws Exception If fails. */
+    @Test
+    public void testSnapshotLocalPartitions() throws Exception {
+        // Start grid node with data before each test.
+        IgniteEx ig = startGridWithCache(dfltCacheCfg, 2048);
+
+        // The following data will be included into checkpoint.
+        for (int i = 2048; i < 4096; i++)
+            ig.cache(DEFAULT_CACHE_NAME).put(i, new TestOrderItem(i, i));
+
+        for (int i = 4096; i < 8192; i++) {
+            ig.cache(DEFAULT_CACHE_NAME).put(i, new TestOrderItem(i, i) {
+                @Override public String toString() {
+                    return "_" + super.toString();
+                }
+            });
+        }
+
+        GridCacheSharedContext<?, ?> cctx = ig.context().cache().context();
+        IgniteSnapshotManager mgr = snp(ig);
+
+        // Collection of pairs group and appropriate cache partition to be snapshot.
+        IgniteInternalFuture<?> snpFut = startLocalSnapshotTask(cctx,
+            SNAPSHOT_NAME,
+            F.asMap(CU.cacheId(DEFAULT_CACHE_NAME), null),
+            mgr.localSnapshotSenderFactory().apply(SNAPSHOT_NAME));
+
+        snpFut.get();
+
+        File cacheWorkDir = ((FilePageStoreManager)ig.context()
+            .cache()
+            .context()
+            .pageStore())
+            .cacheWorkDir(dfltCacheCfg);
+
+        // Checkpoint forces on cluster deactivation (currently only single node in cluster),
+        // so we must have the same data in snapshot partitions and those which left
+        // after node stop.
+        stopGrid(ig.name());
+
+        // Calculate CRCs.
+        IgniteConfiguration cfg = ig.context().config();
+        PdsFolderSettings settings = ig.context().pdsFolderResolver().resolveFolders();
+        String nodePath = databaseRelativePath(settings.folderName());
+        File binWorkDir = resolveBinaryWorkDir(cfg.getWorkDirectory(), settings.folderName());
+        File marshWorkDir = mappingFileStoreWorkDir(U.workDirectory(cfg.getWorkDirectory(), cfg.getIgniteHome()));
+        File snpBinWorkDir = resolveBinaryWorkDir(mgr.snapshotLocalDir(SNAPSHOT_NAME).getAbsolutePath(), settings.folderName());
+        File snpMarshWorkDir = mappingFileStoreWorkDir(mgr.snapshotLocalDir(SNAPSHOT_NAME).getAbsolutePath());
+
+        final Map<String, Integer> origPartCRCs = calculateCRC32Partitions(cacheWorkDir);
+        final Map<String, Integer> snpPartCRCs = calculateCRC32Partitions(
+            FilePageStoreManager.cacheWorkDir(U.resolveWorkDirectory(mgr.snapshotLocalDir(SNAPSHOT_NAME)
+                    .getAbsolutePath(),
+                nodePath,
+                false),
+                cacheDirName(dfltCacheCfg)));
+
+        assertEquals("Partitions must have the same CRC after file copying and merging partition delta files",
+            origPartCRCs, snpPartCRCs);
+        assertEquals("Binary object mappings must be the same for local node and created snapshot",
+            calculateCRC32Partitions(binWorkDir), calculateCRC32Partitions(snpBinWorkDir));
+        assertEquals("Marshaller meta mast be the same for local node and created snapshot",
+            calculateCRC32Partitions(marshWorkDir), calculateCRC32Partitions(snpMarshWorkDir));
+
+        File snpWorkDir = mgr.snapshotTmpDir();
+
+        assertEquals("Snapshot working directory must be cleaned after usage", 0, snpWorkDir.listFiles().length);
+    }
+
+    /**
+     * Test that all partitions are copied successfully even after multiple checkpoints occur during
+     * the long copy of cache partition files.
+     *
+     * Data consistency checked through a test node started right from snapshot directory and all values
+     * read successes.
+     *
+     * @throws Exception If fails.
+     */
+    @Test
+    public void testSnapshotLocalPartitionMultiCpWithLoad() throws Exception {
+        int valMultiplier = 2;
+        CountDownLatch slowCopy = new CountDownLatch(1);
+
+        // Start grid node with data before each test.
+        IgniteEx ig = startGrid(0);
+
+        ig.cluster().baselineAutoAdjustEnabled(false);
+        ig.cluster().state(ClusterState.ACTIVE);
+        GridCacheSharedContext<?, ?> cctx = ig.context().cache().context();
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            ig.cache(DEFAULT_CACHE_NAME).put(i, new TestOrderItem(i, i));
+
+        forceCheckpoint(ig);
+
+        AtomicInteger cntr = new AtomicInteger();
+        CountDownLatch ldrLatch = new CountDownLatch(1);
+        IgniteSnapshotManager mgr = snp(ig);
+        GridCacheDatabaseSharedManager db = (GridCacheDatabaseSharedManager)cctx.database();
+
+        IgniteInternalFuture<?> loadFut = GridTestUtils.runMultiThreadedAsync(() -> {
+            try {
+                U.await(ldrLatch);
+
+                while (!Thread.currentThread().isInterrupted())
+                    ig.cache(DEFAULT_CACHE_NAME).put(cntr.incrementAndGet(),
+                        new TestOrderItem(cntr.incrementAndGet(), cntr.incrementAndGet()));
+            }
+            catch (IgniteInterruptedCheckedException e) {
+                log.warning("Loader has been interrupted", e);
+            }
+        }, 5, "cache-loader-");
+
+        // Register task but not schedule it on the checkpoint.
+        SnapshotFutureTask snpFutTask = mgr.registerSnapshotTask(SNAPSHOT_NAME,
+            cctx.localNodeId(),
+            F.asMap(CU.cacheId(DEFAULT_CACHE_NAME), null),
+            new DelegateSnapshotSender(log, mgr.snapshotExecutorService(), mgr.localSnapshotSenderFactory().apply(SNAPSHOT_NAME)) {
+                @Override public void sendPart0(File part, String cacheDirName, GroupPartitionId pair, Long length) {
+                    try {
+                        U.await(slowCopy);
+
+                        delegate.sendPart0(part, cacheDirName, pair, length);
+                    }
+                    catch (IgniteInterruptedCheckedException e) {
+                        throw new IgniteException(e);
+                    }
+                }
+            });
+
+        db.addCheckpointListener(new DbCheckpointListener() {
+            /** {@inheritDoc} */
+            @Override public void beforeCheckpointBegin(Context ctx) {
+                // No-op.
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onMarkCheckpointBegin(Context ctx) {
+                // No-op.
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onCheckpointBegin(Context ctx) {
+                Map<Integer, Set<Integer>> processed = GridTestUtils.getFieldValue(snpFutTask,
+                    SnapshotFutureTask.class,
+                    "processed");
+
+                if (!processed.isEmpty())
+                    ldrLatch.countDown();
+            }
+        });
+
+        try {
+            snpFutTask.start();
+
+            // Change data before snapshot creation which must be included into it witch correct value multiplier.
+            for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+                ig.cache(DEFAULT_CACHE_NAME).put(i, new TestOrderItem(i, valMultiplier * i));
+
+            // Snapshot is still in the INIT state. beforeCheckpoint has been skipped
+            // due to checkpoint already running and we need to schedule the next one
+            // right after current will be completed.
+            cctx.database().forceCheckpoint(String.format(CP_SNAPSHOT_REASON, SNAPSHOT_NAME));
+
+            snpFutTask.awaitStarted();
+
+            db.forceCheckpoint("snapshot is ready to be created")
+                .futureFor(CheckpointState.MARKER_STORED_TO_DISK)
+                .get();
+
+            // Change data after snapshot.
+            for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+                ig.cache(DEFAULT_CACHE_NAME).put(i, new TestOrderItem(i, 3 * i));
+
+            // Snapshot on the next checkpoint must copy page to delta file before write it to a partition.
+            forceCheckpoint(ig);
+
+            slowCopy.countDown();
+
+            snpFutTask.get();
+        }
+        finally {
+            loadFut.cancel();
+        }
+
+        // Now can stop the node and check created snapshots.
+        stopGrid(0);
+
+        cleanPersistenceDir(ig.name());
+
+        // Start Ignite instance from snapshot directory.
+        IgniteEx ig2 = startGridsFromSnapshot(1, SNAPSHOT_NAME);
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++) {
+            assertEquals("snapshot data consistency violation [key=" + i + ']',
+                i * valMultiplier, ((TestOrderItem)ig2.cache(DEFAULT_CACHE_NAME).get(i)).value);
+        }
+    }
+
+    /** @throws Exception If fails. */
+    @Test
+    public void testSnapshotLocalPartitionNotEnoughSpace() throws Exception {
+        String err_msg = "Test exception. Not enough space.";
+        AtomicInteger throwCntr = new AtomicInteger();
+        RandomAccessFileIOFactory ioFactory = new RandomAccessFileIOFactory();
+
+        IgniteEx ig = startGridWithCache(dfltCacheCfg.setAffinity(new ZeroPartitionAffinityFunction()),
+            CACHE_KEYS_RANGE);
+
+        // Change data after backup.
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            ig.cache(DEFAULT_CACHE_NAME).put(i, 2 * i);
+
+        GridCacheSharedContext<?, ?> cctx0 = ig.context().cache().context();
+
+        IgniteSnapshotManager mgr = snp(ig);
+
+        mgr.ioFactory(new FileIOFactory() {
+            @Override public FileIO create(File file, OpenOption... modes) throws IOException {
+                FileIO fileIo = ioFactory.create(file, modes);
+
+                if (file.getName().equals(IgniteSnapshotManager.partDeltaFileName(0)))
+                    return new FileIODecorator(fileIo) {
+                        @Override public int writeFully(ByteBuffer srcBuf) throws IOException {
+                            if (throwCntr.incrementAndGet() == 3)
+                                throw new IOException(err_msg);
+
+                            return super.writeFully(srcBuf);
+                        }
+                    };
+
+                return fileIo;
+            }
+        });
+
+        IgniteInternalFuture<?> snpFut = startLocalSnapshotTask(cctx0,
+            SNAPSHOT_NAME,
+            F.asMap(CU.cacheId(DEFAULT_CACHE_NAME), null),
+            mgr.localSnapshotSenderFactory().apply(SNAPSHOT_NAME));
+
+        // Check the right exception thrown.
+        assertThrowsAnyCause(log,
+            snpFut::get,
+            IOException.class,
+            err_msg);
+    }
+
+    /** @throws Exception If fails. */
+    @Test
+    public void testSnapshotCreateLocalCopyPartitionFail() throws Exception {
+        String err_msg = "Test. Fail to copy partition: ";
+        IgniteEx ig = startGridWithCache(dfltCacheCfg, CACHE_KEYS_RANGE);
+
+        Map<Integer, Set<Integer>> parts = new HashMap<>();
+        parts.put(CU.cacheId(DEFAULT_CACHE_NAME), new HashSet<>(Collections.singletonList(0)));
+
+        IgniteSnapshotManager mgr0 = snp(ig);
+
+        IgniteInternalFuture<?> fut = startLocalSnapshotTask(ig.context().cache().context(),
+            SNAPSHOT_NAME,
+            parts,
+            new DelegateSnapshotSender(log, mgr0.snapshotExecutorService(),
+                mgr0.localSnapshotSenderFactory().apply(SNAPSHOT_NAME)) {
+                @Override public void sendPart0(File part, String cacheDirName, GroupPartitionId pair, Long length) {
+                    if (pair.getPartitionId() == 0)
+                        throw new IgniteException(err_msg + pair);
+
+                    delegate.sendPart0(part, cacheDirName, pair, length);
+                }
+            });
+
+        assertThrowsAnyCause(log,
+            fut::get,
+            IgniteException.class,
+            err_msg);
+    }
+
+    /** @throws Exception If fails. */
+    @Test
+    public void testSnapshotRemotePartitionsWithLoad() throws Exception {
+        IgniteEx ig0 = startGrids(2);
+
+        ig0.cluster().state(ClusterState.ACTIVE);
+
+        AtomicInteger cntr = new AtomicInteger();
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            ig0.cache(DEFAULT_CACHE_NAME).put(i, cntr.incrementAndGet());
+
+        GridCacheSharedContext<?, ?> cctx1 = grid(1).context().cache().context();
+        GridCacheDatabaseSharedManager db1 = (GridCacheDatabaseSharedManager)cctx1.database();
+
+        forceCheckpoint();
+
+        Map<String, Integer> rmtPartCRCs = new HashMap<>();
+        CountDownLatch cancelLatch = new CountDownLatch(1);
+
+        db1.addCheckpointListener(new DbCheckpointListener() {
+            /** {@inheritDoc} */
+            @Override public void beforeCheckpointBegin(Context ctx) {
+                //No-op.
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onMarkCheckpointBegin(Context ctx) {
+                // No-op.
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onCheckpointBegin(Context ctx) {
+                SnapshotFutureTask task = cctx1.snapshotMgr().lastScheduledRemoteSnapshotTask(grid(0).localNode().id());
+
+                // Skip first remote snapshot creation due to it will be cancelled.
+                if (task == null || cancelLatch.getCount() > 0)
+                    return;
+
+                Map<Integer, Set<Integer>> processed = GridTestUtils.getFieldValue(task,
+                    SnapshotFutureTask.class,
+                    "processed");
+
+                if (!processed.isEmpty()) {
+                    assert rmtPartCRCs.isEmpty();
+
+                    // Calculate actual partition CRCs when the checkpoint will be finished on this node.
+                    ctx.finishedStateFut().listen(f -> {
+                        File cacheWorkDir = ((FilePageStoreManager)grid(1).context().cache().context().pageStore())
+                            .cacheWorkDir(dfltCacheCfg);
+
+                        rmtPartCRCs.putAll(calculateCRC32Partitions(cacheWorkDir));
+                    });
+                }
+            }
+        });
+
+        IgniteSnapshotManager mgr0 = snp(ig0);
+
+        UUID rmtNodeId = grid(1).localNode().id();
+        Map<String, Integer> snpPartCRCs = new HashMap<>();
+
+        Map<Integer, Set<Integer>> parts = new HashMap<>();
+        parts.put(CU.cacheId(DEFAULT_CACHE_NAME), null);
+
+        IgniteInternalFuture<?> loadFut = GridTestUtils.runMultiThreadedAsync(() -> {
+            while (!Thread.currentThread().isInterrupted())
+                ig0.cache(DEFAULT_CACHE_NAME).put(cntr.incrementAndGet(), cntr.incrementAndGet());
+        }, 5, "cache-loader-");
+
+        try {
+            // Snapshot must be taken on node1 and transmitted to node0.
+            IgniteInternalFuture<?> fut = mgr0.requestRemoteSnapshot(rmtNodeId,
+                parts,
+                new BiConsumer<File, GroupPartitionId>() {
+                    @Override public void accept(File file, GroupPartitionId gprPartId) {
+                        log.info("Snapshot partition received successfully [rmtNodeId=" + rmtNodeId +
+                            ", part=" + file.getAbsolutePath() + ", gprPartId=" + gprPartId + ']');
+
+                        cancelLatch.countDown();
+                    }
+                });
+
+            cancelLatch.await();
+
+            fut.cancel();
+
+            IgniteInternalFuture<?> fut2 = mgr0.requestRemoteSnapshot(rmtNodeId,
+                parts,
+                (part, pair) -> {
+                    try {
+                        snpPartCRCs.put(part.getName(), FastCrc.calcCrc(part));
+                    }
+                    catch (IOException e) {
+                        throw new IgniteException(e);
+                    }
+                });
+
+            fut2.get();
+        }
+        finally {
+            loadFut.cancel();
+        }
+
+        assertEquals("Partitions from remote node must have the same CRCs as those which have been received",
+            rmtPartCRCs, snpPartCRCs);
+    }
+
+    /** @throws Exception If fails. */
+    @Test
+    public void testSnapshotRemoteOnBothNodes() throws Exception {
+        IgniteEx ig0 = startGrids(2);
+
+        ig0.cluster().state(ClusterState.ACTIVE);
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            ig0.cache(DEFAULT_CACHE_NAME).put(i, i);
+
+        forceCheckpoint(ig0);
+
+        IgniteSnapshotManager mgr0 = snp(ig0);
+        IgniteSnapshotManager mgr1 = snp(grid(1));
+
+        UUID node0 = grid(0).localNode().id();
+        UUID node1 = grid(1).localNode().id();
+
+        Map<Integer, Set<Integer>> fromNode1 = owningParts(ig0,
+            new HashSet<>(Collections.singletonList(CU.cacheId(DEFAULT_CACHE_NAME))),
+            node1);
+
+        Map<Integer, Set<Integer>> fromNode0 = owningParts(grid(1),
+            new HashSet<>(Collections.singletonList(CU.cacheId(DEFAULT_CACHE_NAME))),
+            node0);
+
+        // Snapshot must be taken on node1 and transmitted to node0.
+        IgniteInternalFuture<?> futFrom1To0 = mgr0.requestRemoteSnapshot(node1, fromNode1,
+            (part, pair) -> assertTrue("Received partition has not been requested", fromNode1.get(pair.getGroupId())
+                    .remove(pair.getPartitionId())));
+        IgniteInternalFuture<?> futFrom0To1 = mgr1.requestRemoteSnapshot(node0, fromNode0,
+            (part, pair) -> assertTrue("Received partition has not been requested", fromNode0.get(pair.getGroupId())
+                .remove(pair.getPartitionId())));
+
+        futFrom0To1.get();
+        futFrom1To0.get();
+
+        assertTrue("Not all of partitions have been received: " + fromNode1,
+            fromNode1.get(CU.cacheId(DEFAULT_CACHE_NAME)).isEmpty());
+        assertTrue("Not all of partitions have been received: " + fromNode0,
+            fromNode0.get(CU.cacheId(DEFAULT_CACHE_NAME)).isEmpty());
+    }
+
+    /** @throws Exception If fails. */
+    @Test(expected = ClusterTopologyCheckedException.class)
+    public void testRemoteSnapshotRequestedNodeLeft() throws Exception {
+        IgniteEx ig0 = startGridWithCache(dfltCacheCfg, CACHE_KEYS_RANGE);
+        IgniteEx ig1 = startGrid(1);
+
+        ig0.cluster().setBaselineTopology(ig0.cluster().forServers().nodes());
+
+        awaitPartitionMapExchange();
+
+        CountDownLatch hold = new CountDownLatch(1);
+
+        ((GridCacheDatabaseSharedManager)ig1.context().cache().context().database())
+            .addCheckpointListener(new DbCheckpointListener() {
+                /** {@inheritDoc} */
+                @Override public void beforeCheckpointBegin(Context ctx) throws IgniteCheckedException {
+                    // Listener will be executed inside the checkpoint thead.
+                    U.await(hold);
+                }
+
+                /** {@inheritDoc} */
+                @Override public void onMarkCheckpointBegin(Context ctx) {
+                    // No-op.
+                }
+
+                /** {@inheritDoc} */
+                @Override public void onCheckpointBegin(Context ctx) {
+                    // No-op.
+                }
+            });
+
+        UUID rmtNodeId = ig1.localNode().id();
+
+        Map<Integer, Set<Integer>> parts = new HashMap<>();
+        parts.put(CU.cacheId(DEFAULT_CACHE_NAME), null);
+
+        snp(ig0).requestRemoteSnapshot(rmtNodeId, parts, (part, grp) -> {});
+
+        IgniteInternalFuture<?>[] futs = new IgniteInternalFuture[1];
+
+        assertTrue(GridTestUtils.waitForCondition(new GridAbsPredicate() {
+            @Override public boolean apply() {
+                IgniteInternalFuture<Boolean> snpFut = snp(ig1)
+                    .lastScheduledRemoteSnapshotTask(ig0.localNode().id());
+
+                if (snpFut == null)
+                    return false;
+                else
+                    futs[0] = snpFut;
+
+                return true;
+            }
+        }, 5_000L));
+
+        stopGrid(0);
+
+        hold.countDown();
+
+        futs[0].get();
+    }
+
+    /**
+     * <pre>
+     * 1. Start 2 nodes.
+     * 2. Request snapshot from 2-nd node
+     * 3. Block snapshot-request message.
+     * 4. Start 3-rd node and change BLT.
+     * 5. Stop 3-rd node and change BLT.
+     * 6. 2-nd node now have MOVING partitions to be preloaded.
+     * 7. Release snapshot-request message.
+     * 8. Should get an error of snapshot creation since MOVING partitions cannot be snapshot.
+     * </pre>
+     *
+     * @throws Exception If fails.
+     */
+    @Test(expected = IgniteCheckedException.class)
+    public void testRemoteOutdatedSnapshot() throws Exception {
+        IgniteEx ig0 = startGrids(2);
+
+        ig0.cluster().state(ClusterState.ACTIVE);
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            ig0.cache(DEFAULT_CACHE_NAME).put(i, i);
+
+        awaitPartitionMapExchange();
+
+        forceCheckpoint();
+
+        TestRecordingCommunicationSpi.spi(ig0)
+            .blockMessages((node, msg) -> msg instanceof SnapshotRequestMessage);
+
+        UUID rmtNodeId = grid(1).localNode().id();
+
+        IgniteSnapshotManager mgr0 = snp(ig0);
+
+        // Snapshot must be taken on node1 and transmitted to node0.
+        IgniteInternalFuture<?> snpFut = mgr0.requestRemoteSnapshot(rmtNodeId,
+            owningParts(ig0, new HashSet<>(Collections.singletonList(CU.cacheId(DEFAULT_CACHE_NAME))), rmtNodeId),
+            (part, grp) -> {});
+
+        TestRecordingCommunicationSpi.spi(ig0)
+            .waitForBlocked();
+
+        startGrid(2);
+
+        ig0.cluster().setBaselineTopology(ig0.cluster().forServers().nodes());
+
+        awaitPartitionMapExchange();
+
+        stopGrid(2);
+
+        TestRecordingCommunicationSpi.spi(grid(1))
+            .blockMessages((node, msg) ->  msg instanceof GridDhtPartitionDemandMessage);
+
+        ig0.cluster().setBaselineTopology(ig0.cluster().forServers().nodes());
+
+        TestRecordingCommunicationSpi.spi(ig0)
+            .stopBlock(true, obj -> obj.get2().message() instanceof SnapshotRequestMessage);
+
+        snpFut.get();
+    }
+
+    /** @throws Exception If fails. */
+    @Test(expected = IgniteCheckedException.class)
+    public void testLocalSnapshotOnCacheStopped() throws Exception {
+        IgniteEx ig = startGridWithCache(dfltCacheCfg, CACHE_KEYS_RANGE);
+
+        startGrid(1);
+
+        ig.cluster().state(ClusterState.ACTIVE);
+
+        awaitPartitionMapExchange();
+
+        GridCacheSharedContext<?, ?> cctx0 = ig.context().cache().context();
+        IgniteSnapshotManager mgr = snp(ig);
+
+        CountDownLatch cpLatch = new CountDownLatch(1);
+
+        IgniteInternalFuture<?> snpFut = startLocalSnapshotTask(cctx0,
+            SNAPSHOT_NAME,
+            F.asMap(CU.cacheId(DEFAULT_CACHE_NAME), null),
+            new DelegateSnapshotSender(log, mgr.snapshotExecutorService(), mgr.localSnapshotSenderFactory().apply(SNAPSHOT_NAME)) {
+                @Override public void sendPart0(File part, String cacheDirName, GroupPartitionId pair, Long length) {
+                    try {
+                        U.await(cpLatch);
+
+                            delegate.sendPart0(part, cacheDirName, pair, length);
+                        } catch (IgniteInterruptedCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                    }
+                });
+
+        IgniteCache<?, ?> cache = ig.getOrCreateCache(DEFAULT_CACHE_NAME);
+
+        cache.destroy();
+
+        cpLatch.countDown();
+
+        snpFut.get(5_000, TimeUnit.MILLISECONDS);
+    }
+
+    /**
+     * @param src Source node to calculate.
+     * @param grps Groups to collect owning parts.
+     * @param rmtNodeId Remote node id.
+     * @return Map of collected parts.
+     */
+    private static Map<Integer, Set<Integer>> owningParts(IgniteEx src, Set<Integer> grps, UUID rmtNodeId) {
+        Map<Integer, Set<Integer>> result = new HashMap<>();
+
+        for (Integer grpId : grps) {
+            Set<Integer> parts = src.context()
+                .cache()
+                .cacheGroup(grpId)
+                .topology()
+                .partitions(rmtNodeId)
+                .entrySet()
+                .stream()
+                .filter(p -> p.getValue() == GridDhtPartitionState.OWNING)
+                .map(Map.Entry::getKey)
+                .collect(Collectors.toSet());
+
+            result.put(grpId, parts);
+        }
+
+        return result;
+    }
+
+    /**
+     * @param snpName Unique snapshot name.
+     * @param parts Collection of pairs group and appropriate cache partition to be snapshot.
+     * @param snpSndr Sender which used for snapshot sub-task processing.
+     * @return Future which will be completed when snapshot is done.
+     */
+    private static SnapshotFutureTask startLocalSnapshotTask(
+        GridCacheSharedContext<?, ?> cctx,
+        String snpName,
+        Map<Integer, Set<Integer>> parts,
+        SnapshotSender snpSndr
+    ) throws IgniteCheckedException{
+        SnapshotFutureTask snpFutTask = cctx.snapshotMgr().registerSnapshotTask(snpName, cctx.localNodeId(), parts, snpSndr);
+
+        snpFutTask.start();
+
+        // Snapshot is still in the INIT state. beforeCheckpoint has been skipped
+        // due to checkpoint already running and we need to schedule the next one
+        // right after current will be completed.
+        cctx.database().forceCheckpoint(String.format(CP_SNAPSHOT_REASON, snpName));
+
+        snpFutTask.awaitStarted();
+
+        return snpFutTask;
+    }
+
+    /** */
+    private static class ZeroPartitionAffinityFunction extends RendezvousAffinityFunction {
+        @Override public int partition(Object key) {
+            return 0;
+        }
+    }
+
+    /** */
+    private static class TestOrderItem implements Serializable {
+        /** Serial version. */
+        private static final long serialVersionUID = 0L;
+
+        /** Order key. */
+        private final int key;
+
+        /** Order value. */
+        private final int value;
 
 Review comment:
   Abbreviation should be used

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] alamar commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
alamar commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r405091954
 
 

 ##########
 File path: modules/platforms/dotnet/Apache.Ignite.Core.Tests/Services/ServicesTest.cs
 ##########
 @@ -870,20 +870,6 @@ public void TestCallJavaService()
                 binSvc.testBinaryObject(
                     Grid1.GetBinary().ToBinary<IBinaryObject>(new PlatformComputeBinarizable {Field = 6}))
                     .GetField<int>("Field"));
-            
 
 Review comment:
   Why!?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r409414130
 
 

 ##########
 File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotSelfTest.java
 ##########
 @@ -0,0 +1,734 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.file.OpenOption;
+import java.util.Collections;
+import java.util.List;
+import java.util.Random;
+import java.util.concurrent.Callable;
+import java.util.concurrent.CountDownLatch;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.function.Function;
+import java.util.function.Predicate;
+import org.apache.ignite.Ignite;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.Ignition;
+import org.apache.ignite.cache.CacheAtomicityMode;
+import org.apache.ignite.cache.affinity.rendezvous.RendezvousAffinityFunction;
+import org.apache.ignite.cache.query.ScanQuery;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.IgniteInterruptedCheckedException;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.TestRecordingCommunicationSpi;
+import org.apache.ignite.internal.events.DiscoveryCustomEvent;
+import org.apache.ignite.internal.managers.discovery.DiscoveryCustomMessage;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionDemandMessage;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionSupplyMessage;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.PartitionsExchangeAware;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.metric.MetricRegistry;
+import org.apache.ignite.internal.processors.metric.impl.ObjectGauge;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.distributed.FullMessage;
+import org.apache.ignite.internal.util.distributed.SingleNodeMessage;
+import org.apache.ignite.internal.util.typedef.G;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.spi.metric.LongMetric;
+import org.apache.ignite.testframework.GridTestUtils;
+import org.apache.ignite.transactions.Transaction;
+import org.junit.Before;
+import org.junit.Test;
+
+import static org.apache.ignite.cluster.ClusterState.ACTIVE;
+import static org.apache.ignite.internal.events.DiscoveryCustomEvent.EVT_DISCOVERY_CUSTOM_EVT;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.SNAPSHOT_METRICS;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.SNP_IN_PROGRESS_ERR_MSG;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.SNP_NODE_STOPPING_ERR_MSG;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.isSnapshotOperation;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.resolveSnapshotWorkDirectory;
+import static org.apache.ignite.testframework.GridTestUtils.assertThrowsAnyCause;
+import static org.apache.ignite.testframework.GridTestUtils.assertThrowsWithCause;
+
+/**
+ * Cluster-wide snapshot test.
+ */
+public class IgniteClusterSnapshotSelfTest extends AbstractSnapshotSelfTest {
+    /** Random instance. */
+    private static final Random R = new Random();
+
+    /** Time to wait while rebalance may happen. */
+    private static final long REBALANCE_AWAIT_TIME = GridTestUtils.SF.applyLB(10_000, 3_000);
+
+    /** Cache configuration for test. */
+    private static CacheConfiguration<Integer, Integer> txCcfg = new CacheConfiguration<Integer, Integer>("txCacheName")
+        .setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL)
+        .setBackups(2)
+        .setAffinity(new RendezvousAffinityFunction(false)
+            .setPartitions(CACHE_PARTS_COUNT));
+
+    /** {@code true} if node should be started in separate jvm. */
+    protected volatile boolean jvm;
+
+    /** @throws Exception If fails. */
+    @Before
+    @Override public void beforeTestSnapshot() throws Exception {
+        super.beforeTestSnapshot();
+
+        jvm = false;
+    }
+
+    /**
+     * Take snapshot from the whole cluster and check snapshot consistency.
+     * Note: Client nodes and server nodes not in baseline topology must not be affected.
+     *
+     * @throws Exception If fails.
+     */
+    @Test
+    public void testConsistentClusterSnapshotUnderLoad() throws Exception {
+        int grids = 3;
+        String snpName = "backup23012020";
+        AtomicInteger atKey = new AtomicInteger(CACHE_KEYS_RANGE);
+        AtomicInteger txKey = new AtomicInteger(CACHE_KEYS_RANGE);
+
+        IgniteEx ignite = startGrids(grids);
+        startClientGrid();
+
+        ignite.cluster().baselineAutoAdjustEnabled(false);
+        ignite.cluster().state(ACTIVE);
+
+        // Start node not in baseline.
+        IgniteEx notBltIgnite = startGrid(grids);
+        File locSnpDir = snp(notBltIgnite).snapshotLocalDir(SNAPSHOT_NAME);
+        String notBltDirName = folderName(notBltIgnite);
+
+        IgniteCache<Integer, Integer> cache = ignite.createCache(txCcfg);
+
+        for (int idx = 0; idx < CACHE_KEYS_RANGE; idx++) {
+            cache.put(txKey.incrementAndGet(), -1);
+            ignite.cache(DEFAULT_CACHE_NAME).put(atKey.incrementAndGet(), -1);
+        }
+
+        forceCheckpoint();
+
+        CountDownLatch loadLatch = new CountDownLatch(1);
+
+        ignite.context().cache().context().exchange().registerExchangeAwareComponent(new PartitionsExchangeAware() {
+            /** {@inheritDoc} */
+            @Override public void onInitBeforeTopologyLock(GridDhtPartitionsExchangeFuture fut) {
+                if (fut.firstEvent().type() != EVT_DISCOVERY_CUSTOM_EVT)
+                    return;
+
+                // First discovery custom event will be a snapshot operation.
+                assertTrue(isSnapshotOperation(fut.firstEvent()));
+                assertTrue("Snapshot must use pme-free exchange", fut.context().exchangeFreeSwitch());
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onInitAfterTopologyLock(GridDhtPartitionsExchangeFuture fut) {
+                if (fut.firstEvent().type() != EVT_DISCOVERY_CUSTOM_EVT)
+                    return;
+
+                DiscoveryCustomMessage msg = ((DiscoveryCustomEvent)fut.firstEvent()).customMessage();
+
+                assertNotNull(msg);
+
+                if (msg instanceof SnapshotDiscoveryMessage)
+                    loadLatch.countDown();
+            }
+        });
+
+        // Start cache load
+        IgniteInternalFuture<Long> loadFut = GridTestUtils.runMultiThreadedAsync(() -> {
+            try {
+                U.await(loadLatch);
+
+                while (!Thread.currentThread().isInterrupted()) {
+                    int txIdx = R.nextInt(grids);
+
+                    // zero out the sign bit
+                    grid(txIdx).cache(txCcfg.getName()).put(txKey.incrementAndGet(), R.nextInt() & Integer.MAX_VALUE);
+
+                    int atomicIdx = R.nextInt(grids);
+
+                    grid(atomicIdx).cache(DEFAULT_CACHE_NAME).put(atKey.incrementAndGet(), R.nextInt() & Integer.MAX_VALUE);
+                }
+            }
+            catch (IgniteInterruptedCheckedException e) {
+                throw new RuntimeException(e);
+            }
+        }, 3, "cache-put-");
+
+        try {
+            IgniteFuture<Void> fut = ignite.snapshot().createSnapshot(snpName);
+
+            U.await(loadLatch, 10, TimeUnit.SECONDS);
+
+            fut.get();
+        }
+        finally {
+            loadFut.cancel();
+        }
+
+        // cluster can be deactivated but we must test snapshot restore when binary recovery also occurred
+        stopAllGrids();
+
+        assertTrue("Snapshot directory must be empty for node not in baseline topology: " + notBltDirName,
+            !searchDirectoryRecursively(locSnpDir.toPath(), notBltDirName).isPresent());
+
+        IgniteEx snpIg0 = startGridsFromSnapshot(grids, snpName);
+
+        assertEquals("The number of all (primary + backup) cache keys mismatch for cache: " + DEFAULT_CACHE_NAME,
+            CACHE_KEYS_RANGE, snpIg0.cache(DEFAULT_CACHE_NAME).size());
+
+        assertEquals("The number of all (primary + backup) cache keys mismatch for cache: " + txCcfg.getName(),
+            CACHE_KEYS_RANGE, snpIg0.cache(txCcfg.getName()).size());
+
+        snpIg0.cache(DEFAULT_CACHE_NAME).query(new ScanQuery<>(null))
+            .forEach(e -> assertTrue("Snapshot must contains only negative values " +
+                "[cache=" + DEFAULT_CACHE_NAME + ", entry=" + e +']', (Integer)e.getValue() < 0));
+
+        snpIg0.cache(txCcfg.getName()).query(new ScanQuery<>(null))
+            .forEach(e -> assertTrue("Snapshot must contains only negative values " +
+                "[cache=" + txCcfg.getName() + ", entry=" + e + ']', (Integer)e.getValue() < 0));
+    }
+
+    /** @throws Exception If fails. */
+    @Test
+    public void testSnapshotPrimaryBackupsTheSame() throws Exception {
+        int grids = 3;
+        AtomicInteger cacheKey = new AtomicInteger();
+
+        IgniteEx ignite = startGridsWithCache(grids, dfltCacheCfg, CACHE_KEYS_RANGE);
+
+        IgniteInternalFuture<Long> atLoadFut = GridTestUtils.runMultiThreadedAsync(() -> {
+            while (!Thread.currentThread().isInterrupted()) {
+                int gId = R.nextInt(grids);
+
+                grid(gId).cache(DEFAULT_CACHE_NAME)
+                    .put(cacheKey.incrementAndGet(), 0);
+            }
+        }, 5, "atomic-cache-put-");
+
+        IgniteInternalFuture<Long> txLoadFut = GridTestUtils.runMultiThreadedAsync(() -> {
+            while (!Thread.currentThread().isInterrupted()) {
+                int gId = R.nextInt(grids);
+
+                IgniteCache<Integer, Integer> txCache = grid(gId).getOrCreateCache(txCcfg);
+
+                try (Transaction tx = grid(gId).transactions().txStart()) {
+                    txCache.put(cacheKey.incrementAndGet(), 0);
+
+                    tx.commit();
+                }
+            }
+        }, 5, "tx-cache-put-");
+
+        try {
+            IgniteFuture<Void> fut = ignite.snapshot().createSnapshot(SNAPSHOT_NAME);
+
+            fut.get();
+        }
+        finally {
+            txLoadFut.cancel();
+            atLoadFut.cancel();
+        }
+
+        stopAllGrids();
+
+        IgniteEx snpIg0 = startGridsFromSnapshot(grids, cfg -> resolveSnapshotWorkDirectory(cfg).getAbsolutePath(), SNAPSHOT_NAME, false);
+
+        // Block whole rebalancing.
+        for (Ignite g : G.allGrids())
+            TestRecordingCommunicationSpi.spi(g).blockMessages((node, msg) -> msg instanceof GridDhtPartitionDemandMessage);
+
+        snpIg0.cluster().state(ACTIVE);
+
+        assertFalse("Primary and backup in snapshot must have the same counters. Rebalance must not happen.",
+            GridTestUtils.waitForCondition(() -> {
+                boolean hasMsgs = false;
+
+                for (Ignite g : G.allGrids())
+                    hasMsgs |= TestRecordingCommunicationSpi.spi(g).hasBlockedMessages();
+
+                return hasMsgs;
+            }, REBALANCE_AWAIT_TIME));
+
+        TestRecordingCommunicationSpi.stopBlockAll();
+    }
+
+    /** @throws Exception If fails. */
+    @Test
+    public void testRejectCacheStopDuringClusterSnapshot() throws Exception {
+        // Block the full message, so cluster-wide snapshot operation would not be fully completed.
+        IgniteEx ignite = startGridsWithCache(3, dfltCacheCfg, CACHE_KEYS_RANGE);
+
+        BlockingCustomMessageDiscoverySpi spi = discoSpi(ignite);
+        spi.block((msg) -> {
+            if (msg instanceof FullMessage) {
+                FullMessage<?> msg0 = (FullMessage<?>)msg;
+
+                assertEquals("Snapshot distributed process must be used",
+                    DistributedProcess.DistributedProcessType.START_SNAPSHOT.ordinal(), msg0.type());
+
+                assertTrue("Snapshot has to be finished successfully on all nodes", msg0.error().isEmpty());
+
+                return true;
+            }
+
+            return false;
+        });
+
+        IgniteFuture<Void> fut = ignite.snapshot().createSnapshot(SNAPSHOT_NAME);
+
+        spi.waitBlocked(10_000L);
+
+        // Creating of new caches should not be blocked.
+        ignite.getOrCreateCache(dfltCacheCfg.setName("default2"))
+            .put(1, 1);
+
+        forceCheckpoint();
+
+        assertThrowsAnyCause(log,
+            () -> {
+                ignite.destroyCache(DEFAULT_CACHE_NAME);
+
+                return 0;
+            },
+            IgniteCheckedException.class,
+            SNP_IN_PROGRESS_ERR_MSG);
+
+        spi.unblock();
+
+        fut.get();
+    }
+
+    /** @throws Exception If fails. */
+    @Test
+    public void testBltChangeDuringClusterSnapshot() throws Exception {
+        IgniteEx ignite = startGridsWithCache(3, dfltCacheCfg, CACHE_KEYS_RANGE);
+
+        startGrid(3);
+
+        long topVer = ignite.cluster().topologyVersion();
+
+        BlockingCustomMessageDiscoverySpi spi = discoSpi(ignite);
+        spi.block((msg) -> msg instanceof FullMessage);
+
+        IgniteFuture<Void> fut = ignite.snapshot().createSnapshot(SNAPSHOT_NAME);
+
+        spi.waitBlocked(10_000L);
+
+        // Not baseline node joins successfully.
+        String grid4Dir = folderName(startGrid(4));
+
+        // Not blt node left the cluster and snapshot not affected.
+        stopGrid(4);
+
+        // Client node must connect successfully.
+        startClientGrid(4);
+
+        // Changing baseline complete successfully.
+        ignite.cluster().setBaselineTopology(topVer);
+
+        spi.unblock();
+
+        fut.get();
+
+        assertTrue("Snapshot directory must be empty for node 0 due to snapshot future fail: " + grid4Dir,
+            !searchDirectoryRecursively(snp(ignite).snapshotLocalDir(SNAPSHOT_NAME).toPath(), grid4Dir).isPresent());
+    }
+
+    /** @throws Exception If fails. */
+    @Test
+    public void testClusterSnapshotExOnInitiatorLeft() throws Exception {
+        IgniteEx ignite = startGridsWithCache(2, dfltCacheCfg, CACHE_KEYS_RANGE);
+
+        BlockingCustomMessageDiscoverySpi spi = discoSpi(ignite);
+        spi.block((msg) -> msg instanceof FullMessage);
+
+        IgniteFuture<Void> fut = ignite.snapshot().createSnapshot(SNAPSHOT_NAME);
+
+        spi.waitBlocked(10_000L);
+
+        ignite.close();
+
+        assertThrowsAnyCause(log,
+            fut::get,
+            NodeStoppingException.class,
+            SNP_NODE_STOPPING_ERR_MSG);
+    }
+
+    /** @throws Exception If fails. */
+    @Test
+    public void testSnapshotExistsException() throws Exception {
+        IgniteEx ignite = startGridsWithCache(2, dfltCacheCfg, CACHE_KEYS_RANGE);
+
+        ignite.snapshot().createSnapshot(SNAPSHOT_NAME).get();
+
+        assertThrowsAnyCause(log,
+            () -> ignite.snapshot().createSnapshot(SNAPSHOT_NAME).get(),
+            IgniteException.class,
+            "Snapshot with given name already exists.");
+
+        stopAllGrids();
+
+        // Check that snapshot has not been accidentally deleted.
+        IgniteEx snp = startGridsFromSnapshot(2, SNAPSHOT_NAME);
+
+        assertSnapshotCacheKeys(snp.cache(dfltCacheCfg.getName()));
+    }
+
+    /** @throws Exception If fails. */
+    @Test
+    public void testClusterSnapshotCleanedOnLeft() throws Exception {
+        CountDownLatch block = new CountDownLatch(1);
+        CountDownLatch partProcessed = new CountDownLatch(1);
+
+        IgniteEx ignite = startGridsWithCache(2, dfltCacheCfg, CACHE_KEYS_RANGE);
+
+        File locSnpDir = snp(ignite).snapshotLocalDir(SNAPSHOT_NAME);
+        String dirNameIgnite0 = folderName(ignite);
+
+        String dirNameIgnite1 = folderName(grid(1));
+
+        snp(grid(1)).localSnapshotSenderFactory(
+            blockingLocalSnapshotSender(grid(1), partProcessed, block));
+
+        TestRecordingCommunicationSpi commSpi1 = TestRecordingCommunicationSpi.spi(grid(1));
+        commSpi1.blockMessages((node, msg) -> msg instanceof SingleNodeMessage);
+
+        IgniteFuture<?> fut = ignite.snapshot().createSnapshot(SNAPSHOT_NAME);
+
+        U.await(partProcessed);
+
+        stopGrid(1);
+
+        block.countDown();
+
+        assertThrowsAnyCause(log,
+            fut::get,
+            IgniteCheckedException.class,
+            "Snapshot creation has been finished with an error");
+
+        assertTrue("Snapshot directory must be empty for node 0 due to snapshot future fail: " + dirNameIgnite0,
+            !searchDirectoryRecursively(locSnpDir.toPath(), dirNameIgnite0).isPresent());
+
+        startGrid(1);
+
+        awaitPartitionMapExchange();
+
+        // Snapshot directory must be cleaned.
+        assertTrue("Snapshot directory must be empty for node 1 due to snapshot future fail: " + dirNameIgnite1,
+            !searchDirectoryRecursively(locSnpDir.toPath(), dirNameIgnite1).isPresent());
+
+        List<String> allSnapshots = snp(ignite).getSnapshots();
+
+        assertTrue("Snapshot directory must be empty due to snapshot fail: " + allSnapshots,
+            allSnapshots.isEmpty());
+    }
+
+    /** @throws Exception If fails. */
+    @Test
+    public void testRecoveryClusterSnapshotJvmHalted() throws Exception {
+        IgniteEx ignite = startGridsWithCache(2, dfltCacheCfg, CACHE_KEYS_RANGE);
+
+        String grid0Dir = folderName(ignite);
+        String grid1Dir = folderName(grid(1));
+        File locSnpDir = snp(ignite).snapshotLocalDir(SNAPSHOT_NAME);
+
+        jvm = true;
+
+        IgniteConfiguration cfg2 = optimize(getConfiguration(getTestIgniteInstanceName(2)));
+
+        cfg2.getDataStorageConfiguration()
+            .setFileIOFactory(new HaltJvmFileIOFactory(new RandomAccessFileIOFactory(),
+                (Predicate<File> & Serializable) file -> {
+                    // Trying to create FileIO over partition file.
+                    return file.getAbsolutePath().contains(SNAPSHOT_NAME);
+                }));
+
+        startGrid(cfg2);
+
+        String grid2Dir = U.maskForFileName(cfg2.getConsistentId().toString());
+
+        jvm = false;
+
+        ignite.cluster().setBaselineTopology(ignite.cluster().topologyVersion());
+
+        awaitPartitionMapExchange();
+
+        assertThrowsAnyCause(log,
+            () -> ignite.snapshot().createSnapshot(SNAPSHOT_NAME).get(),
+            IgniteCheckedException.class,
+            "Snapshot creation has been finished with an error");
+
+        assertTrue("Snapshot directory must be empty: " + grid0Dir,
+            !searchDirectoryRecursively(locSnpDir.toPath(), grid0Dir).isPresent());
+
+        assertTrue("Snapshot directory must be empty: " + grid1Dir,
+            !searchDirectoryRecursively(locSnpDir.toPath(), grid1Dir).isPresent());
+
+        assertTrue("Snapshot directory must exist due to grid2 has been halted and cleanup not fully performed: " + grid2Dir,
+            searchDirectoryRecursively(locSnpDir.toPath(), grid2Dir).isPresent());
+
+        IgniteEx grid2 = startGrid(2);
+
+        assertTrue("Snapshot directory must be empty after recovery: " + grid2Dir,
+            !searchDirectoryRecursively(locSnpDir.toPath(), grid2Dir).isPresent());
+
+        awaitPartitionMapExchange();
+
+        assertTrue("Snapshot directory must be empty", grid2.snapshot().getSnapshots().isEmpty());
+
+        ignite.snapshot().createSnapshot(SNAPSHOT_NAME)
+            .get();
+
+        stopAllGrids();
+
+        IgniteEx snp = startGridsFromSnapshot(2, SNAPSHOT_NAME);
+
+        assertSnapshotCacheKeys(snp.cache(dfltCacheCfg.getName()));
+    }
+
+    /** @throws Exception If fails. */
+    @Test
+    public void testClusterSnapshotWithRebalancing() throws Exception {
+        IgniteEx ignite = startGridsWithCache(2, dfltCacheCfg, CACHE_KEYS_RANGE);
+
+        TestRecordingCommunicationSpi commSpi = TestRecordingCommunicationSpi.spi(ignite);
+        commSpi.blockMessages((node, msg) -> msg instanceof GridDhtPartitionSupplyMessage);
+
+        startGrid(2);
+
+        ignite.cluster().setBaselineTopology(ignite.cluster().topologyVersion());
+
+        commSpi.waitForBlocked();
+
+        IgniteFuture<Void> fut = ignite.snapshot().createSnapshot(SNAPSHOT_NAME);
+
+        commSpi.stopBlock(true);
+
+        fut.get();
+
+        stopAllGrids();
+
+        IgniteEx snp = startGridsFromSnapshot(3, SNAPSHOT_NAME);
+
+        awaitPartitionMapExchange();
+
+        assertSnapshotCacheKeys(snp.cache(dfltCacheCfg.getName()));
+    }
+
+    /** @throws Exception If fails. */
+    @Test
+    public void testClusterSnapshotWithExplicitPath() throws Exception {
+        File exSnpDir = U.resolveWorkDirectory(U.defaultWorkDirectory(), "ex_snapshots", true);
+
+        try {
+            IgniteEx ignite = null;
+
+            for (int i = 0; i < 2; i++) {
+                IgniteConfiguration cfg = optimize(getConfiguration(getTestIgniteInstanceName(i)));
+
+                cfg.setSnapshotPath(exSnpDir.getAbsolutePath());
+
+                ignite = startGrid(cfg);
+            }
+
+            ignite.cluster().baselineAutoAdjustEnabled(false);
+            ignite.cluster().state(ACTIVE);
+
+            for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+                ignite.cache(DEFAULT_CACHE_NAME).put(i, i);
+
+            forceCheckpoint();
+
+            ignite.snapshot().createSnapshot(SNAPSHOT_NAME)
+                .get();
+
+            stopAllGrids();
+
+            IgniteEx snp = startGridsFromSnapshot(2, cfg -> exSnpDir.getAbsolutePath(), SNAPSHOT_NAME, true);
+
+            assertSnapshotCacheKeys(snp.cache(dfltCacheCfg.getName()));
+        }
+        finally {
+            stopAllGrids();
+
+            U.delete(exSnpDir);
+        }
+    }
+
+    /** @throws Exception If fails. */
+    @Test
+    public void testClusterSnapshotMetrics() throws Exception {
+        String newSnapshotName = SNAPSHOT_NAME + "_new";
+        CountDownLatch deltaApply = new CountDownLatch(1);
+        CountDownLatch deltaBlock = new CountDownLatch(1);
+        IgniteEx ignite = startGridsWithCache(2, dfltCacheCfg, CACHE_KEYS_RANGE);
+
+        MetricRegistry mreg0 = ignite.context().metric().registry(SNAPSHOT_METRICS);
+
+        LongMetric startTime = mreg0.findMetric("LastSnapshotStartTime");
 
 Review comment:
   Also please check all metrics before first snapshot attempt.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r409741300
 
 

 ##########
 File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManagerSelfTest.java
 ##########
 @@ -0,0 +1,770 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.nio.file.OpenOption;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.CountDownLatch;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.function.BiConsumer;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.cache.affinity.rendezvous.RendezvousAffinityFunction;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.IgniteInterruptedCheckedException;
+import org.apache.ignite.internal.TestRecordingCommunicationSpi;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionDemandMessage;
+import org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionState;
+import org.apache.ignite.internal.processors.cache.persistence.CheckpointState;
+import org.apache.ignite.internal.processors.cache.persistence.DbCheckpointListener;
+import org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIODecorator;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.filename.PdsFolderSettings;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.util.lang.GridAbsPredicate;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.testframework.GridTestUtils;
+import org.junit.Test;
+
+import static org.apache.ignite.internal.MarshallerContextImpl.mappingFileStoreWorkDir;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.resolveBinaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.cacheDirName;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.CP_SNAPSHOT_REASON;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.testframework.GridTestUtils.assertThrowsAnyCause;
+
+/**
+ * Default snapshot manager test.
+ */
+public class IgniteSnapshotManagerSelfTest extends AbstractSnapshotSelfTest {
+    /** @throws Exception If fails. */
+    @Test
+    public void testSnapshotLocalPartitions() throws Exception {
+        // Start grid node with data before each test.
+        IgniteEx ig = startGridWithCache(dfltCacheCfg, 2048);
+
+        // The following data will be included into checkpoint.
+        for (int i = 2048; i < 4096; i++)
+            ig.cache(DEFAULT_CACHE_NAME).put(i, new TestOrderItem(i, i));
+
+        for (int i = 4096; i < 8192; i++) {
+            ig.cache(DEFAULT_CACHE_NAME).put(i, new TestOrderItem(i, i) {
+                @Override public String toString() {
+                    return "_" + super.toString();
+                }
+            });
+        }
+
+        GridCacheSharedContext<?, ?> cctx = ig.context().cache().context();
+        IgniteSnapshotManager mgr = snp(ig);
+
+        // Collection of pairs group and appropriate cache partition to be snapshot.
+        IgniteInternalFuture<?> snpFut = startLocalSnapshotTask(cctx,
+            SNAPSHOT_NAME,
+            F.asMap(CU.cacheId(DEFAULT_CACHE_NAME), null),
+            mgr.localSnapshotSenderFactory().apply(SNAPSHOT_NAME));
+
+        snpFut.get();
+
+        File cacheWorkDir = ((FilePageStoreManager)ig.context()
+            .cache()
+            .context()
+            .pageStore())
+            .cacheWorkDir(dfltCacheCfg);
+
+        // Checkpoint forces on cluster deactivation (currently only single node in cluster),
+        // so we must have the same data in snapshot partitions and those which left
+        // after node stop.
+        stopGrid(ig.name());
+
+        // Calculate CRCs.
+        IgniteConfiguration cfg = ig.context().config();
+        PdsFolderSettings settings = ig.context().pdsFolderResolver().resolveFolders();
+        String nodePath = databaseRelativePath(settings.folderName());
+        File binWorkDir = resolveBinaryWorkDir(cfg.getWorkDirectory(), settings.folderName());
+        File marshWorkDir = mappingFileStoreWorkDir(U.workDirectory(cfg.getWorkDirectory(), cfg.getIgniteHome()));
+        File snpBinWorkDir = resolveBinaryWorkDir(mgr.snapshotLocalDir(SNAPSHOT_NAME).getAbsolutePath(), settings.folderName());
+        File snpMarshWorkDir = mappingFileStoreWorkDir(mgr.snapshotLocalDir(SNAPSHOT_NAME).getAbsolutePath());
+
+        final Map<String, Integer> origPartCRCs = calculateCRC32Partitions(cacheWorkDir);
+        final Map<String, Integer> snpPartCRCs = calculateCRC32Partitions(
+            FilePageStoreManager.cacheWorkDir(U.resolveWorkDirectory(mgr.snapshotLocalDir(SNAPSHOT_NAME)
+                    .getAbsolutePath(),
+                nodePath,
+                false),
+                cacheDirName(dfltCacheCfg)));
+
+        assertEquals("Partitions must have the same CRC after file copying and merging partition delta files",
+            origPartCRCs, snpPartCRCs);
+        assertEquals("Binary object mappings must be the same for local node and created snapshot",
+            calculateCRC32Partitions(binWorkDir), calculateCRC32Partitions(snpBinWorkDir));
+        assertEquals("Marshaller meta mast be the same for local node and created snapshot",
+            calculateCRC32Partitions(marshWorkDir), calculateCRC32Partitions(snpMarshWorkDir));
+
+        File snpWorkDir = mgr.snapshotTmpDir();
+
+        assertEquals("Snapshot working directory must be cleaned after usage", 0, snpWorkDir.listFiles().length);
+    }
+
+    /**
+     * Test that all partitions are copied successfully even after multiple checkpoints occur during
+     * the long copy of cache partition files.
+     *
+     * Data consistency checked through a test node started right from snapshot directory and all values
+     * read successes.
+     *
+     * @throws Exception If fails.
+     */
+    @Test
+    public void testSnapshotLocalPartitionMultiCpWithLoad() throws Exception {
+        int valMultiplier = 2;
+        CountDownLatch slowCopy = new CountDownLatch(1);
+
+        // Start grid node with data before each test.
+        IgniteEx ig = startGrid(0);
+
+        ig.cluster().baselineAutoAdjustEnabled(false);
+        ig.cluster().state(ClusterState.ACTIVE);
+        GridCacheSharedContext<?, ?> cctx = ig.context().cache().context();
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            ig.cache(DEFAULT_CACHE_NAME).put(i, new TestOrderItem(i, i));
+
+        forceCheckpoint(ig);
+
+        AtomicInteger cntr = new AtomicInteger();
+        CountDownLatch ldrLatch = new CountDownLatch(1);
+        IgniteSnapshotManager mgr = snp(ig);
+        GridCacheDatabaseSharedManager db = (GridCacheDatabaseSharedManager)cctx.database();
+
+        IgniteInternalFuture<?> loadFut = GridTestUtils.runMultiThreadedAsync(() -> {
+            try {
+                U.await(ldrLatch);
+
+                while (!Thread.currentThread().isInterrupted())
+                    ig.cache(DEFAULT_CACHE_NAME).put(cntr.incrementAndGet(),
+                        new TestOrderItem(cntr.incrementAndGet(), cntr.incrementAndGet()));
+            }
+            catch (IgniteInterruptedCheckedException e) {
+                log.warning("Loader has been interrupted", e);
+            }
+        }, 5, "cache-loader-");
+
+        // Register task but not schedule it on the checkpoint.
+        SnapshotFutureTask snpFutTask = mgr.registerSnapshotTask(SNAPSHOT_NAME,
+            cctx.localNodeId(),
+            F.asMap(CU.cacheId(DEFAULT_CACHE_NAME), null),
+            new DelegateSnapshotSender(log, mgr.snapshotExecutorService(), mgr.localSnapshotSenderFactory().apply(SNAPSHOT_NAME)) {
+                @Override public void sendPart0(File part, String cacheDirName, GroupPartitionId pair, Long length) {
+                    try {
+                        U.await(slowCopy);
+
+                        delegate.sendPart0(part, cacheDirName, pair, length);
+                    }
+                    catch (IgniteInterruptedCheckedException e) {
+                        throw new IgniteException(e);
+                    }
+                }
+            });
+
+        db.addCheckpointListener(new DbCheckpointListener() {
+            /** {@inheritDoc} */
+            @Override public void beforeCheckpointBegin(Context ctx) {
+                // No-op.
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onMarkCheckpointBegin(Context ctx) {
+                // No-op.
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onCheckpointBegin(Context ctx) {
+                Map<Integer, Set<Integer>> processed = GridTestUtils.getFieldValue(snpFutTask,
+                    SnapshotFutureTask.class,
+                    "processed");
+
+                if (!processed.isEmpty())
+                    ldrLatch.countDown();
+            }
+        });
+
+        try {
+            snpFutTask.start();
+
+            // Change data before snapshot creation which must be included into it witch correct value multiplier.
+            for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+                ig.cache(DEFAULT_CACHE_NAME).put(i, new TestOrderItem(i, valMultiplier * i));
+
+            // Snapshot is still in the INIT state. beforeCheckpoint has been skipped
+            // due to checkpoint already running and we need to schedule the next one
+            // right after current will be completed.
+            cctx.database().forceCheckpoint(String.format(CP_SNAPSHOT_REASON, SNAPSHOT_NAME));
+
+            snpFutTask.awaitStarted();
+
+            db.forceCheckpoint("snapshot is ready to be created")
+                .futureFor(CheckpointState.MARKER_STORED_TO_DISK)
+                .get();
+
+            // Change data after snapshot.
+            for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+                ig.cache(DEFAULT_CACHE_NAME).put(i, new TestOrderItem(i, 3 * i));
+
+            // Snapshot on the next checkpoint must copy page to delta file before write it to a partition.
+            forceCheckpoint(ig);
+
+            slowCopy.countDown();
+
+            snpFutTask.get();
+        }
+        finally {
+            loadFut.cancel();
+        }
+
+        // Now can stop the node and check created snapshots.
+        stopGrid(0);
+
+        cleanPersistenceDir(ig.name());
+
+        // Start Ignite instance from snapshot directory.
+        IgniteEx ig2 = startGridsFromSnapshot(1, SNAPSHOT_NAME);
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++) {
+            assertEquals("snapshot data consistency violation [key=" + i + ']',
+                i * valMultiplier, ((TestOrderItem)ig2.cache(DEFAULT_CACHE_NAME).get(i)).value);
+        }
+    }
+
+    /** @throws Exception If fails. */
+    @Test
+    public void testSnapshotLocalPartitionNotEnoughSpace() throws Exception {
+        String err_msg = "Test exception. Not enough space.";
+        AtomicInteger throwCntr = new AtomicInteger();
+        RandomAccessFileIOFactory ioFactory = new RandomAccessFileIOFactory();
+
+        IgniteEx ig = startGridWithCache(dfltCacheCfg.setAffinity(new ZeroPartitionAffinityFunction()),
+            CACHE_KEYS_RANGE);
+
+        // Change data after backup.
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            ig.cache(DEFAULT_CACHE_NAME).put(i, 2 * i);
+
+        GridCacheSharedContext<?, ?> cctx0 = ig.context().cache().context();
+
+        IgniteSnapshotManager mgr = snp(ig);
+
+        mgr.ioFactory(new FileIOFactory() {
+            @Override public FileIO create(File file, OpenOption... modes) throws IOException {
+                FileIO fileIo = ioFactory.create(file, modes);
+
+                if (file.getName().equals(IgniteSnapshotManager.partDeltaFileName(0)))
+                    return new FileIODecorator(fileIo) {
+                        @Override public int writeFully(ByteBuffer srcBuf) throws IOException {
+                            if (throwCntr.incrementAndGet() == 3)
+                                throw new IOException(err_msg);
+
+                            return super.writeFully(srcBuf);
+                        }
+                    };
+
+                return fileIo;
+            }
+        });
+
+        IgniteInternalFuture<?> snpFut = startLocalSnapshotTask(cctx0,
+            SNAPSHOT_NAME,
+            F.asMap(CU.cacheId(DEFAULT_CACHE_NAME), null),
+            mgr.localSnapshotSenderFactory().apply(SNAPSHOT_NAME));
+
+        // Check the right exception thrown.
+        assertThrowsAnyCause(log,
+            snpFut::get,
+            IOException.class,
+            err_msg);
+    }
+
+    /** @throws Exception If fails. */
+    @Test
+    public void testSnapshotCreateLocalCopyPartitionFail() throws Exception {
+        String err_msg = "Test. Fail to copy partition: ";
+        IgniteEx ig = startGridWithCache(dfltCacheCfg, CACHE_KEYS_RANGE);
+
+        Map<Integer, Set<Integer>> parts = new HashMap<>();
+        parts.put(CU.cacheId(DEFAULT_CACHE_NAME), new HashSet<>(Collections.singletonList(0)));
+
+        IgniteSnapshotManager mgr0 = snp(ig);
+
+        IgniteInternalFuture<?> fut = startLocalSnapshotTask(ig.context().cache().context(),
+            SNAPSHOT_NAME,
+            parts,
+            new DelegateSnapshotSender(log, mgr0.snapshotExecutorService(),
+                mgr0.localSnapshotSenderFactory().apply(SNAPSHOT_NAME)) {
+                @Override public void sendPart0(File part, String cacheDirName, GroupPartitionId pair, Long length) {
+                    if (pair.getPartitionId() == 0)
+                        throw new IgniteException(err_msg + pair);
+
+                    delegate.sendPart0(part, cacheDirName, pair, length);
+                }
+            });
+
+        assertThrowsAnyCause(log,
+            fut::get,
+            IgniteException.class,
+            err_msg);
+    }
+
+    /** @throws Exception If fails. */
+    @Test
+    public void testSnapshotRemotePartitionsWithLoad() throws Exception {
+        IgniteEx ig0 = startGrids(2);
+
+        ig0.cluster().state(ClusterState.ACTIVE);
+
+        AtomicInteger cntr = new AtomicInteger();
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            ig0.cache(DEFAULT_CACHE_NAME).put(i, cntr.incrementAndGet());
+
+        GridCacheSharedContext<?, ?> cctx1 = grid(1).context().cache().context();
+        GridCacheDatabaseSharedManager db1 = (GridCacheDatabaseSharedManager)cctx1.database();
+
+        forceCheckpoint();
+
+        Map<String, Integer> rmtPartCRCs = new HashMap<>();
+        CountDownLatch cancelLatch = new CountDownLatch(1);
+
+        db1.addCheckpointListener(new DbCheckpointListener() {
+            /** {@inheritDoc} */
+            @Override public void beforeCheckpointBegin(Context ctx) {
+                //No-op.
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onMarkCheckpointBegin(Context ctx) {
+                // No-op.
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onCheckpointBegin(Context ctx) {
+                SnapshotFutureTask task = cctx1.snapshotMgr().lastScheduledRemoteSnapshotTask(grid(0).localNode().id());
+
+                // Skip first remote snapshot creation due to it will be cancelled.
+                if (task == null || cancelLatch.getCount() > 0)
+                    return;
+
+                Map<Integer, Set<Integer>> processed = GridTestUtils.getFieldValue(task,
+                    SnapshotFutureTask.class,
+                    "processed");
+
+                if (!processed.isEmpty()) {
+                    assert rmtPartCRCs.isEmpty();
+
+                    // Calculate actual partition CRCs when the checkpoint will be finished on this node.
+                    ctx.finishedStateFut().listen(f -> {
+                        File cacheWorkDir = ((FilePageStoreManager)grid(1).context().cache().context().pageStore())
+                            .cacheWorkDir(dfltCacheCfg);
+
+                        rmtPartCRCs.putAll(calculateCRC32Partitions(cacheWorkDir));
+                    });
+                }
+            }
+        });
+
+        IgniteSnapshotManager mgr0 = snp(ig0);
+
+        UUID rmtNodeId = grid(1).localNode().id();
+        Map<String, Integer> snpPartCRCs = new HashMap<>();
+
+        Map<Integer, Set<Integer>> parts = new HashMap<>();
+        parts.put(CU.cacheId(DEFAULT_CACHE_NAME), null);
+
+        IgniteInternalFuture<?> loadFut = GridTestUtils.runMultiThreadedAsync(() -> {
+            while (!Thread.currentThread().isInterrupted())
+                ig0.cache(DEFAULT_CACHE_NAME).put(cntr.incrementAndGet(), cntr.incrementAndGet());
+        }, 5, "cache-loader-");
+
+        try {
+            // Snapshot must be taken on node1 and transmitted to node0.
+            IgniteInternalFuture<?> fut = mgr0.requestRemoteSnapshot(rmtNodeId,
+                parts,
+                new BiConsumer<File, GroupPartitionId>() {
+                    @Override public void accept(File file, GroupPartitionId gprPartId) {
+                        log.info("Snapshot partition received successfully [rmtNodeId=" + rmtNodeId +
+                            ", part=" + file.getAbsolutePath() + ", gprPartId=" + gprPartId + ']');
+
+                        cancelLatch.countDown();
+                    }
+                });
+
+            cancelLatch.await();
+
+            fut.cancel();
+
+            IgniteInternalFuture<?> fut2 = mgr0.requestRemoteSnapshot(rmtNodeId,
+                parts,
+                (part, pair) -> {
+                    try {
+                        snpPartCRCs.put(part.getName(), FastCrc.calcCrc(part));
+                    }
+                    catch (IOException e) {
+                        throw new IgniteException(e);
+                    }
+                });
+
+            fut2.get();
+        }
+        finally {
+            loadFut.cancel();
+        }
+
+        assertEquals("Partitions from remote node must have the same CRCs as those which have been received",
+            rmtPartCRCs, snpPartCRCs);
+    }
+
+    /** @throws Exception If fails. */
+    @Test
+    public void testSnapshotRemoteOnBothNodes() throws Exception {
+        IgniteEx ig0 = startGrids(2);
+
+        ig0.cluster().state(ClusterState.ACTIVE);
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            ig0.cache(DEFAULT_CACHE_NAME).put(i, i);
+
+        forceCheckpoint(ig0);
+
+        IgniteSnapshotManager mgr0 = snp(ig0);
+        IgniteSnapshotManager mgr1 = snp(grid(1));
+
+        UUID node0 = grid(0).localNode().id();
+        UUID node1 = grid(1).localNode().id();
+
+        Map<Integer, Set<Integer>> fromNode1 = owningParts(ig0,
+            new HashSet<>(Collections.singletonList(CU.cacheId(DEFAULT_CACHE_NAME))),
+            node1);
+
+        Map<Integer, Set<Integer>> fromNode0 = owningParts(grid(1),
+            new HashSet<>(Collections.singletonList(CU.cacheId(DEFAULT_CACHE_NAME))),
+            node0);
+
+        // Snapshot must be taken on node1 and transmitted to node0.
+        IgniteInternalFuture<?> futFrom1To0 = mgr0.requestRemoteSnapshot(node1, fromNode1,
+            (part, pair) -> assertTrue("Received partition has not been requested", fromNode1.get(pair.getGroupId())
+                    .remove(pair.getPartitionId())));
+        IgniteInternalFuture<?> futFrom0To1 = mgr1.requestRemoteSnapshot(node0, fromNode0,
+            (part, pair) -> assertTrue("Received partition has not been requested", fromNode0.get(pair.getGroupId())
+                .remove(pair.getPartitionId())));
+
+        futFrom0To1.get();
+        futFrom1To0.get();
+
+        assertTrue("Not all of partitions have been received: " + fromNode1,
+            fromNode1.get(CU.cacheId(DEFAULT_CACHE_NAME)).isEmpty());
+        assertTrue("Not all of partitions have been received: " + fromNode0,
+            fromNode0.get(CU.cacheId(DEFAULT_CACHE_NAME)).isEmpty());
+    }
+
+    /** @throws Exception If fails. */
+    @Test(expected = ClusterTopologyCheckedException.class)
+    public void testRemoteSnapshotRequestedNodeLeft() throws Exception {
+        IgniteEx ig0 = startGridWithCache(dfltCacheCfg, CACHE_KEYS_RANGE);
+        IgniteEx ig1 = startGrid(1);
+
+        ig0.cluster().setBaselineTopology(ig0.cluster().forServers().nodes());
+
+        awaitPartitionMapExchange();
+
+        CountDownLatch hold = new CountDownLatch(1);
+
+        ((GridCacheDatabaseSharedManager)ig1.context().cache().context().database())
+            .addCheckpointListener(new DbCheckpointListener() {
+                /** {@inheritDoc} */
+                @Override public void beforeCheckpointBegin(Context ctx) throws IgniteCheckedException {
+                    // Listener will be executed inside the checkpoint thead.
+                    U.await(hold);
+                }
+
+                /** {@inheritDoc} */
+                @Override public void onMarkCheckpointBegin(Context ctx) {
+                    // No-op.
+                }
+
+                /** {@inheritDoc} */
+                @Override public void onCheckpointBegin(Context ctx) {
+                    // No-op.
+                }
+            });
+
+        UUID rmtNodeId = ig1.localNode().id();
+
+        Map<Integer, Set<Integer>> parts = new HashMap<>();
+        parts.put(CU.cacheId(DEFAULT_CACHE_NAME), null);
+
+        snp(ig0).requestRemoteSnapshot(rmtNodeId, parts, (part, grp) -> {});
+
+        IgniteInternalFuture<?>[] futs = new IgniteInternalFuture[1];
+
+        assertTrue(GridTestUtils.waitForCondition(new GridAbsPredicate() {
+            @Override public boolean apply() {
+                IgniteInternalFuture<Boolean> snpFut = snp(ig1)
+                    .lastScheduledRemoteSnapshotTask(ig0.localNode().id());
+
+                if (snpFut == null)
+                    return false;
+                else
+                    futs[0] = snpFut;
+
+                return true;
+            }
+        }, 5_000L));
+
+        stopGrid(0);
+
+        hold.countDown();
+
+        futs[0].get();
+    }
+
+    /**
+     * <pre>
+     * 1. Start 2 nodes.
+     * 2. Request snapshot from 2-nd node
+     * 3. Block snapshot-request message.
+     * 4. Start 3-rd node and change BLT.
+     * 5. Stop 3-rd node and change BLT.
+     * 6. 2-nd node now have MOVING partitions to be preloaded.
+     * 7. Release snapshot-request message.
+     * 8. Should get an error of snapshot creation since MOVING partitions cannot be snapshot.
+     * </pre>
+     *
+     * @throws Exception If fails.
+     */
+    @Test(expected = IgniteCheckedException.class)
+    public void testRemoteOutdatedSnapshot() throws Exception {
+        IgniteEx ig0 = startGrids(2);
+
+        ig0.cluster().state(ClusterState.ACTIVE);
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            ig0.cache(DEFAULT_CACHE_NAME).put(i, i);
+
+        awaitPartitionMapExchange();
+
+        forceCheckpoint();
+
+        TestRecordingCommunicationSpi.spi(ig0)
+            .blockMessages((node, msg) -> msg instanceof SnapshotRequestMessage);
+
+        UUID rmtNodeId = grid(1).localNode().id();
+
+        IgniteSnapshotManager mgr0 = snp(ig0);
+
+        // Snapshot must be taken on node1 and transmitted to node0.
+        IgniteInternalFuture<?> snpFut = mgr0.requestRemoteSnapshot(rmtNodeId,
+            owningParts(ig0, new HashSet<>(Collections.singletonList(CU.cacheId(DEFAULT_CACHE_NAME))), rmtNodeId),
+            (part, grp) -> {});
+
+        TestRecordingCommunicationSpi.spi(ig0)
+            .waitForBlocked();
+
+        startGrid(2);
+
+        ig0.cluster().setBaselineTopology(ig0.cluster().forServers().nodes());
+
+        awaitPartitionMapExchange();
+
+        stopGrid(2);
+
+        TestRecordingCommunicationSpi.spi(grid(1))
+            .blockMessages((node, msg) ->  msg instanceof GridDhtPartitionDemandMessage);
+
+        ig0.cluster().setBaselineTopology(ig0.cluster().forServers().nodes());
+
+        TestRecordingCommunicationSpi.spi(ig0)
+            .stopBlock(true, obj -> obj.get2().message() instanceof SnapshotRequestMessage);
+
+        snpFut.get();
+    }
+
+    /** @throws Exception If fails. */
+    @Test(expected = IgniteCheckedException.class)
+    public void testLocalSnapshotOnCacheStopped() throws Exception {
+        IgniteEx ig = startGridWithCache(dfltCacheCfg, CACHE_KEYS_RANGE);
+
+        startGrid(1);
+
+        ig.cluster().state(ClusterState.ACTIVE);
+
+        awaitPartitionMapExchange();
+
+        GridCacheSharedContext<?, ?> cctx0 = ig.context().cache().context();
+        IgniteSnapshotManager mgr = snp(ig);
+
+        CountDownLatch cpLatch = new CountDownLatch(1);
+
+        IgniteInternalFuture<?> snpFut = startLocalSnapshotTask(cctx0,
+            SNAPSHOT_NAME,
+            F.asMap(CU.cacheId(DEFAULT_CACHE_NAME), null),
+            new DelegateSnapshotSender(log, mgr.snapshotExecutorService(), mgr.localSnapshotSenderFactory().apply(SNAPSHOT_NAME)) {
+                @Override public void sendPart0(File part, String cacheDirName, GroupPartitionId pair, Long length) {
+                    try {
+                        U.await(cpLatch);
+
+                            delegate.sendPart0(part, cacheDirName, pair, length);
 
 Review comment:
   Fixed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r410148822
 
 

 ##########
 File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotSelfTest.java
 ##########
 @@ -83,15 +84,12 @@
  * Cluster-wide snapshot test.
  */
 public class IgniteClusterSnapshotSelfTest extends AbstractSnapshotSelfTest {
-    /** Random instance. */
-    private static final Random R = new Random();
-
     /** Time to wait while rebalance may happen. */
     private static final long REBALANCE_AWAIT_TIME = GridTestUtils.SF.applyLB(10_000, 3_000);
 
     /** Cache configuration for test. */
-    private static CacheConfiguration<Integer, Integer> txCcfg = new CacheConfiguration<Integer, Integer>("txCacheName")
-        .setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL)
+    private static CacheConfiguration<Integer, Integer> atomicCcfg = new CacheConfiguration<Integer, Integer>("txCacheName")
 
 Review comment:
   Fixed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r408971947
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotFutureTask.java
 ##########
 @@ -0,0 +1,881 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.Closeable;
+import java.io.File;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.atomic.AtomicBoolean;
+import java.util.concurrent.atomic.AtomicIntegerArray;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.concurrent.locks.ReadWriteLock;
+import java.util.concurrent.locks.ReentrantReadWriteLock;
+import java.util.function.BooleanSupplier;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.internal.pagemem.PageIdUtils;
+import org.apache.ignite.internal.pagemem.store.PageStore;
+import org.apache.ignite.internal.pagemem.store.PageWriteListener;
+import org.apache.ignite.internal.processors.cache.CacheGroupContext;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtLocalPartition;
+import org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionState;
+import org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopology;
+import org.apache.ignite.internal.processors.cache.persistence.DbCheckpointListener;
+import org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.tree.io.PageIO;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.processors.marshaller.MappedName;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.lang.IgniteThrowableRunner;
+import org.apache.ignite.internal.util.tostring.GridToStringExclude;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.A;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.S;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.INDEX_PARTITION;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.cacheDirName;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.cacheWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFile;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.partDeltaFile;
+
+/**
+ *
+ */
+class SnapshotFutureTask extends GridFutureAdapter<Boolean> implements DbCheckpointListener {
+    /** Shared context. */
+    private final GridCacheSharedContext<?, ?> cctx;
+
+    /** Ignite logger */
 
 Review comment:
   Point

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r408769636
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/GridCacheProcessor.java
 ##########
 @@ -4042,6 +4047,10 @@ public void onDiscoveryEvent(
      * @return {@code True} if minor topology version should be increased.
      */
     public boolean onCustomEvent(DiscoveryCustomMessage msg, AffinityTopologyVersion topVer, ClusterNode node) {
+        if (msg instanceof InitMessage &&
 
 Review comment:
   Should be removed if standard discovery workflow will be used.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r410149142
 
 

 ##########
 File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotSelfTest.java
 ##########
 @@ -286,6 +293,134 @@ public void testSnapshotPrimaryBackupsTheSame() throws Exception {
         TestRecordingCommunicationSpi.stopBlockAll();
     }
 
+    /** @throws Exception If fails. */
+    @Test
+    public void testClusterSnapshotConsistencyUnderLoad() throws Exception {
+        int clients = 50;
+        int balance = 10_000;
+        int transferLimit = 1000;
+        int total = clients * balance * 2;
+        int grids = 3;
+        int transferThreadCnt = 4;
+        AtomicBoolean stop = new AtomicBoolean(false);
+        CountDownLatch txStarted = new CountDownLatch(1);
+
+        CacheConfiguration<Integer, Account> eastCcfg = txCacheConfig(new CacheConfiguration<>("east"));
+        CacheConfiguration<Integer, Account> westCcfg = txCacheConfig(new CacheConfiguration<>("west"));
+
+        for (int i = 0; i < grids; i++)
+            startGrid(optimize(getConfiguration(getTestIgniteInstanceName(i)).setCacheConfiguration(eastCcfg, westCcfg)));
+
+        grid(0).cluster().state(ACTIVE);
+
+        Ignite client = startClientGrid(grids);
+
+        IgniteCache<Integer, Account> eastCache = client.cache(eastCcfg.getName());
+        IgniteCache<Integer, Account> westCache = client.cache(westCcfg.getName());
+
+        // Create clients with zero balance.
+        for (int i = 0; i < clients; i++) {
+            eastCache.put(i, new Account(i, balance));
+            westCache.put(i, new Account(i, balance));
+        }
+
+        assertEquals("The initial summary value in all caches is not correct.",
+            total, sumAllCacheValues(client, clients, eastCcfg.getName(), westCcfg.getName()));
+
+        forceCheckpoint();
+
+        IgniteInternalFuture<?> txLoadFut = GridTestUtils.runMultiThreadedAsync(
+            () -> {
+                ThreadLocalRandom rnd = ThreadLocalRandom.current();
+
+                int amount;
+
+                try {
+                    while (!stop.get()) {
+                        IgniteEx ignite = grid(rnd.nextInt(grids));
+                        IgniteCache<Integer, Account> east = ignite.cache("east");
+                        IgniteCache<Integer, Account> west = ignite.cache("west");
+
+                        amount = rnd.nextInt(transferLimit);
+
+                        try (Transaction tx = ignite.transactions().txStart()) {
+                            Integer id = rnd.nextInt(clients);
+
+                            Account acc0 = east.get(id);
+                            Account acc1 = west.get(id);
+
+                            acc0.balance -= amount;
+
+                            txStarted.countDown();
+
+                            acc1.balance += amount;
+
+                            east.put(id, acc0);
+                            west.put(id, acc1);
+
+                            tx.commit();
+                        }
+                    }
+                }
+                catch (Throwable e) {
+                    U.error(log, e);
+
+                    fail("Tx must not be failed.");
+                }
+            }, transferThreadCnt, "transfer-account-thread-");
+
+        try {
+            U.await(txStarted);
+
+            grid(0).snapshot().createSnapshot(SNAPSHOT_NAME).get();
+        }
+        finally {
+            stop.set(true);
+        }
+
+        txLoadFut.get();
+
+        assertEquals("The summary value should not changed during tx transfers.",
+            total, sumAllCacheValues(client, clients, eastCcfg.getName(), westCcfg.getName()));
+
+        stopAllGrids();
+
+        IgniteEx snpIg0 = startGridsFromSnapshot(grids, SNAPSHOT_NAME);
+
+        assertEquals("The total amount of all cache values must not changed in snapshot.",
+            total, sumAllCacheValues(snpIg0, clients, eastCcfg.getName(), westCcfg.getName()));
+    }
+
+    /** @throws Exception If fails. */
+    @Test
+    public void testClusterSnapshotWithCacheNodeFilter() throws Exception {
+        int grids = 4;
+
+        CacheConfiguration<Integer, Integer> ccfg = txCacheConfig(new CacheConfiguration<Integer, Integer>(DEFAULT_CACHE_NAME))
+            .setNodeFilter(node -> node.consistentId().toString().endsWith("1"));
+
+        for (int i = 0; i < grids; i++)
+            startGrid(optimize(getConfiguration(getTestIgniteInstanceName(i)).setCacheConfiguration()));
+
+        IgniteEx ig0 = grid(0);
+
+        ig0.cluster().baselineAutoAdjustEnabled(false);
+        ig0.cluster().state(ACTIVE);
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            ig0.getOrCreateCache(ccfg).put(i, i);
+
+        forceCheckpoint();
 
 Review comment:
   Fixed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r409024086
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManager.java
 ##########
 @@ -0,0 +1,1986 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.channels.FileChannel;
+import java.nio.file.FileVisitResult;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.SimpleFileVisitor;
+import java.nio.file.attribute.BasicFileAttributes;
+import java.util.ArrayDeque;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Queue;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.Executor;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.concurrent.atomic.LongAdder;
+import java.util.function.BiConsumer;
+import java.util.function.BiFunction;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.events.DiscoveryEvent;
+import org.apache.ignite.failure.FailureContext;
+import org.apache.ignite.failure.FailureType;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.GridTopic;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.events.DiscoveryCustomEvent;
+import org.apache.ignite.internal.managers.communication.GridIoManager;
+import org.apache.ignite.internal.managers.communication.GridMessageListener;
+import org.apache.ignite.internal.managers.communication.TransmissionCancelledException;
+import org.apache.ignite.internal.managers.communication.TransmissionHandler;
+import org.apache.ignite.internal.managers.communication.TransmissionMeta;
+import org.apache.ignite.internal.managers.communication.TransmissionPolicy;
+import org.apache.ignite.internal.managers.discovery.DiscoCache;
+import org.apache.ignite.internal.managers.discovery.DiscoveryCustomMessage;
+import org.apache.ignite.internal.managers.discovery.GridDiscoveryManager;
+import org.apache.ignite.internal.managers.eventstorage.DiscoveryEventListener;
+import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.CacheType;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.PartitionsExchangeAware;
+import org.apache.ignite.internal.processors.cache.persistence.StorageException;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.filename.PdsFolderSettings;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.MetastorageLifecycleListener;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadOnlyMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadWriteMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.tree.io.PageIO;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.processors.marshaller.MappedName;
+import org.apache.ignite.internal.processors.metric.MetricRegistry;
+import org.apache.ignite.internal.processors.metric.impl.LongAdderMetric;
+import org.apache.ignite.internal.util.GridBusyLock;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.distributed.InitMessage;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.tostring.GridToStringInclude;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.A;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.S;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.lang.IgniteUuid;
+import org.apache.ignite.thread.IgniteThreadPoolExecutor;
+import org.apache.ignite.thread.OomExceptionHandler;
+import org.jetbrains.annotations.Nullable;
+
+import static java.nio.file.StandardOpenOption.READ;
+import static org.apache.ignite.cluster.ClusterState.active;
+import static org.apache.ignite.configuration.IgniteConfiguration.DFLT_SNAPSHOT_DIRECTORY;
+import static org.apache.ignite.events.EventType.EVT_NODE_FAILED;
+import static org.apache.ignite.events.EventType.EVT_NODE_LEFT;
+import static org.apache.ignite.internal.IgniteFeatures.PERSISTENCE_CACHE_SNAPSHOT;
+import static org.apache.ignite.internal.IgniteFeatures.nodeSupports;
+import static org.apache.ignite.internal.MarshallerContextImpl.mappingFileStoreWorkDir;
+import static org.apache.ignite.internal.MarshallerContextImpl.saveMappings;
+import static org.apache.ignite.internal.events.DiscoveryCustomEvent.EVT_DISCOVERY_CUSTOM_EVT;
+import static org.apache.ignite.internal.managers.communication.GridIoPolicy.SYSTEM_POOL;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.INDEX_PARTITION;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.MAX_PARTITION_ID;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.resolveBinaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.INDEX_FILE_NAME;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.PART_FILE_TEMPLATE;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFile;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFileName;
+import static org.apache.ignite.internal.processors.cache.persistence.filename.PdsConsistentIdProcessor.DB_DEFAULT_FOLDER;
+import static org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId.getFlagByPartId;
+import static org.apache.ignite.internal.util.IgniteUtils.isLocalNodeCoordinator;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.END_SNAPSHOT;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.START_SNAPSHOT;
+
+/**
+ * Internal implementation of snapshot operations over persistence caches.
+ * <p>
+ * There are two major actions available:
+ * <ul>
+ *     <li>Create snapshot of the whole cluster cache groups by triggering PME to achieve consistency.</li>
+ *     <li>Create local snapshot of requested cache groups and send it to the node which request this operation.
+ *     Cache groups will be transmitted using internal API for transferring files. See {@link TransmissionHandler}.</li>
+ * </ul>
+ */
+public class IgniteSnapshotManager extends GridCacheSharedManagerAdapter
+    implements IgniteSnapshot, PartitionsExchangeAware, MetastorageLifecycleListener {
+    /** File with delta pages suffix. */
+    public static final String DELTA_SUFFIX = ".delta";
+
+    /** File name template consists of delta pages. */
+    public static final String PART_DELTA_TEMPLATE = PART_FILE_TEMPLATE + DELTA_SUFFIX;
+
+    /** File name template for index delta pages. */
+    public static final String INDEX_DELTA_NAME = INDEX_FILE_NAME + DELTA_SUFFIX;
+
+    /** Text Reason for checkpoint to start snapshot operation. */
+    public static final String CP_SNAPSHOT_REASON = "Checkpoint started to enforce snapshot operation: %s";
+
+    /** Name prefix for each remote snapshot operation. */
+    public static final String RMT_SNAPSHOT_PREFIX = "snapshot_";
+
+    /** Default snapshot directory for loading remote snapshots. */
+    public static final String DFLT_SNAPSHOT_TMP_DIR = "snp";
+
+    /** Timeout in millisecond for snapshot operations. */
+    public static final long DFLT_SNAPSHOT_TIMEOUT = 15_000L;
+
+    /** Snapshot in progress error message. */
+    public static final String SNP_IN_PROGRESS_ERR_MSG = "Operation rejected due to the snapshot operation in progress.";
+
+    /** Error message to finalize snapshot tasks. */
+    public static final String SNP_NODE_STOPPING_ERR_MSG = "Snapshot has been cancelled due to the local node " +
+        "is stopping";
+
+    /** Metastorage key to save currently running snapshot. */
+    public static final String SNP_RUNNING_KEY = "snapshot-running";
+
+    /** Snapshot metrics prefix. */
+    public static final String SNAPSHOT_METRICS = "snapshot";
+
+    /** Prefix for snapshot threads. */
+    private static final String SNAPSHOT_RUNNER_THREAD_PREFIX = "snapshot-runner";
+
+    /** Total number of thread to perform local snapshot. */
+    private static final int SNAPSHOT_THREAD_POOL_SIZE = 4;
+
+    /** Default snapshot topic to receive snapshots from remote node. */
+    private static final Object DFLT_INITIAL_SNAPSHOT_TOPIC = GridTopic.TOPIC_SNAPSHOT.topic("rmt_snp");
+
+    /** File transmission parameter of cache group id. */
+    private static final String SNP_GRP_ID_PARAM = "grpId";
+
+    /** File transmission parameter of cache partition id. */
+    private static final String SNP_PART_ID_PARAM = "partId";
+
+    /** File transmission parameter of node-sender directory path with its consistentId (e.g. db/IgniteNode0). */
+    private static final String SNP_DB_NODE_PATH_PARAM = "dbNodePath";
+
+    /** File transmission parameter of a cache directory with is currently sends its partitions. */
+    private static final String SNP_CACHE_DIR_NAME_PARAM = "cacheDirName";
+
+    /** Snapshot parameter name for a file transmission. */
+    private static final String SNP_NAME_PARAM = "snpName";
+
+    /** Total snapshot files count which receiver should expect to receive. */
+    private static final String SNP_PARTITIONS_CNT = "partsCnt";
+
+    /**
+     * Local buffer to perform copy-on-write operations with pages for {@code SnapshotFutureTask.PageStoreSerialWriter}s.
+     * It is important to have only only buffer per thread (instead of creating each buffer per
+     * each {@code SnapshotFutureTask.PageStoreSerialWriter}) this is redundant and can lead to OOM errors. Direct buffer
+     * deallocate only when ByteBuffer is garbage collected, but it can get out of off-heap memory before it.
+     */
+    private final ThreadLocal<ByteBuffer> locBuff;
+
+    /** Map of registered cache snapshot processes and their corresponding contexts. */
+    private final ConcurrentMap<String, SnapshotFutureTask> locSnpTasks = new ConcurrentHashMap<>();
+
+    /** Lock to protect the resources is used. */
+    private final GridBusyLock busyLock = new GridBusyLock();
+
+    /** Requested snapshot from remote node. */
+    private final AtomicReference<RemoteSnapshotFuture> rmtSnpReq = new AtomicReference<>();
+
+    /** Mutex used to order cluster snapshot operation progress. */
+    private final Object snpOpMux = new Object();
+
+    /** Take snapshot operation procedure. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> startSnpProc;
+
+    /** Check previously performed snapshot operation and delete uncompleted files if need. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> endSnpProc;
+
+    /** Resolved persistent data storage settings. */
+    private volatile PdsFolderSettings pdsSettings;
+
+    /** Fully initialized metastorage. */
+    private volatile ReadWriteMetastorage metaStorage;
+
+    /** Local snapshot sender factory. */
+    private Function<String, SnapshotSender> locSndrFactory = localSnapshotSenderFactory();
+
+    /** Main snapshot directory to save created snapshots. */
+    private volatile File locSnpDir;
+
+    /**
+     * Working directory for loaded snapshots from the remote nodes and storing
+     * temporary partition delta-files of locally started snapshot process.
+     */
+    private File tmpWorkDir;
+
+    /** Factory to working with delta as file storage. */
+    private volatile FileIOFactory ioFactory = new RandomAccessFileIOFactory();
+
+    /** Factory to create page store for restore. */
+    private volatile BiFunction<Integer, Boolean, FilePageStoreFactory> storeFactory;
+
+    /** Snapshot thread pool to perform local partition snapshots. */
+    private ExecutorService snpRunner;
+
+    /** System discovery message listener. */
+    private DiscoveryEventListener discoLsnr;
+
+    /** Cluster snapshot operation requested by user. */
+    private ClusterSnapshotFuture clusterSnpFut;
+
+    /** Current snapshot operation on local node. */
+    private volatile SnapshotOperationRequest clusterSnpRq;
 
 Review comment:
   Fixed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r409512534
 
 

 ##########
 File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManagerSelfTest.java
 ##########
 @@ -0,0 +1,770 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.nio.file.OpenOption;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.CountDownLatch;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.function.BiConsumer;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.cache.affinity.rendezvous.RendezvousAffinityFunction;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.IgniteInterruptedCheckedException;
+import org.apache.ignite.internal.TestRecordingCommunicationSpi;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionDemandMessage;
+import org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionState;
+import org.apache.ignite.internal.processors.cache.persistence.CheckpointState;
+import org.apache.ignite.internal.processors.cache.persistence.DbCheckpointListener;
+import org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIODecorator;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.filename.PdsFolderSettings;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.util.lang.GridAbsPredicate;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.testframework.GridTestUtils;
+import org.junit.Test;
+
+import static org.apache.ignite.internal.MarshallerContextImpl.mappingFileStoreWorkDir;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.resolveBinaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.cacheDirName;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.CP_SNAPSHOT_REASON;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.testframework.GridTestUtils.assertThrowsAnyCause;
+
+/**
+ * Default snapshot manager test.
+ */
+public class IgniteSnapshotManagerSelfTest extends AbstractSnapshotSelfTest {
+    /** @throws Exception If fails. */
+    @Test
+    public void testSnapshotLocalPartitions() throws Exception {
+        // Start grid node with data before each test.
+        IgniteEx ig = startGridWithCache(dfltCacheCfg, 2048);
+
+        // The following data will be included into checkpoint.
+        for (int i = 2048; i < 4096; i++)
+            ig.cache(DEFAULT_CACHE_NAME).put(i, new TestOrderItem(i, i));
+
+        for (int i = 4096; i < 8192; i++) {
+            ig.cache(DEFAULT_CACHE_NAME).put(i, new TestOrderItem(i, i) {
+                @Override public String toString() {
+                    return "_" + super.toString();
+                }
+            });
+        }
+
+        GridCacheSharedContext<?, ?> cctx = ig.context().cache().context();
+        IgniteSnapshotManager mgr = snp(ig);
+
+        // Collection of pairs group and appropriate cache partition to be snapshot.
+        IgniteInternalFuture<?> snpFut = startLocalSnapshotTask(cctx,
+            SNAPSHOT_NAME,
+            F.asMap(CU.cacheId(DEFAULT_CACHE_NAME), null),
+            mgr.localSnapshotSenderFactory().apply(SNAPSHOT_NAME));
+
+        snpFut.get();
+
+        File cacheWorkDir = ((FilePageStoreManager)ig.context()
+            .cache()
+            .context()
+            .pageStore())
+            .cacheWorkDir(dfltCacheCfg);
+
+        // Checkpoint forces on cluster deactivation (currently only single node in cluster),
+        // so we must have the same data in snapshot partitions and those which left
+        // after node stop.
+        stopGrid(ig.name());
+
+        // Calculate CRCs.
+        IgniteConfiguration cfg = ig.context().config();
+        PdsFolderSettings settings = ig.context().pdsFolderResolver().resolveFolders();
+        String nodePath = databaseRelativePath(settings.folderName());
+        File binWorkDir = resolveBinaryWorkDir(cfg.getWorkDirectory(), settings.folderName());
+        File marshWorkDir = mappingFileStoreWorkDir(U.workDirectory(cfg.getWorkDirectory(), cfg.getIgniteHome()));
+        File snpBinWorkDir = resolveBinaryWorkDir(mgr.snapshotLocalDir(SNAPSHOT_NAME).getAbsolutePath(), settings.folderName());
+        File snpMarshWorkDir = mappingFileStoreWorkDir(mgr.snapshotLocalDir(SNAPSHOT_NAME).getAbsolutePath());
+
+        final Map<String, Integer> origPartCRCs = calculateCRC32Partitions(cacheWorkDir);
+        final Map<String, Integer> snpPartCRCs = calculateCRC32Partitions(
+            FilePageStoreManager.cacheWorkDir(U.resolveWorkDirectory(mgr.snapshotLocalDir(SNAPSHOT_NAME)
+                    .getAbsolutePath(),
+                nodePath,
+                false),
+                cacheDirName(dfltCacheCfg)));
+
+        assertEquals("Partitions must have the same CRC after file copying and merging partition delta files",
+            origPartCRCs, snpPartCRCs);
+        assertEquals("Binary object mappings must be the same for local node and created snapshot",
+            calculateCRC32Partitions(binWorkDir), calculateCRC32Partitions(snpBinWorkDir));
+        assertEquals("Marshaller meta mast be the same for local node and created snapshot",
+            calculateCRC32Partitions(marshWorkDir), calculateCRC32Partitions(snpMarshWorkDir));
+
+        File snpWorkDir = mgr.snapshotTmpDir();
+
+        assertEquals("Snapshot working directory must be cleaned after usage", 0, snpWorkDir.listFiles().length);
+    }
+
+    /**
+     * Test that all partitions are copied successfully even after multiple checkpoints occur during
+     * the long copy of cache partition files.
+     *
+     * Data consistency checked through a test node started right from snapshot directory and all values
+     * read successes.
+     *
+     * @throws Exception If fails.
+     */
+    @Test
+    public void testSnapshotLocalPartitionMultiCpWithLoad() throws Exception {
+        int valMultiplier = 2;
+        CountDownLatch slowCopy = new CountDownLatch(1);
+
+        // Start grid node with data before each test.
+        IgniteEx ig = startGrid(0);
+
+        ig.cluster().baselineAutoAdjustEnabled(false);
+        ig.cluster().state(ClusterState.ACTIVE);
+        GridCacheSharedContext<?, ?> cctx = ig.context().cache().context();
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            ig.cache(DEFAULT_CACHE_NAME).put(i, new TestOrderItem(i, i));
+
+        forceCheckpoint(ig);
+
+        AtomicInteger cntr = new AtomicInteger();
+        CountDownLatch ldrLatch = new CountDownLatch(1);
+        IgniteSnapshotManager mgr = snp(ig);
+        GridCacheDatabaseSharedManager db = (GridCacheDatabaseSharedManager)cctx.database();
+
+        IgniteInternalFuture<?> loadFut = GridTestUtils.runMultiThreadedAsync(() -> {
+            try {
+                U.await(ldrLatch);
+
+                while (!Thread.currentThread().isInterrupted())
+                    ig.cache(DEFAULT_CACHE_NAME).put(cntr.incrementAndGet(),
+                        new TestOrderItem(cntr.incrementAndGet(), cntr.incrementAndGet()));
+            }
+            catch (IgniteInterruptedCheckedException e) {
+                log.warning("Loader has been interrupted", e);
+            }
+        }, 5, "cache-loader-");
+
+        // Register task but not schedule it on the checkpoint.
+        SnapshotFutureTask snpFutTask = mgr.registerSnapshotTask(SNAPSHOT_NAME,
+            cctx.localNodeId(),
+            F.asMap(CU.cacheId(DEFAULT_CACHE_NAME), null),
+            new DelegateSnapshotSender(log, mgr.snapshotExecutorService(), mgr.localSnapshotSenderFactory().apply(SNAPSHOT_NAME)) {
+                @Override public void sendPart0(File part, String cacheDirName, GroupPartitionId pair, Long length) {
+                    try {
+                        U.await(slowCopy);
+
+                        delegate.sendPart0(part, cacheDirName, pair, length);
+                    }
+                    catch (IgniteInterruptedCheckedException e) {
+                        throw new IgniteException(e);
+                    }
+                }
+            });
+
+        db.addCheckpointListener(new DbCheckpointListener() {
+            /** {@inheritDoc} */
+            @Override public void beforeCheckpointBegin(Context ctx) {
+                // No-op.
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onMarkCheckpointBegin(Context ctx) {
+                // No-op.
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onCheckpointBegin(Context ctx) {
+                Map<Integer, Set<Integer>> processed = GridTestUtils.getFieldValue(snpFutTask,
+                    SnapshotFutureTask.class,
+                    "processed");
+
+                if (!processed.isEmpty())
+                    ldrLatch.countDown();
+            }
+        });
+
+        try {
+            snpFutTask.start();
+
+            // Change data before snapshot creation which must be included into it witch correct value multiplier.
+            for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+                ig.cache(DEFAULT_CACHE_NAME).put(i, new TestOrderItem(i, valMultiplier * i));
+
+            // Snapshot is still in the INIT state. beforeCheckpoint has been skipped
+            // due to checkpoint already running and we need to schedule the next one
+            // right after current will be completed.
+            cctx.database().forceCheckpoint(String.format(CP_SNAPSHOT_REASON, SNAPSHOT_NAME));
+
+            snpFutTask.awaitStarted();
+
+            db.forceCheckpoint("snapshot is ready to be created")
+                .futureFor(CheckpointState.MARKER_STORED_TO_DISK)
+                .get();
+
+            // Change data after snapshot.
+            for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+                ig.cache(DEFAULT_CACHE_NAME).put(i, new TestOrderItem(i, 3 * i));
+
+            // Snapshot on the next checkpoint must copy page to delta file before write it to a partition.
+            forceCheckpoint(ig);
+
+            slowCopy.countDown();
+
+            snpFutTask.get();
+        }
+        finally {
+            loadFut.cancel();
+        }
+
+        // Now can stop the node and check created snapshots.
+        stopGrid(0);
+
+        cleanPersistenceDir(ig.name());
+
+        // Start Ignite instance from snapshot directory.
+        IgniteEx ig2 = startGridsFromSnapshot(1, SNAPSHOT_NAME);
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++) {
+            assertEquals("snapshot data consistency violation [key=" + i + ']',
+                i * valMultiplier, ((TestOrderItem)ig2.cache(DEFAULT_CACHE_NAME).get(i)).value);
+        }
+    }
+
+    /** @throws Exception If fails. */
+    @Test
+    public void testSnapshotLocalPartitionNotEnoughSpace() throws Exception {
+        String err_msg = "Test exception. Not enough space.";
+        AtomicInteger throwCntr = new AtomicInteger();
+        RandomAccessFileIOFactory ioFactory = new RandomAccessFileIOFactory();
+
+        IgniteEx ig = startGridWithCache(dfltCacheCfg.setAffinity(new ZeroPartitionAffinityFunction()),
+            CACHE_KEYS_RANGE);
+
+        // Change data after backup.
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            ig.cache(DEFAULT_CACHE_NAME).put(i, 2 * i);
+
+        GridCacheSharedContext<?, ?> cctx0 = ig.context().cache().context();
+
+        IgniteSnapshotManager mgr = snp(ig);
+
+        mgr.ioFactory(new FileIOFactory() {
+            @Override public FileIO create(File file, OpenOption... modes) throws IOException {
+                FileIO fileIo = ioFactory.create(file, modes);
+
+                if (file.getName().equals(IgniteSnapshotManager.partDeltaFileName(0)))
+                    return new FileIODecorator(fileIo) {
+                        @Override public int writeFully(ByteBuffer srcBuf) throws IOException {
+                            if (throwCntr.incrementAndGet() == 3)
+                                throw new IOException(err_msg);
+
+                            return super.writeFully(srcBuf);
+                        }
+                    };
+
+                return fileIo;
+            }
+        });
+
+        IgniteInternalFuture<?> snpFut = startLocalSnapshotTask(cctx0,
+            SNAPSHOT_NAME,
+            F.asMap(CU.cacheId(DEFAULT_CACHE_NAME), null),
+            mgr.localSnapshotSenderFactory().apply(SNAPSHOT_NAME));
+
+        // Check the right exception thrown.
+        assertThrowsAnyCause(log,
+            snpFut::get,
+            IOException.class,
+            err_msg);
+    }
+
+    /** @throws Exception If fails. */
+    @Test
+    public void testSnapshotCreateLocalCopyPartitionFail() throws Exception {
+        String err_msg = "Test. Fail to copy partition: ";
+        IgniteEx ig = startGridWithCache(dfltCacheCfg, CACHE_KEYS_RANGE);
+
+        Map<Integer, Set<Integer>> parts = new HashMap<>();
+        parts.put(CU.cacheId(DEFAULT_CACHE_NAME), new HashSet<>(Collections.singletonList(0)));
+
+        IgniteSnapshotManager mgr0 = snp(ig);
+
+        IgniteInternalFuture<?> fut = startLocalSnapshotTask(ig.context().cache().context(),
+            SNAPSHOT_NAME,
+            parts,
+            new DelegateSnapshotSender(log, mgr0.snapshotExecutorService(),
+                mgr0.localSnapshotSenderFactory().apply(SNAPSHOT_NAME)) {
+                @Override public void sendPart0(File part, String cacheDirName, GroupPartitionId pair, Long length) {
+                    if (pair.getPartitionId() == 0)
+                        throw new IgniteException(err_msg + pair);
+
+                    delegate.sendPart0(part, cacheDirName, pair, length);
+                }
+            });
+
+        assertThrowsAnyCause(log,
+            fut::get,
+            IgniteException.class,
+            err_msg);
+    }
+
+    /** @throws Exception If fails. */
+    @Test
+    public void testSnapshotRemotePartitionsWithLoad() throws Exception {
+        IgniteEx ig0 = startGrids(2);
+
+        ig0.cluster().state(ClusterState.ACTIVE);
+
+        AtomicInteger cntr = new AtomicInteger();
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            ig0.cache(DEFAULT_CACHE_NAME).put(i, cntr.incrementAndGet());
+
+        GridCacheSharedContext<?, ?> cctx1 = grid(1).context().cache().context();
+        GridCacheDatabaseSharedManager db1 = (GridCacheDatabaseSharedManager)cctx1.database();
+
+        forceCheckpoint();
+
+        Map<String, Integer> rmtPartCRCs = new HashMap<>();
+        CountDownLatch cancelLatch = new CountDownLatch(1);
+
+        db1.addCheckpointListener(new DbCheckpointListener() {
+            /** {@inheritDoc} */
+            @Override public void beforeCheckpointBegin(Context ctx) {
+                //No-op.
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onMarkCheckpointBegin(Context ctx) {
+                // No-op.
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onCheckpointBegin(Context ctx) {
+                SnapshotFutureTask task = cctx1.snapshotMgr().lastScheduledRemoteSnapshotTask(grid(0).localNode().id());
+
+                // Skip first remote snapshot creation due to it will be cancelled.
+                if (task == null || cancelLatch.getCount() > 0)
+                    return;
+
+                Map<Integer, Set<Integer>> processed = GridTestUtils.getFieldValue(task,
+                    SnapshotFutureTask.class,
+                    "processed");
+
+                if (!processed.isEmpty()) {
+                    assert rmtPartCRCs.isEmpty();
+
+                    // Calculate actual partition CRCs when the checkpoint will be finished on this node.
+                    ctx.finishedStateFut().listen(f -> {
+                        File cacheWorkDir = ((FilePageStoreManager)grid(1).context().cache().context().pageStore())
+                            .cacheWorkDir(dfltCacheCfg);
+
+                        rmtPartCRCs.putAll(calculateCRC32Partitions(cacheWorkDir));
+                    });
+                }
+            }
+        });
+
+        IgniteSnapshotManager mgr0 = snp(ig0);
+
+        UUID rmtNodeId = grid(1).localNode().id();
+        Map<String, Integer> snpPartCRCs = new HashMap<>();
+
+        Map<Integer, Set<Integer>> parts = new HashMap<>();
+        parts.put(CU.cacheId(DEFAULT_CACHE_NAME), null);
+
+        IgniteInternalFuture<?> loadFut = GridTestUtils.runMultiThreadedAsync(() -> {
+            while (!Thread.currentThread().isInterrupted())
+                ig0.cache(DEFAULT_CACHE_NAME).put(cntr.incrementAndGet(), cntr.incrementAndGet());
+        }, 5, "cache-loader-");
+
+        try {
+            // Snapshot must be taken on node1 and transmitted to node0.
+            IgniteInternalFuture<?> fut = mgr0.requestRemoteSnapshot(rmtNodeId,
+                parts,
+                new BiConsumer<File, GroupPartitionId>() {
+                    @Override public void accept(File file, GroupPartitionId gprPartId) {
+                        log.info("Snapshot partition received successfully [rmtNodeId=" + rmtNodeId +
+                            ", part=" + file.getAbsolutePath() + ", gprPartId=" + gprPartId + ']');
+
+                        cancelLatch.countDown();
+                    }
+                });
+
+            cancelLatch.await();
+
+            fut.cancel();
+
+            IgniteInternalFuture<?> fut2 = mgr0.requestRemoteSnapshot(rmtNodeId,
+                parts,
+                (part, pair) -> {
+                    try {
+                        snpPartCRCs.put(part.getName(), FastCrc.calcCrc(part));
+                    }
+                    catch (IOException e) {
+                        throw new IgniteException(e);
+                    }
+                });
+
+            fut2.get();
+        }
+        finally {
+            loadFut.cancel();
+        }
+
+        assertEquals("Partitions from remote node must have the same CRCs as those which have been received",
+            rmtPartCRCs, snpPartCRCs);
+    }
+
+    /** @throws Exception If fails. */
+    @Test
+    public void testSnapshotRemoteOnBothNodes() throws Exception {
+        IgniteEx ig0 = startGrids(2);
+
+        ig0.cluster().state(ClusterState.ACTIVE);
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            ig0.cache(DEFAULT_CACHE_NAME).put(i, i);
+
+        forceCheckpoint(ig0);
+
+        IgniteSnapshotManager mgr0 = snp(ig0);
+        IgniteSnapshotManager mgr1 = snp(grid(1));
+
+        UUID node0 = grid(0).localNode().id();
+        UUID node1 = grid(1).localNode().id();
+
+        Map<Integer, Set<Integer>> fromNode1 = owningParts(ig0,
+            new HashSet<>(Collections.singletonList(CU.cacheId(DEFAULT_CACHE_NAME))),
+            node1);
+
+        Map<Integer, Set<Integer>> fromNode0 = owningParts(grid(1),
+            new HashSet<>(Collections.singletonList(CU.cacheId(DEFAULT_CACHE_NAME))),
+            node0);
+
+        // Snapshot must be taken on node1 and transmitted to node0.
+        IgniteInternalFuture<?> futFrom1To0 = mgr0.requestRemoteSnapshot(node1, fromNode1,
+            (part, pair) -> assertTrue("Received partition has not been requested", fromNode1.get(pair.getGroupId())
+                    .remove(pair.getPartitionId())));
+        IgniteInternalFuture<?> futFrom0To1 = mgr1.requestRemoteSnapshot(node0, fromNode0,
+            (part, pair) -> assertTrue("Received partition has not been requested", fromNode0.get(pair.getGroupId())
+                .remove(pair.getPartitionId())));
+
+        futFrom0To1.get();
+        futFrom1To0.get();
+
+        assertTrue("Not all of partitions have been received: " + fromNode1,
+            fromNode1.get(CU.cacheId(DEFAULT_CACHE_NAME)).isEmpty());
+        assertTrue("Not all of partitions have been received: " + fromNode0,
+            fromNode0.get(CU.cacheId(DEFAULT_CACHE_NAME)).isEmpty());
+    }
+
+    /** @throws Exception If fails. */
+    @Test(expected = ClusterTopologyCheckedException.class)
+    public void testRemoteSnapshotRequestedNodeLeft() throws Exception {
+        IgniteEx ig0 = startGridWithCache(dfltCacheCfg, CACHE_KEYS_RANGE);
+        IgniteEx ig1 = startGrid(1);
+
+        ig0.cluster().setBaselineTopology(ig0.cluster().forServers().nodes());
+
+        awaitPartitionMapExchange();
+
+        CountDownLatch hold = new CountDownLatch(1);
+
+        ((GridCacheDatabaseSharedManager)ig1.context().cache().context().database())
+            .addCheckpointListener(new DbCheckpointListener() {
+                /** {@inheritDoc} */
+                @Override public void beforeCheckpointBegin(Context ctx) throws IgniteCheckedException {
+                    // Listener will be executed inside the checkpoint thead.
+                    U.await(hold);
+                }
+
+                /** {@inheritDoc} */
+                @Override public void onMarkCheckpointBegin(Context ctx) {
+                    // No-op.
+                }
+
+                /** {@inheritDoc} */
+                @Override public void onCheckpointBegin(Context ctx) {
+                    // No-op.
+                }
+            });
+
+        UUID rmtNodeId = ig1.localNode().id();
+
+        Map<Integer, Set<Integer>> parts = new HashMap<>();
+        parts.put(CU.cacheId(DEFAULT_CACHE_NAME), null);
+
+        snp(ig0).requestRemoteSnapshot(rmtNodeId, parts, (part, grp) -> {});
+
+        IgniteInternalFuture<?>[] futs = new IgniteInternalFuture[1];
+
+        assertTrue(GridTestUtils.waitForCondition(new GridAbsPredicate() {
+            @Override public boolean apply() {
+                IgniteInternalFuture<Boolean> snpFut = snp(ig1)
+                    .lastScheduledRemoteSnapshotTask(ig0.localNode().id());
+
+                if (snpFut == null)
+                    return false;
+                else
+                    futs[0] = snpFut;
+
+                return true;
+            }
+        }, 5_000L));
+
+        stopGrid(0);
+
+        hold.countDown();
+
+        futs[0].get();
+    }
+
+    /**
+     * <pre>
+     * 1. Start 2 nodes.
+     * 2. Request snapshot from 2-nd node
+     * 3. Block snapshot-request message.
+     * 4. Start 3-rd node and change BLT.
+     * 5. Stop 3-rd node and change BLT.
+     * 6. 2-nd node now have MOVING partitions to be preloaded.
+     * 7. Release snapshot-request message.
+     * 8. Should get an error of snapshot creation since MOVING partitions cannot be snapshot.
+     * </pre>
+     *
+     * @throws Exception If fails.
+     */
+    @Test(expected = IgniteCheckedException.class)
+    public void testRemoteOutdatedSnapshot() throws Exception {
+        IgniteEx ig0 = startGrids(2);
+
+        ig0.cluster().state(ClusterState.ACTIVE);
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            ig0.cache(DEFAULT_CACHE_NAME).put(i, i);
+
+        awaitPartitionMapExchange();
+
+        forceCheckpoint();
+
+        TestRecordingCommunicationSpi.spi(ig0)
+            .blockMessages((node, msg) -> msg instanceof SnapshotRequestMessage);
+
+        UUID rmtNodeId = grid(1).localNode().id();
+
+        IgniteSnapshotManager mgr0 = snp(ig0);
+
+        // Snapshot must be taken on node1 and transmitted to node0.
+        IgniteInternalFuture<?> snpFut = mgr0.requestRemoteSnapshot(rmtNodeId,
+            owningParts(ig0, new HashSet<>(Collections.singletonList(CU.cacheId(DEFAULT_CACHE_NAME))), rmtNodeId),
+            (part, grp) -> {});
+
+        TestRecordingCommunicationSpi.spi(ig0)
+            .waitForBlocked();
+
+        startGrid(2);
+
+        ig0.cluster().setBaselineTopology(ig0.cluster().forServers().nodes());
+
+        awaitPartitionMapExchange();
+
+        stopGrid(2);
+
+        TestRecordingCommunicationSpi.spi(grid(1))
+            .blockMessages((node, msg) ->  msg instanceof GridDhtPartitionDemandMessage);
+
+        ig0.cluster().setBaselineTopology(ig0.cluster().forServers().nodes());
+
+        TestRecordingCommunicationSpi.spi(ig0)
+            .stopBlock(true, obj -> obj.get2().message() instanceof SnapshotRequestMessage);
+
+        snpFut.get();
+    }
+
+    /** @throws Exception If fails. */
+    @Test(expected = IgniteCheckedException.class)
+    public void testLocalSnapshotOnCacheStopped() throws Exception {
+        IgniteEx ig = startGridWithCache(dfltCacheCfg, CACHE_KEYS_RANGE);
+
+        startGrid(1);
+
+        ig.cluster().state(ClusterState.ACTIVE);
+
+        awaitPartitionMapExchange();
+
+        GridCacheSharedContext<?, ?> cctx0 = ig.context().cache().context();
+        IgniteSnapshotManager mgr = snp(ig);
+
+        CountDownLatch cpLatch = new CountDownLatch(1);
+
+        IgniteInternalFuture<?> snpFut = startLocalSnapshotTask(cctx0,
+            SNAPSHOT_NAME,
+            F.asMap(CU.cacheId(DEFAULT_CACHE_NAME), null),
+            new DelegateSnapshotSender(log, mgr.snapshotExecutorService(), mgr.localSnapshotSenderFactory().apply(SNAPSHOT_NAME)) {
+                @Override public void sendPart0(File part, String cacheDirName, GroupPartitionId pair, Long length) {
+                    try {
+                        U.await(cpLatch);
+
+                            delegate.sendPart0(part, cacheDirName, pair, length);
+                        } catch (IgniteInterruptedCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                    }
+                });
+
+        IgniteCache<?, ?> cache = ig.getOrCreateCache(DEFAULT_CACHE_NAME);
+
+        cache.destroy();
+
+        cpLatch.countDown();
+
+        snpFut.get(5_000, TimeUnit.MILLISECONDS);
+    }
+
+    /**
+     * @param src Source node to calculate.
+     * @param grps Groups to collect owning parts.
+     * @param rmtNodeId Remote node id.
+     * @return Map of collected parts.
+     */
+    private static Map<Integer, Set<Integer>> owningParts(IgniteEx src, Set<Integer> grps, UUID rmtNodeId) {
+        Map<Integer, Set<Integer>> result = new HashMap<>();
+
+        for (Integer grpId : grps) {
+            Set<Integer> parts = src.context()
+                .cache()
+                .cacheGroup(grpId)
+                .topology()
+                .partitions(rmtNodeId)
+                .entrySet()
+                .stream()
+                .filter(p -> p.getValue() == GridDhtPartitionState.OWNING)
+                .map(Map.Entry::getKey)
+                .collect(Collectors.toSet());
+
+            result.put(grpId, parts);
+        }
+
+        return result;
+    }
+
+    /**
+     * @param snpName Unique snapshot name.
+     * @param parts Collection of pairs group and appropriate cache partition to be snapshot.
+     * @param snpSndr Sender which used for snapshot sub-task processing.
+     * @return Future which will be completed when snapshot is done.
+     */
+    private static SnapshotFutureTask startLocalSnapshotTask(
+        GridCacheSharedContext<?, ?> cctx,
+        String snpName,
+        Map<Integer, Set<Integer>> parts,
+        SnapshotSender snpSndr
+    ) throws IgniteCheckedException{
+        SnapshotFutureTask snpFutTask = cctx.snapshotMgr().registerSnapshotTask(snpName, cctx.localNodeId(), parts, snpSndr);
+
+        snpFutTask.start();
+
+        // Snapshot is still in the INIT state. beforeCheckpoint has been skipped
+        // due to checkpoint already running and we need to schedule the next one
+        // right after current will be completed.
+        cctx.database().forceCheckpoint(String.format(CP_SNAPSHOT_REASON, snpName));
+
+        snpFutTask.awaitStarted();
+
+        return snpFutTask;
+    }
+
+    /** */
+    private static class ZeroPartitionAffinityFunction extends RendezvousAffinityFunction {
+        @Override public int partition(Object key) {
+            return 0;
+        }
+    }
+
+    /** */
+    private static class TestOrderItem implements Serializable {
+        /** Serial version. */
+        private static final long serialVersionUID = 0L;
+
+        /** Order key. */
+        private final int key;
+
+        /** Order value. */
+        private final int value;
+
+        public TestOrderItem(int key, int value) {
 
 Review comment:
   Comment is absent

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r407978321
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManager.java
 ##########
 @@ -0,0 +1,1894 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.channels.FileChannel;
+import java.nio.file.FileVisitResult;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.SimpleFileVisitor;
+import java.nio.file.attribute.BasicFileAttributes;
+import java.util.ArrayDeque;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Queue;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.Executor;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.concurrent.atomic.LongAdder;
+import java.util.function.BiConsumer;
+import java.util.function.BiFunction;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.events.DiscoveryEvent;
+import org.apache.ignite.failure.FailureContext;
+import org.apache.ignite.failure.FailureType;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.GridTopic;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.events.DiscoveryCustomEvent;
+import org.apache.ignite.internal.managers.communication.GridIoManager;
+import org.apache.ignite.internal.managers.communication.GridMessageListener;
+import org.apache.ignite.internal.managers.communication.TransmissionCancelledException;
+import org.apache.ignite.internal.managers.communication.TransmissionHandler;
+import org.apache.ignite.internal.managers.communication.TransmissionMeta;
+import org.apache.ignite.internal.managers.communication.TransmissionPolicy;
+import org.apache.ignite.internal.managers.discovery.DiscoCache;
+import org.apache.ignite.internal.managers.discovery.DiscoveryCustomMessage;
+import org.apache.ignite.internal.managers.discovery.GridDiscoveryManager;
+import org.apache.ignite.internal.managers.eventstorage.DiscoveryEventListener;
+import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.CacheType;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.PartitionsExchangeAware;
+import org.apache.ignite.internal.processors.cache.persistence.StorageException;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.filename.PdsFolderSettings;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.MetastorageLifecycleListener;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadOnlyMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadWriteMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.tree.io.PageIO;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.processors.marshaller.MappedName;
+import org.apache.ignite.internal.processors.metric.impl.LongAdderMetric;
+import org.apache.ignite.internal.util.GridBusyLock;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.distributed.InitMessage;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.tostring.GridToStringInclude;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.S;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.lang.IgniteUuid;
+import org.apache.ignite.thread.IgniteThreadPoolExecutor;
+import org.apache.ignite.thread.OomExceptionHandler;
+import org.jetbrains.annotations.Nullable;
+
+import static java.nio.file.StandardOpenOption.READ;
+import static org.apache.ignite.cluster.ClusterState.active;
+import static org.apache.ignite.configuration.IgniteConfiguration.DFLT_SNAPSHOT_DIRECTORY;
+import static org.apache.ignite.events.EventType.EVT_NODE_FAILED;
+import static org.apache.ignite.events.EventType.EVT_NODE_LEFT;
+import static org.apache.ignite.internal.IgniteFeatures.PERSISTENCE_CACHE_SNAPSHOT;
+import static org.apache.ignite.internal.IgniteFeatures.nodeSupports;
+import static org.apache.ignite.internal.MarshallerContextImpl.saveMappings;
+import static org.apache.ignite.internal.events.DiscoveryCustomEvent.EVT_DISCOVERY_CUSTOM_EVT;
+import static org.apache.ignite.internal.managers.communication.GridIoPolicy.SYSTEM_POOL;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.INDEX_PARTITION;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.MAX_PARTITION_ID;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.INDEX_FILE_NAME;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.PART_FILE_TEMPLATE;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFile;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFileName;
+import static org.apache.ignite.internal.processors.cache.persistence.filename.PdsConsistentIdProcessor.DB_DEFAULT_FOLDER;
+import static org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId.getFlagByPartId;
+import static org.apache.ignite.internal.util.IgniteUtils.isLocalNodeCoordinator;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.END_SNAPSHOT;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.START_SNAPSHOT;
+
+/**
+ * Internal implementation of snapshot operations over persistence caches.
+ * <p>
+ * There are two major actions available:
+ * <ul>
+ *     <li>Create snapshot of the whole cluster cache groups by triggering PME to achieve consistency.</li>
+ *     <li>Create local snapshot of requested cache groups and send it to the node which request this operation.
+ *     Cache groups will be transmitted using internal API for transferring files. See {@link TransmissionHandler}.</li>
+ * </ul>
+ */
+public class IgniteSnapshotManager extends GridCacheSharedManagerAdapter
+    implements IgniteSnapshot, PartitionsExchangeAware, MetastorageLifecycleListener {
+    /** File with delta pages suffix. */
+    public static final String DELTA_SUFFIX = ".delta";
+
+    /** File name template consists of delta pages. */
+    public static final String PART_DELTA_TEMPLATE = PART_FILE_TEMPLATE + DELTA_SUFFIX;
+
+    /** File name template for index delta pages. */
+    public static final String INDEX_DELTA_NAME = INDEX_FILE_NAME + DELTA_SUFFIX;
+
+    /** Text Reason for checkpoint to start snapshot operation. */
+    public static final String CP_SNAPSHOT_REASON = "Checkpoint started to enforce snapshot operation: %s";
+
+    /** Name prefix for each remote snapshot operation. */
+    public static final String RMT_SNAPSHOT_PREFIX = "snapshot_";
+
+    /** Default snapshot directory for loading remote snapshots. */
+    public static final String DFLT_SNAPSHOT_TMP_DIR = "snp";
+
+    /** Timeout in millisecond for snapshot operations. */
+    public static final long DFLT_SNAPSHOT_TIMEOUT = 15_000L;
+
+    /** Snapshot in progress error message. */
+    public static final String SNP_IN_PROGRESS_ERR_MSG = "Operation rejected due to the snapshot operation in progress.";
+
+    /** Error message to finalize snapshot tasks. */
+    public static final String SNP_NODE_STOPPING_ERR_MSG = "Snapshot has been cancelled due to the local node " +
+        "is stopping";
+
+    /** Metastorage key to save currently running snapshot. */
+    public static final String SNP_RUNNING_KEY = "snapshot-running";
+
+    /** Prefix for snapshot threads. */
+    private static final String SNAPSHOT_RUNNER_THREAD_PREFIX = "snapshot-runner";
+
+    /** Total number of thread to perform local snapshot. */
+    private static final int SNAPSHOT_THREAD_POOL_SIZE = 4;
+
+    /** Default snapshot topic to receive snapshots from remote node. */
+    private static final Object DFLT_INITIAL_SNAPSHOT_TOPIC = GridTopic.TOPIC_SNAPSHOT.topic("rmt_snp");
+
+    /** File transmission parameter of cache group id. */
+    private static final String SNP_GRP_ID_PARAM = "grpId";
+
+    /** File transmission parameter of cache partition id. */
+    private static final String SNP_PART_ID_PARAM = "partId";
+
+    /** File transmission parameter of node-sender directory path with its consistentId (e.g. db/IgniteNode0). */
+    private static final String SNP_DB_NODE_PATH_PARAM = "dbNodePath";
+
+    /** File transmission parameter of a cache directory with is currently sends its partitions. */
+    private static final String SNP_CACHE_DIR_NAME_PARAM = "cacheDirName";
+
+    /** Snapshot parameter name for a file transmission. */
+    private static final String SNP_NAME_PARAM = "snpName";
+
+    /** Total snapshot files count which receiver should expect to receive. */
+    private static final String SNP_PARTITIONS_CNT = "partsCnt";
+
+    /**
+     * Local buffer to perform copy-on-write operations with pages for {@code SnapshotFutureTask.PageStoreSerialWriter}s.
+     * It is important to have only only buffer per thread (instead of creating each buffer per
+     * each {@code SnapshotFutureTask.PageStoreSerialWriter}) this is redundant and can lead to OOM errors. Direct buffer
+     * deallocate only when ByteBuffer is garbage collected, but it can get out of off-heap memory before it.
+     */
+    private final ThreadLocal<ByteBuffer> locBuff;
+
+    /** Map of registered cache snapshot processes and their corresponding contexts. */
+    private final ConcurrentMap<String, SnapshotFutureTask> locSnpTasks = new ConcurrentHashMap<>();
+
+    /** Lock to protect the resources is used. */
+    private final GridBusyLock busyLock = new GridBusyLock();
+
+    /** Requested snapshot from remote node. */
+    private final AtomicReference<RemoteSnapshotFuture> rmtSnpReq = new AtomicReference<>();
+
+    /** Mutex used to order cluster snapshot operation progress. */
+    private final Object snpOpMux = new Object();
+
+    /** Take snapshot operation procedure. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> startSnpProc;
+
+    /** Check previously preformed snapshot operation and delete uncompleted files if need. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> endSnpProc;
+
+    /** Resolved persistent data storage settings. */
+    private volatile PdsFolderSettings pdsSettings;
+
+    /** Fully initialized metastorage. */
+    private volatile ReadWriteMetastorage metaStorage;
+
+    /** Local snapshot sender factory. */
+    private Function<String, SnapshotSender> locSndrFactory = this::localSnapshotSender;
+
+    /** Main snapshot directory to save created snapshots. */
+    private volatile File locSnpDir;
+
+    /**
+     * Working directory for loaded snapshots from the remote nodes and storing
+     * temporary partition delta-files of locally started snapshot process.
+     */
+    private File tmpWorkDir;
+
+    /** Factory to working with delta as file storage. */
+    private volatile FileIOFactory ioFactory = new RandomAccessFileIOFactory();
+
+    /** Factory to create page store for restore. */
+    private volatile BiFunction<Integer, Boolean, FilePageStoreFactory> storeFactory;
+
+    /** Snapshot thread pool to perform local partition snapshots. */
+    private ExecutorService snpRunner;
+
+    /** System discovery message listener. */
+    private DiscoveryEventListener discoLsnr;
+
+    /** Cluster snapshot operation requested by user. */
+    private GridFutureAdapter<Void> clusterSnpFut;
+
+    /** Current snapshot operation on local node. */
+    private volatile SnapshotOperationRequest clusterSnpRq;
+
+    /** {@code true} if recovery process occurred for snapshot. */
+    private volatile boolean recovered;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public IgniteSnapshotManager(GridKernalContext ctx) {
+        locBuff = ThreadLocal.withInitial(() ->
+            ByteBuffer.allocateDirect(ctx.config().getDataStorageConfiguration().getPageSize())
+                .order(ByteOrder.nativeOrder()));
+
+        startSnpProc = new DistributedProcess<>(ctx, START_SNAPSHOT, this::startLocalSnapshot,
+            this::startLocalSnapshotResult);
+
+        endSnpProc = new DistributedProcess<>(ctx, END_SNAPSHOT, this::endLocalSnapshot,
+            this::endLocalSnapshotResult);
+    }
+
+    /**
+     * @param snapshotCacheDir Snapshot directory to store files.
+     * @param partId Cache partition identifier.
+     * @return A file representation.
+     */
+    public static File partDeltaFile(File snapshotCacheDir, int partId) {
+        return new File(snapshotCacheDir, partDeltaFileName(partId));
+    }
+
+    /**
+     * @param partId Partition id.
+     * @return File name of delta partition pages.
+     */
+    public static String partDeltaFileName(int partId) {
+        assert partId <= MAX_PARTITION_ID || partId == INDEX_PARTITION;
+
+        return partId == INDEX_PARTITION ? INDEX_DELTA_NAME : String.format(PART_DELTA_TEMPLATE, partId);
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void start0() throws IgniteCheckedException {
+        super.start0();
+
+        GridKernalContext ctx = cctx.kernalContext();
+
+        if (ctx.clientNode())
+            return;
+
+        if (!CU.isPersistenceEnabled(ctx.config()))
+            return;
+
+        snpRunner = new IgniteThreadPoolExecutor(SNAPSHOT_RUNNER_THREAD_PREFIX,
+            cctx.igniteInstanceName(),
+            SNAPSHOT_THREAD_POOL_SIZE,
+            SNAPSHOT_THREAD_POOL_SIZE,
+            IgniteConfiguration.DFLT_THREAD_KEEP_ALIVE_TIME,
+            new LinkedBlockingQueue<>(),
+            SYSTEM_POOL,
+            new OomExceptionHandler(ctx));
+
+        assert cctx.pageStore() instanceof FilePageStoreManager;
+
+        FilePageStoreManager storeMgr = (FilePageStoreManager)cctx.pageStore();
+
+        pdsSettings = cctx.kernalContext().pdsFolderResolver().resolveFolders();
+
+        locSnpDir = snapshotPath(ctx.config()).toFile();
+        tmpWorkDir = Paths.get(storeMgr.workDir().getAbsolutePath(), DFLT_SNAPSHOT_TMP_DIR).toFile();
+
+        U.ensureDirectory(locSnpDir, "snapshot work directory", log);
+        U.ensureDirectory(tmpWorkDir, "temp directory for snapshot creation", log);
+
+        storeFactory = storeMgr::getPageStoreFactory;
+
+        cctx.exchange().registerExchangeAwareComponent(this);
+        ctx.internalSubscriptionProcessor().registerMetastorageListener(this);
+
+        // Receive remote snapshots requests.
+        cctx.gridIO().addMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC, new GridMessageListener() {
+            @Override public void onMessage(UUID nodeId, Object msg, byte plc) {
+                if (!busyLock.enterBusy())
+                    return;
+
+                try {
+                    if (msg instanceof SnapshotRequestMessage) {
+                        SnapshotRequestMessage reqMsg0 = (SnapshotRequestMessage)msg;
+                        String snpName = reqMsg0.snapshotName();
+
+                        synchronized (this) {
+                            SnapshotFutureTask task = lastScheduledRemoteSnapshotTask(nodeId);
+
+                            if (task != null) {
+                                // Task will also be removed from local map due to the listener on future done.
+                                task.cancel();
+
+                                log.info("Snapshot request has been cancelled due to another request received " +
+                                    "[prevSnpResp=" + task + ", msg0=" + reqMsg0 + ']');
+                            }
+                        }
+
+                        SnapshotFutureTask task = registerSnapshotTask(snpName,
+                            nodeId,
+                            reqMsg0.parts(),
+                            remoteSnapshotSender(snpName, nodeId));
+
+                        task.listen(f -> {
+                            if (f.error() == null)
+                                return;
+
+                            U.error(log, "Failed to process request of creating a snapshot " +
+                                "[from=" + nodeId + ", msg=" + reqMsg0 + ']', f.error());
+
+                            try {
+                                cctx.gridIO().sendToCustomTopic(nodeId,
+                                    DFLT_INITIAL_SNAPSHOT_TOPIC,
+                                    new SnapshotResponseMessage(reqMsg0.snapshotName(), f.error().getMessage()),
+                                    SYSTEM_POOL);
+                            }
+                            catch (IgniteCheckedException ex0) {
+                                U.error(log, "Fail to send the response message with processing snapshot request " +
+                                    "error [request=" + reqMsg0 + ", nodeId=" + nodeId + ']', ex0);
+                            }
+                        });
+
+                        task.start();
+                    }
+                    else if (msg instanceof SnapshotResponseMessage) {
+                        SnapshotResponseMessage respMsg0 = (SnapshotResponseMessage)msg;
+
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.snpName.equals(respMsg0.snapshotName())) {
+                            if (log.isInfoEnabled()) {
+                                log.info("A stale snapshot response message has been received. Will be ignored " +
+                                    "[fromNodeId=" + nodeId + ", response=" + respMsg0 + ']');
+                            }
+
+                            return;
+                        }
+
+                        if (respMsg0.errorMessage() != null) {
+                            fut0.onDone(new IgniteCheckedException("Request cancelled. The snapshot operation stopped " +
+                                "on the remote node with an error: " + respMsg0.errorMessage()));
+                        }
+                    }
+                }
+                catch (Throwable e) {
+                    U.error(log, "Processing snapshot request from remote node fails with an error", e);
+
+                    cctx.kernalContext().failure().process(new FailureContext(FailureType.CRITICAL_ERROR, e));
+                }
+                finally {
+                    busyLock.leaveBusy();
+                }
+            }
+        });
+
+        cctx.gridEvents().addDiscoveryEventListener(discoLsnr = (evt, discoCache) -> {
+            if (!busyLock.enterBusy())
+                return;
+
+            try {
+                UUID leftNodeId = evt.eventNode().id();
+
+                if (evt.type() == EVT_DISCOVERY_CUSTOM_EVT) {
+                    DiscoveryCustomEvent evt0 = (DiscoveryCustomEvent)evt;
+
+                    if (evt0.customMessage() instanceof InitMessage) {
+                        InitMessage<?> msg = (InitMessage<?>)evt0.customMessage();
+
+                        // This happens when #takeSnapshot() method already invoked and distributed process
+                        // starts its action.
+                        if (msg.type() == START_SNAPSHOT.ordinal()) {
+                            assert clusterSnpRq != null ||
+                                !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()) : evt;
+
+                            DiscoveryCustomEvent customEvt = new DiscoveryCustomEvent();
+
+                            customEvt.node(evt0.node());
+                            customEvt.eventNode(evt0.eventNode());
+                            customEvt.affinityTopologyVersion(evt0.affinityTopologyVersion());
+                            customEvt.customMessage(new SnapshotStartDiscoveryMessage(discoCache, msg.processId()));
+
+                            // Handle new event inside discovery thread, so no guarantees will be violated.
+                            cctx.exchange().onDiscoveryEvent(customEvt, discoCache);
+                        }
+                    }
+                }
+                else if (evt.type() == EVT_NODE_LEFT || evt.type() == EVT_NODE_FAILED) {
+                    SnapshotOperationRequest snpRq = clusterSnpRq;
+
+                    for (SnapshotFutureTask sctx : locSnpTasks.values()) {
+                        if (sctx.sourceNodeId().equals(leftNodeId) ||
+                            (snpRq != null &&
+                                snpRq.snpName.equals(sctx.snapshotName()) &&
+                                snpRq.bltNodes.contains(leftNodeId))) {
+                            sctx.acceptException(new ClusterTopologyCheckedException("The node which requested snapshot " +
+                                "creation has left the grid"));
+                        }
+                    }
+
+                    RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                    if (snpTrFut != null && snpTrFut.rmtNodeId.equals(leftNodeId)) {
+                        snpTrFut.onDone(new ClusterTopologyCheckedException("The node from which a snapshot has been " +
+                            "requested left the grid"));
+                    }
+                }
+            }
+            finally {
+                busyLock.leaveBusy();
+            }
+        }, EVT_NODE_LEFT, EVT_NODE_FAILED, EVT_DISCOVERY_CUSTOM_EVT);
+
+        // Remote snapshot handler.
+        cctx.kernalContext().io().addTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC, new TransmissionHandler() {
+            @Override public void onEnd(UUID nodeId) {
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                assert snpTrFut.stores.isEmpty() : snpTrFut.stores.entrySet();
+                assert snpTrFut.partsLeft == 0 : snpTrFut;
+
+                snpTrFut.onDone();
+
+                log.info("Requested snapshot from remote node has been fully received " +
+                    "[snpName=" + snpTrFut.snpName + ", snpTrans=" + snpTrFut + ']');
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onException(UUID nodeId, Throwable err) {
+                RemoteSnapshotFuture fut = rmtSnpReq.get();
+
+                if (fut == null)
+                    return;
+
+                if (fut.rmtNodeId.equals(nodeId))
+                    fut.onDone(err);
+            }
+
+            /** {@inheritDoc} */
+            @Override public String filePath(UUID nodeId, TransmissionMeta fileMeta) {
+                Integer partId = (Integer)fileMeta.params().get(SNP_PART_ID_PARAM);
+                String rmtDbNodePath = (String)fileMeta.params().get(SNP_DB_NODE_PATH_PARAM);
+                String cacheDirName = (String)fileMeta.params().get(SNP_CACHE_DIR_NAME_PARAM);
+
+                RemoteSnapshotFuture transFut = resolve(nodeId, fileMeta);
+
+                try {
+                    File cacheDir = U.resolveWorkDirectory(tmpWorkDir.getAbsolutePath(),
+                        Paths.get(transFut.snpName, rmtDbNodePath, cacheDirName).toString(),
+                        false);
+
+                    return new File(cacheDir, getPartitionFileName(partId)).getAbsolutePath();
+                }
+                catch (IgniteCheckedException e) {
+                    throw new IgniteException(e);
+                }
+            }
+
+            /**
+             * @param nodeId Remote node id.
+             * @param meta Transmission meta.
+             * @return Resolved transmission future.
+             */
+            private RemoteSnapshotFuture resolve(UUID nodeId, TransmissionMeta meta) {
+                String snpName = (String)meta.params().get(SNP_NAME_PARAM);
+                Integer partsCnt = (Integer)meta.params().get(SNP_PARTITIONS_CNT);
+
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                if (snpTrFut == null || !snpTrFut.snpName.equals(snpName)) {
+                    throw new TransmissionCancelledException("Stale snapshot transmission will be ignored " +
+                        "[snpName=" + snpName + ", meta=" + meta + ", snpTrFut=" + snpTrFut + ']');
+                }
+
+                assert snpTrFut.snpName.equals(snpName) && snpTrFut.rmtNodeId.equals(nodeId) :
+                    "Another transmission in progress [snpTrFut=" + snpTrFut + ", nodeId=" + snpName + ']';
+
+                if (snpTrFut.partsLeft == -1)
+                    snpTrFut.partsLeft = partsCnt;
+
+                return snpTrFut;
+            }
+
+            /**
+             * @param snpTrans Current snapshot transmission.
+             * @param grpPartId Pair of group id and its partition id.
+             */
+            private void finishRecover(RemoteSnapshotFuture snpTrans, GroupPartitionId grpPartId) {
+                FilePageStore pageStore = null;
+
+                try {
+                    pageStore = snpTrans.stores.remove(grpPartId);
+
+                    pageStore.finishRecover();
+
+                    snpTrans.partConsumer.accept(new File(pageStore.getFileAbsolutePath()), grpPartId);
+
+                    snpTrans.partsLeft--;
+                }
+                catch (StorageException e) {
+                    throw new IgniteException(e);
+                }
+                finally {
+                    U.closeQuiet(pageStore);
+                }
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<ByteBuffer> chunkHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+
+                RemoteSnapshotFuture snpTrFut = resolve(nodeId, initMeta);
+
+                GroupPartitionId grpPartId = new GroupPartitionId(grpId, partId);
+                FilePageStore pageStore = snpTrFut.stores.get(grpPartId);
+
+                if (pageStore == null) {
+                    throw new IgniteException("Partition must be loaded before applying snapshot delta pages " +
+                        "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                pageStore.beginRecover();
+
+                // No snapshot delta pages received. Finalize recovery.
+                if (initMeta.count() == 0)
+                    finishRecover(snpTrFut, grpPartId);
+
+                return new Consumer<ByteBuffer>() {
+                    final LongAdder transferred = new LongAdder();
+
+                    @Override public void accept(ByteBuffer buff) {
+                        try {
+                            assert initMeta.count() != 0 : initMeta;
+
+                            RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                            if (fut0 == null || !fut0.equals(snpTrFut) || fut0.isCancelled()) {
+                                throw new TransmissionCancelledException("Snapshot request is cancelled " +
+                                    "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                            }
+
+                            pageStore.write(PageIO.getPageId(buff), buff, 0, false);
+
+                            transferred.add(buff.capacity());
+
+                            if (transferred.longValue() == initMeta.count())
+                                finishRecover(snpTrFut, grpPartId);
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                    }
+                };
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<File> fileHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+                String snpName = (String)initMeta.params().get(SNP_NAME_PARAM);
+
+                assert grpId != null;
+                assert partId != null;
+                assert snpName != null;
+                assert storeFactory != null;
+
+                RemoteSnapshotFuture transFut = rmtSnpReq.get();
+
+                if (transFut == null) {
+                    throw new IgniteException("Snapshot transmission with given name doesn't exists " +
+                        "[snpName=" + snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                return new Consumer<File>() {
+                    @Override public void accept(File file) {
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.equals(transFut) || fut0.isCancelled()) {
+                            throw new TransmissionCancelledException("Snapshot request is cancelled [snpName=" + snpName +
+                                ", grpId=" + grpId + ", partId=" + partId + ']');
+                        }
+
+                        busyLock.enterBusy();
+
+                        try {
+                            FilePageStore pageStore = (FilePageStore)storeFactory
+                                .apply(grpId, false)
+                                .createPageStore(getFlagByPartId(partId),
+                                    file::toPath,
+                                    new LongAdderMetric("NO_OP", null));
+
+                            transFut.stores.put(new GroupPartitionId(grpId, partId), pageStore);
+
+                            pageStore.init();
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                        finally {
+                            busyLock.leaveBusy();
+                        }
+                    }
+                };
+            }
+        });
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void stop0(boolean cancel) {
+        busyLock.block();
+
+        try {
+            // Try stop all snapshot processing if not yet.
+            for (SnapshotFutureTask sctx : locSnpTasks.values())
+                sctx.acceptException(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+            locSnpTasks.clear();
+
+            RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+            if (snpTrFut != null)
+                snpTrFut.cancel();
+
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null) {
+                    clusterSnpFut.onDone(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+                    clusterSnpFut = null;
+                }
+            }
+
+            if (snpRunner != null)
+                snpRunner.shutdownNow();
+
+            cctx.kernalContext().io().removeMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC);
+            cctx.kernalContext().io().removeTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC);
+
+            if (discoLsnr != null)
+                cctx.kernalContext().event().removeDiscoveryEventListener(discoLsnr);
+
+            cctx.exchange().unregisterExchangeAwareComponent(this);
+        }
+        finally {
+            busyLock.unblock();
+        }
+    }
+
+    /**
+     * Concurrently traverse the snapshot directory for given local node folder name and
+     * delete recursively all files from it if exist.
+     *
+     * @param snpDir Snapshot dire
+     * @param folderName Local node folder name (see U.maskForFileName with consistent id).
+     */
+    public static void deleteSnapshot(File snpDir, String folderName) {
+        if (!snpDir.exists())
+            return;
+
+        assert snpDir.isDirectory() : snpDir;
+
+        try {
+            List<Path> dirs = new ArrayList<>();
+
+            Files.walkFileTree(snpDir.toPath(), new SimpleFileVisitor<Path>() {
+                @Override public FileVisitResult preVisitDirectory(Path dir,
+                    BasicFileAttributes attrs) throws IOException {
+                    if (Files.isDirectory(dir) &&
+                        Files.exists(dir) &&
+                        folderName.equals(dir.getFileName().toString())) {
+                        // Directory found, add it for processing.
+                        dirs.add(dir);
+                    }
+
+                    return super.preVisitDirectory(dir, attrs);
+                }
+
+                @Override public FileVisitResult visitFileFailed(Path file, IOException exc) {
+                    // Skip files which can be concurrently removed from FileTree.
+                    return FileVisitResult.CONTINUE;
+                }
+            });
+
+            dirs.forEach(U::delete);
+
+            File db = new File(snpDir, DB_DEFAULT_FOLDER);
+
+            if (!db.exists() || db.list().length == 0)
+                U.delete(snpDir);
+        }
+        catch (IOException e) {
+            throw new IgniteException(e);
+        }
+    }
+
+    /**
+     * @param snpName Snapshot name.
+     * @return Local snapshot directory for snapshot with given name.
+     */
+    public File snapshotLocalDir(String snpName) {
+        assert locSnpDir != null;
+
+        return new File(locSnpDir, snpName);
+    }
+
+    /**
+     * @return Node snapshot working directory.
+     */
+    public File snapshotTmpDir() {
+        assert tmpWorkDir != null;
+
+        return tmpWorkDir;
+    }
+
+    /**
+     * @param req Request on snapshot creation.
+     * @return Future which will be completed when a snapshot has been started.
+     */
+    private IgniteInternalFuture<SnapshotOperationResponse> startLocalSnapshot(SnapshotOperationRequest req) {
+        if (cctx.kernalContext().clientNode() ||
+            !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()))
+            return new GridFinishedFuture<>();
+
+        // Executed inside discovery notifier thread, prior to firing discovery custom event,
+        // so it is safe to set new snapshot task inside this method without synchronization.
+        if (clusterSnpRq != null) {
+            return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot operation has been rejected. " +
+                "Another snapshot operation in progress [req=" + req + ", curr=" + clusterSnpRq + ']'));
+        }
+
+        // Collection of pairs group and appropriate cache partition to be snapshot.
+        Map<Integer, Set<Integer>> parts = new HashMap<>();
+
+        for (Integer grpId : req.grpIds)
+            parts.put(grpId, null);
+
+        SnapshotFutureTask task0 = registerSnapshotTask(req.snpName,
+            req.srcNodeId,
+            parts,
+            locSndrFactory.apply(req.snpName));
+
+        clusterSnpRq = req;
+
+        return task0.chain(f -> new SnapshotOperationResponse());
+    }
+
+    /**
+     * @param id Request id.
+     * @param res Results.
+     * @param err Errors.
+     */
+    private void startLocalSnapshotResult(UUID id, Map<UUID, SnapshotOperationResponse> res, Map<UUID, Exception> err) {
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        if (snpRq == null)
+            return;
+
+        if (isLocalNodeCoordinator(cctx.discovery())) {
+            Set<UUID> missed = new HashSet<>(snpRq.bltNodes);
+            missed.removeAll(res.keySet());
+            missed.removeAll(err.keySet());
+
+            snpRq.hasErr = !F.isEmpty(err) || !missed.isEmpty();
+
+            if (snpRq.hasErr) {
+                U.warn(log, "Execution of local snapshot tasks fails or them haven't been executed " +
+                    "due to some of nodes left the cluster. Uncompleted snapshot will be deleted " +
+                    "[err=" + err + ", missed=" + missed + ']');
+            }
+
+            endSnpProc.start(UUID.randomUUID(), snpRq);
+        }
+    }
+
+    /**
+     * @param req Request on snapshot creation.
+     * @return Future which will be completed when the snapshot will be finalized.
+     */
+    private IgniteInternalFuture<SnapshotOperationResponse> endLocalSnapshot(SnapshotOperationRequest req) {
+        if (clusterSnpRq == null)
+            return new GridFinishedFuture<>(new SnapshotOperationResponse());
+
+        try {
+            if (req.hasErr)
+                deleteSnapshot(snapshotLocalDir(req.snpName), pdsSettings.folderName());
+
+            removeLastMetaStorageKey();
+        }
+        catch (Exception e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        return new GridFinishedFuture<>(new SnapshotOperationResponse());
+    }
+
+    /**
+     * @param id Request id.
+     * @param res Results.
+     * @param err Errors.
+     */
+    private void endLocalSnapshotResult(UUID id, Map<UUID, SnapshotOperationResponse> res, Map<UUID, Exception> err) {
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        if (snpRq == null)
+            return;
+
+        Set<UUID> endFail = new HashSet<>(snpRq.bltNodes);
+        endFail.removeAll(res.keySet());
+
+        clusterSnpRq = null;
+
+        synchronized (snpOpMux) {
+            if (clusterSnpFut != null) {
+                if (endFail.isEmpty() && !snpRq.hasErr) {
+                    clusterSnpFut.onDone();
+
+                    if (log.isInfoEnabled())
+                        log.info("Cluster-wide snapshot operation finished successfully [req=" + snpRq + ']');
+                }
+                else {
+                    clusterSnpFut.onDone(new IgniteCheckedException("Snapshot creation has been finished with an error. " +
+                        "Local snapshot tasks may not finished completely or finalizing results fails " +
+                        "[hasErr" + snpRq.hasErr + ", fail=" + endFail + ']'));
+                }
+
+                clusterSnpFut = null;
+            }
+        }
+    }
+
+    /**
+     * @return {@code True} if snapshot operation started.
+     */
+    public boolean inProgress() {
+        synchronized (snpOpMux) {
+            return clusterSnpRq != null || clusterSnpFut != null;
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public List<String> getSnapshots() {
+        if (cctx.kernalContext().clientNode())
+            throw new UnsupportedOperationException("Client and daemon nodes can not perform this operation.");
+
+        synchronized (snpOpMux) {
+            return Arrays.stream(locSnpDir.listFiles(File::isDirectory))
+                .map(File::getName)
+                .collect(Collectors.toList());
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public IgniteFuture<Void> createSnapshot(String name) {
+        if (cctx.kernalContext().clientNode()) {
+            return new IgniteFinishedFutureImpl<>(new UnsupportedOperationException("Client and daemon nodes can not " +
+                "perform this operation."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(cctx.discovery().allNodes(), PERSISTENCE_CACHE_SNAPSHOT)) {
+            return new IgniteFinishedFutureImpl<>(new IllegalStateException("Not all nodes in the cluster support " +
+                "a snapshot operation."));
+        }
+
+        if (!active(cctx.kernalContext().state().clusterState().state())) {
+            return new IgniteFinishedFutureImpl<>(new IgniteException("Snapshot operation has been rejected. " +
+                "The cluster is inactive."));
+        }
+
+        DiscoveryDataClusterState clusterState = cctx.kernalContext().state().clusterState();
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(new IgniteException("Snapshot operation has been rejected. " +
+                "The baseline topology is not configured for cluster."));
+        }
+
+        GridFutureAdapter<Void> snpFut0;
+
+        synchronized (snpOpMux) {
+            if (clusterSnpFut != null && !clusterSnpFut.isDone()) {
+                return new IgniteFinishedFutureImpl<>(new IgniteException("Create snapshot request has been rejected. " +
+                    "The previous snapshot operation was not completed."));
+            }
+
+            if (clusterSnpRq != null) {
+                return new IgniteFinishedFutureImpl<>(new IgniteException("Create snapshot request has been rejected. " +
+                    "Parallel snapshot processes are not allowed."));
+            }
+
+            if (getSnapshots().contains(name))
+                return new IgniteFinishedFutureImpl<>(new IgniteException("Create snapshot request has been rejected. " +
+                    "Snapshot with given name already exists."));
+
+            snpFut0 = new GridFutureAdapter<>();
+
+            clusterSnpFut = snpFut0;
+        }
+
+        List<Integer> grps = cctx.cache().persistentGroups().stream()
+            .filter(g -> cctx.cache().cacheType(g.cacheOrGroupName()) == CacheType.USER)
+            .filter(g -> !g.config().isEncryptionEnabled())
+            .map(CacheGroupDescriptor::groupId)
+            .collect(Collectors.toList());
+
+        List<ClusterNode> srvNodes = cctx.discovery().serverNodes(AffinityTopologyVersion.NONE);
+
+        startSnpProc.start(UUID.randomUUID(), new SnapshotOperationRequest(cctx.localNodeId(),
+            name,
+            grps,
+            new HashSet<>(F.viewReadOnly(srvNodes,
+                F.node2id(),
+                (node) -> CU.baselineNode(node, clusterState)))));
+
+        if (log.isInfoEnabled())
+            log.info("Cluster-wide snapshot operation started [snpName=" + name + ", grps=" + grps + ']');
+
+        return new IgniteFutureImpl<>(snpFut0);
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onReadyForReadWrite(ReadWriteMetastorage metaStorage) throws IgniteCheckedException {
+        synchronized (snpOpMux) {
+            this.metaStorage = metaStorage;
+
+            if (recovered)
+                removeLastMetaStorageKey();
+
+            recovered = false;
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onReadyForRead(ReadOnlyMetastorage metaStorage) throws IgniteCheckedException {
+        // Snapshot which has not been completed due to the local node crashed must be deleted.
+        String snpName = (String)metaStorage.read(SNP_RUNNING_KEY);
+
+        if (snpName == null)
+            return;
+
+        recovered = true;
+
+        for (File tmp : snapshotTmpDir().listFiles())
+            U.delete(tmp);
+
+        deleteSnapshot(snapshotLocalDir(snpName), pdsSettings.folderName());
+
+        if (log.isInfoEnabled()) {
+            log.info("Previous attempt to create snapshot fail due to the local node crash. All resources " +
+                "related to snapshot operation have been deleted: " + snpName);
+        }
+    }
+
+    /**
+     * @param evt Discovery event to check.
+     * @return {@code true} if exchange started by snapshot operation.
+     */
+    public static boolean isSnapshotOperation(DiscoveryEvent evt) {
+        return !evt.eventNode().isClient() &&
+            evt.type() == EVT_DISCOVERY_CUSTOM_EVT &&
+            ((DiscoveryCustomEvent)evt).customMessage() instanceof SnapshotStartDiscoveryMessage;
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onDoneBeforeTopologyUnlock(GridDhtPartitionsExchangeFuture fut) {
+        if (clusterSnpRq == null || cctx.kernalContext().clientNode())
+            return;
+
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        SnapshotFutureTask task = locSnpTasks.get(snpRq.snpName);
+
+        if (task == null)
+            return;
+
+        if (task.start()) {
+            cctx.database().forceCheckpoint(String.format("Start snapshot operation: %s", snpRq.snpName));
+
+            // schedule task on checkpoint and wait when it starts
+            try {
+                task.awaitStarted();
+            }
+            catch (IgniteCheckedException e) {
+                U.error(log, "Fail to wait while cluster-wide snapshot operation started", e);
+            }
+        }
+    }
+
+    /**
+     * @param parts Collection of pairs group and appropriate cache partition to be snapshot.
+     * @param rmtNodeId The remote node to connect to.
+     * @param partConsumer Received partition handler.
+     * @return Future which will be completed when requested snapshot fully received.
+     */
+    public IgniteInternalFuture<Void> createRemoteSnapshot(
+        UUID rmtNodeId,
+        Map<Integer, Set<Integer>> parts,
+        BiConsumer<File, GroupPartitionId> partConsumer
+    ) {
+        assert partConsumer != null;
+
+        ClusterNode rmtNode = cctx.discovery().node(rmtNodeId);
+
+        if (!nodeSupports(rmtNode, PERSISTENCE_CACHE_SNAPSHOT))
+            return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot on remote node is not supported: " + rmtNode.id()));
+
+        if (rmtNode == null) {
+            return new GridFinishedFuture<>(new ClusterTopologyCheckedException("Snapshot request cannot be performed. " +
+                "Remote node left the grid [rmtNodeId=" + rmtNodeId + ']'));
+        }
+
+        String snpName = RMT_SNAPSHOT_PREFIX + UUID.randomUUID().toString();
+
+        RemoteSnapshotFuture snpTransFut = new RemoteSnapshotFuture(rmtNodeId, snpName, partConsumer);
+
+        busyLock.enterBusy();
+        SnapshotRequestMessage msg0;
+
+        try {
+            msg0 = new SnapshotRequestMessage(snpName, parts);
+
+            RemoteSnapshotFuture fut = rmtSnpReq.get();
+
+            try {
+                if (fut != null)
+                    fut.get(DFLT_SNAPSHOT_TIMEOUT, TimeUnit.MILLISECONDS);
+            }
+            catch (IgniteCheckedException e) {
+                if (log.isInfoEnabled())
+                    log.info("The previous snapshot request finished with an exception:" + e.getMessage());
+            }
+
+            try {
+                if (rmtSnpReq.compareAndSet(null, snpTransFut)) {
+                    cctx.gridIO().sendOrderedMessage(rmtNode, DFLT_INITIAL_SNAPSHOT_TOPIC, msg0, SYSTEM_POOL,
+                        Long.MAX_VALUE, true);
+                }
+                else
+                    return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot request has been concurrently interrupted."));
+
+            }
+            catch (IgniteCheckedException e) {
+                rmtSnpReq.compareAndSet(snpTransFut, null);
+
+                return new GridFinishedFuture<>(e);
+            }
+        }
+        finally {
+            busyLock.leaveBusy();
+        }
+
+        if (log.isInfoEnabled()) {
+            log.info("Snapshot request is sent to the remote node [rmtNodeId=" + rmtNodeId +
+                ", msg0=" + msg0 + ", snpTransFut=" + snpTransFut +
+                ", topVer=" + cctx.discovery().topologyVersionEx() + ']');
+        }
+
+        return snpTransFut;
+    }
+
+    /**
+     * @param grps List of cache groups which will be destroyed.
+     */
+    public void onCacheGroupsStopped(List<Integer> grps) {
+        for (SnapshotFutureTask sctx : locSnpTasks.values()) {
+            Set<Integer> retain = new HashSet<>(grps);
+            retain.retainAll(sctx.affectedCacheGroups());
+
+            if (!retain.isEmpty()) {
+                sctx.acceptException(new IgniteCheckedException("Snapshot has been interrupted due to some of the required " +
+                    "cache groups stopped: " + retain));
+            }
+        }
+    }
+
+    /**
+     * @param snpName Unique snapshot name.
+     * @param srcNodeId Node id which cause snapshot operation.
+     * @param parts Collection of pairs group and appropriate cache partition to be snapshot.
+     * @param snpSndr Factory which produces snapshot receiver instance.
+     * @return Snapshot operation task which should be registered on checkpoint to run.
+     */
+    SnapshotFutureTask registerSnapshotTask(
+        String snpName,
+        UUID srcNodeId,
+        Map<Integer, Set<Integer>> parts,
+        SnapshotSender snpSndr
+    ) {
+        if (!busyLock.enterBusy())
+            return new SnapshotFutureTask(new IgniteCheckedException("Snapshot manager is stopping [locNodeId=" + cctx.localNodeId() + ']'));
+
+        try {
+            if (locSnpTasks.containsKey(snpName))
+                return new SnapshotFutureTask(new IgniteCheckedException("Snapshot with requested name is already scheduled: " + snpName));
+
+            SnapshotFutureTask snpFutTask;
+
+            SnapshotFutureTask prev = locSnpTasks.putIfAbsent(snpName,
+                snpFutTask = new SnapshotFutureTask(cctx,
+                    srcNodeId,
+                    snpName,
+                    tmpWorkDir,
+                    ioFactory,
+                    snpSndr,
+                    parts,
+                    locBuff));
+
+            if (prev != null)
+                return new SnapshotFutureTask(new IgniteCheckedException("Snapshot with requested name is already scheduled: " + snpName));
+
+            if (log.isInfoEnabled()) {
+                log.info("Snapshot task has been registered on local node [sctx=" + this +
+                    ", topVer=" + cctx.discovery().topologyVersionEx() + ']');
+            }
+
+            snpFutTask.listen(f -> locSnpTasks.remove(snpName));
+
+            return snpFutTask;
+        }
+        finally {
+            busyLock.leaveBusy();
+        }
+    }
+
+    /**
+     * @return Snapshot receiver instance.
+     */
+    SnapshotSender localSnapshotSender(String snpName) {
+        return new LocalSnapshotSender(snpName);
+    }
+
+    /**
+     * @param snpName Snapshot name.
+     * @param rmtNodeId Remote node id to send snapshot to.
+     * @return Snapshot sender instance.
+     */
+    SnapshotSender remoteSnapshotSender(String snpName, UUID rmtNodeId) {
+        // Remote snapshots can be send only by single threaded executor since only one transmissionSender created.
+        return new RemoteSnapshotSender(log,
+            new SequentialExecutorWrapper(log, snpRunner),
+            () -> igniteCacheStoragePath(pdsSettings),
+            cctx.gridIO().openTransmissionSender(rmtNodeId, DFLT_INITIAL_SNAPSHOT_TOPIC),
+            snpName);
+    }
+
+    /** Snapshot finished successfully or already restored. Key can be removed. */
+    private void removeLastMetaStorageKey() throws IgniteCheckedException {
+        cctx.database().checkpointReadLock();
+
+        try {
+            metaStorage.remove(SNP_RUNNING_KEY);
+        }
+        finally {
+            cctx.database().checkpointReadUnlock();
+        }
+    }
+
+    /**
+     * @return The executor service used to run snapshot tasks.
+     */
+    ExecutorService snapshotExecutorService() {
+        assert snpRunner != null;
+
+        return snpRunner;
+    }
+
+    /**
+     * @param ioFactory Factory to create IO interface over a page stores.
+     */
+    void ioFactory(FileIOFactory ioFactory) {
+        this.ioFactory = ioFactory;
+    }
+
+    /**
+     * @param nodeId Remote node id on which requests has been registered.
+     * @return Snapshot future related to given node id.
+     */
+    SnapshotFutureTask lastScheduledRemoteSnapshotTask(UUID nodeId) {
+        return locSnpTasks.values().stream()
+            .filter(t -> t.type() == RemoteSnapshotSender.class && t.sourceNodeId().equals(nodeId))
+            .findFirst()
+            .orElse(null);
+    }
+
+    /**
+     * @return Relative configured path of persistence data storage directory for the local node.
+     * Example: {@code snapshotWorkDir/db/IgniteNodeName0}
+     */
+    static String igniteCacheStoragePath(PdsFolderSettings pcfg) {
+        return Paths.get(DB_DEFAULT_FOLDER, pcfg.folderName()).toString();
+    }
+
+    /**
+     * @param cfg Ignite configuration.
+     * @return Snapshot work path.
+     */
+    static Path snapshotPath(IgniteConfiguration cfg) {
 
 Review comment:
   Also you use this method only once in production code and convert result toFile(), it's better to return File instead of Path. In test code toString() is used on result, so there will be no difference for test code.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r408779751
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/MarshallerContextImpl.java
 ##########
 @@ -168,27 +170,73 @@ private void initializeCaches() {
     }
 
     /**
-     * @param platformId Platform id.
-     * @param marshallerMappings All marshaller mappings for given platformId.
-     * @throws IgniteCheckedException In case of failure to process incoming marshaller mappings.
+     * @param log Ignite logger.
+     * @param mappings All marshaller mappings to write.
      */
-    public void onMappingDataReceived(byte platformId, Map<Integer, MappedName> marshallerMappings)
-        throws IgniteCheckedException
-    {
-        ConcurrentMap<Integer, MappedName> platformCache = getCacheFor(platformId);
+    public void onMappingDataReceived(IgniteLogger log, List<Map<Integer, MappedName>> mappings) {
+        addPlatformMappings(log,
+            mappings,
+            this::getCacheFor,
+            (mappedName, clsName) ->
+                mappedName == null || F.isEmpty(clsName) || !clsName.equals(mappedName.className()),
+            fileStore);
+    }
+
+    /**
+     * @param ctx Kernal context.
+     * @param mappings Marshaller mappings to save.
+     * @param dir Directory to save given mappings to.
+     */
+    public static void saveMappings(GridKernalContext ctx, List<Map<Integer, MappedName>> mappings, File dir) {
+        MarshallerMappingFileStore writer = new MarshallerMappingFileStore(ctx,
+            mappingFileStoreWorkDir(dir.getAbsolutePath()));
+
+        addPlatformMappings(ctx.log(MarshallerContextImpl.class),
+            mappings,
+            b -> new ConcurrentHashMap<>(),
+            (mappedName, clsName) -> true,
+            writer);
+    }
+
+    /**
+     * @param mappings Map of marshaller mappings.
+     * @param mappedCache Cache to attach new mappings to.
+     * @param cacheAddPred Check mapping can be added.
+     * @param writer Persistence mapping writer.
+     */
+    private static void addPlatformMappings(
+        IgniteLogger log,
+        List<Map<Integer, MappedName>> mappings,
+        Function<Byte, ConcurrentMap<Integer, MappedName>> mappedCache,
+        BiPredicate<MappedName, String> cacheAddPred,
+        MarshallerMappingFileStore writer
+    ) {
+        if (mappings == null)
+            return;
+
+        for (byte platformId = 0; platformId < mappings.size(); platformId++) {
+            Map<Integer, MappedName> attach = mappings.get(platformId);
 
-        for (Map.Entry<Integer, MappedName> e : marshallerMappings.entrySet()) {
-            int typeId = e.getKey();
-            String clsName = e.getValue().className();
+            if (attach == null)
+                return;
 
 Review comment:
   Originally null mappings were skipped but don't break processing. I think `continue` should be there.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r408714030
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManager.java
 ##########
 @@ -0,0 +1,1986 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.channels.FileChannel;
+import java.nio.file.FileVisitResult;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.SimpleFileVisitor;
+import java.nio.file.attribute.BasicFileAttributes;
+import java.util.ArrayDeque;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Queue;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.Executor;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.concurrent.atomic.LongAdder;
+import java.util.function.BiConsumer;
+import java.util.function.BiFunction;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.events.DiscoveryEvent;
+import org.apache.ignite.failure.FailureContext;
+import org.apache.ignite.failure.FailureType;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.GridTopic;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.events.DiscoveryCustomEvent;
+import org.apache.ignite.internal.managers.communication.GridIoManager;
+import org.apache.ignite.internal.managers.communication.GridMessageListener;
+import org.apache.ignite.internal.managers.communication.TransmissionCancelledException;
+import org.apache.ignite.internal.managers.communication.TransmissionHandler;
+import org.apache.ignite.internal.managers.communication.TransmissionMeta;
+import org.apache.ignite.internal.managers.communication.TransmissionPolicy;
+import org.apache.ignite.internal.managers.discovery.DiscoCache;
+import org.apache.ignite.internal.managers.discovery.DiscoveryCustomMessage;
+import org.apache.ignite.internal.managers.discovery.GridDiscoveryManager;
+import org.apache.ignite.internal.managers.eventstorage.DiscoveryEventListener;
+import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.CacheType;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.PartitionsExchangeAware;
+import org.apache.ignite.internal.processors.cache.persistence.StorageException;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.filename.PdsFolderSettings;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.MetastorageLifecycleListener;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadOnlyMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadWriteMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.tree.io.PageIO;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.processors.marshaller.MappedName;
+import org.apache.ignite.internal.processors.metric.MetricRegistry;
+import org.apache.ignite.internal.processors.metric.impl.LongAdderMetric;
+import org.apache.ignite.internal.util.GridBusyLock;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.distributed.InitMessage;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.tostring.GridToStringInclude;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.A;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.S;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.lang.IgniteUuid;
+import org.apache.ignite.thread.IgniteThreadPoolExecutor;
+import org.apache.ignite.thread.OomExceptionHandler;
+import org.jetbrains.annotations.Nullable;
+
+import static java.nio.file.StandardOpenOption.READ;
+import static org.apache.ignite.cluster.ClusterState.active;
+import static org.apache.ignite.configuration.IgniteConfiguration.DFLT_SNAPSHOT_DIRECTORY;
+import static org.apache.ignite.events.EventType.EVT_NODE_FAILED;
+import static org.apache.ignite.events.EventType.EVT_NODE_LEFT;
+import static org.apache.ignite.internal.IgniteFeatures.PERSISTENCE_CACHE_SNAPSHOT;
+import static org.apache.ignite.internal.IgniteFeatures.nodeSupports;
+import static org.apache.ignite.internal.MarshallerContextImpl.mappingFileStoreWorkDir;
+import static org.apache.ignite.internal.MarshallerContextImpl.saveMappings;
+import static org.apache.ignite.internal.events.DiscoveryCustomEvent.EVT_DISCOVERY_CUSTOM_EVT;
+import static org.apache.ignite.internal.managers.communication.GridIoPolicy.SYSTEM_POOL;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.INDEX_PARTITION;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.MAX_PARTITION_ID;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.resolveBinaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.INDEX_FILE_NAME;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.PART_FILE_TEMPLATE;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFile;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFileName;
+import static org.apache.ignite.internal.processors.cache.persistence.filename.PdsConsistentIdProcessor.DB_DEFAULT_FOLDER;
+import static org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId.getFlagByPartId;
+import static org.apache.ignite.internal.util.IgniteUtils.isLocalNodeCoordinator;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.END_SNAPSHOT;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.START_SNAPSHOT;
+
+/**
+ * Internal implementation of snapshot operations over persistence caches.
+ * <p>
+ * There are two major actions available:
+ * <ul>
+ *     <li>Create snapshot of the whole cluster cache groups by triggering PME to achieve consistency.</li>
+ *     <li>Create local snapshot of requested cache groups and send it to the node which request this operation.
+ *     Cache groups will be transmitted using internal API for transferring files. See {@link TransmissionHandler}.</li>
+ * </ul>
+ */
+public class IgniteSnapshotManager extends GridCacheSharedManagerAdapter
+    implements IgniteSnapshot, PartitionsExchangeAware, MetastorageLifecycleListener {
+    /** File with delta pages suffix. */
+    public static final String DELTA_SUFFIX = ".delta";
+
+    /** File name template consists of delta pages. */
+    public static final String PART_DELTA_TEMPLATE = PART_FILE_TEMPLATE + DELTA_SUFFIX;
+
+    /** File name template for index delta pages. */
+    public static final String INDEX_DELTA_NAME = INDEX_FILE_NAME + DELTA_SUFFIX;
+
+    /** Text Reason for checkpoint to start snapshot operation. */
+    public static final String CP_SNAPSHOT_REASON = "Checkpoint started to enforce snapshot operation: %s";
+
+    /** Name prefix for each remote snapshot operation. */
+    public static final String RMT_SNAPSHOT_PREFIX = "snapshot_";
+
+    /** Default snapshot directory for loading remote snapshots. */
+    public static final String DFLT_SNAPSHOT_TMP_DIR = "snp";
+
+    /** Timeout in millisecond for snapshot operations. */
+    public static final long DFLT_SNAPSHOT_TIMEOUT = 15_000L;
+
+    /** Snapshot in progress error message. */
+    public static final String SNP_IN_PROGRESS_ERR_MSG = "Operation rejected due to the snapshot operation in progress.";
+
+    /** Error message to finalize snapshot tasks. */
+    public static final String SNP_NODE_STOPPING_ERR_MSG = "Snapshot has been cancelled due to the local node " +
+        "is stopping";
+
+    /** Metastorage key to save currently running snapshot. */
+    public static final String SNP_RUNNING_KEY = "snapshot-running";
+
+    /** Snapshot metrics prefix. */
+    public static final String SNAPSHOT_METRICS = "snapshot";
+
+    /** Prefix for snapshot threads. */
+    private static final String SNAPSHOT_RUNNER_THREAD_PREFIX = "snapshot-runner";
+
+    /** Total number of thread to perform local snapshot. */
+    private static final int SNAPSHOT_THREAD_POOL_SIZE = 4;
+
+    /** Default snapshot topic to receive snapshots from remote node. */
+    private static final Object DFLT_INITIAL_SNAPSHOT_TOPIC = GridTopic.TOPIC_SNAPSHOT.topic("rmt_snp");
+
+    /** File transmission parameter of cache group id. */
+    private static final String SNP_GRP_ID_PARAM = "grpId";
+
+    /** File transmission parameter of cache partition id. */
+    private static final String SNP_PART_ID_PARAM = "partId";
+
+    /** File transmission parameter of node-sender directory path with its consistentId (e.g. db/IgniteNode0). */
+    private static final String SNP_DB_NODE_PATH_PARAM = "dbNodePath";
+
+    /** File transmission parameter of a cache directory with is currently sends its partitions. */
+    private static final String SNP_CACHE_DIR_NAME_PARAM = "cacheDirName";
+
+    /** Snapshot parameter name for a file transmission. */
+    private static final String SNP_NAME_PARAM = "snpName";
+
+    /** Total snapshot files count which receiver should expect to receive. */
+    private static final String SNP_PARTITIONS_CNT = "partsCnt";
+
+    /**
+     * Local buffer to perform copy-on-write operations with pages for {@code SnapshotFutureTask.PageStoreSerialWriter}s.
+     * It is important to have only only buffer per thread (instead of creating each buffer per
+     * each {@code SnapshotFutureTask.PageStoreSerialWriter}) this is redundant and can lead to OOM errors. Direct buffer
+     * deallocate only when ByteBuffer is garbage collected, but it can get out of off-heap memory before it.
+     */
+    private final ThreadLocal<ByteBuffer> locBuff;
+
+    /** Map of registered cache snapshot processes and their corresponding contexts. */
+    private final ConcurrentMap<String, SnapshotFutureTask> locSnpTasks = new ConcurrentHashMap<>();
+
+    /** Lock to protect the resources is used. */
+    private final GridBusyLock busyLock = new GridBusyLock();
+
+    /** Requested snapshot from remote node. */
+    private final AtomicReference<RemoteSnapshotFuture> rmtSnpReq = new AtomicReference<>();
+
+    /** Mutex used to order cluster snapshot operation progress. */
+    private final Object snpOpMux = new Object();
+
+    /** Take snapshot operation procedure. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> startSnpProc;
+
+    /** Check previously performed snapshot operation and delete uncompleted files if need. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> endSnpProc;
+
+    /** Resolved persistent data storage settings. */
+    private volatile PdsFolderSettings pdsSettings;
+
+    /** Fully initialized metastorage. */
+    private volatile ReadWriteMetastorage metaStorage;
+
+    /** Local snapshot sender factory. */
+    private Function<String, SnapshotSender> locSndrFactory = localSnapshotSenderFactory();
+
+    /** Main snapshot directory to save created snapshots. */
+    private volatile File locSnpDir;
+
+    /**
+     * Working directory for loaded snapshots from the remote nodes and storing
+     * temporary partition delta-files of locally started snapshot process.
+     */
+    private File tmpWorkDir;
+
+    /** Factory to working with delta as file storage. */
+    private volatile FileIOFactory ioFactory = new RandomAccessFileIOFactory();
+
+    /** Factory to create page store for restore. */
+    private volatile BiFunction<Integer, Boolean, FilePageStoreFactory> storeFactory;
+
+    /** Snapshot thread pool to perform local partition snapshots. */
+    private ExecutorService snpRunner;
+
+    /** System discovery message listener. */
+    private DiscoveryEventListener discoLsnr;
+
+    /** Cluster snapshot operation requested by user. */
+    private ClusterSnapshotFuture clusterSnpFut;
+
+    /** Current snapshot operation on local node. */
+    private volatile SnapshotOperationRequest clusterSnpRq;
+
+    /** {@code true} if recovery process occurred for snapshot. */
+    private volatile boolean recovered;
+
+    /** Last seen cluster snapshot operation. */
+    private volatile ClusterSnapshotFuture lastSeenSnpFut = new ClusterSnapshotFuture();
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public IgniteSnapshotManager(GridKernalContext ctx) {
+        locBuff = ThreadLocal.withInitial(() ->
+            ByteBuffer.allocateDirect(ctx.config().getDataStorageConfiguration().getPageSize())
+                .order(ByteOrder.nativeOrder()));
+
+        startSnpProc = new DistributedProcess<>(ctx, START_SNAPSHOT, this::initLocalSnapshotStartStage,
+            this::processLocalSnapshotStartStageResult);
+
+        endSnpProc = new DistributedProcess<>(ctx, END_SNAPSHOT, this::initLocalSnapshotEndStage,
+            this::processLocalSnapshotEndStageResult);
+    }
+
+    /**
+     * @param snapshotCacheDir Snapshot directory to store files.
+     * @param partId Cache partition identifier.
+     * @return A file representation.
+     */
+    public static File partDeltaFile(File snapshotCacheDir, int partId) {
+        return new File(snapshotCacheDir, partDeltaFileName(partId));
+    }
+
+    /**
+     * @param partId Partition id.
+     * @return File name of delta partition pages.
+     */
+    public static String partDeltaFileName(int partId) {
+        assert partId <= MAX_PARTITION_ID || partId == INDEX_PARTITION;
+
+        return partId == INDEX_PARTITION ? INDEX_DELTA_NAME : String.format(PART_DELTA_TEMPLATE, partId);
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void start0() throws IgniteCheckedException {
+        super.start0();
+
+        GridKernalContext ctx = cctx.kernalContext();
+
+        if (ctx.clientNode())
+            return;
+
+        if (!CU.isPersistenceEnabled(ctx.config()))
+            return;
+
+        snpRunner = new IgniteThreadPoolExecutor(SNAPSHOT_RUNNER_THREAD_PREFIX,
+            cctx.igniteInstanceName(),
+            SNAPSHOT_THREAD_POOL_SIZE,
+            SNAPSHOT_THREAD_POOL_SIZE,
+            IgniteConfiguration.DFLT_THREAD_KEEP_ALIVE_TIME,
+            new LinkedBlockingQueue<>(),
+            SYSTEM_POOL,
+            new OomExceptionHandler(ctx));
+
+        assert cctx.pageStore() instanceof FilePageStoreManager;
+
+        FilePageStoreManager storeMgr = (FilePageStoreManager)cctx.pageStore();
+
+        pdsSettings = cctx.kernalContext().pdsFolderResolver().resolveFolders();
+
+        locSnpDir = resolveSnapshotWorkDirectory(ctx.config());
+        tmpWorkDir = Paths.get(storeMgr.workDir().getAbsolutePath(), DFLT_SNAPSHOT_TMP_DIR).toFile();
+
+        U.ensureDirectory(locSnpDir, "snapshot work directory", log);
+        U.ensureDirectory(tmpWorkDir, "temp directory for snapshot creation", log);
+
+        MetricRegistry mreg = cctx.kernalContext().metric().registry(SNAPSHOT_METRICS);
+
+        mreg.register("LastSnapshotStartTime", () -> lastSeenSnpFut.startTime,
+            "The system time approximated by 10 ms when the last cluster snapshot operation has been started.");
+        mreg.register("LastSnapshotEndTime", () -> lastSeenSnpFut.endTime,
+            "The system time approximated by 10 ms when the last cluster snapshot operation has been finished.");
+        mreg.register("LastSnapshotName", () -> lastSeenSnpFut.name, String.class,
+            "The name of last started cluster snapshot operation.");
+        mreg.register("LastSnapshotErrorMessage",
+            () -> lastSeenSnpFut.error() == null ? null : lastSeenSnpFut.error().getMessage(),
+            String.class,
+            "The error message of last started cluster snapshot operation which fail. This value will be 'null' " +
+                "if last snapshot operation completed successfully.");
+        mreg.register("localSnapshotList", this::getSnapshots, List.class,
+            "The list of all known snapshots currently saved on the local node with respect to " +
+                "configured via IgniteConfiguration a snapshot path.");
+
+        storeFactory = storeMgr::getPageStoreFactory;
+
+        cctx.exchange().registerExchangeAwareComponent(this);
+        ctx.internalSubscriptionProcessor().registerMetastorageListener(this);
+
+        // Receive remote snapshots requests.
+        cctx.gridIO().addMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC, new GridMessageListener() {
+            @Override public void onMessage(UUID nodeId, Object msg, byte plc) {
+                if (!busyLock.enterBusy())
+                    return;
+
+                try {
+                    if (msg instanceof SnapshotRequestMessage) {
+                        SnapshotRequestMessage reqMsg0 = (SnapshotRequestMessage)msg;
+                        String snpName = reqMsg0.snapshotName();
+
+                        synchronized (this) {
+                            SnapshotFutureTask task = lastScheduledRemoteSnapshotTask(nodeId);
+
+                            if (task != null) {
+                                // Task will also be removed from local map due to the listener on future done.
+                                task.cancel();
+
+                                log.info("Snapshot request has been cancelled due to another request received " +
+                                    "[prevSnpResp=" + task + ", msg0=" + reqMsg0 + ']');
+                            }
+                        }
+
+                        SnapshotFutureTask task = registerSnapshotTask(snpName,
+                            nodeId,
+                            reqMsg0.parts(),
+                            remoteSnapshotSender(snpName, nodeId));
+
+                        task.listen(f -> {
+                            if (f.error() == null)
+                                return;
+
+                            U.error(log, "Failed to process request of creating a snapshot " +
+                                "[from=" + nodeId + ", msg=" + reqMsg0 + ']', f.error());
+
+                            try {
+                                cctx.gridIO().sendToCustomTopic(nodeId,
+                                    DFLT_INITIAL_SNAPSHOT_TOPIC,
+                                    new SnapshotResponseMessage(reqMsg0.snapshotName(), f.error().getMessage()),
+                                    SYSTEM_POOL);
+                            }
+                            catch (IgniteCheckedException ex0) {
+                                U.error(log, "Fail to send the response message with processing snapshot request " +
+                                    "error [request=" + reqMsg0 + ", nodeId=" + nodeId + ']', ex0);
+                            }
+                        });
+
+                        task.start();
+                    }
+                    else if (msg instanceof SnapshotResponseMessage) {
+                        SnapshotResponseMessage respMsg0 = (SnapshotResponseMessage)msg;
+
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.snpName.equals(respMsg0.snapshotName())) {
+                            if (log.isInfoEnabled()) {
+                                log.info("A stale snapshot response message has been received. Will be ignored " +
+                                    "[fromNodeId=" + nodeId + ", response=" + respMsg0 + ']');
+                            }
+
+                            return;
+                        }
+
+                        if (respMsg0.errorMessage() != null) {
+                            fut0.onDone(new IgniteCheckedException("Request cancelled. The snapshot operation stopped " +
+                                "on the remote node with an error: " + respMsg0.errorMessage()));
+                        }
+                    }
+                }
+                catch (Throwable e) {
+                    U.error(log, "Processing snapshot request from remote node fails with an error", e);
+
+                    cctx.kernalContext().failure().process(new FailureContext(FailureType.CRITICAL_ERROR, e));
+                }
+                finally {
+                    busyLock.leaveBusy();
+                }
+            }
+        });
+
+        cctx.gridEvents().addDiscoveryEventListener(discoLsnr = (evt, discoCache) -> {
+            if (!busyLock.enterBusy())
+                return;
+
+            try {
+                UUID leftNodeId = evt.eventNode().id();
+
+                if (evt.type() == EVT_DISCOVERY_CUSTOM_EVT) {
+                    DiscoveryCustomEvent evt0 = (DiscoveryCustomEvent)evt;
+
+                    if (evt0.customMessage() instanceof InitMessage) {
+                        InitMessage<?> msg = (InitMessage<?>)evt0.customMessage();
+
+                        // This happens when #takeSnapshot() method already invoked and distributed process
+                        // starts its action.
+                        if (msg.type() == START_SNAPSHOT.ordinal()) {
+                            assert clusterSnpRq != null ||
+                                !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()) : evt;
+
+                            DiscoveryCustomEvent customEvt = new DiscoveryCustomEvent();
+
+                            customEvt.node(evt0.node());
+                            customEvt.eventNode(evt0.eventNode());
+                            customEvt.affinityTopologyVersion(evt0.affinityTopologyVersion());
+                            customEvt.customMessage(new SnapshotStartDiscoveryMessage(discoCache, msg.processId()));
+
+                            // Handle new event inside discovery thread, so no guarantees will be violated.
+                            cctx.exchange().onDiscoveryEvent(customEvt, discoCache);
+                        }
+                    }
+                }
+                else if (evt.type() == EVT_NODE_LEFT || evt.type() == EVT_NODE_FAILED) {
+                    SnapshotOperationRequest snpRq = clusterSnpRq;
+
+                    for (SnapshotFutureTask sctx : locSnpTasks.values()) {
+                        if (sctx.sourceNodeId().equals(leftNodeId) ||
+                            (snpRq != null &&
+                                snpRq.snpName.equals(sctx.snapshotName()) &&
+                                snpRq.bltNodes.contains(leftNodeId))) {
+                            sctx.acceptException(new ClusterTopologyCheckedException("The node which requested snapshot " +
+                                "creation has left the grid"));
+                        }
+                    }
+
+                    RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                    if (snpTrFut != null && snpTrFut.rmtNodeId.equals(leftNodeId)) {
+                        snpTrFut.onDone(new ClusterTopologyCheckedException("The node from which a snapshot has been " +
+                            "requested left the grid"));
+                    }
+                }
+            }
+            finally {
+                busyLock.leaveBusy();
+            }
+        }, EVT_NODE_LEFT, EVT_NODE_FAILED, EVT_DISCOVERY_CUSTOM_EVT);
+
+        // Remote snapshot handler.
+        cctx.kernalContext().io().addTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC, new TransmissionHandler() {
+            @Override public void onEnd(UUID nodeId) {
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                assert snpTrFut.stores.isEmpty() : snpTrFut.stores.entrySet();
+                assert snpTrFut.partsLeft == 0 : snpTrFut;
+
+                snpTrFut.onDone();
+
+                log.info("Requested snapshot from remote node has been fully received " +
+                    "[snpName=" + snpTrFut.snpName + ", snpTrans=" + snpTrFut + ']');
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onException(UUID nodeId, Throwable err) {
+                RemoteSnapshotFuture fut = rmtSnpReq.get();
+
+                if (fut == null)
+                    return;
+
+                if (fut.rmtNodeId.equals(nodeId))
+                    fut.onDone(err);
+            }
+
+            /** {@inheritDoc} */
+            @Override public String filePath(UUID nodeId, TransmissionMeta fileMeta) {
+                Integer partId = (Integer)fileMeta.params().get(SNP_PART_ID_PARAM);
+                String rmtDbNodePath = (String)fileMeta.params().get(SNP_DB_NODE_PATH_PARAM);
+                String cacheDirName = (String)fileMeta.params().get(SNP_CACHE_DIR_NAME_PARAM);
+
+                RemoteSnapshotFuture transFut = resolve(nodeId, fileMeta);
+
+                try {
+                    File cacheDir = U.resolveWorkDirectory(tmpWorkDir.getAbsolutePath(),
+                        Paths.get(transFut.snpName, rmtDbNodePath, cacheDirName).toString(),
+                        false);
+
+                    return new File(cacheDir, getPartitionFileName(partId)).getAbsolutePath();
+                }
+                catch (IgniteCheckedException e) {
+                    throw new IgniteException(e);
+                }
+            }
+
+            /**
+             * @param nodeId Remote node id.
+             * @param meta Transmission meta.
+             * @return Resolved transmission future.
+             */
+            private RemoteSnapshotFuture resolve(UUID nodeId, TransmissionMeta meta) {
+                String snpName = (String)meta.params().get(SNP_NAME_PARAM);
+                Integer partsCnt = (Integer)meta.params().get(SNP_PARTITIONS_CNT);
+
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                if (snpTrFut == null || !snpTrFut.snpName.equals(snpName)) {
+                    throw new TransmissionCancelledException("Stale snapshot transmission will be ignored " +
+                        "[snpName=" + snpName + ", meta=" + meta + ", snpTrFut=" + snpTrFut + ']');
+                }
+
+                assert snpTrFut.snpName.equals(snpName) && snpTrFut.rmtNodeId.equals(nodeId) :
+                    "Another transmission in progress [snpTrFut=" + snpTrFut + ", nodeId=" + snpName + ']';
+
+                if (snpTrFut.partsLeft == -1)
+                    snpTrFut.partsLeft = partsCnt;
+
+                return snpTrFut;
+            }
+
+            /**
+             * @param snpTrans Current snapshot transmission.
+             * @param grpPartId Pair of group id and its partition id.
+             */
+            private void finishRecover(RemoteSnapshotFuture snpTrans, GroupPartitionId grpPartId) {
+                FilePageStore pageStore = null;
+
+                try {
+                    pageStore = snpTrans.stores.remove(grpPartId);
+
+                    pageStore.finishRecover();
+
+                    snpTrans.partConsumer.accept(new File(pageStore.getFileAbsolutePath()), grpPartId);
+
+                    snpTrans.partsLeft--;
+                }
+                catch (StorageException e) {
+                    throw new IgniteException(e);
+                }
+                finally {
+                    U.closeQuiet(pageStore);
+                }
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<ByteBuffer> chunkHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+
+                RemoteSnapshotFuture snpTrFut = resolve(nodeId, initMeta);
+
+                GroupPartitionId grpPartId = new GroupPartitionId(grpId, partId);
+                FilePageStore pageStore = snpTrFut.stores.get(grpPartId);
+
+                if (pageStore == null) {
+                    throw new IgniteException("Partition must be loaded before applying snapshot delta pages " +
+                        "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                pageStore.beginRecover();
+
+                // No snapshot delta pages received. Finalize recovery.
+                if (initMeta.count() == 0)
+                    finishRecover(snpTrFut, grpPartId);
+
+                return new Consumer<ByteBuffer>() {
+                    final LongAdder transferred = new LongAdder();
+
+                    @Override public void accept(ByteBuffer buff) {
+                        try {
+                            assert initMeta.count() != 0 : initMeta;
+
+                            RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                            if (fut0 == null || !fut0.equals(snpTrFut) || fut0.isCancelled()) {
+                                throw new TransmissionCancelledException("Snapshot request is cancelled " +
+                                    "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                            }
+
+                            pageStore.write(PageIO.getPageId(buff), buff, 0, false);
+
+                            transferred.add(buff.capacity());
+
+                            if (transferred.longValue() == initMeta.count())
+                                finishRecover(snpTrFut, grpPartId);
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                    }
+                };
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<File> fileHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+                String snpName = (String)initMeta.params().get(SNP_NAME_PARAM);
+
+                assert grpId != null;
+                assert partId != null;
+                assert snpName != null;
+                assert storeFactory != null;
+
+                RemoteSnapshotFuture transFut = rmtSnpReq.get();
+
+                if (transFut == null) {
+                    throw new IgniteException("Snapshot transmission with given name doesn't exists " +
+                        "[snpName=" + snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                return new Consumer<File>() {
+                    @Override public void accept(File file) {
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.equals(transFut) || fut0.isCancelled()) {
+                            throw new TransmissionCancelledException("Snapshot request is cancelled [snpName=" + snpName +
+                                ", grpId=" + grpId + ", partId=" + partId + ']');
+                        }
+
+                        busyLock.enterBusy();
+
+                        try {
+                            FilePageStore pageStore = (FilePageStore)storeFactory
+                                .apply(grpId, false)
+                                .createPageStore(getFlagByPartId(partId),
+                                    file::toPath,
+                                    new LongAdderMetric("NO_OP", null));
+
+                            transFut.stores.put(new GroupPartitionId(grpId, partId), pageStore);
+
+                            pageStore.init();
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                        finally {
+                            busyLock.leaveBusy();
+                        }
+                    }
+                };
+            }
+        });
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void stop0(boolean cancel) {
+        busyLock.block();
+
+        try {
+            // Try stop all snapshot processing if not yet.
+            for (SnapshotFutureTask sctx : locSnpTasks.values())
+                sctx.acceptException(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+            locSnpTasks.clear();
+
+            RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+            if (snpTrFut != null)
+                snpTrFut.cancel();
+
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null) {
+                    clusterSnpFut.onDone(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+                    clusterSnpFut = null;
+                }
+            }
+
+            if (snpRunner != null)
+                snpRunner.shutdownNow();
+
+            cctx.kernalContext().io().removeMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC);
+            cctx.kernalContext().io().removeTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC);
+
+            if (discoLsnr != null)
+                cctx.kernalContext().event().removeDiscoveryEventListener(discoLsnr);
+
+            cctx.exchange().unregisterExchangeAwareComponent(this);
+        }
+        finally {
+            busyLock.unblock();
+        }
+    }
+
+    /**
+     * @param snpDir Snapshot dire
+     * @param folderName Local node folder name (see {@link U#maskForFileName} with consistent id).
+     */
+    public static void deleteSnapshot(File snpDir, String folderName) {
+        if (!snpDir.exists())
+            return;
+
+        assert snpDir.isDirectory() : snpDir;
+
+        try {
+            File binDir = resolveBinaryWorkDir(snpDir.getAbsolutePath(), folderName);
+            File dbDir = U.resolveWorkDirectory(snpDir.getAbsolutePath(), databaseRelativePath(folderName), false);
+
+            U.delete(binDir);
+            U.delete(dbDir);
+
+            File marshDir = mappingFileStoreWorkDir(snpDir.getAbsolutePath());
+
+            // Concurrently traverse the snapshot marshaller directory and delete all files.
+            Files.walkFileTree(marshDir.toPath(), new SimpleFileVisitor<Path>() {
+                @Override public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException {
+                    U.delete(file);
+
+                    return FileVisitResult.CONTINUE;
+                }
+
+                @Override public FileVisitResult visitFileFailed(Path file, IOException exc) {
+                    // Skip files which can be concurrently removed from FileTree.
+                    return FileVisitResult.CONTINUE;
+                }
+            });
+
+            File db = new File(snpDir, DB_DEFAULT_FOLDER);
+
+            if (!db.exists() || db.list().length == 0)
+                U.delete(snpDir);
+        }
+        catch (IOException | IgniteCheckedException e) {
+            throw new IgniteException(e);
+        }
+    }
+
+    /**
+     * @param snpName Snapshot name.
+     * @return Local snapshot directory for snapshot with given name.
+     */
+    public File snapshotLocalDir(String snpName) {
+        assert locSnpDir != null;
+
+        return new File(locSnpDir, snpName);
+    }
+
+    /**
+     * @return Node snapshot working directory.
+     */
+    public File snapshotTmpDir() {
+        assert tmpWorkDir != null;
+
+        return tmpWorkDir;
+    }
+
+    /**
+     * @param req Request on snapshot creation.
+     * @return Future which will be completed when a snapshot has been started.
+     */
+    private IgniteInternalFuture<SnapshotOperationResponse> initLocalSnapshotStartStage(SnapshotOperationRequest req) {
+        if (cctx.kernalContext().clientNode() ||
+            !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()))
+            return new GridFinishedFuture<>();
+
+        // Executed inside discovery notifier thread, prior to firing discovery custom event,
+        // so it is safe to set new snapshot task inside this method without synchronization.
+        if (clusterSnpRq != null) {
+            return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot operation has been rejected. " +
+                "Another snapshot operation in progress [req=" + req + ", curr=" + clusterSnpRq + ']'));
+        }
+
+        // Collection of pairs group and appropriate cache partition to be snapshot.
+        Map<Integer, Set<Integer>> parts = new HashMap<>();
+
+        for (Integer grpId : req.grpIds)
+            parts.put(grpId, null);
+
+        SnapshotFutureTask task0 = registerSnapshotTask(req.snpName,
+            req.srcNodeId,
+            parts,
+            locSndrFactory.apply(req.snpName));
+
+        clusterSnpRq = req;
+
+        return task0.chain(f -> new SnapshotOperationResponse());
+    }
+
+    /**
+     * @param id Request id.
+     * @param res Results.
+     * @param err Errors.
+     */
+    private void processLocalSnapshotStartStageResult(UUID id, Map<UUID, SnapshotOperationResponse> res, Map<UUID, Exception> err) {
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        if (snpRq == null)
+            return;
+
+        if (!snpRq.rqId.equals(id)) {
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null && clusterSnpFut.rqId.equals(id)) {
+                    clusterSnpFut.onDone(new IgniteCheckedException("Snapshot operation failed due to another snapshot " +
+                        "operation in progress: " + snpRq.snpName));
+
+                    clusterSnpFut = null;
+                }
+
+                return;
+            }
+        }
+
+        if (isLocalNodeCoordinator(cctx.discovery())) {
+            Set<UUID> missed = new HashSet<>(snpRq.bltNodes);
+            missed.removeAll(res.keySet());
+            missed.removeAll(err.keySet());
+
+            snpRq.hasErr = !F.isEmpty(err) || !missed.isEmpty();
+
+            if (snpRq.hasErr) {
+                U.warn(log, "Execution of local snapshot tasks fails or them haven't been executed " +
+                    "due to some of nodes left the cluster. Uncompleted snapshot will be deleted " +
+                    "[err=" + err + ", missed=" + missed + ']');
+            }
+
+            endSnpProc.start(UUID.randomUUID(), snpRq);
+        }
+    }
+
+    /**
+     * @param req Request on snapshot creation.
+     * @return Future which will be completed when the snapshot will be finalized.
+     */
+    private IgniteInternalFuture<SnapshotOperationResponse> initLocalSnapshotEndStage(SnapshotOperationRequest req) {
+        if (clusterSnpRq == null)
+            return new GridFinishedFuture<>(new SnapshotOperationResponse());
+
+        try {
+            if (req.hasErr)
+                deleteSnapshot(snapshotLocalDir(req.snpName), pdsSettings.folderName());
+
+            removeLastMetaStorageKey();
+        }
+        catch (Exception e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        return new GridFinishedFuture<>(new SnapshotOperationResponse());
+    }
+
+    /**
+     * @param id Request id.
+     * @param res Results.
+     * @param err Errors.
+     */
+    private void processLocalSnapshotEndStageResult(UUID id, Map<UUID, SnapshotOperationResponse> res, Map<UUID, Exception> err) {
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        if (snpRq == null)
+            return;
+
+        Set<UUID> endFail = new HashSet<>(snpRq.bltNodes);
+        endFail.removeAll(res.keySet());
+
+        clusterSnpRq = null;
+
+        synchronized (snpOpMux) {
+            if (clusterSnpFut != null) {
+                if (endFail.isEmpty() && !snpRq.hasErr) {
+                    clusterSnpFut.onDone();
+
+                    if (log.isInfoEnabled())
+                        log.info("Cluster-wide snapshot operation finished successfully [req=" + snpRq + ']');
+                }
+                else {
+                    clusterSnpFut.onDone(new IgniteCheckedException("Snapshot creation has been finished with an error. " +
+                        "Local snapshot tasks may not finished completely or finalizing results fails " +
+                        "[hasErr" + snpRq.hasErr + ", fail=" + endFail + ']'));
+                }
+
+                clusterSnpFut = null;
+            }
+        }
+    }
+
+    /**
+     * @return {@code True} if snapshot operation is in progress.
+     */
+    public boolean isSnapshotCreating() {
+        if (clusterSnpRq != null)
+            return true;
+
+        synchronized (snpOpMux) {
+            return clusterSnpRq != null || clusterSnpFut != null;
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public List<String> getSnapshots() {
+        if (cctx.kernalContext().clientNode())
+            throw new UnsupportedOperationException("Client and daemon nodes can not perform this operation.");
+
+        synchronized (snpOpMux) {
+            return Arrays.stream(locSnpDir.listFiles(File::isDirectory))
+                .map(File::getName)
+                .collect(Collectors.toList());
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public IgniteFuture<Void> createSnapshot(String name) {
+        A.notNullOrEmpty(name, "name");
+
+        try {
+            if (cctx.kernalContext().clientNode())
+                throw new UnsupportedOperationException("Client and daemon nodes can not perform this operation.");
+
+            if (!IgniteFeatures.allNodesSupports(cctx.discovery().allNodes(), PERSISTENCE_CACHE_SNAPSHOT))
+                throw new IgniteException("Not all nodes in the cluster support a snapshot operation.");
+
+            if (!active(cctx.kernalContext().state().clusterState().state()))
+                throw new IgniteException("Snapshot operation has been rejected. The cluster is inactive.");
+
+            DiscoveryDataClusterState clusterState = cctx.kernalContext().state().clusterState();
+
+            if (!clusterState.hasBaselineTopology())
+                throw new IgniteException("Snapshot operation has been rejected. The baseline topology is not configured for cluster.");
+
+            ClusterSnapshotFuture snpFut0;
+
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null && !clusterSnpFut.isDone())
+                    throw new IgniteException("Create snapshot request has been rejected. The previous snapshot operation was not completed.");
+
+                if (clusterSnpRq != null)
+                    throw new IgniteException("Create snapshot request has been rejected. Parallel snapshot processes are not allowed.");
+
+                if (getSnapshots().contains(name))
+                    throw new IgniteException("Create snapshot request has been rejected. Snapshot with given name already exists.");
+
+                snpFut0 = new ClusterSnapshotFuture(UUID.randomUUID(), name);
+
+                clusterSnpFut = snpFut0;
+                lastSeenSnpFut = snpFut0;
+            }
+
+            List<Integer> grps = cctx.cache().persistentGroups().stream()
+                .filter(g -> cctx.cache().cacheType(g.cacheOrGroupName()) == CacheType.USER)
+                .filter(g -> !g.config().isEncryptionEnabled())
+                .map(CacheGroupDescriptor::groupId)
+                .collect(Collectors.toList());
+
+            List<ClusterNode> srvNodes = cctx.discovery().serverNodes(AffinityTopologyVersion.NONE);
+
+            startSnpProc.start(snpFut0.rqId, new SnapshotOperationRequest(snpFut0.rqId,
+                cctx.localNodeId(),
+                name,
+                grps,
+                new HashSet<>(F.viewReadOnly(srvNodes,
+                    F.node2id(),
+                    (node) -> CU.baselineNode(node, clusterState)))));
+
+            if (log.isInfoEnabled())
+                log.info("Cluster-wide snapshot operation started [snpName=" + name + ", grps=" + grps + ']');
+
+            return new IgniteFutureImpl<>(snpFut0);
+        }
+        catch (Exception e) {
+            U.error(log, "Start snapshot operation failed", e);
+
+            lastSeenSnpFut = new ClusterSnapshotFuture(name, e);
+
+            return new IgniteFinishedFutureImpl<>(e);
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onReadyForReadWrite(ReadWriteMetastorage metaStorage) throws IgniteCheckedException {
+        synchronized (snpOpMux) {
+            this.metaStorage = metaStorage;
+
+            if (recovered)
+                removeLastMetaStorageKey();
+
+            recovered = false;
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onReadyForRead(ReadOnlyMetastorage metaStorage) throws IgniteCheckedException {
+        // Snapshot which has not been completed due to the local node crashed must be deleted.
+        String snpName = (String)metaStorage.read(SNP_RUNNING_KEY);
+
+        if (snpName == null)
+            return;
+
+        recovered = true;
+
+        for (File tmp : snapshotTmpDir().listFiles())
+            U.delete(tmp);
+
+        deleteSnapshot(snapshotLocalDir(snpName), pdsSettings.folderName());
+
+        if (log.isInfoEnabled()) {
+            log.info("Previous attempt to create snapshot fail due to the local node crash. All resources " +
+                "related to snapshot operation have been deleted: " + snpName);
+        }
+    }
+
+    /**
+     * @param evt Discovery event to check.
+     * @return {@code true} if exchange started by snapshot operation.
+     */
+    public static boolean isSnapshotOperation(DiscoveryEvent evt) {
+        return !evt.eventNode().isClient() &&
+            evt.type() == EVT_DISCOVERY_CUSTOM_EVT &&
+            ((DiscoveryCustomEvent)evt).customMessage() instanceof SnapshotStartDiscoveryMessage;
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onDoneBeforeTopologyUnlock(GridDhtPartitionsExchangeFuture fut) {
+        if (clusterSnpRq == null || cctx.kernalContext().clientNode())
+            return;
+
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        SnapshotFutureTask task = locSnpTasks.get(snpRq.snpName);
+
+        if (task == null)
+            return;
+
+        if (task.start()) {
+            cctx.database().forceCheckpoint(String.format("Start snapshot operation: %s", snpRq.snpName));
+
+            // schedule task on checkpoint and wait when it starts
+            try {
+                task.awaitStarted();
+            }
+            catch (IgniteCheckedException e) {
+                U.error(log, "Fail to wait while cluster-wide snapshot operation started", e);
+            }
+        }
+    }
+
+    /**
+     * @param parts Collection of pairs group and appropriate cache partition to be snapshot.
+     * @param rmtNodeId The remote node to connect to.
+     * @param partConsumer Received partition handler.
+     * @return Future which will be completed when requested snapshot fully received.
+     */
+    public IgniteInternalFuture<Void> createRemoteSnapshot(
+        UUID rmtNodeId,
+        Map<Integer, Set<Integer>> parts,
+        BiConsumer<File, GroupPartitionId> partConsumer
+    ) {
+        assert partConsumer != null;
+
+        ClusterNode rmtNode = cctx.discovery().node(rmtNodeId);
+
+        if (!nodeSupports(rmtNode, PERSISTENCE_CACHE_SNAPSHOT))
+            return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot on remote node is not supported: " + rmtNode.id()));
+
+        if (rmtNode == null) {
+            return new GridFinishedFuture<>(new ClusterTopologyCheckedException("Snapshot request cannot be performed. " +
+                "Remote node left the grid [rmtNodeId=" + rmtNodeId + ']'));
+        }
+
+        String snpName = RMT_SNAPSHOT_PREFIX + UUID.randomUUID().toString();
+
+        RemoteSnapshotFuture snpTransFut = new RemoteSnapshotFuture(rmtNodeId, snpName, partConsumer);
+
+        busyLock.enterBusy();
+        SnapshotRequestMessage msg0;
+
+        try {
+            msg0 = new SnapshotRequestMessage(snpName, parts);
+
+            RemoteSnapshotFuture fut = rmtSnpReq.get();
+
+            try {
+                if (fut != null)
+                    fut.get(DFLT_SNAPSHOT_TIMEOUT, TimeUnit.MILLISECONDS);
+            }
+            catch (IgniteCheckedException e) {
+                if (log.isInfoEnabled())
+                    log.info("The previous snapshot request finished with an exception:" + e.getMessage());
+            }
+
+            try {
+                if (rmtSnpReq.compareAndSet(null, snpTransFut)) {
+                    cctx.gridIO().sendOrderedMessage(rmtNode, DFLT_INITIAL_SNAPSHOT_TOPIC, msg0, SYSTEM_POOL,
+                        Long.MAX_VALUE, true);
+                }
+                else
+                    return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot request has been concurrently interrupted."));
+
+            }
+            catch (IgniteCheckedException e) {
+                rmtSnpReq.compareAndSet(snpTransFut, null);
+
+                return new GridFinishedFuture<>(e);
+            }
+        }
+        finally {
+            busyLock.leaveBusy();
+        }
+
+        if (log.isInfoEnabled()) {
+            log.info("Snapshot request is sent to the remote node [rmtNodeId=" + rmtNodeId +
+                ", msg0=" + msg0 + ", snpTransFut=" + snpTransFut +
+                ", topVer=" + cctx.discovery().topologyVersionEx() + ']');
+        }
+
+        return snpTransFut;
+    }
+
+    /**
+     * @param grps List of cache groups which will be destroyed.
+     */
+    public void onCacheGroupsStopped(List<Integer> grps) {
+        for (SnapshotFutureTask sctx : locSnpTasks.values()) {
+            Set<Integer> retain = new HashSet<>(grps);
+            retain.retainAll(sctx.affectedCacheGroups());
+
+            if (!retain.isEmpty()) {
+                sctx.acceptException(new IgniteCheckedException("Snapshot has been interrupted due to some of the required " +
+                    "cache groups stopped: " + retain));
+            }
+        }
+    }
+
+    /**
+     * @param snpName Unique snapshot name.
+     * @param srcNodeId Node id which cause snapshot operation.
+     * @param parts Collection of pairs group and appropriate cache partition to be snapshot.
+     * @param snpSndr Factory which produces snapshot receiver instance.
+     * @return Snapshot operation task which should be registered on checkpoint to run.
+     */
+    SnapshotFutureTask registerSnapshotTask(
+        String snpName,
+        UUID srcNodeId,
+        Map<Integer, Set<Integer>> parts,
+        SnapshotSender snpSndr
+    ) {
+        if (!busyLock.enterBusy())
+            return new SnapshotFutureTask(new IgniteCheckedException("Snapshot manager is stopping [locNodeId=" + cctx.localNodeId() + ']'));
+
+        try {
+            if (locSnpTasks.containsKey(snpName))
+                return new SnapshotFutureTask(new IgniteCheckedException("Snapshot with requested name is already scheduled: " + snpName));
+
+            SnapshotFutureTask snpFutTask;
+
+            SnapshotFutureTask prev = locSnpTasks.putIfAbsent(snpName,
+                snpFutTask = new SnapshotFutureTask(cctx,
+                    srcNodeId,
+                    snpName,
+                    tmpWorkDir,
+                    ioFactory,
+                    snpSndr,
+                    parts,
+                    locBuff));
+
+            if (prev != null)
+                return new SnapshotFutureTask(new IgniteCheckedException("Snapshot with requested name is already scheduled: " + snpName));
+
+            if (log.isInfoEnabled()) {
+                log.info("Snapshot task has been registered on local node [sctx=" + this +
+                    ", topVer=" + cctx.discovery().topologyVersionEx() + ']');
+            }
+
+            snpFutTask.listen(f -> locSnpTasks.remove(snpName));
+
+            return snpFutTask;
+        }
+        finally {
+            busyLock.leaveBusy();
+        }
+    }
+
+    /**
+     * @param factory Factory which produces {@link LocalSnapshotSender} implementation.
+     */
+    void setLocalSnapshotSenderFactory(Function<String, SnapshotSender> factory) {
+        locSndrFactory = factory;
+    }
+
+    /**
+     * @return Factory which produces {@link LocalSnapshotSender} implementation.
+     */
+    Function<String, SnapshotSender> localSnapshotSenderFactory() {
+        return LocalSnapshotSender::new;
+    }
+
+    /**
+     * @param snpName Snapshot name.
+     * @param rmtNodeId Remote node id to send snapshot to.
+     * @return Snapshot sender instance.
+     */
+    SnapshotSender remoteSnapshotSender(String snpName, UUID rmtNodeId) {
+        // Remote snapshots can be send only by single threaded executor since only one transmissionSender created.
+        return new RemoteSnapshotSender(log,
+            new SequentialExecutorWrapper(log, snpRunner),
+            () -> databaseRelativePath(pdsSettings.folderName()),
+            cctx.gridIO().openTransmissionSender(rmtNodeId, DFLT_INITIAL_SNAPSHOT_TOPIC),
+            snpName);
+    }
+
+    /** Snapshot finished successfully or already restored. Key can be removed. */
+    private void removeLastMetaStorageKey() throws IgniteCheckedException {
+        cctx.database().checkpointReadLock();
+
+        try {
+            metaStorage.remove(SNP_RUNNING_KEY);
+        }
+        finally {
+            cctx.database().checkpointReadUnlock();
+        }
+    }
+
+    /**
+     * @return The executor service used to run snapshot tasks.
+     */
+    ExecutorService snapshotExecutorService() {
+        assert snpRunner != null;
+
+        return snpRunner;
+    }
+
+    /**
+     * @param ioFactory Factory to create IO interface over a page stores.
+     */
+    void ioFactory(FileIOFactory ioFactory) {
+        this.ioFactory = ioFactory;
+    }
+
+    /**
+     * @param nodeId Remote node id on which requests has been registered.
+     * @return Snapshot future related to given node id.
+     */
+    SnapshotFutureTask lastScheduledRemoteSnapshotTask(UUID nodeId) {
+        return locSnpTasks.values().stream()
+            .filter(t -> t.type() == RemoteSnapshotSender.class && t.sourceNodeId().equals(nodeId))
+            .findFirst()
+            .orElse(null);
+    }
+
+    /**
+     * @return Relative configured path of persistence data storage directory for the local node.
+     * Example: {@code snapshotWorkDir/db/IgniteNodeName0}
+     */
+    static String databaseRelativePath(String folderName) {
+        return Paths.get(DB_DEFAULT_FOLDER, folderName).toString();
+    }
+
+    /**
+     * @param cfg Ignite configuration.
+     * @return Snapshot work path.
+     */
+    static File resolveSnapshotWorkDirectory(IgniteConfiguration cfg) {
+        try {
+            return cfg.getSnapshotPath() == null ?
 
 Review comment:
   Use just one `U.resolveWorkDirectory(cfg.getWorkDirectory(), cfg.getSnapshotPath(), false);`
   Set `DFLT_SNAPSHOT_DIRECTORY` as `getSnapshotPath` default value (not null), fix `getSnapshotPath` and `getSnapshotPath` javadoc

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r410250317
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManager.java
 ##########
 @@ -0,0 +1,1944 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.channels.FileChannel;
+import java.nio.file.FileVisitResult;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.SimpleFileVisitor;
+import java.nio.file.attribute.BasicFileAttributes;
+import java.util.ArrayDeque;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Queue;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.Executor;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.concurrent.atomic.LongAdder;
+import java.util.function.BiConsumer;
+import java.util.function.BiFunction;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.events.DiscoveryEvent;
+import org.apache.ignite.failure.FailureContext;
+import org.apache.ignite.failure.FailureType;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.GridTopic;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.events.DiscoveryCustomEvent;
+import org.apache.ignite.internal.managers.communication.GridIoManager;
+import org.apache.ignite.internal.managers.communication.GridMessageListener;
+import org.apache.ignite.internal.managers.communication.TransmissionCancelledException;
+import org.apache.ignite.internal.managers.communication.TransmissionHandler;
+import org.apache.ignite.internal.managers.communication.TransmissionMeta;
+import org.apache.ignite.internal.managers.communication.TransmissionPolicy;
+import org.apache.ignite.internal.managers.eventstorage.DiscoveryEventListener;
+import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.CacheType;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.PartitionsExchangeAware;
+import org.apache.ignite.internal.processors.cache.persistence.StorageException;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.filename.PdsFolderSettings;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.MetastorageLifecycleListener;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadOnlyMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadWriteMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.tree.io.PageIO;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.processors.marshaller.MappedName;
+import org.apache.ignite.internal.processors.metric.MetricRegistry;
+import org.apache.ignite.internal.processors.metric.impl.LongAdderMetric;
+import org.apache.ignite.internal.util.GridBusyLock;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.distributed.InitMessage;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.lang.GridClosureException;
+import org.apache.ignite.internal.util.tostring.GridToStringInclude;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.A;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.S;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.thread.IgniteThreadPoolExecutor;
+import org.apache.ignite.thread.OomExceptionHandler;
+import org.jetbrains.annotations.Nullable;
+
+import static java.nio.file.StandardOpenOption.READ;
+import static org.apache.ignite.cluster.ClusterState.active;
+import static org.apache.ignite.events.EventType.EVT_NODE_FAILED;
+import static org.apache.ignite.events.EventType.EVT_NODE_LEFT;
+import static org.apache.ignite.internal.IgniteFeatures.PERSISTENCE_CACHE_SNAPSHOT;
+import static org.apache.ignite.internal.IgniteFeatures.nodeSupports;
+import static org.apache.ignite.internal.MarshallerContextImpl.mappingFileStoreWorkDir;
+import static org.apache.ignite.internal.MarshallerContextImpl.saveMappings;
+import static org.apache.ignite.internal.events.DiscoveryCustomEvent.EVT_DISCOVERY_CUSTOM_EVT;
+import static org.apache.ignite.internal.managers.communication.GridIoPolicy.SYSTEM_POOL;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.INDEX_PARTITION;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.MAX_PARTITION_ID;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.resolveBinaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.INDEX_FILE_NAME;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.PART_FILE_TEMPLATE;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFile;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFileName;
+import static org.apache.ignite.internal.processors.cache.persistence.filename.PdsConsistentIdProcessor.DB_DEFAULT_FOLDER;
+import static org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId.getFlagByPartId;
+import static org.apache.ignite.internal.util.IgniteUtils.isLocalNodeCoordinator;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.END_SNAPSHOT;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.START_SNAPSHOT;
+
+/**
+ * Internal implementation of snapshot operations over persistence caches.
+ * <p>
+ * There are two major actions available:
+ * <ul>
+ *     <li>Create snapshot of the whole cluster cache groups by triggering PME to achieve consistency.</li>
+ *     <li>Create local snapshot of requested cache groups and send it to the node which request this operation.
+ *     Cache groups will be transmitted using internal API for transferring files. See {@link TransmissionHandler}.</li>
+ * </ul>
+ */
+public class IgniteSnapshotManager extends GridCacheSharedManagerAdapter
+    implements IgniteSnapshot, PartitionsExchangeAware, MetastorageLifecycleListener {
+    /** File with delta pages suffix. */
+    public static final String DELTA_SUFFIX = ".delta";
+
+    /** File name template consists of delta pages. */
+    public static final String PART_DELTA_TEMPLATE = PART_FILE_TEMPLATE + DELTA_SUFFIX;
+
+    /** File name template for index delta pages. */
+    public static final String INDEX_DELTA_NAME = INDEX_FILE_NAME + DELTA_SUFFIX;
+
+    /** Text Reason for checkpoint to start snapshot operation. */
+    public static final String CP_SNAPSHOT_REASON = "Checkpoint started to enforce snapshot operation: %s";
+
+    /** Name prefix for each remote snapshot operation. */
+    public static final String RMT_SNAPSHOT_PREFIX = "snapshot_";
+
+    /** Default snapshot directory for loading remote snapshots. */
+    public static final String DFLT_SNAPSHOT_TMP_DIR = "snp";
+
+    /** Timeout in millisecond for snapshot operations. */
+    public static final long DFLT_SNAPSHOT_TIMEOUT = 15_000L;
+
+    /** Snapshot in progress error message. */
+    public static final String SNP_IN_PROGRESS_ERR_MSG = "Operation rejected due to the snapshot operation in progress.";
+
+    /** Error message to finalize snapshot tasks. */
+    public static final String SNP_NODE_STOPPING_ERR_MSG = "Snapshot has been cancelled due to the local node " +
+        "is stopping";
+
+    /** Metastorage key to save currently running snapshot. */
+    public static final String SNP_RUNNING_KEY = "snapshot-running";
+
+    /** Snapshot metrics prefix. */
+    public static final String SNAPSHOT_METRICS = "snapshot";
+
+    /** Prefix for snapshot threads. */
+    private static final String SNAPSHOT_RUNNER_THREAD_PREFIX = "snapshot-runner";
+
+    /** Total number of thread to perform local snapshot. */
+    private static final int SNAPSHOT_THREAD_POOL_SIZE = 4;
+
+    /** Default snapshot topic to receive snapshots from remote node. */
+    private static final Object DFLT_INITIAL_SNAPSHOT_TOPIC = GridTopic.TOPIC_SNAPSHOT.topic("rmt_snp");
+
+    /** File transmission parameter of cache group id. */
+    private static final String SNP_GRP_ID_PARAM = "grpId";
+
+    /** File transmission parameter of cache partition id. */
+    private static final String SNP_PART_ID_PARAM = "partId";
+
+    /** File transmission parameter of node-sender directory path with its consistentId (e.g. db/IgniteNode0). */
+    private static final String SNP_DB_NODE_PATH_PARAM = "dbNodePath";
+
+    /** File transmission parameter of a cache directory with is currently sends its partitions. */
+    private static final String SNP_CACHE_DIR_NAME_PARAM = "cacheDirName";
+
+    /** Snapshot parameter name for a file transmission. */
+    private static final String SNP_NAME_PARAM = "snpName";
+
+    /** Total snapshot files count which receiver should expect to receive. */
+    private static final String SNP_PARTITIONS_CNT = "partsCnt";
+
+    /**
+     * Local buffer to perform copy-on-write operations with pages for {@code SnapshotFutureTask.PageStoreSerialWriter}s.
+     * It is important to have only only buffer per thread (instead of creating each buffer per
+     * each {@code SnapshotFutureTask.PageStoreSerialWriter}) this is redundant and can lead to OOM errors. Direct buffer
+     * deallocate only when ByteBuffer is garbage collected, but it can get out of off-heap memory before it.
+     */
+    private final ThreadLocal<ByteBuffer> locBuff;
+
+    /** Map of registered cache snapshot processes and their corresponding contexts. */
+    private final ConcurrentMap<String, SnapshotFutureTask> locSnpTasks = new ConcurrentHashMap<>();
+
+    /** Lock to protect the resources is used. */
+    private final GridBusyLock busyLock = new GridBusyLock();
+
+    /** Requested snapshot from remote node. */
+    private final AtomicReference<RemoteSnapshotFuture> rmtSnpReq = new AtomicReference<>();
+
+    /** Mutex used to order cluster snapshot operation progress. */
+    private final Object snpOpMux = new Object();
+
+    /** Take snapshot operation procedure. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> startSnpProc;
+
+    /** Check previously performed snapshot operation and delete uncompleted files if need. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> endSnpProc;
+
+    /** Resolved persistent data storage settings. */
+    private volatile PdsFolderSettings pdsSettings;
+
+    /** Fully initialized metastorage. */
+    private volatile ReadWriteMetastorage metaStorage;
+
+    /** Local snapshot sender factory. */
+    private Function<String, SnapshotSender> locSndrFactory = LocalSnapshotSender::new;
+
+    /** Main snapshot directory to save created snapshots. */
+    private volatile File locSnpDir;
+
+    /**
+     * Working directory for loaded snapshots from the remote nodes and storing
+     * temporary partition delta-files of locally started snapshot process.
+     */
+    private File tmpWorkDir;
+
+    /** Factory to working with delta as file storage. */
+    private volatile FileIOFactory ioFactory = new RandomAccessFileIOFactory();
+
+    /** Factory to create page store for restore. */
+    private volatile BiFunction<Integer, Boolean, FilePageStoreFactory> storeFactory;
+
+    /** Snapshot thread pool to perform local partition snapshots. */
+    private ExecutorService snpRunner;
+
+    /** System discovery message listener. */
+    private DiscoveryEventListener discoLsnr;
+
+    /** Cluster snapshot operation requested by user. */
+    private ClusterSnapshotFuture clusterSnpFut;
+
+    /** Current snapshot operation on local node. */
+    private volatile SnapshotOperationRequest clusterSnpReq;
+
+    /** {@code true} if recovery process occurred for snapshot. */
+    private volatile boolean recovered;
+
+    /** Last seen cluster snapshot operation. */
+    private volatile ClusterSnapshotFuture lastSeenSnpFut = new ClusterSnapshotFuture();
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public IgniteSnapshotManager(GridKernalContext ctx) {
+        locBuff = ThreadLocal.withInitial(() ->
+            ByteBuffer.allocateDirect(ctx.config().getDataStorageConfiguration().getPageSize())
+                .order(ByteOrder.nativeOrder()));
+
+        startSnpProc = new DistributedProcess<>(ctx, START_SNAPSHOT, this::initLocalSnapshotStartStage,
+            this::processLocalSnapshotStartStageResult, SnapshotStartDiscoveryMessage::new);
+
+        endSnpProc = new DistributedProcess<>(ctx, END_SNAPSHOT, this::initLocalSnapshotEndStage,
+            this::processLocalSnapshotEndStageResult);
+    }
+
+    /**
+     * @param snapshotCacheDir Snapshot directory to store files.
+     * @param partId Cache partition identifier.
+     * @return A file representation.
+     */
+    public static File partDeltaFile(File snapshotCacheDir, int partId) {
+        return new File(snapshotCacheDir, partDeltaFileName(partId));
+    }
+
+    /**
+     * @param partId Partition id.
+     * @return File name of delta partition pages.
+     */
+    public static String partDeltaFileName(int partId) {
+        assert partId <= MAX_PARTITION_ID || partId == INDEX_PARTITION;
+
+        return partId == INDEX_PARTITION ? INDEX_DELTA_NAME : String.format(PART_DELTA_TEMPLATE, partId);
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void start0() throws IgniteCheckedException {
+        super.start0();
+
+        GridKernalContext ctx = cctx.kernalContext();
+
+        if (ctx.clientNode())
+            return;
+
+        if (!CU.isPersistenceEnabled(ctx.config()))
+            return;
+
+        snpRunner = new IgniteThreadPoolExecutor(SNAPSHOT_RUNNER_THREAD_PREFIX,
+            cctx.igniteInstanceName(),
+            SNAPSHOT_THREAD_POOL_SIZE,
+            SNAPSHOT_THREAD_POOL_SIZE,
+            IgniteConfiguration.DFLT_THREAD_KEEP_ALIVE_TIME,
+            new LinkedBlockingQueue<>(),
+            SYSTEM_POOL,
+            new OomExceptionHandler(ctx));
+
+        assert cctx.pageStore() instanceof FilePageStoreManager;
+
+        FilePageStoreManager storeMgr = (FilePageStoreManager)cctx.pageStore();
+
+        pdsSettings = cctx.kernalContext().pdsFolderResolver().resolveFolders();
+
+        locSnpDir = resolveSnapshotWorkDirectory(ctx.config());
+        tmpWorkDir = Paths.get(storeMgr.workDir().getAbsolutePath(), DFLT_SNAPSHOT_TMP_DIR).toFile();
+
+        U.ensureDirectory(locSnpDir, "snapshot work directory", log);
+        U.ensureDirectory(tmpWorkDir, "temp directory for snapshot creation", log);
+
+        MetricRegistry mreg = cctx.kernalContext().metric().registry(SNAPSHOT_METRICS);
+
+        mreg.register("LastSnapshotStartTime", () -> lastSeenSnpFut.startTime,
+            "The system time of the last cluster snapshot request start time on this node.");
+        mreg.register("LastSnapshotEndTime", () -> lastSeenSnpFut.endTime,
+            "The system time of the last cluster snapshot request end time on this node.");
+        mreg.register("LastSnapshotName", () -> lastSeenSnpFut.name, String.class,
+            "The name of last started cluster snapshot request on this node.");
+        mreg.register("LastSnapshotErrorMessage",
+            () -> lastSeenSnpFut.error() == null ? null : lastSeenSnpFut.error().getMessage(),
+            String.class,
+            "The error message of last started cluster snapshot request which fail with an error. " +
+                "This value will be 'null' if last snapshot request has been completed successfully.");
+        mreg.register("LocalSnapshotList", this::getSnapshots, List.class,
+            "The list of names of all snapshots currently saved on the local node with respect to " +
+                "the configured via IgniteConfiguration snapshot working path.");
+
+        storeFactory = storeMgr::getPageStoreFactory;
+
+        cctx.exchange().registerExchangeAwareComponent(this);
+        ctx.internalSubscriptionProcessor().registerMetastorageListener(this);
+
+        // Receive remote snapshots requests.
+        cctx.gridIO().addMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC, new GridMessageListener() {
+            @Override public void onMessage(UUID nodeId, Object msg, byte plc) {
+                if (!busyLock.enterBusy())
+                    return;
+
+                try {
+                    if (msg instanceof SnapshotRequestMessage) {
+                        SnapshotRequestMessage reqMsg0 = (SnapshotRequestMessage)msg;
+                        String snpName = reqMsg0.snapshotName();
+
+                        synchronized (this) {
+                            SnapshotFutureTask task = lastScheduledRemoteSnapshotTask(nodeId);
+
+                            if (task != null) {
+                                // Task will also be removed from local map due to the listener on future done.
+                                task.cancel();
+
+                                log.info("Snapshot request has been cancelled due to another request received " +
+                                    "[prevSnpResp=" + task + ", msg0=" + reqMsg0 + ']');
+                            }
+                        }
+
+                        SnapshotFutureTask task = registerSnapshotTask(snpName,
+                            nodeId,
+                            reqMsg0.parts(),
+                            remoteSnapshotSender(snpName, nodeId));
+
+                        task.listen(f -> {
+                            if (f.error() == null)
+                                return;
+
+                            U.error(log, "Failed to process request of creating a snapshot " +
+                                "[from=" + nodeId + ", msg=" + reqMsg0 + ']', f.error());
+
+                            try {
+                                cctx.gridIO().sendToCustomTopic(nodeId,
+                                    DFLT_INITIAL_SNAPSHOT_TOPIC,
+                                    new SnapshotResponseMessage(reqMsg0.snapshotName(), f.error().getMessage()),
+                                    SYSTEM_POOL);
+                            }
+                            catch (IgniteCheckedException ex0) {
+                                U.error(log, "Fail to send the response message with processing snapshot request " +
+                                    "error [request=" + reqMsg0 + ", nodeId=" + nodeId + ']', ex0);
+                            }
+                        });
+
+                        task.start();
+                    }
+                    else if (msg instanceof SnapshotResponseMessage) {
+                        SnapshotResponseMessage respMsg0 = (SnapshotResponseMessage)msg;
+
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.snpName.equals(respMsg0.snapshotName())) {
+                            if (log.isInfoEnabled()) {
+                                log.info("A stale snapshot response message has been received. Will be ignored " +
+                                    "[fromNodeId=" + nodeId + ", response=" + respMsg0 + ']');
+                            }
+
+                            return;
+                        }
+
+                        if (respMsg0.errorMessage() != null) {
+                            fut0.onDone(new IgniteCheckedException("Request cancelled. The snapshot operation stopped " +
+                                "on the remote node with an error: " + respMsg0.errorMessage()));
+                        }
+                    }
+                }
+                catch (Throwable e) {
+                    U.error(log, "Processing snapshot request from remote node fails with an error", e);
+
+                    cctx.kernalContext().failure().process(new FailureContext(FailureType.CRITICAL_ERROR, e));
+                }
+                finally {
+                    busyLock.leaveBusy();
+                }
+            }
+        });
+
+        cctx.gridEvents().addDiscoveryEventListener(discoLsnr = (evt, discoCache) -> {
+            if (!busyLock.enterBusy())
+                return;
+
+            try {
+                UUID leftNodeId = evt.eventNode().id();
+
+                if (evt.type() == EVT_NODE_LEFT || evt.type() == EVT_NODE_FAILED) {
+                    SnapshotOperationRequest snpReq = clusterSnpReq;
+
+                    for (SnapshotFutureTask sctx : locSnpTasks.values()) {
+                        if (sctx.sourceNodeId().equals(leftNodeId) ||
+                            (snpReq != null &&
+                                snpReq.snpName.equals(sctx.snapshotName()) &&
+                                snpReq.bltNodes.contains(leftNodeId))) {
+                            sctx.acceptException(new ClusterTopologyCheckedException("Snapshot operation interrupted. " +
+                                "One of baseline nodes left the cluster: " + leftNodeId));
+                        }
+                    }
+
+                    RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                    if (snpTrFut != null && snpTrFut.rmtNodeId.equals(leftNodeId)) {
+                        snpTrFut.onDone(new ClusterTopologyCheckedException("The node from which a snapshot has been " +
+                            "requested left the grid"));
+                    }
+                }
+            }
+            finally {
+                busyLock.leaveBusy();
+            }
+        }, EVT_NODE_LEFT, EVT_NODE_FAILED);
+
+        // Remote snapshot handler.
+        cctx.kernalContext().io().addTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC, new TransmissionHandler() {
+            @Override public void onEnd(UUID nodeId) {
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                assert snpTrFut.stores.isEmpty() : snpTrFut.stores.entrySet();
+                assert snpTrFut.partsLeft == 0 : snpTrFut;
+
+                snpTrFut.onDone();
+
+                log.info("Requested snapshot from remote node has been fully received " +
+                    "[snpName=" + snpTrFut.snpName + ", snpTrans=" + snpTrFut + ']');
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onException(UUID nodeId, Throwable err) {
+                RemoteSnapshotFuture fut = rmtSnpReq.get();
+
+                if (fut == null)
+                    return;
+
+                if (fut.rmtNodeId.equals(nodeId))
+                    fut.onDone(err);
+            }
+
+            /** {@inheritDoc} */
+            @Override public String filePath(UUID nodeId, TransmissionMeta fileMeta) {
+                Integer partId = (Integer)fileMeta.params().get(SNP_PART_ID_PARAM);
+                String rmtDbNodePath = (String)fileMeta.params().get(SNP_DB_NODE_PATH_PARAM);
+                String cacheDirName = (String)fileMeta.params().get(SNP_CACHE_DIR_NAME_PARAM);
+
+                RemoteSnapshotFuture transFut = resolve(nodeId, fileMeta);
+
+                try {
+                    File cacheDir = U.resolveWorkDirectory(tmpWorkDir.getAbsolutePath(),
+                        Paths.get(transFut.snpName, rmtDbNodePath, cacheDirName).toString(),
+                        false);
+
+                    return new File(cacheDir, getPartitionFileName(partId)).getAbsolutePath();
+                }
+                catch (IgniteCheckedException e) {
+                    throw new IgniteException(e);
+                }
+            }
+
+            /**
+             * @param nodeId Remote node id.
+             * @param meta Transmission meta.
+             * @return Resolved transmission future.
+             */
+            private RemoteSnapshotFuture resolve(UUID nodeId, TransmissionMeta meta) {
+                String snpName = (String)meta.params().get(SNP_NAME_PARAM);
+                Integer partsCnt = (Integer)meta.params().get(SNP_PARTITIONS_CNT);
+
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                if (snpTrFut == null || !snpTrFut.snpName.equals(snpName)) {
+                    throw new TransmissionCancelledException("Stale snapshot transmission will be ignored " +
+                        "[snpName=" + snpName + ", meta=" + meta + ", snpTrFut=" + snpTrFut + ']');
+                }
+
+                assert snpTrFut.snpName.equals(snpName) && snpTrFut.rmtNodeId.equals(nodeId) :
+                    "Another transmission in progress [snpTrFut=" + snpTrFut + ", nodeId=" + snpName + ']';
+
+                if (snpTrFut.partsLeft == -1)
+                    snpTrFut.partsLeft = partsCnt;
+
+                return snpTrFut;
+            }
+
+            /**
+             * @param snpTrans Current snapshot transmission.
+             * @param grpPartId Pair of group id and its partition id.
+             */
+            private void finishRecover(RemoteSnapshotFuture snpTrans, GroupPartitionId grpPartId) {
+                FilePageStore pageStore = null;
+
+                try {
+                    pageStore = snpTrans.stores.remove(grpPartId);
+
+                    pageStore.finishRecover();
+
+                    snpTrans.partConsumer.accept(new File(pageStore.getFileAbsolutePath()), grpPartId);
+
+                    snpTrans.partsLeft--;
+                }
+                catch (StorageException e) {
+                    throw new IgniteException(e);
+                }
+                finally {
+                    U.closeQuiet(pageStore);
+                }
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<ByteBuffer> chunkHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+
+                RemoteSnapshotFuture snpTrFut = resolve(nodeId, initMeta);
+
+                GroupPartitionId grpPartId = new GroupPartitionId(grpId, partId);
+                FilePageStore pageStore = snpTrFut.stores.get(grpPartId);
+
+                if (pageStore == null) {
+                    throw new IgniteException("Partition must be loaded before applying snapshot delta pages " +
+                        "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                pageStore.beginRecover();
+
+                // No snapshot delta pages received. Finalize recovery.
+                if (initMeta.count() == 0)
+                    finishRecover(snpTrFut, grpPartId);
+
+                return new Consumer<ByteBuffer>() {
+                    final LongAdder transferred = new LongAdder();
+
+                    @Override public void accept(ByteBuffer buff) {
+                        try {
+                            assert initMeta.count() != 0 : initMeta;
+
+                            RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                            if (fut0 == null || !fut0.equals(snpTrFut) || fut0.isCancelled()) {
+                                throw new TransmissionCancelledException("Snapshot request is cancelled " +
+                                    "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                            }
+
+                            pageStore.write(PageIO.getPageId(buff), buff, 0, false);
+
+                            transferred.add(buff.capacity());
+
+                            if (transferred.longValue() == initMeta.count())
+                                finishRecover(snpTrFut, grpPartId);
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                    }
+                };
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<File> fileHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+                String snpName = (String)initMeta.params().get(SNP_NAME_PARAM);
+
+                assert grpId != null;
+                assert partId != null;
+                assert snpName != null;
+                assert storeFactory != null;
+
+                RemoteSnapshotFuture transFut = rmtSnpReq.get();
+
+                if (transFut == null) {
+                    throw new IgniteException("Snapshot transmission with given name doesn't exists " +
+                        "[snpName=" + snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                return new Consumer<File>() {
+                    @Override public void accept(File file) {
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.equals(transFut) || fut0.isCancelled()) {
+                            throw new TransmissionCancelledException("Snapshot request is cancelled [snpName=" + snpName +
+                                ", grpId=" + grpId + ", partId=" + partId + ']');
+                        }
+
+                        busyLock.enterBusy();
+
+                        try {
+                            FilePageStore pageStore = (FilePageStore)storeFactory
+                                .apply(grpId, false)
+                                .createPageStore(getFlagByPartId(partId),
+                                    file::toPath,
+                                    new LongAdderMetric("NO_OP", null));
+
+                            transFut.stores.put(new GroupPartitionId(grpId, partId), pageStore);
+
+                            pageStore.init();
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                        finally {
+                            busyLock.leaveBusy();
+                        }
+                    }
+                };
+            }
+        });
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void stop0(boolean cancel) {
+        busyLock.block();
+
+        try {
+            // Try stop all snapshot processing if not yet.
+            for (SnapshotFutureTask sctx : locSnpTasks.values())
+                sctx.acceptException(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+            locSnpTasks.clear();
+
+            RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+            if (snpTrFut != null)
+                snpTrFut.cancel();
+
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null) {
+                    clusterSnpFut.onDone(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+                    clusterSnpFut = null;
+                }
+            }
+
+            if (snpRunner != null)
+                snpRunner.shutdownNow();
+
+            cctx.kernalContext().io().removeMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC);
+            cctx.kernalContext().io().removeTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC);
+
+            if (discoLsnr != null)
+                cctx.kernalContext().event().removeDiscoveryEventListener(discoLsnr);
+
+            cctx.exchange().unregisterExchangeAwareComponent(this);
+        }
+        finally {
+            busyLock.unblock();
+        }
+    }
+
+    /**
+     * @param snpDir Snapshot dir.
+     * @param folderName Local node folder name (see {@link U#maskForFileName} with consistent id).
+     */
+    public static void deleteSnapshot(File snpDir, String folderName) {
+        if (!snpDir.exists())
+            return;
+
+        assert snpDir.isDirectory() : snpDir;
+
+        try {
+            File binDir = resolveBinaryWorkDir(snpDir.getAbsolutePath(), folderName);
+            File dbDir = U.resolveWorkDirectory(snpDir.getAbsolutePath(), databaseRelativePath(folderName), false);
+
+            U.delete(binDir);
+            U.delete(dbDir);
+
+            File marshDir = mappingFileStoreWorkDir(snpDir.getAbsolutePath());
+
+            // Concurrently traverse the snapshot marshaller directory and delete all files.
+            Files.walkFileTree(marshDir.toPath(), new SimpleFileVisitor<Path>() {
+                @Override public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException {
+                    U.delete(file);
+
+                    return FileVisitResult.CONTINUE;
+                }
+
+                @Override public FileVisitResult visitFileFailed(Path file, IOException exc) {
+                    // Skip files which can be concurrently removed from FileTree.
+                    return FileVisitResult.CONTINUE;
+                }
+            });
+
+            File db = new File(snpDir, DB_DEFAULT_FOLDER);
+
+            if (!db.exists() || db.list().length == 0)
+                U.delete(snpDir);
+        }
+        catch (IOException | IgniteCheckedException e) {
+            throw new IgniteException(e);
+        }
+    }
+
+    /**
+     * @param snpName Snapshot name.
+     * @return Local snapshot directory for snapshot with given name.
+     */
+    public File snapshotLocalDir(String snpName) {
+        assert locSnpDir != null;
+
+        return new File(locSnpDir, snpName);
+    }
+
+    /**
+     * @return Node snapshot working directory.
+     */
+    public File snapshotTmpDir() {
+        assert tmpWorkDir != null;
+
+        return tmpWorkDir;
+    }
+
+    /**
+     * @param req Request on snapshot creation.
+     * @return Future which will be completed when a snapshot has been started.
+     */
+    private IgniteInternalFuture<SnapshotOperationResponse> initLocalSnapshotStartStage(SnapshotOperationRequest req) {
+        if (cctx.kernalContext().clientNode() ||
+            !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()))
+            return new GridFinishedFuture<>();
+
+        // Executed inside discovery notifier thread, prior to firing discovery custom event,
+        // so it is safe to set new snapshot task inside this method without synchronization.
+        if (clusterSnpReq != null) {
+            return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot operation has been rejected. " +
+                "Another snapshot operation in progress [req=" + req + ", curr=" + clusterSnpReq + ']'));
+        }
+
+        Set<UUID> leftNodes = new HashSet<>(req.bltNodes);
+        leftNodes.removeAll(F.viewReadOnly(cctx.discovery().serverNodes(AffinityTopologyVersion.NONE),
+            F.node2id()));
+
+        if (!leftNodes.isEmpty()) {
+            return new GridFinishedFuture<>(new IgniteCheckedException("Some of baseline nodes left the cluster " +
+                "prior to snapshot operation start: " + leftNodes));
+        }
+
+        Set<Integer> leftGrps = new HashSet<>(req.grpIds);
+        leftGrps.removeAll(cctx.cache().cacheGroupDescriptors().keySet());
+
+        if (!leftGrps.isEmpty()) {
+            return new GridFinishedFuture<>(new IgniteCheckedException("Some of requested cache groups doesn't exist " +
+                "on the local node [missed=" + leftGrps + ", nodeId=" + cctx.localNodeId() + ']'));
+        }
+
+        Map<Integer, Set<Integer>> parts = new HashMap<>();
+
+        // Prepare collection of pairs group and appropriate cache partition to be snapshot.
+        // Cache group context may be 'null' on some nodes e.g. a node filter is set.
+        for (Integer grpId : req.grpIds) {
+            if (cctx.cache().cacheGroup(grpId) == null)
+                continue;
+
+            parts.put(grpId, null);
+        }
+
+        if (parts.isEmpty())
+            return new GridFinishedFuture<>();
+
+        SnapshotFutureTask task0 = registerSnapshotTask(req.snpName,
+            req.srcNodeId,
+            parts,
+            locSndrFactory.apply(req.snpName));
+
+        clusterSnpReq = req;
+
+        return task0.chain(fut -> {
+            if (fut.error() == null)
+                return new SnapshotOperationResponse();
+            else
+                throw new GridClosureException(fut.error());
+        });
+    }
+
+    /**
+     * @param id Request id.
+     * @param res Results.
+     * @param err Errors.
+     */
+    private void processLocalSnapshotStartStageResult(UUID id, Map<UUID, SnapshotOperationResponse> res, Map<UUID, Exception> err) {
+        if (cctx.kernalContext().clientNode())
+            return;
+
+        SnapshotOperationRequest snpReq = clusterSnpReq;
+
+        if (snpReq == null || !snpReq.rqId.equals(id)) {
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null && clusterSnpFut.rqId.equals(id)) {
+                    clusterSnpFut.onDone(new IgniteCheckedException("Snapshot operation has not been fully completed " +
+                        "[err=" + err + ", snpReq=" + snpReq + ']'));
+
+                    clusterSnpFut = null;
+                }
+
+                return;
+            }
+        }
+
+        if (isLocalNodeCoordinator(cctx.discovery())) {
+            Set<UUID> missed = new HashSet<>(snpReq.bltNodes);
+            missed.removeAll(res.keySet());
+            missed.removeAll(err.keySet());
+
+            snpReq.hasErr = !F.isEmpty(err) || !missed.isEmpty();
+
+            if (snpReq.hasErr) {
+                U.warn(log, "Execution of local snapshot tasks fails or them haven't been executed " +
+                    "due to some of nodes left the cluster. Uncompleted snapshot will be deleted " +
+                    "[err=" + err + ", missed=" + missed + ']');
+            }
+
+            endSnpProc.start(UUID.randomUUID(), snpReq);
+        }
+    }
+
+    /**
+     * @param req Request on snapshot creation.
+     * @return Future which will be completed when the snapshot will be finalized.
+     */
+    private IgniteInternalFuture<SnapshotOperationResponse> initLocalSnapshotEndStage(SnapshotOperationRequest req) {
+        if (clusterSnpReq == null)
+            return new GridFinishedFuture<>(new SnapshotOperationResponse());
+
+        try {
+            if (req.hasErr)
+                deleteSnapshot(snapshotLocalDir(req.snpName), pdsSettings.folderName());
+
+            removeLastMetaStorageKey();
+        }
+        catch (Exception e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        return new GridFinishedFuture<>(new SnapshotOperationResponse());
+    }
+
+    /**
+     * @param id Request id.
+     * @param res Results.
+     * @param err Errors.
+     */
+    private void processLocalSnapshotEndStageResult(UUID id, Map<UUID, SnapshotOperationResponse> res, Map<UUID, Exception> err) {
+        SnapshotOperationRequest snpReq = clusterSnpReq;
+
+        if (snpReq == null)
+            return;
+
+        Set<UUID> endFail = new HashSet<>(snpReq.bltNodes);
+        endFail.removeAll(res.keySet());
+
+        clusterSnpReq = null;
+
+        synchronized (snpOpMux) {
+            if (clusterSnpFut != null) {
+                if (endFail.isEmpty() && !snpReq.hasErr) {
+                    clusterSnpFut.onDone();
+
+                    if (log.isInfoEnabled())
+                        log.info("Cluster-wide snapshot operation finished successfully [req=" + snpReq + ']');
+                }
+                else {
+                    clusterSnpFut.onDone(new IgniteCheckedException("Snapshot creation has been finished with an error. " +
+                        "Local snapshot tasks may not finished completely or finalizing results fails " +
+                        "[hasErr=" + snpReq.hasErr + ", fail=" + endFail + ", err=" + err + ']'));
+                }
+
+                clusterSnpFut = null;
+            }
+        }
+    }
+
+    /**
+     * @return {@code True} if snapshot operation is in progress.
+     */
+    public boolean isSnapshotCreating() {
+        if (clusterSnpReq != null)
+            return true;
+
+        synchronized (snpOpMux) {
+            return clusterSnpReq != null || clusterSnpFut != null;
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public List<String> getSnapshots() {
+        if (cctx.kernalContext().clientNode())
+            throw new UnsupportedOperationException("Client and daemon nodes can not perform this operation.");
+
+        synchronized (snpOpMux) {
+            return Arrays.stream(locSnpDir.listFiles(File::isDirectory))
+                .map(File::getName)
+                .collect(Collectors.toList());
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public IgniteFuture<Void> createSnapshot(String name) {
+        A.notNullOrEmpty(name, "name");
+
+        try {
+            if (cctx.kernalContext().clientNode())
+                throw new UnsupportedOperationException("Client and daemon nodes can not perform this operation.");
+
+            if (!IgniteFeatures.allNodesSupports(cctx.discovery().allNodes(), PERSISTENCE_CACHE_SNAPSHOT))
+                throw new IgniteException("Not all nodes in the cluster support a snapshot operation.");
+
+            if (!active(cctx.kernalContext().state().clusterState().state()))
+                throw new IgniteException("Snapshot operation has been rejected. The cluster is inactive.");
+
+            DiscoveryDataClusterState clusterState = cctx.kernalContext().state().clusterState();
+
+            if (!clusterState.hasBaselineTopology())
+                throw new IgniteException("Snapshot operation has been rejected. The baseline topology is not configured for cluster.");
+
+            ClusterSnapshotFuture snpFut0;
+
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null && !clusterSnpFut.isDone())
+                    throw new IgniteException("Create snapshot request has been rejected. The previous snapshot operation was not completed.");
+
+                if (clusterSnpReq != null)
+                    throw new IgniteException("Create snapshot request has been rejected. Parallel snapshot processes are not allowed.");
+
+                if (getSnapshots().contains(name))
+                    throw new IgniteException("Create snapshot request has been rejected. Snapshot with given name already exists.");
+
+                snpFut0 = new ClusterSnapshotFuture(UUID.randomUUID(), name);
+
+                clusterSnpFut = snpFut0;
+                lastSeenSnpFut = snpFut0;
+            }
+
+            List<Integer> grps = cctx.cache().persistentGroups().stream()
+                .filter(g -> cctx.cache().cacheType(g.cacheOrGroupName()) == CacheType.USER)
+                .filter(g -> !g.config().isEncryptionEnabled())
+                .map(CacheGroupDescriptor::groupId)
+                .collect(Collectors.toList());
+
+            List<ClusterNode> srvNodes = cctx.discovery().serverNodes(AffinityTopologyVersion.NONE);
+
+            startSnpProc.start(snpFut0.rqId, new SnapshotOperationRequest(snpFut0.rqId,
+                cctx.localNodeId(),
+                name,
+                grps,
+                new HashSet<>(F.viewReadOnly(srvNodes,
+                    F.node2id(),
+                    (node) -> CU.baselineNode(node, clusterState)))));
+
+            if (log.isInfoEnabled())
+                log.info("Cluster-wide snapshot operation started [snpName=" + name + ", grps=" + grps + ']');
+
+            return new IgniteFutureImpl<>(snpFut0);
+        }
+        catch (Exception e) {
+            U.error(log, "Start snapshot operation failed", e);
+
+            lastSeenSnpFut = new ClusterSnapshotFuture(name, e);
+
+            return new IgniteFinishedFutureImpl<>(e);
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onReadyForReadWrite(ReadWriteMetastorage metaStorage) throws IgniteCheckedException {
+        synchronized (snpOpMux) {
+            this.metaStorage = metaStorage;
+
+            if (recovered)
+                removeLastMetaStorageKey();
+
+            recovered = false;
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onReadyForRead(ReadOnlyMetastorage metaStorage) throws IgniteCheckedException {
+        // Snapshot which has not been completed due to the local node crashed must be deleted.
+        String snpName = (String)metaStorage.read(SNP_RUNNING_KEY);
+
+        if (snpName == null)
+            return;
+
+        recovered = true;
+
+        for (File tmp : snapshotTmpDir().listFiles())
+            U.delete(tmp);
+
+        deleteSnapshot(snapshotLocalDir(snpName), pdsSettings.folderName());
+
+        if (log.isInfoEnabled()) {
+            log.info("Previous attempt to create snapshot fail due to the local node crash. All resources " +
+                "related to snapshot operation have been deleted: " + snpName);
+        }
+    }
+
+    /**
+     * @param evt Discovery event to check.
+     * @return {@code true} if exchange started by snapshot operation.
+     */
+    public static boolean isSnapshotOperation(DiscoveryEvent evt) {
+        return !evt.eventNode().isClient() &&
+            evt.type() == EVT_DISCOVERY_CUSTOM_EVT &&
+            ((DiscoveryCustomEvent)evt).customMessage() instanceof SnapshotStartDiscoveryMessage;
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onDoneBeforeTopologyUnlock(GridDhtPartitionsExchangeFuture fut) {
+        if (clusterSnpReq == null || cctx.kernalContext().clientNode())
+            return;
+
+        SnapshotOperationRequest snpReq = clusterSnpReq;
+
+        SnapshotFutureTask task = locSnpTasks.get(snpReq.snpName);
+
+        if (task == null)
+            return;
+
+        if (task.start()) {
+            cctx.database().forceCheckpoint(String.format("Start snapshot operation: %s", snpReq.snpName));
+
+            // Schedule task on a checkpoint and wait when it starts.
+            try {
+                task.awaitStarted();
+            }
+            catch (IgniteCheckedException e) {
+                U.error(log, "Fail to wait while cluster-wide snapshot operation started", e);
+            }
+        }
+    }
+
+    /**
+     * @param parts Collection of pairs group and appropriate cache partition to be snapshot.
+     * @param rmtNodeId The remote node to connect to.
+     * @param partConsumer Received partition handler.
+     * @return Future which will be completed when requested snapshot fully received.
+     */
+    public IgniteInternalFuture<Void> requestRemoteSnapshot(
+        UUID rmtNodeId,
+        Map<Integer, Set<Integer>> parts,
+        BiConsumer<File, GroupPartitionId> partConsumer
+    ) {
+        assert partConsumer != null;
+
+        ClusterNode rmtNode = cctx.discovery().node(rmtNodeId);
+
+        if (!nodeSupports(rmtNode, PERSISTENCE_CACHE_SNAPSHOT))
+            return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot on remote node is not supported: " + rmtNode.id()));
+
+        if (rmtNode == null) {
+            return new GridFinishedFuture<>(new ClusterTopologyCheckedException("Snapshot request cannot be performed. " +
+                "Remote node left the grid [rmtNodeId=" + rmtNodeId + ']'));
+        }
+
+        String snpName = RMT_SNAPSHOT_PREFIX + UUID.randomUUID().toString();
+
+        RemoteSnapshotFuture snpTransFut = new RemoteSnapshotFuture(rmtNodeId, snpName, partConsumer);
+
+        busyLock.enterBusy();
+        SnapshotRequestMessage msg0;
+
+        try {
+            msg0 = new SnapshotRequestMessage(snpName, parts);
+
+            RemoteSnapshotFuture fut = rmtSnpReq.get();
+
+            try {
+                if (fut != null)
+                    fut.get(DFLT_SNAPSHOT_TIMEOUT, TimeUnit.MILLISECONDS);
+            }
+            catch (IgniteCheckedException e) {
+                if (log.isInfoEnabled())
+                    log.info("The previous snapshot request finished with an exception:" + e.getMessage());
+            }
+
+            try {
+                if (rmtSnpReq.compareAndSet(null, snpTransFut)) {
+                    cctx.gridIO().sendOrderedMessage(rmtNode, DFLT_INITIAL_SNAPSHOT_TOPIC, msg0, SYSTEM_POOL,
+                        Long.MAX_VALUE, true);
+                }
+                else
+                    return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot request has been concurrently interrupted."));
+
+            }
+            catch (IgniteCheckedException e) {
+                rmtSnpReq.compareAndSet(snpTransFut, null);
+
+                return new GridFinishedFuture<>(e);
+            }
+        }
+        finally {
+            busyLock.leaveBusy();
+        }
+
+        if (log.isInfoEnabled()) {
+            log.info("Snapshot request is sent to the remote node [rmtNodeId=" + rmtNodeId +
+                ", msg0=" + msg0 + ", snpTransFut=" + snpTransFut +
+                ", topVer=" + cctx.discovery().topologyVersionEx() + ']');
+        }
+
+        return snpTransFut;
+    }
+
+    /**
+     * @param grps List of cache groups which will be destroyed.
+     */
+    public void onCacheGroupsStopped(List<Integer> grps) {
+        for (SnapshotFutureTask sctx : locSnpTasks.values()) {
+            Set<Integer> retain = new HashSet<>(grps);
+            retain.retainAll(sctx.affectedCacheGroups());
+
+            if (!retain.isEmpty()) {
+                sctx.acceptException(new IgniteCheckedException("Snapshot has been interrupted due to some of the required " +
+                    "cache groups stopped: " + retain));
+            }
+        }
+    }
+
+    /**
+     * @param snpName Unique snapshot name.
+     * @param srcNodeId Node id which cause snapshot operation.
+     * @param parts Collection of pairs group and appropriate cache partition to be snapshot.
+     * @param snpSndr Factory which produces snapshot receiver instance.
+     * @return Snapshot operation task which should be registered on checkpoint to run.
+     */
+    SnapshotFutureTask registerSnapshotTask(
+        String snpName,
+        UUID srcNodeId,
+        Map<Integer, Set<Integer>> parts,
+        SnapshotSender snpSndr
+    ) {
+        if (!busyLock.enterBusy())
+            return new SnapshotFutureTask(new IgniteCheckedException("Snapshot manager is stopping [locNodeId=" + cctx.localNodeId() + ']'));
+
+        try {
+            if (locSnpTasks.containsKey(snpName))
+                return new SnapshotFutureTask(new IgniteCheckedException("Snapshot with requested name is already scheduled: " + snpName));
+
+            SnapshotFutureTask snpFutTask;
+
+            SnapshotFutureTask prev = locSnpTasks.putIfAbsent(snpName,
+                snpFutTask = new SnapshotFutureTask(cctx,
+                    srcNodeId,
+                    snpName,
+                    tmpWorkDir,
+                    ioFactory,
+                    snpSndr,
+                    parts,
+                    locBuff));
+
+            if (prev != null)
+                return new SnapshotFutureTask(new IgniteCheckedException("Snapshot with requested name is already scheduled: " + snpName));
+
+            if (log.isInfoEnabled()) {
+                log.info("Snapshot task has been registered on local node [sctx=" + this +
+                    ", topVer=" + cctx.discovery().topologyVersionEx() + ']');
+            }
+
+            snpFutTask.listen(f -> locSnpTasks.remove(snpName));
+
+            return snpFutTask;
+        }
+        finally {
+            busyLock.leaveBusy();
+        }
+    }
+
+    /**
+     * @param factory Factory which produces {@link LocalSnapshotSender} implementation.
+     */
+    void localSnapshotSenderFactory(Function<String, SnapshotSender> factory) {
+        locSndrFactory = factory;
+    }
+
+    /**
+     * @return Factory which produces {@link LocalSnapshotSender} implementation.
+     */
+    Function<String, SnapshotSender> localSnapshotSenderFactory() {
+        return locSndrFactory;
+    }
+
+    /**
+     * @param snpName Snapshot name.
+     * @param rmtNodeId Remote node id to send snapshot to.
+     * @return Snapshot sender instance.
+     */
+    SnapshotSender remoteSnapshotSender(String snpName, UUID rmtNodeId) {
+        // Remote snapshots can be send only by single threaded executor since only one transmissionSender created.
+        return new RemoteSnapshotSender(log,
+            new SequentialExecutorWrapper(log, snpRunner),
+            () -> databaseRelativePath(pdsSettings.folderName()),
+            cctx.gridIO().openTransmissionSender(rmtNodeId, DFLT_INITIAL_SNAPSHOT_TOPIC),
+            snpName);
+    }
+
+    /** Snapshot finished successfully or already restored. Key can be removed. */
+    private void removeLastMetaStorageKey() throws IgniteCheckedException {
+        cctx.database().checkpointReadLock();
+
+        try {
+            metaStorage.remove(SNP_RUNNING_KEY);
+        }
+        finally {
+            cctx.database().checkpointReadUnlock();
+        }
+    }
+
+    /**
+     * @return The executor service used to run snapshot tasks.
+     */
+    ExecutorService snapshotExecutorService() {
+        assert snpRunner != null;
+
+        return snpRunner;
+    }
+
+    /**
+     * @param ioFactory Factory to create IO interface over a page stores.
+     */
+    void ioFactory(FileIOFactory ioFactory) {
+        this.ioFactory = ioFactory;
+    }
+
+    /**
+     * @param nodeId Remote node id on which requests has been registered.
+     * @return Snapshot future related to given node id.
+     */
+    SnapshotFutureTask lastScheduledRemoteSnapshotTask(UUID nodeId) {
+        return locSnpTasks.values().stream()
+            .filter(t -> t.type() == RemoteSnapshotSender.class && t.sourceNodeId().equals(nodeId))
+            .findFirst()
+            .orElse(null);
+    }
+
+    /**
+     * @return Relative configured path of persistence data storage directory for the local node.
+     * Example: {@code snapshotWorkDir/db/IgniteNodeName0}
+     */
+    static String databaseRelativePath(String folderName) {
+        return Paths.get(DB_DEFAULT_FOLDER, folderName).toString();
+    }
+
+    /**
+     * @param cfg Ignite configuration.
+     * @return Snapshot directory resolved through given configuration.
+     */
+    static File resolveSnapshotWorkDirectory(IgniteConfiguration cfg) {
+        try {
+            return U.resolveWorkDirectory(cfg.getWorkDirectory(), cfg.getSnapshotPath(), false);
+        }
+        catch (IgniteCheckedException e) {
+            throw new IgniteException(e);
+        }
+    }
+
+    /** Remote snapshot future which tracks remote snapshot transmission result. */
+    private class RemoteSnapshotFuture extends GridFutureAdapter<Void> {
+        /** Snapshot name to create. */
+        private final String snpName;
+
+        /** Remote node id to request snapshot from. */
+        private final UUID rmtNodeId;
+
+        /** Collection of partition to be received. */
+        private final Map<GroupPartitionId, FilePageStore> stores = new ConcurrentHashMap<>();
+
+        /** Partition handler given by request initiator. */
+        private final BiConsumer<File, GroupPartitionId> partConsumer;
+
+        /** Counter which show how many partitions left to be received. */
+        private int partsLeft = -1;
+
+        /**
+         * @param partConsumer Received partition handler.
+         */
+        public RemoteSnapshotFuture(UUID rmtNodeId, String snpName, BiConsumer<File, GroupPartitionId> partConsumer) {
+            this.snpName = snpName;
+            this.rmtNodeId = rmtNodeId;
+            this.partConsumer = partConsumer;
+        }
+
+        /** {@inheritDoc} */
+        @Override public boolean cancel() {
+            return onCancelled();
+        }
+
+        /** {@inheritDoc} */
+        @Override protected boolean onDone(@Nullable Void res, @Nullable Throwable err, boolean cancel) {
+            assert err != null || cancel || stores.isEmpty() : "Not all file storage processed: " + stores;
+
+            rmtSnpReq.compareAndSet(this, null);
+
+            if (err != null || cancel) {
+                // Close non finished file storage.
+                for (Map.Entry<GroupPartitionId, FilePageStore> entry : stores.entrySet()) {
+                    FilePageStore store = entry.getValue();
+
+                    try {
+                        store.stop(true);
+                    }
+                    catch (StorageException e) {
+                        log.warning("Error stopping received file page store", e);
+                    }
+                }
+            }
+
+            U.delete(Paths.get(tmpWorkDir.getAbsolutePath(), snpName));
+
+            return super.onDone(res, err, cancel);
+        }
+
+        /** {@inheritDoc} */
+        @Override public boolean equals(Object o) {
+            if (this == o)
+                return true;
+
+            if (o == null || getClass() != o.getClass())
+                return false;
+
+            RemoteSnapshotFuture fut = (RemoteSnapshotFuture)o;
+
+            return rmtNodeId.equals(fut.rmtNodeId) &&
+                snpName.equals(fut.snpName);
+        }
+
+        /** {@inheritDoc} */
+        @Override public int hashCode() {
+            return Objects.hash(rmtNodeId, snpName);
+        }
+
+        /** {@inheritDoc} */
+        @Override public String toString() {
+            return S.toString(RemoteSnapshotFuture.class, this);
+        }
+    }
+
+    /**
+     * Such an executor can executes tasks not in a single thread, but executes them
+     * on different threads sequentially. It's important for some {@link SnapshotSender}'s
+     * to process sub-task sequentially due to all these sub-tasks may share a single socket
+     * channel to send data to.
+     */
+    private static class SequentialExecutorWrapper implements Executor {
+        /** Ignite logger. */
+        private final IgniteLogger log;
+
+        /** Queue of task to execute. */
+        private final Queue<Runnable> tasks = new ArrayDeque<>();
+
+        /** Delegate executor. */
+        private final Executor executor;
+
+        /** Currently running task. */
+        private volatile Runnable active;
+
+        /** If wrapped executor is shutting down. */
+        private volatile boolean stopping;
+
+        /**
+         * @param executor Executor to run tasks on.
+         */
+        public SequentialExecutorWrapper(IgniteLogger log, Executor executor) {
+            this.log = log.getLogger(SequentialExecutorWrapper.class);
+            this.executor = executor;
+        }
+
+        /** {@inheritDoc} */
+        @Override public synchronized void execute(final Runnable r) {
+            assert !stopping : "Task must be cancelled prior to the wrapped executor is shutting down.";
+
+            tasks.offer(() -> {
+                try {
+                    r.run();
+                }
+                finally {
+                    scheduleNext();
+                }
+            });
+
+            if (active == null)
+                scheduleNext();
+        }
+
+        /** */
+        protected synchronized void scheduleNext() {
+            if ((active = tasks.poll()) != null) {
+                try {
+                    executor.execute(active);
+                }
+                catch (RejectedExecutionException e) {
+                    tasks.clear();
+
+                    stopping = true;
+
+                    log.warning("Task is outdated. Wrapped executor is shutting down.", e);
+                }
+            }
+        }
+    }
+
+    /**
+     *
+     */
+    private static class RemoteSnapshotSender extends SnapshotSender {
+        /** The sender which sends files to remote node. */
+        private final GridIoManager.TransmissionSender sndr;
+
+        /** Relative node path initializer. */
+        private final Supplier<String> initPath;
+
+        /** Snapshot name. */
+        private final String snpName;
+
+        /** Local node persistent directory with consistent id. */
+        private String relativeNodePath;
+
+        /** The number of cache partition files expected to be processed. */
+        private int partsCnt;
+
+        /**
+         * @param log Ignite logger.
+         * @param sndr File sender instance.
+         * @param snpName Snapshot name.
+         */
+        public RemoteSnapshotSender(
+            IgniteLogger log,
+            Executor exec,
+            Supplier<String> initPath,
+            GridIoManager.TransmissionSender sndr,
+            String snpName
+        ) {
+            super(log, exec);
+
+            this.sndr = sndr;
+            this.snpName = snpName;
+            this.initPath = initPath;
+        }
+
+        /** {@inheritDoc} */
+        @Override protected void init(int partsCnt) {
+            this.partsCnt = partsCnt;
+
+            relativeNodePath = initPath.get();
+
+            if (relativeNodePath == null)
+                throw new IgniteException("Relative node path cannot be empty.");
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendPart0(File part, String cacheDirName, GroupPartitionId pair, Long len) {
+            try {
+                assert part.exists();
+                assert len > 0 : "Requested partitions has incorrect file length " +
+                    "[pair=" + pair + ", cacheDirName=" + cacheDirName + ']';
+
+                sndr.send(part, 0, len, transmissionParams(snpName, cacheDirName, pair), TransmissionPolicy.FILE);
+
+                if (log.isInfoEnabled()) {
+                    log.info("Partition file has been send [part=" + part.getName() + ", pair=" + pair +
+                        ", length=" + len + ']');
+                }
+            }
+            catch (TransmissionCancelledException e) {
+                if (log.isInfoEnabled()) {
+                    log.info("Transmission partition file has been interrupted [part=" + part.getName() +
+                        ", pair=" + pair + ']');
+                }
+            }
+            catch (IgniteCheckedException | InterruptedException | IOException e) {
+                U.error(log, "Error sending partition file [part=" + part.getName() + ", pair=" + pair +
+                    ", length=" + len + ']', e);
+
+                throw new IgniteException(e);
+            }
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendDelta0(File delta, String cacheDirName, GroupPartitionId pair) {
+            try {
+                sndr.send(delta, transmissionParams(snpName, cacheDirName, pair), TransmissionPolicy.CHUNK);
+
+                if (log.isInfoEnabled())
+                    log.info("Delta pages storage has been send [part=" + delta.getName() + ", pair=" + pair + ']');
+            }
+            catch (TransmissionCancelledException e) {
+                if (log.isInfoEnabled()) {
+                    log.info("Transmission delta pages has been interrupted [part=" + delta.getName() +
+                        ", pair=" + pair + ']');
+                }
+            }
+            catch (IgniteCheckedException | InterruptedException | IOException e) {
+                U.error(log, "Error sending delta file  [part=" + delta.getName() + ", pair=" + pair + ']', e);
+
+                throw new IgniteException(e);
+            }
+        }
+
+        /**
+         * @param cacheDirName Cache directory name.
+         * @param pair Cache group id with corresponding partition id.
+         * @return Map of params.
+         */
+        private Map<String, Serializable> transmissionParams(String snpName, String cacheDirName,
+            GroupPartitionId pair) {
+            Map<String, Serializable> params = new HashMap<>();
+
+            params.put(SNP_GRP_ID_PARAM, pair.getGroupId());
+            params.put(SNP_PART_ID_PARAM, pair.getPartitionId());
+            params.put(SNP_DB_NODE_PATH_PARAM, relativeNodePath);
+            params.put(SNP_CACHE_DIR_NAME_PARAM, cacheDirName);
+            params.put(SNP_NAME_PARAM, snpName);
+            params.put(SNP_PARTITIONS_CNT, partsCnt);
+
+            return params;
+        }
+
+        /** {@inheritDoc} */
+        @Override public void close0(@Nullable Throwable th) {
+            U.closeQuiet(sndr);
+
+            if (th == null) {
+                if (log.isInfoEnabled())
+                    log.info("The remote snapshot sender closed normally [snpName=" + snpName + ']');
+            }
+            else {
+                U.warn(log, "The remote snapshot sender closed due to an error occurred while processing " +
+                    "snapshot operation [snpName=" + snpName + ']', th);
+            }
+        }
+    }
+
+    /**
+     * Snapshot sender which writes all data to local directory.
+     */
+    private class LocalSnapshotSender extends SnapshotSender {
+        /** Snapshot name. */
+        private final String snpName;
+
+        /** Local snapshot directory. */
+        private final File snpLocDir;
+
+        /** Local node snapshot directory calculated on snapshot directory. */
+        private File dbDir;
+
+        /** Size of page. */
+        private final int pageSize;
+
+        /**
+         * @param snpName Snapshot name.
+         */
+        public LocalSnapshotSender(String snpName) {
+            super(IgniteSnapshotManager.this.log, snpRunner);
+
+            this.snpName = snpName;
+            snpLocDir = snapshotLocalDir(snpName);
+            pageSize = cctx.kernalContext().config().getDataStorageConfiguration().getPageSize();
+        }
+
+        /** {@inheritDoc} */
+        @Override protected void init(int partsCnt) {
+            dbDir = new File (snpLocDir, databaseRelativePath(pdsSettings.folderName()));
+
+            if (dbDir.exists()) {
+                throw new IgniteException("Snapshot with given name already exists " +
+                    "[snpName=" + snpName + ", absPath=" + dbDir.getAbsolutePath() + ']');
+            }
+
+            cctx.database().checkpointReadLock();
+
+            try {
+                assert metaStorage != null && metaStorage.read(SNP_RUNNING_KEY) == null :
+                    "The previous snapshot hasn't been completed correctly";
+
+                metaStorage.write(SNP_RUNNING_KEY, snpName);
+
+                U.ensureDirectory(dbDir, "snapshot work directory", log);
+            }
+            catch (IgniteCheckedException e) {
+                throw new IgniteException(e);
+            }
+            finally {
+                cctx.database().checkpointReadUnlock();
+            }
+
 
 Review comment:
   Fixed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r409038442
 
 

 ##########
 File path: modules/platforms/dotnet/Apache.Ignite.Core.Tests/Services/ServicesTest.cs
 ##########
 @@ -870,7 +870,7 @@ public void TestCallJavaService()
                 binSvc.testBinaryObject(
                     Grid1.GetBinary().ToBinary<IBinaryObject>(new PlatformComputeBinarizable {Field = 6}))
                     .GetField<int>("Field"));
-            
+
 
 Review comment:
   Fixed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r408327850
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManager.java
 ##########
 @@ -0,0 +1,1906 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.channels.FileChannel;
+import java.nio.file.FileVisitResult;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.SimpleFileVisitor;
+import java.nio.file.attribute.BasicFileAttributes;
+import java.util.ArrayDeque;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Queue;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.Executor;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.concurrent.atomic.LongAdder;
+import java.util.function.BiConsumer;
+import java.util.function.BiFunction;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.events.DiscoveryEvent;
+import org.apache.ignite.failure.FailureContext;
+import org.apache.ignite.failure.FailureType;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.GridTopic;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.events.DiscoveryCustomEvent;
+import org.apache.ignite.internal.managers.communication.GridIoManager;
+import org.apache.ignite.internal.managers.communication.GridMessageListener;
+import org.apache.ignite.internal.managers.communication.TransmissionCancelledException;
+import org.apache.ignite.internal.managers.communication.TransmissionHandler;
+import org.apache.ignite.internal.managers.communication.TransmissionMeta;
+import org.apache.ignite.internal.managers.communication.TransmissionPolicy;
+import org.apache.ignite.internal.managers.discovery.DiscoCache;
+import org.apache.ignite.internal.managers.discovery.DiscoveryCustomMessage;
+import org.apache.ignite.internal.managers.discovery.GridDiscoveryManager;
+import org.apache.ignite.internal.managers.eventstorage.DiscoveryEventListener;
+import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.CacheType;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.PartitionsExchangeAware;
+import org.apache.ignite.internal.processors.cache.persistence.StorageException;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.filename.PdsFolderSettings;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.MetastorageLifecycleListener;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadOnlyMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadWriteMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.tree.io.PageIO;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.processors.marshaller.MappedName;
+import org.apache.ignite.internal.processors.metric.impl.LongAdderMetric;
+import org.apache.ignite.internal.util.GridBusyLock;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.distributed.InitMessage;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.tostring.GridToStringInclude;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.S;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.lang.IgniteUuid;
+import org.apache.ignite.thread.IgniteThreadPoolExecutor;
+import org.apache.ignite.thread.OomExceptionHandler;
+import org.jetbrains.annotations.Nullable;
+
+import static java.nio.file.StandardOpenOption.READ;
+import static org.apache.ignite.cluster.ClusterState.active;
+import static org.apache.ignite.configuration.IgniteConfiguration.DFLT_SNAPSHOT_DIRECTORY;
+import static org.apache.ignite.events.EventType.EVT_NODE_FAILED;
+import static org.apache.ignite.events.EventType.EVT_NODE_LEFT;
+import static org.apache.ignite.internal.IgniteFeatures.PERSISTENCE_CACHE_SNAPSHOT;
+import static org.apache.ignite.internal.IgniteFeatures.nodeSupports;
+import static org.apache.ignite.internal.MarshallerContextImpl.saveMappings;
+import static org.apache.ignite.internal.events.DiscoveryCustomEvent.EVT_DISCOVERY_CUSTOM_EVT;
+import static org.apache.ignite.internal.managers.communication.GridIoPolicy.SYSTEM_POOL;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.INDEX_PARTITION;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.MAX_PARTITION_ID;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.INDEX_FILE_NAME;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.PART_FILE_TEMPLATE;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFile;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFileName;
+import static org.apache.ignite.internal.processors.cache.persistence.filename.PdsConsistentIdProcessor.DB_DEFAULT_FOLDER;
+import static org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId.getFlagByPartId;
+import static org.apache.ignite.internal.util.IgniteUtils.isLocalNodeCoordinator;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.END_SNAPSHOT;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.START_SNAPSHOT;
+
+/**
+ * Internal implementation of snapshot operations over persistence caches.
+ * <p>
+ * There are two major actions available:
+ * <ul>
+ *     <li>Create snapshot of the whole cluster cache groups by triggering PME to achieve consistency.</li>
+ *     <li>Create local snapshot of requested cache groups and send it to the node which request this operation.
+ *     Cache groups will be transmitted using internal API for transferring files. See {@link TransmissionHandler}.</li>
+ * </ul>
+ */
+public class IgniteSnapshotManager extends GridCacheSharedManagerAdapter
+    implements IgniteSnapshot, PartitionsExchangeAware, MetastorageLifecycleListener {
+    /** File with delta pages suffix. */
+    public static final String DELTA_SUFFIX = ".delta";
+
+    /** File name template consists of delta pages. */
+    public static final String PART_DELTA_TEMPLATE = PART_FILE_TEMPLATE + DELTA_SUFFIX;
+
+    /** File name template for index delta pages. */
+    public static final String INDEX_DELTA_NAME = INDEX_FILE_NAME + DELTA_SUFFIX;
+
+    /** Text Reason for checkpoint to start snapshot operation. */
+    public static final String CP_SNAPSHOT_REASON = "Checkpoint started to enforce snapshot operation: %s";
+
+    /** Name prefix for each remote snapshot operation. */
+    public static final String RMT_SNAPSHOT_PREFIX = "snapshot_";
+
+    /** Default snapshot directory for loading remote snapshots. */
+    public static final String DFLT_SNAPSHOT_TMP_DIR = "snp";
+
+    /** Timeout in millisecond for snapshot operations. */
+    public static final long DFLT_SNAPSHOT_TIMEOUT = 15_000L;
+
+    /** Snapshot in progress error message. */
+    public static final String SNP_IN_PROGRESS_ERR_MSG = "Operation rejected due to the snapshot operation in progress.";
+
+    /** Error message to finalize snapshot tasks. */
+    public static final String SNP_NODE_STOPPING_ERR_MSG = "Snapshot has been cancelled due to the local node " +
+        "is stopping";
+
+    /** Metastorage key to save currently running snapshot. */
+    public static final String SNP_RUNNING_KEY = "snapshot-running";
+
+    /** Prefix for snapshot threads. */
+    private static final String SNAPSHOT_RUNNER_THREAD_PREFIX = "snapshot-runner";
+
+    /** Total number of thread to perform local snapshot. */
+    private static final int SNAPSHOT_THREAD_POOL_SIZE = 4;
+
+    /** Default snapshot topic to receive snapshots from remote node. */
+    private static final Object DFLT_INITIAL_SNAPSHOT_TOPIC = GridTopic.TOPIC_SNAPSHOT.topic("rmt_snp");
+
+    /** File transmission parameter of cache group id. */
+    private static final String SNP_GRP_ID_PARAM = "grpId";
+
+    /** File transmission parameter of cache partition id. */
+    private static final String SNP_PART_ID_PARAM = "partId";
+
+    /** File transmission parameter of node-sender directory path with its consistentId (e.g. db/IgniteNode0). */
+    private static final String SNP_DB_NODE_PATH_PARAM = "dbNodePath";
+
+    /** File transmission parameter of a cache directory with is currently sends its partitions. */
+    private static final String SNP_CACHE_DIR_NAME_PARAM = "cacheDirName";
+
+    /** Snapshot parameter name for a file transmission. */
+    private static final String SNP_NAME_PARAM = "snpName";
+
+    /** Total snapshot files count which receiver should expect to receive. */
+    private static final String SNP_PARTITIONS_CNT = "partsCnt";
+
+    /**
+     * Local buffer to perform copy-on-write operations with pages for {@code SnapshotFutureTask.PageStoreSerialWriter}s.
+     * It is important to have only only buffer per thread (instead of creating each buffer per
+     * each {@code SnapshotFutureTask.PageStoreSerialWriter}) this is redundant and can lead to OOM errors. Direct buffer
+     * deallocate only when ByteBuffer is garbage collected, but it can get out of off-heap memory before it.
+     */
+    private final ThreadLocal<ByteBuffer> locBuff;
+
+    /** Map of registered cache snapshot processes and their corresponding contexts. */
+    private final ConcurrentMap<String, SnapshotFutureTask> locSnpTasks = new ConcurrentHashMap<>();
+
+    /** Lock to protect the resources is used. */
+    private final GridBusyLock busyLock = new GridBusyLock();
+
+    /** Requested snapshot from remote node. */
+    private final AtomicReference<RemoteSnapshotFuture> rmtSnpReq = new AtomicReference<>();
+
+    /** Mutex used to order cluster snapshot operation progress. */
+    private final Object snpOpMux = new Object();
+
+    /** Take snapshot operation procedure. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> startSnpProc;
+
+    /** Check previously performed snapshot operation and delete uncompleted files if need. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> endSnpProc;
+
+    /** Resolved persistent data storage settings. */
+    private volatile PdsFolderSettings pdsSettings;
+
+    /** Fully initialized metastorage. */
+    private volatile ReadWriteMetastorage metaStorage;
+
+    /** Local snapshot sender factory. */
+    private Function<String, SnapshotSender> locSndrFactory = localSnapshotSenderFactory();
+
+    /** Main snapshot directory to save created snapshots. */
+    private volatile File locSnpDir;
+
+    /**
+     * Working directory for loaded snapshots from the remote nodes and storing
+     * temporary partition delta-files of locally started snapshot process.
+     */
+    private File tmpWorkDir;
+
+    /** Factory to working with delta as file storage. */
+    private volatile FileIOFactory ioFactory = new RandomAccessFileIOFactory();
+
+    /** Factory to create page store for restore. */
+    private volatile BiFunction<Integer, Boolean, FilePageStoreFactory> storeFactory;
+
+    /** Snapshot thread pool to perform local partition snapshots. */
+    private ExecutorService snpRunner;
+
+    /** System discovery message listener. */
+    private DiscoveryEventListener discoLsnr;
+
+    /** Cluster snapshot operation requested by user. */
+    private GridFutureAdapter<Void> clusterSnpFut;
+
+    /** Current snapshot operation on local node. */
+    private volatile SnapshotOperationRequest clusterSnpRq;
+
+    /** {@code true} if recovery process occurred for snapshot. */
+    private volatile boolean recovered;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public IgniteSnapshotManager(GridKernalContext ctx) {
+        locBuff = ThreadLocal.withInitial(() ->
+            ByteBuffer.allocateDirect(ctx.config().getDataStorageConfiguration().getPageSize())
+                .order(ByteOrder.nativeOrder()));
+
+        startSnpProc = new DistributedProcess<>(ctx, START_SNAPSHOT, this::initLocalSnapshotStartStage,
+            this::processLocalSnapshotStartStageResult);
+
+        endSnpProc = new DistributedProcess<>(ctx, END_SNAPSHOT, this::initLocalSnapshotEndStage,
+            this::processLocalSnapshotEndStageResult);
+    }
+
+    /**
+     * @param snapshotCacheDir Snapshot directory to store files.
+     * @param partId Cache partition identifier.
+     * @return A file representation.
+     */
+    public static File partDeltaFile(File snapshotCacheDir, int partId) {
+        return new File(snapshotCacheDir, partDeltaFileName(partId));
+    }
+
+    /**
+     * @param partId Partition id.
+     * @return File name of delta partition pages.
+     */
+    public static String partDeltaFileName(int partId) {
+        assert partId <= MAX_PARTITION_ID || partId == INDEX_PARTITION;
+
+        return partId == INDEX_PARTITION ? INDEX_DELTA_NAME : String.format(PART_DELTA_TEMPLATE, partId);
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void start0() throws IgniteCheckedException {
+        super.start0();
+
+        GridKernalContext ctx = cctx.kernalContext();
+
+        if (ctx.clientNode())
+            return;
+
+        if (!CU.isPersistenceEnabled(ctx.config()))
+            return;
+
+        snpRunner = new IgniteThreadPoolExecutor(SNAPSHOT_RUNNER_THREAD_PREFIX,
+            cctx.igniteInstanceName(),
+            SNAPSHOT_THREAD_POOL_SIZE,
+            SNAPSHOT_THREAD_POOL_SIZE,
+            IgniteConfiguration.DFLT_THREAD_KEEP_ALIVE_TIME,
+            new LinkedBlockingQueue<>(),
+            SYSTEM_POOL,
+            new OomExceptionHandler(ctx));
+
+        assert cctx.pageStore() instanceof FilePageStoreManager;
+
+        FilePageStoreManager storeMgr = (FilePageStoreManager)cctx.pageStore();
+
+        pdsSettings = cctx.kernalContext().pdsFolderResolver().resolveFolders();
+
+        locSnpDir = resolveSnapshotWorkDirectory(ctx.config());
+        tmpWorkDir = Paths.get(storeMgr.workDir().getAbsolutePath(), DFLT_SNAPSHOT_TMP_DIR).toFile();
+
+        U.ensureDirectory(locSnpDir, "snapshot work directory", log);
+        U.ensureDirectory(tmpWorkDir, "temp directory for snapshot creation", log);
+
+        storeFactory = storeMgr::getPageStoreFactory;
+
+        cctx.exchange().registerExchangeAwareComponent(this);
+        ctx.internalSubscriptionProcessor().registerMetastorageListener(this);
+
+        // Receive remote snapshots requests.
+        cctx.gridIO().addMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC, new GridMessageListener() {
+            @Override public void onMessage(UUID nodeId, Object msg, byte plc) {
+                if (!busyLock.enterBusy())
+                    return;
+
+                try {
+                    if (msg instanceof SnapshotRequestMessage) {
+                        SnapshotRequestMessage reqMsg0 = (SnapshotRequestMessage)msg;
+                        String snpName = reqMsg0.snapshotName();
+
+                        synchronized (this) {
+                            SnapshotFutureTask task = lastScheduledRemoteSnapshotTask(nodeId);
+
+                            if (task != null) {
+                                // Task will also be removed from local map due to the listener on future done.
+                                task.cancel();
+
+                                log.info("Snapshot request has been cancelled due to another request received " +
+                                    "[prevSnpResp=" + task + ", msg0=" + reqMsg0 + ']');
+                            }
+                        }
+
+                        SnapshotFutureTask task = registerSnapshotTask(snpName,
+                            nodeId,
+                            reqMsg0.parts(),
+                            remoteSnapshotSender(snpName, nodeId));
+
+                        task.listen(f -> {
+                            if (f.error() == null)
+                                return;
+
+                            U.error(log, "Failed to process request of creating a snapshot " +
+                                "[from=" + nodeId + ", msg=" + reqMsg0 + ']', f.error());
+
+                            try {
+                                cctx.gridIO().sendToCustomTopic(nodeId,
+                                    DFLT_INITIAL_SNAPSHOT_TOPIC,
+                                    new SnapshotResponseMessage(reqMsg0.snapshotName(), f.error().getMessage()),
+                                    SYSTEM_POOL);
+                            }
+                            catch (IgniteCheckedException ex0) {
+                                U.error(log, "Fail to send the response message with processing snapshot request " +
+                                    "error [request=" + reqMsg0 + ", nodeId=" + nodeId + ']', ex0);
+                            }
+                        });
+
+                        task.start();
+                    }
+                    else if (msg instanceof SnapshotResponseMessage) {
+                        SnapshotResponseMessage respMsg0 = (SnapshotResponseMessage)msg;
+
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.snpName.equals(respMsg0.snapshotName())) {
+                            if (log.isInfoEnabled()) {
+                                log.info("A stale snapshot response message has been received. Will be ignored " +
+                                    "[fromNodeId=" + nodeId + ", response=" + respMsg0 + ']');
+                            }
+
+                            return;
+                        }
+
+                        if (respMsg0.errorMessage() != null) {
+                            fut0.onDone(new IgniteCheckedException("Request cancelled. The snapshot operation stopped " +
+                                "on the remote node with an error: " + respMsg0.errorMessage()));
+                        }
+                    }
+                }
+                catch (Throwable e) {
+                    U.error(log, "Processing snapshot request from remote node fails with an error", e);
+
+                    cctx.kernalContext().failure().process(new FailureContext(FailureType.CRITICAL_ERROR, e));
+                }
+                finally {
+                    busyLock.leaveBusy();
+                }
+            }
+        });
+
+        cctx.gridEvents().addDiscoveryEventListener(discoLsnr = (evt, discoCache) -> {
+            if (!busyLock.enterBusy())
+                return;
+
+            try {
+                UUID leftNodeId = evt.eventNode().id();
+
+                if (evt.type() == EVT_DISCOVERY_CUSTOM_EVT) {
+                    DiscoveryCustomEvent evt0 = (DiscoveryCustomEvent)evt;
+
+                    if (evt0.customMessage() instanceof InitMessage) {
+                        InitMessage<?> msg = (InitMessage<?>)evt0.customMessage();
+
+                        // This happens when #takeSnapshot() method already invoked and distributed process
+                        // starts its action.
+                        if (msg.type() == START_SNAPSHOT.ordinal()) {
+                            assert clusterSnpRq != null ||
+                                !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()) : evt;
+
+                            DiscoveryCustomEvent customEvt = new DiscoveryCustomEvent();
+
+                            customEvt.node(evt0.node());
+                            customEvt.eventNode(evt0.eventNode());
+                            customEvt.affinityTopologyVersion(evt0.affinityTopologyVersion());
+                            customEvt.customMessage(new SnapshotStartDiscoveryMessage(discoCache, msg.processId()));
+
+                            // Handle new event inside discovery thread, so no guarantees will be violated.
+                            cctx.exchange().onDiscoveryEvent(customEvt, discoCache);
+                        }
+                    }
+                }
+                else if (evt.type() == EVT_NODE_LEFT || evt.type() == EVT_NODE_FAILED) {
+                    SnapshotOperationRequest snpRq = clusterSnpRq;
+
+                    for (SnapshotFutureTask sctx : locSnpTasks.values()) {
+                        if (sctx.sourceNodeId().equals(leftNodeId) ||
+                            (snpRq != null &&
+                                snpRq.snpName.equals(sctx.snapshotName()) &&
+                                snpRq.bltNodes.contains(leftNodeId))) {
+                            sctx.acceptException(new ClusterTopologyCheckedException("The node which requested snapshot " +
+                                "creation has left the grid"));
+                        }
+                    }
+
+                    RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                    if (snpTrFut != null && snpTrFut.rmtNodeId.equals(leftNodeId)) {
+                        snpTrFut.onDone(new ClusterTopologyCheckedException("The node from which a snapshot has been " +
+                            "requested left the grid"));
+                    }
+                }
+            }
+            finally {
+                busyLock.leaveBusy();
+            }
+        }, EVT_NODE_LEFT, EVT_NODE_FAILED, EVT_DISCOVERY_CUSTOM_EVT);
+
+        // Remote snapshot handler.
+        cctx.kernalContext().io().addTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC, new TransmissionHandler() {
+            @Override public void onEnd(UUID nodeId) {
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                assert snpTrFut.stores.isEmpty() : snpTrFut.stores.entrySet();
+                assert snpTrFut.partsLeft == 0 : snpTrFut;
+
+                snpTrFut.onDone();
+
+                log.info("Requested snapshot from remote node has been fully received " +
+                    "[snpName=" + snpTrFut.snpName + ", snpTrans=" + snpTrFut + ']');
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onException(UUID nodeId, Throwable err) {
+                RemoteSnapshotFuture fut = rmtSnpReq.get();
+
+                if (fut == null)
+                    return;
+
+                if (fut.rmtNodeId.equals(nodeId))
+                    fut.onDone(err);
+            }
+
+            /** {@inheritDoc} */
+            @Override public String filePath(UUID nodeId, TransmissionMeta fileMeta) {
+                Integer partId = (Integer)fileMeta.params().get(SNP_PART_ID_PARAM);
+                String rmtDbNodePath = (String)fileMeta.params().get(SNP_DB_NODE_PATH_PARAM);
+                String cacheDirName = (String)fileMeta.params().get(SNP_CACHE_DIR_NAME_PARAM);
+
+                RemoteSnapshotFuture transFut = resolve(nodeId, fileMeta);
+
+                try {
+                    File cacheDir = U.resolveWorkDirectory(tmpWorkDir.getAbsolutePath(),
+                        Paths.get(transFut.snpName, rmtDbNodePath, cacheDirName).toString(),
+                        false);
+
+                    return new File(cacheDir, getPartitionFileName(partId)).getAbsolutePath();
+                }
+                catch (IgniteCheckedException e) {
+                    throw new IgniteException(e);
+                }
+            }
+
+            /**
+             * @param nodeId Remote node id.
+             * @param meta Transmission meta.
+             * @return Resolved transmission future.
+             */
+            private RemoteSnapshotFuture resolve(UUID nodeId, TransmissionMeta meta) {
+                String snpName = (String)meta.params().get(SNP_NAME_PARAM);
+                Integer partsCnt = (Integer)meta.params().get(SNP_PARTITIONS_CNT);
+
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                if (snpTrFut == null || !snpTrFut.snpName.equals(snpName)) {
+                    throw new TransmissionCancelledException("Stale snapshot transmission will be ignored " +
+                        "[snpName=" + snpName + ", meta=" + meta + ", snpTrFut=" + snpTrFut + ']');
+                }
+
+                assert snpTrFut.snpName.equals(snpName) && snpTrFut.rmtNodeId.equals(nodeId) :
+                    "Another transmission in progress [snpTrFut=" + snpTrFut + ", nodeId=" + snpName + ']';
+
+                if (snpTrFut.partsLeft == -1)
+                    snpTrFut.partsLeft = partsCnt;
+
+                return snpTrFut;
+            }
+
+            /**
+             * @param snpTrans Current snapshot transmission.
+             * @param grpPartId Pair of group id and its partition id.
+             */
+            private void finishRecover(RemoteSnapshotFuture snpTrans, GroupPartitionId grpPartId) {
+                FilePageStore pageStore = null;
+
+                try {
+                    pageStore = snpTrans.stores.remove(grpPartId);
+
+                    pageStore.finishRecover();
+
+                    snpTrans.partConsumer.accept(new File(pageStore.getFileAbsolutePath()), grpPartId);
+
+                    snpTrans.partsLeft--;
+                }
+                catch (StorageException e) {
+                    throw new IgniteException(e);
+                }
+                finally {
+                    U.closeQuiet(pageStore);
+                }
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<ByteBuffer> chunkHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+
+                RemoteSnapshotFuture snpTrFut = resolve(nodeId, initMeta);
+
+                GroupPartitionId grpPartId = new GroupPartitionId(grpId, partId);
+                FilePageStore pageStore = snpTrFut.stores.get(grpPartId);
+
+                if (pageStore == null) {
+                    throw new IgniteException("Partition must be loaded before applying snapshot delta pages " +
+                        "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                pageStore.beginRecover();
+
+                // No snapshot delta pages received. Finalize recovery.
+                if (initMeta.count() == 0)
+                    finishRecover(snpTrFut, grpPartId);
+
+                return new Consumer<ByteBuffer>() {
+                    final LongAdder transferred = new LongAdder();
+
+                    @Override public void accept(ByteBuffer buff) {
+                        try {
+                            assert initMeta.count() != 0 : initMeta;
+
+                            RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                            if (fut0 == null || !fut0.equals(snpTrFut) || fut0.isCancelled()) {
+                                throw new TransmissionCancelledException("Snapshot request is cancelled " +
+                                    "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                            }
+
+                            pageStore.write(PageIO.getPageId(buff), buff, 0, false);
+
+                            transferred.add(buff.capacity());
+
+                            if (transferred.longValue() == initMeta.count())
+                                finishRecover(snpTrFut, grpPartId);
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                    }
+                };
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<File> fileHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+                String snpName = (String)initMeta.params().get(SNP_NAME_PARAM);
+
+                assert grpId != null;
+                assert partId != null;
+                assert snpName != null;
+                assert storeFactory != null;
+
+                RemoteSnapshotFuture transFut = rmtSnpReq.get();
+
+                if (transFut == null) {
+                    throw new IgniteException("Snapshot transmission with given name doesn't exists " +
+                        "[snpName=" + snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                return new Consumer<File>() {
+                    @Override public void accept(File file) {
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.equals(transFut) || fut0.isCancelled()) {
+                            throw new TransmissionCancelledException("Snapshot request is cancelled [snpName=" + snpName +
+                                ", grpId=" + grpId + ", partId=" + partId + ']');
+                        }
+
+                        busyLock.enterBusy();
+
+                        try {
+                            FilePageStore pageStore = (FilePageStore)storeFactory
+                                .apply(grpId, false)
+                                .createPageStore(getFlagByPartId(partId),
+                                    file::toPath,
+                                    new LongAdderMetric("NO_OP", null));
+
+                            transFut.stores.put(new GroupPartitionId(grpId, partId), pageStore);
+
+                            pageStore.init();
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                        finally {
+                            busyLock.leaveBusy();
+                        }
+                    }
+                };
+            }
+        });
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void stop0(boolean cancel) {
+        busyLock.block();
+
+        try {
+            // Try stop all snapshot processing if not yet.
+            for (SnapshotFutureTask sctx : locSnpTasks.values())
+                sctx.acceptException(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+            locSnpTasks.clear();
+
+            RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+            if (snpTrFut != null)
+                snpTrFut.cancel();
+
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null) {
+                    clusterSnpFut.onDone(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+                    clusterSnpFut = null;
+                }
+            }
+
+            if (snpRunner != null)
+                snpRunner.shutdownNow();
+
+            cctx.kernalContext().io().removeMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC);
+            cctx.kernalContext().io().removeTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC);
+
+            if (discoLsnr != null)
+                cctx.kernalContext().event().removeDiscoveryEventListener(discoLsnr);
+
+            cctx.exchange().unregisterExchangeAwareComponent(this);
+        }
+        finally {
+            busyLock.unblock();
+        }
+    }
+
+    /**
+     * Concurrently traverse the snapshot directory for given local node folder name and
+     * delete recursively all files from it if exist.
+     *
+     * @param snpDir Snapshot dire
+     * @param folderName Local node folder name (see U.maskForFileName with consistent id).
+     */
+    public static void deleteSnapshot(File snpDir, String folderName) {
+        if (!snpDir.exists())
+            return;
+
+        assert snpDir.isDirectory() : snpDir;
+
+        try {
+            List<Path> dirs = new ArrayList<>();
+
+            Files.walkFileTree(snpDir.toPath(), new SimpleFileVisitor<Path>() {
+                @Override public FileVisitResult preVisitDirectory(Path dir,
+                    BasicFileAttributes attrs) throws IOException {
+                    if (Files.isDirectory(dir) &&
+                        Files.exists(dir) &&
+                        folderName.equals(dir.getFileName().toString())) {
+                        // Directory found, add it for processing.
+                        dirs.add(dir);
+                    }
+
+                    return super.preVisitDirectory(dir, attrs);
+                }
+
+                @Override public FileVisitResult visitFileFailed(Path file, IOException exc) {
+                    // Skip files which can be concurrently removed from FileTree.
+                    return FileVisitResult.CONTINUE;
+                }
+            });
+
+            dirs.forEach(U::delete);
+
+            File db = new File(snpDir, DB_DEFAULT_FOLDER);
+
+            if (!db.exists() || db.list().length == 0)
+                U.delete(snpDir);
+        }
+        catch (IOException e) {
+            throw new IgniteException(e);
+        }
+    }
+
+    /**
+     * @param snpName Snapshot name.
+     * @return Local snapshot directory for snapshot with given name.
+     */
+    public File snapshotLocalDir(String snpName) {
+        assert locSnpDir != null;
+
+        return new File(locSnpDir, snpName);
+    }
+
+    /**
+     * @return Node snapshot working directory.
+     */
+    public File snapshotTmpDir() {
+        assert tmpWorkDir != null;
+
+        return tmpWorkDir;
+    }
+
+    /**
+     * @param req Request on snapshot creation.
+     * @return Future which will be completed when a snapshot has been started.
+     */
+    private IgniteInternalFuture<SnapshotOperationResponse> initLocalSnapshotStartStage(SnapshotOperationRequest req) {
+        if (cctx.kernalContext().clientNode() ||
+            !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()))
+            return new GridFinishedFuture<>();
+
+        // Executed inside discovery notifier thread, prior to firing discovery custom event,
+        // so it is safe to set new snapshot task inside this method without synchronization.
+        if (clusterSnpRq != null) {
+            return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot operation has been rejected. " +
+                "Another snapshot operation in progress [req=" + req + ", curr=" + clusterSnpRq + ']'));
+        }
+
+        // Collection of pairs group and appropriate cache partition to be snapshot.
+        Map<Integer, Set<Integer>> parts = new HashMap<>();
+
+        for (Integer grpId : req.grpIds)
+            parts.put(grpId, null);
+
+        SnapshotFutureTask task0 = registerSnapshotTask(req.snpName,
+            req.srcNodeId,
+            parts,
+            locSndrFactory.apply(req.snpName));
+
+        clusterSnpRq = req;
+
+        return task0.chain(f -> new SnapshotOperationResponse());
+    }
+
+    /**
+     * @param id Request id.
+     * @param res Results.
+     * @param err Errors.
+     */
+    private void processLocalSnapshotStartStageResult(UUID id, Map<UUID, SnapshotOperationResponse> res, Map<UUID, Exception> err) {
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        if (snpRq == null)
+            return;
+
+        if (isLocalNodeCoordinator(cctx.discovery())) {
+            Set<UUID> missed = new HashSet<>(snpRq.bltNodes);
+            missed.removeAll(res.keySet());
+            missed.removeAll(err.keySet());
+
+            snpRq.hasErr = !F.isEmpty(err) || !missed.isEmpty();
+
+            if (snpRq.hasErr) {
+                U.warn(log, "Execution of local snapshot tasks fails or them haven't been executed " +
+                    "due to some of nodes left the cluster. Uncompleted snapshot will be deleted " +
+                    "[err=" + err + ", missed=" + missed + ']');
+            }
+
+            endSnpProc.start(UUID.randomUUID(), snpRq);
+        }
+    }
+
+    /**
+     * @param req Request on snapshot creation.
+     * @return Future which will be completed when the snapshot will be finalized.
+     */
+    private IgniteInternalFuture<SnapshotOperationResponse> initLocalSnapshotEndStage(SnapshotOperationRequest req) {
+        if (clusterSnpRq == null)
+            return new GridFinishedFuture<>(new SnapshotOperationResponse());
+
+        try {
+            if (req.hasErr)
+                deleteSnapshot(snapshotLocalDir(req.snpName), pdsSettings.folderName());
+
+            removeLastMetaStorageKey();
+        }
+        catch (Exception e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        return new GridFinishedFuture<>(new SnapshotOperationResponse());
+    }
+
+    /**
+     * @param id Request id.
+     * @param res Results.
+     * @param err Errors.
+     */
+    private void processLocalSnapshotEndStageResult(UUID id, Map<UUID, SnapshotOperationResponse> res, Map<UUID, Exception> err) {
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        if (snpRq == null)
+            return;
+
+        Set<UUID> endFail = new HashSet<>(snpRq.bltNodes);
+        endFail.removeAll(res.keySet());
+
+        clusterSnpRq = null;
+
+        synchronized (snpOpMux) {
+            if (clusterSnpFut != null) {
+                if (endFail.isEmpty() && !snpRq.hasErr) {
+                    clusterSnpFut.onDone();
+
+                    if (log.isInfoEnabled())
+                        log.info("Cluster-wide snapshot operation finished successfully [req=" + snpRq + ']');
+                }
+                else {
+                    clusterSnpFut.onDone(new IgniteCheckedException("Snapshot creation has been finished with an error. " +
+                        "Local snapshot tasks may not finished completely or finalizing results fails " +
+                        "[hasErr" + snpRq.hasErr + ", fail=" + endFail + ']'));
+                }
+
+                clusterSnpFut = null;
+            }
+        }
+    }
+
+    /**
+     * @return {@code True} if snapshot operation started.
+     */
+    public boolean inProgress() {
+        synchronized (snpOpMux) {
+            return clusterSnpRq != null || clusterSnpFut != null;
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public List<String> getSnapshots() {
+        if (cctx.kernalContext().clientNode())
+            throw new UnsupportedOperationException("Client and daemon nodes can not perform this operation.");
+
+        synchronized (snpOpMux) {
+            return Arrays.stream(locSnpDir.listFiles(File::isDirectory))
+                .map(File::getName)
+                .collect(Collectors.toList());
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public IgniteFuture<Void> createSnapshot(String name) {
+        if (cctx.kernalContext().clientNode()) {
+            return new IgniteFinishedFutureImpl<>(new UnsupportedOperationException("Client and daemon nodes can not " +
+                "perform this operation."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(cctx.discovery().allNodes(), PERSISTENCE_CACHE_SNAPSHOT)) {
+            return new IgniteFinishedFutureImpl<>(new IllegalStateException("Not all nodes in the cluster support " +
+                "a snapshot operation."));
+        }
+
+        if (!active(cctx.kernalContext().state().clusterState().state())) {
+            return new IgniteFinishedFutureImpl<>(new IgniteException("Snapshot operation has been rejected. " +
+                "The cluster is inactive."));
+        }
+
+        DiscoveryDataClusterState clusterState = cctx.kernalContext().state().clusterState();
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(new IgniteException("Snapshot operation has been rejected. " +
+                "The baseline topology is not configured for cluster."));
+        }
+
+        GridFutureAdapter<Void> snpFut0;
+
+        synchronized (snpOpMux) {
+            if (clusterSnpFut != null && !clusterSnpFut.isDone()) {
 
 Review comment:
   I think it's not true. Here are two statements which you need to take into account:
   - Multiple distributed processes can run simultaneously;
   - `initLocalSnapshotStartStage` runs through discovery message;
   
   So the first snapshot operation will be able to complete successfully, but the second one must fail. We should check the request ids here, you're right.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r409104510
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManager.java
 ##########
 @@ -0,0 +1,1986 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.channels.FileChannel;
+import java.nio.file.FileVisitResult;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.SimpleFileVisitor;
+import java.nio.file.attribute.BasicFileAttributes;
+import java.util.ArrayDeque;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Queue;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.Executor;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.concurrent.atomic.LongAdder;
+import java.util.function.BiConsumer;
+import java.util.function.BiFunction;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.events.DiscoveryEvent;
+import org.apache.ignite.failure.FailureContext;
+import org.apache.ignite.failure.FailureType;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.GridTopic;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.events.DiscoveryCustomEvent;
+import org.apache.ignite.internal.managers.communication.GridIoManager;
+import org.apache.ignite.internal.managers.communication.GridMessageListener;
+import org.apache.ignite.internal.managers.communication.TransmissionCancelledException;
+import org.apache.ignite.internal.managers.communication.TransmissionHandler;
+import org.apache.ignite.internal.managers.communication.TransmissionMeta;
+import org.apache.ignite.internal.managers.communication.TransmissionPolicy;
+import org.apache.ignite.internal.managers.discovery.DiscoCache;
+import org.apache.ignite.internal.managers.discovery.DiscoveryCustomMessage;
+import org.apache.ignite.internal.managers.discovery.GridDiscoveryManager;
+import org.apache.ignite.internal.managers.eventstorage.DiscoveryEventListener;
+import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.CacheType;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.PartitionsExchangeAware;
+import org.apache.ignite.internal.processors.cache.persistence.StorageException;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.filename.PdsFolderSettings;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.MetastorageLifecycleListener;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadOnlyMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadWriteMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.tree.io.PageIO;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.processors.marshaller.MappedName;
+import org.apache.ignite.internal.processors.metric.MetricRegistry;
+import org.apache.ignite.internal.processors.metric.impl.LongAdderMetric;
+import org.apache.ignite.internal.util.GridBusyLock;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.distributed.InitMessage;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.tostring.GridToStringInclude;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.A;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.S;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.lang.IgniteUuid;
+import org.apache.ignite.thread.IgniteThreadPoolExecutor;
+import org.apache.ignite.thread.OomExceptionHandler;
+import org.jetbrains.annotations.Nullable;
+
+import static java.nio.file.StandardOpenOption.READ;
+import static org.apache.ignite.cluster.ClusterState.active;
+import static org.apache.ignite.configuration.IgniteConfiguration.DFLT_SNAPSHOT_DIRECTORY;
+import static org.apache.ignite.events.EventType.EVT_NODE_FAILED;
+import static org.apache.ignite.events.EventType.EVT_NODE_LEFT;
+import static org.apache.ignite.internal.IgniteFeatures.PERSISTENCE_CACHE_SNAPSHOT;
+import static org.apache.ignite.internal.IgniteFeatures.nodeSupports;
+import static org.apache.ignite.internal.MarshallerContextImpl.mappingFileStoreWorkDir;
+import static org.apache.ignite.internal.MarshallerContextImpl.saveMappings;
+import static org.apache.ignite.internal.events.DiscoveryCustomEvent.EVT_DISCOVERY_CUSTOM_EVT;
+import static org.apache.ignite.internal.managers.communication.GridIoPolicy.SYSTEM_POOL;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.INDEX_PARTITION;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.MAX_PARTITION_ID;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.resolveBinaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.INDEX_FILE_NAME;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.PART_FILE_TEMPLATE;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFile;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFileName;
+import static org.apache.ignite.internal.processors.cache.persistence.filename.PdsConsistentIdProcessor.DB_DEFAULT_FOLDER;
+import static org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId.getFlagByPartId;
+import static org.apache.ignite.internal.util.IgniteUtils.isLocalNodeCoordinator;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.END_SNAPSHOT;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.START_SNAPSHOT;
+
+/**
+ * Internal implementation of snapshot operations over persistence caches.
+ * <p>
+ * There are two major actions available:
+ * <ul>
+ *     <li>Create snapshot of the whole cluster cache groups by triggering PME to achieve consistency.</li>
+ *     <li>Create local snapshot of requested cache groups and send it to the node which request this operation.
+ *     Cache groups will be transmitted using internal API for transferring files. See {@link TransmissionHandler}.</li>
+ * </ul>
+ */
+public class IgniteSnapshotManager extends GridCacheSharedManagerAdapter
+    implements IgniteSnapshot, PartitionsExchangeAware, MetastorageLifecycleListener {
+    /** File with delta pages suffix. */
+    public static final String DELTA_SUFFIX = ".delta";
+
+    /** File name template consists of delta pages. */
+    public static final String PART_DELTA_TEMPLATE = PART_FILE_TEMPLATE + DELTA_SUFFIX;
+
+    /** File name template for index delta pages. */
+    public static final String INDEX_DELTA_NAME = INDEX_FILE_NAME + DELTA_SUFFIX;
+
+    /** Text Reason for checkpoint to start snapshot operation. */
+    public static final String CP_SNAPSHOT_REASON = "Checkpoint started to enforce snapshot operation: %s";
+
+    /** Name prefix for each remote snapshot operation. */
+    public static final String RMT_SNAPSHOT_PREFIX = "snapshot_";
+
+    /** Default snapshot directory for loading remote snapshots. */
+    public static final String DFLT_SNAPSHOT_TMP_DIR = "snp";
+
+    /** Timeout in millisecond for snapshot operations. */
+    public static final long DFLT_SNAPSHOT_TIMEOUT = 15_000L;
+
+    /** Snapshot in progress error message. */
+    public static final String SNP_IN_PROGRESS_ERR_MSG = "Operation rejected due to the snapshot operation in progress.";
+
+    /** Error message to finalize snapshot tasks. */
+    public static final String SNP_NODE_STOPPING_ERR_MSG = "Snapshot has been cancelled due to the local node " +
+        "is stopping";
+
+    /** Metastorage key to save currently running snapshot. */
+    public static final String SNP_RUNNING_KEY = "snapshot-running";
+
+    /** Snapshot metrics prefix. */
+    public static final String SNAPSHOT_METRICS = "snapshot";
+
+    /** Prefix for snapshot threads. */
+    private static final String SNAPSHOT_RUNNER_THREAD_PREFIX = "snapshot-runner";
+
+    /** Total number of thread to perform local snapshot. */
+    private static final int SNAPSHOT_THREAD_POOL_SIZE = 4;
+
+    /** Default snapshot topic to receive snapshots from remote node. */
+    private static final Object DFLT_INITIAL_SNAPSHOT_TOPIC = GridTopic.TOPIC_SNAPSHOT.topic("rmt_snp");
+
+    /** File transmission parameter of cache group id. */
+    private static final String SNP_GRP_ID_PARAM = "grpId";
+
+    /** File transmission parameter of cache partition id. */
+    private static final String SNP_PART_ID_PARAM = "partId";
+
+    /** File transmission parameter of node-sender directory path with its consistentId (e.g. db/IgniteNode0). */
+    private static final String SNP_DB_NODE_PATH_PARAM = "dbNodePath";
+
+    /** File transmission parameter of a cache directory with is currently sends its partitions. */
+    private static final String SNP_CACHE_DIR_NAME_PARAM = "cacheDirName";
+
+    /** Snapshot parameter name for a file transmission. */
+    private static final String SNP_NAME_PARAM = "snpName";
+
+    /** Total snapshot files count which receiver should expect to receive. */
+    private static final String SNP_PARTITIONS_CNT = "partsCnt";
+
+    /**
+     * Local buffer to perform copy-on-write operations with pages for {@code SnapshotFutureTask.PageStoreSerialWriter}s.
+     * It is important to have only only buffer per thread (instead of creating each buffer per
+     * each {@code SnapshotFutureTask.PageStoreSerialWriter}) this is redundant and can lead to OOM errors. Direct buffer
+     * deallocate only when ByteBuffer is garbage collected, but it can get out of off-heap memory before it.
+     */
+    private final ThreadLocal<ByteBuffer> locBuff;
+
+    /** Map of registered cache snapshot processes and their corresponding contexts. */
+    private final ConcurrentMap<String, SnapshotFutureTask> locSnpTasks = new ConcurrentHashMap<>();
+
+    /** Lock to protect the resources is used. */
+    private final GridBusyLock busyLock = new GridBusyLock();
+
+    /** Requested snapshot from remote node. */
+    private final AtomicReference<RemoteSnapshotFuture> rmtSnpReq = new AtomicReference<>();
+
+    /** Mutex used to order cluster snapshot operation progress. */
+    private final Object snpOpMux = new Object();
+
+    /** Take snapshot operation procedure. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> startSnpProc;
+
+    /** Check previously performed snapshot operation and delete uncompleted files if need. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> endSnpProc;
+
+    /** Resolved persistent data storage settings. */
+    private volatile PdsFolderSettings pdsSettings;
+
+    /** Fully initialized metastorage. */
+    private volatile ReadWriteMetastorage metaStorage;
+
+    /** Local snapshot sender factory. */
+    private Function<String, SnapshotSender> locSndrFactory = localSnapshotSenderFactory();
+
+    /** Main snapshot directory to save created snapshots. */
+    private volatile File locSnpDir;
+
+    /**
+     * Working directory for loaded snapshots from the remote nodes and storing
+     * temporary partition delta-files of locally started snapshot process.
+     */
+    private File tmpWorkDir;
+
+    /** Factory to working with delta as file storage. */
+    private volatile FileIOFactory ioFactory = new RandomAccessFileIOFactory();
+
+    /** Factory to create page store for restore. */
+    private volatile BiFunction<Integer, Boolean, FilePageStoreFactory> storeFactory;
+
+    /** Snapshot thread pool to perform local partition snapshots. */
+    private ExecutorService snpRunner;
+
+    /** System discovery message listener. */
+    private DiscoveryEventListener discoLsnr;
+
+    /** Cluster snapshot operation requested by user. */
+    private ClusterSnapshotFuture clusterSnpFut;
+
+    /** Current snapshot operation on local node. */
+    private volatile SnapshotOperationRequest clusterSnpRq;
+
+    /** {@code true} if recovery process occurred for snapshot. */
+    private volatile boolean recovered;
+
+    /** Last seen cluster snapshot operation. */
+    private volatile ClusterSnapshotFuture lastSeenSnpFut = new ClusterSnapshotFuture();
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public IgniteSnapshotManager(GridKernalContext ctx) {
+        locBuff = ThreadLocal.withInitial(() ->
+            ByteBuffer.allocateDirect(ctx.config().getDataStorageConfiguration().getPageSize())
+                .order(ByteOrder.nativeOrder()));
+
+        startSnpProc = new DistributedProcess<>(ctx, START_SNAPSHOT, this::initLocalSnapshotStartStage,
+            this::processLocalSnapshotStartStageResult);
+
+        endSnpProc = new DistributedProcess<>(ctx, END_SNAPSHOT, this::initLocalSnapshotEndStage,
+            this::processLocalSnapshotEndStageResult);
+    }
+
+    /**
+     * @param snapshotCacheDir Snapshot directory to store files.
+     * @param partId Cache partition identifier.
+     * @return A file representation.
+     */
+    public static File partDeltaFile(File snapshotCacheDir, int partId) {
+        return new File(snapshotCacheDir, partDeltaFileName(partId));
+    }
+
+    /**
+     * @param partId Partition id.
+     * @return File name of delta partition pages.
+     */
+    public static String partDeltaFileName(int partId) {
+        assert partId <= MAX_PARTITION_ID || partId == INDEX_PARTITION;
+
+        return partId == INDEX_PARTITION ? INDEX_DELTA_NAME : String.format(PART_DELTA_TEMPLATE, partId);
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void start0() throws IgniteCheckedException {
+        super.start0();
+
+        GridKernalContext ctx = cctx.kernalContext();
+
+        if (ctx.clientNode())
+            return;
+
+        if (!CU.isPersistenceEnabled(ctx.config()))
+            return;
+
+        snpRunner = new IgniteThreadPoolExecutor(SNAPSHOT_RUNNER_THREAD_PREFIX,
+            cctx.igniteInstanceName(),
+            SNAPSHOT_THREAD_POOL_SIZE,
+            SNAPSHOT_THREAD_POOL_SIZE,
+            IgniteConfiguration.DFLT_THREAD_KEEP_ALIVE_TIME,
+            new LinkedBlockingQueue<>(),
+            SYSTEM_POOL,
+            new OomExceptionHandler(ctx));
+
+        assert cctx.pageStore() instanceof FilePageStoreManager;
+
+        FilePageStoreManager storeMgr = (FilePageStoreManager)cctx.pageStore();
+
+        pdsSettings = cctx.kernalContext().pdsFolderResolver().resolveFolders();
+
+        locSnpDir = resolveSnapshotWorkDirectory(ctx.config());
+        tmpWorkDir = Paths.get(storeMgr.workDir().getAbsolutePath(), DFLT_SNAPSHOT_TMP_DIR).toFile();
+
+        U.ensureDirectory(locSnpDir, "snapshot work directory", log);
+        U.ensureDirectory(tmpWorkDir, "temp directory for snapshot creation", log);
+
+        MetricRegistry mreg = cctx.kernalContext().metric().registry(SNAPSHOT_METRICS);
+
+        mreg.register("LastSnapshotStartTime", () -> lastSeenSnpFut.startTime,
+            "The system time approximated by 10 ms when the last cluster snapshot operation has been started.");
+        mreg.register("LastSnapshotEndTime", () -> lastSeenSnpFut.endTime,
+            "The system time approximated by 10 ms when the last cluster snapshot operation has been finished.");
+        mreg.register("LastSnapshotName", () -> lastSeenSnpFut.name, String.class,
+            "The name of last started cluster snapshot operation.");
+        mreg.register("LastSnapshotErrorMessage",
+            () -> lastSeenSnpFut.error() == null ? null : lastSeenSnpFut.error().getMessage(),
+            String.class,
+            "The error message of last started cluster snapshot operation which fail. This value will be 'null' " +
+                "if last snapshot operation completed successfully.");
+        mreg.register("localSnapshotList", this::getSnapshots, List.class,
+            "The list of all known snapshots currently saved on the local node with respect to " +
+                "configured via IgniteConfiguration a snapshot path.");
+
+        storeFactory = storeMgr::getPageStoreFactory;
+
+        cctx.exchange().registerExchangeAwareComponent(this);
+        ctx.internalSubscriptionProcessor().registerMetastorageListener(this);
+
+        // Receive remote snapshots requests.
+        cctx.gridIO().addMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC, new GridMessageListener() {
+            @Override public void onMessage(UUID nodeId, Object msg, byte plc) {
+                if (!busyLock.enterBusy())
+                    return;
+
+                try {
+                    if (msg instanceof SnapshotRequestMessage) {
+                        SnapshotRequestMessage reqMsg0 = (SnapshotRequestMessage)msg;
+                        String snpName = reqMsg0.snapshotName();
+
+                        synchronized (this) {
+                            SnapshotFutureTask task = lastScheduledRemoteSnapshotTask(nodeId);
+
+                            if (task != null) {
+                                // Task will also be removed from local map due to the listener on future done.
+                                task.cancel();
+
+                                log.info("Snapshot request has been cancelled due to another request received " +
+                                    "[prevSnpResp=" + task + ", msg0=" + reqMsg0 + ']');
+                            }
+                        }
+
+                        SnapshotFutureTask task = registerSnapshotTask(snpName,
+                            nodeId,
+                            reqMsg0.parts(),
+                            remoteSnapshotSender(snpName, nodeId));
+
+                        task.listen(f -> {
+                            if (f.error() == null)
+                                return;
+
+                            U.error(log, "Failed to process request of creating a snapshot " +
+                                "[from=" + nodeId + ", msg=" + reqMsg0 + ']', f.error());
+
+                            try {
+                                cctx.gridIO().sendToCustomTopic(nodeId,
+                                    DFLT_INITIAL_SNAPSHOT_TOPIC,
+                                    new SnapshotResponseMessage(reqMsg0.snapshotName(), f.error().getMessage()),
+                                    SYSTEM_POOL);
+                            }
+                            catch (IgniteCheckedException ex0) {
+                                U.error(log, "Fail to send the response message with processing snapshot request " +
+                                    "error [request=" + reqMsg0 + ", nodeId=" + nodeId + ']', ex0);
+                            }
+                        });
+
+                        task.start();
+                    }
+                    else if (msg instanceof SnapshotResponseMessage) {
+                        SnapshotResponseMessage respMsg0 = (SnapshotResponseMessage)msg;
+
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.snpName.equals(respMsg0.snapshotName())) {
+                            if (log.isInfoEnabled()) {
+                                log.info("A stale snapshot response message has been received. Will be ignored " +
+                                    "[fromNodeId=" + nodeId + ", response=" + respMsg0 + ']');
+                            }
+
+                            return;
+                        }
+
+                        if (respMsg0.errorMessage() != null) {
+                            fut0.onDone(new IgniteCheckedException("Request cancelled. The snapshot operation stopped " +
+                                "on the remote node with an error: " + respMsg0.errorMessage()));
+                        }
+                    }
+                }
+                catch (Throwable e) {
+                    U.error(log, "Processing snapshot request from remote node fails with an error", e);
+
+                    cctx.kernalContext().failure().process(new FailureContext(FailureType.CRITICAL_ERROR, e));
+                }
+                finally {
+                    busyLock.leaveBusy();
+                }
+            }
+        });
+
+        cctx.gridEvents().addDiscoveryEventListener(discoLsnr = (evt, discoCache) -> {
+            if (!busyLock.enterBusy())
+                return;
+
+            try {
+                UUID leftNodeId = evt.eventNode().id();
+
+                if (evt.type() == EVT_DISCOVERY_CUSTOM_EVT) {
+                    DiscoveryCustomEvent evt0 = (DiscoveryCustomEvent)evt;
+
+                    if (evt0.customMessage() instanceof InitMessage) {
+                        InitMessage<?> msg = (InitMessage<?>)evt0.customMessage();
+
+                        // This happens when #takeSnapshot() method already invoked and distributed process
+                        // starts its action.
+                        if (msg.type() == START_SNAPSHOT.ordinal()) {
+                            assert clusterSnpRq != null ||
+                                !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()) : evt;
+
+                            DiscoveryCustomEvent customEvt = new DiscoveryCustomEvent();
+
+                            customEvt.node(evt0.node());
+                            customEvt.eventNode(evt0.eventNode());
+                            customEvt.affinityTopologyVersion(evt0.affinityTopologyVersion());
+                            customEvt.customMessage(new SnapshotStartDiscoveryMessage(discoCache, msg.processId()));
+
+                            // Handle new event inside discovery thread, so no guarantees will be violated.
+                            cctx.exchange().onDiscoveryEvent(customEvt, discoCache);
+                        }
+                    }
+                }
+                else if (evt.type() == EVT_NODE_LEFT || evt.type() == EVT_NODE_FAILED) {
+                    SnapshotOperationRequest snpRq = clusterSnpRq;
+
+                    for (SnapshotFutureTask sctx : locSnpTasks.values()) {
+                        if (sctx.sourceNodeId().equals(leftNodeId) ||
+                            (snpRq != null &&
+                                snpRq.snpName.equals(sctx.snapshotName()) &&
+                                snpRq.bltNodes.contains(leftNodeId))) {
+                            sctx.acceptException(new ClusterTopologyCheckedException("The node which requested snapshot " +
+                                "creation has left the grid"));
+                        }
+                    }
+
+                    RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                    if (snpTrFut != null && snpTrFut.rmtNodeId.equals(leftNodeId)) {
+                        snpTrFut.onDone(new ClusterTopologyCheckedException("The node from which a snapshot has been " +
+                            "requested left the grid"));
+                    }
+                }
+            }
+            finally {
+                busyLock.leaveBusy();
+            }
+        }, EVT_NODE_LEFT, EVT_NODE_FAILED, EVT_DISCOVERY_CUSTOM_EVT);
+
+        // Remote snapshot handler.
+        cctx.kernalContext().io().addTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC, new TransmissionHandler() {
+            @Override public void onEnd(UUID nodeId) {
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                assert snpTrFut.stores.isEmpty() : snpTrFut.stores.entrySet();
+                assert snpTrFut.partsLeft == 0 : snpTrFut;
+
+                snpTrFut.onDone();
+
+                log.info("Requested snapshot from remote node has been fully received " +
+                    "[snpName=" + snpTrFut.snpName + ", snpTrans=" + snpTrFut + ']');
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onException(UUID nodeId, Throwable err) {
+                RemoteSnapshotFuture fut = rmtSnpReq.get();
+
+                if (fut == null)
+                    return;
+
+                if (fut.rmtNodeId.equals(nodeId))
+                    fut.onDone(err);
+            }
+
+            /** {@inheritDoc} */
+            @Override public String filePath(UUID nodeId, TransmissionMeta fileMeta) {
+                Integer partId = (Integer)fileMeta.params().get(SNP_PART_ID_PARAM);
+                String rmtDbNodePath = (String)fileMeta.params().get(SNP_DB_NODE_PATH_PARAM);
+                String cacheDirName = (String)fileMeta.params().get(SNP_CACHE_DIR_NAME_PARAM);
+
+                RemoteSnapshotFuture transFut = resolve(nodeId, fileMeta);
+
+                try {
+                    File cacheDir = U.resolveWorkDirectory(tmpWorkDir.getAbsolutePath(),
+                        Paths.get(transFut.snpName, rmtDbNodePath, cacheDirName).toString(),
+                        false);
+
+                    return new File(cacheDir, getPartitionFileName(partId)).getAbsolutePath();
+                }
+                catch (IgniteCheckedException e) {
+                    throw new IgniteException(e);
+                }
+            }
+
+            /**
+             * @param nodeId Remote node id.
+             * @param meta Transmission meta.
+             * @return Resolved transmission future.
+             */
+            private RemoteSnapshotFuture resolve(UUID nodeId, TransmissionMeta meta) {
+                String snpName = (String)meta.params().get(SNP_NAME_PARAM);
+                Integer partsCnt = (Integer)meta.params().get(SNP_PARTITIONS_CNT);
+
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                if (snpTrFut == null || !snpTrFut.snpName.equals(snpName)) {
+                    throw new TransmissionCancelledException("Stale snapshot transmission will be ignored " +
+                        "[snpName=" + snpName + ", meta=" + meta + ", snpTrFut=" + snpTrFut + ']');
+                }
+
+                assert snpTrFut.snpName.equals(snpName) && snpTrFut.rmtNodeId.equals(nodeId) :
+                    "Another transmission in progress [snpTrFut=" + snpTrFut + ", nodeId=" + snpName + ']';
+
+                if (snpTrFut.partsLeft == -1)
+                    snpTrFut.partsLeft = partsCnt;
+
+                return snpTrFut;
+            }
+
+            /**
+             * @param snpTrans Current snapshot transmission.
+             * @param grpPartId Pair of group id and its partition id.
+             */
+            private void finishRecover(RemoteSnapshotFuture snpTrans, GroupPartitionId grpPartId) {
+                FilePageStore pageStore = null;
+
+                try {
+                    pageStore = snpTrans.stores.remove(grpPartId);
+
+                    pageStore.finishRecover();
+
+                    snpTrans.partConsumer.accept(new File(pageStore.getFileAbsolutePath()), grpPartId);
+
+                    snpTrans.partsLeft--;
+                }
+                catch (StorageException e) {
+                    throw new IgniteException(e);
+                }
+                finally {
+                    U.closeQuiet(pageStore);
+                }
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<ByteBuffer> chunkHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+
+                RemoteSnapshotFuture snpTrFut = resolve(nodeId, initMeta);
+
+                GroupPartitionId grpPartId = new GroupPartitionId(grpId, partId);
+                FilePageStore pageStore = snpTrFut.stores.get(grpPartId);
+
+                if (pageStore == null) {
+                    throw new IgniteException("Partition must be loaded before applying snapshot delta pages " +
+                        "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                pageStore.beginRecover();
+
+                // No snapshot delta pages received. Finalize recovery.
+                if (initMeta.count() == 0)
+                    finishRecover(snpTrFut, grpPartId);
+
+                return new Consumer<ByteBuffer>() {
+                    final LongAdder transferred = new LongAdder();
+
+                    @Override public void accept(ByteBuffer buff) {
+                        try {
+                            assert initMeta.count() != 0 : initMeta;
+
+                            RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                            if (fut0 == null || !fut0.equals(snpTrFut) || fut0.isCancelled()) {
+                                throw new TransmissionCancelledException("Snapshot request is cancelled " +
+                                    "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                            }
+
+                            pageStore.write(PageIO.getPageId(buff), buff, 0, false);
+
+                            transferred.add(buff.capacity());
+
+                            if (transferred.longValue() == initMeta.count())
+                                finishRecover(snpTrFut, grpPartId);
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                    }
+                };
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<File> fileHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+                String snpName = (String)initMeta.params().get(SNP_NAME_PARAM);
+
+                assert grpId != null;
+                assert partId != null;
+                assert snpName != null;
+                assert storeFactory != null;
+
+                RemoteSnapshotFuture transFut = rmtSnpReq.get();
+
+                if (transFut == null) {
+                    throw new IgniteException("Snapshot transmission with given name doesn't exists " +
+                        "[snpName=" + snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                return new Consumer<File>() {
+                    @Override public void accept(File file) {
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.equals(transFut) || fut0.isCancelled()) {
+                            throw new TransmissionCancelledException("Snapshot request is cancelled [snpName=" + snpName +
+                                ", grpId=" + grpId + ", partId=" + partId + ']');
+                        }
+
+                        busyLock.enterBusy();
+
+                        try {
+                            FilePageStore pageStore = (FilePageStore)storeFactory
+                                .apply(grpId, false)
+                                .createPageStore(getFlagByPartId(partId),
+                                    file::toPath,
+                                    new LongAdderMetric("NO_OP", null));
+
+                            transFut.stores.put(new GroupPartitionId(grpId, partId), pageStore);
+
+                            pageStore.init();
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                        finally {
+                            busyLock.leaveBusy();
+                        }
+                    }
+                };
+            }
+        });
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void stop0(boolean cancel) {
+        busyLock.block();
+
+        try {
+            // Try stop all snapshot processing if not yet.
+            for (SnapshotFutureTask sctx : locSnpTasks.values())
+                sctx.acceptException(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+            locSnpTasks.clear();
+
+            RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+            if (snpTrFut != null)
+                snpTrFut.cancel();
+
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null) {
+                    clusterSnpFut.onDone(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+                    clusterSnpFut = null;
+                }
+            }
+
+            if (snpRunner != null)
+                snpRunner.shutdownNow();
+
+            cctx.kernalContext().io().removeMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC);
+            cctx.kernalContext().io().removeTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC);
+
+            if (discoLsnr != null)
+                cctx.kernalContext().event().removeDiscoveryEventListener(discoLsnr);
+
+            cctx.exchange().unregisterExchangeAwareComponent(this);
+        }
+        finally {
+            busyLock.unblock();
+        }
+    }
+
+    /**
+     * @param snpDir Snapshot dire
+     * @param folderName Local node folder name (see {@link U#maskForFileName} with consistent id).
+     */
+    public static void deleteSnapshot(File snpDir, String folderName) {
+        if (!snpDir.exists())
+            return;
+
+        assert snpDir.isDirectory() : snpDir;
+
+        try {
+            File binDir = resolveBinaryWorkDir(snpDir.getAbsolutePath(), folderName);
+            File dbDir = U.resolveWorkDirectory(snpDir.getAbsolutePath(), databaseRelativePath(folderName), false);
+
+            U.delete(binDir);
+            U.delete(dbDir);
+
+            File marshDir = mappingFileStoreWorkDir(snpDir.getAbsolutePath());
+
+            // Concurrently traverse the snapshot marshaller directory and delete all files.
+            Files.walkFileTree(marshDir.toPath(), new SimpleFileVisitor<Path>() {
+                @Override public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException {
+                    U.delete(file);
+
+                    return FileVisitResult.CONTINUE;
+                }
+
+                @Override public FileVisitResult visitFileFailed(Path file, IOException exc) {
+                    // Skip files which can be concurrently removed from FileTree.
+                    return FileVisitResult.CONTINUE;
+                }
+            });
+
+            File db = new File(snpDir, DB_DEFAULT_FOLDER);
+
+            if (!db.exists() || db.list().length == 0)
+                U.delete(snpDir);
+        }
+        catch (IOException | IgniteCheckedException e) {
+            throw new IgniteException(e);
+        }
+    }
+
+    /**
+     * @param snpName Snapshot name.
+     * @return Local snapshot directory for snapshot with given name.
+     */
+    public File snapshotLocalDir(String snpName) {
+        assert locSnpDir != null;
+
+        return new File(locSnpDir, snpName);
+    }
+
+    /**
+     * @return Node snapshot working directory.
+     */
+    public File snapshotTmpDir() {
+        assert tmpWorkDir != null;
+
+        return tmpWorkDir;
+    }
+
+    /**
+     * @param req Request on snapshot creation.
+     * @return Future which will be completed when a snapshot has been started.
+     */
+    private IgniteInternalFuture<SnapshotOperationResponse> initLocalSnapshotStartStage(SnapshotOperationRequest req) {
+        if (cctx.kernalContext().clientNode() ||
+            !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()))
+            return new GridFinishedFuture<>();
+
+        // Executed inside discovery notifier thread, prior to firing discovery custom event,
+        // so it is safe to set new snapshot task inside this method without synchronization.
+        if (clusterSnpRq != null) {
+            return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot operation has been rejected. " +
+                "Another snapshot operation in progress [req=" + req + ", curr=" + clusterSnpRq + ']'));
+        }
+
+        // Collection of pairs group and appropriate cache partition to be snapshot.
+        Map<Integer, Set<Integer>> parts = new HashMap<>();
+
+        for (Integer grpId : req.grpIds)
+            parts.put(grpId, null);
+
+        SnapshotFutureTask task0 = registerSnapshotTask(req.snpName,
+            req.srcNodeId,
+            parts,
+            locSndrFactory.apply(req.snpName));
+
+        clusterSnpRq = req;
+
+        return task0.chain(f -> new SnapshotOperationResponse());
+    }
+
+    /**
+     * @param id Request id.
+     * @param res Results.
+     * @param err Errors.
+     */
+    private void processLocalSnapshotStartStageResult(UUID id, Map<UUID, SnapshotOperationResponse> res, Map<UUID, Exception> err) {
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        if (snpRq == null)
+            return;
+
+        if (!snpRq.rqId.equals(id)) {
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null && clusterSnpFut.rqId.equals(id)) {
+                    clusterSnpFut.onDone(new IgniteCheckedException("Snapshot operation failed due to another snapshot " +
+                        "operation in progress: " + snpRq.snpName));
+
+                    clusterSnpFut = null;
+                }
+
+                return;
+            }
+        }
+
+        if (isLocalNodeCoordinator(cctx.discovery())) {
+            Set<UUID> missed = new HashSet<>(snpRq.bltNodes);
+            missed.removeAll(res.keySet());
+            missed.removeAll(err.keySet());
+
+            snpRq.hasErr = !F.isEmpty(err) || !missed.isEmpty();
+
+            if (snpRq.hasErr) {
+                U.warn(log, "Execution of local snapshot tasks fails or them haven't been executed " +
+                    "due to some of nodes left the cluster. Uncompleted snapshot will be deleted " +
+                    "[err=" + err + ", missed=" + missed + ']');
+            }
+
+            endSnpProc.start(UUID.randomUUID(), snpRq);
+        }
+    }
+
+    /**
+     * @param req Request on snapshot creation.
+     * @return Future which will be completed when the snapshot will be finalized.
+     */
+    private IgniteInternalFuture<SnapshotOperationResponse> initLocalSnapshotEndStage(SnapshotOperationRequest req) {
+        if (clusterSnpRq == null)
+            return new GridFinishedFuture<>(new SnapshotOperationResponse());
+
+        try {
+            if (req.hasErr)
+                deleteSnapshot(snapshotLocalDir(req.snpName), pdsSettings.folderName());
+
+            removeLastMetaStorageKey();
+        }
+        catch (Exception e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        return new GridFinishedFuture<>(new SnapshotOperationResponse());
+    }
+
+    /**
+     * @param id Request id.
+     * @param res Results.
+     * @param err Errors.
+     */
+    private void processLocalSnapshotEndStageResult(UUID id, Map<UUID, SnapshotOperationResponse> res, Map<UUID, Exception> err) {
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        if (snpRq == null)
+            return;
+
+        Set<UUID> endFail = new HashSet<>(snpRq.bltNodes);
+        endFail.removeAll(res.keySet());
+
+        clusterSnpRq = null;
+
+        synchronized (snpOpMux) {
+            if (clusterSnpFut != null) {
+                if (endFail.isEmpty() && !snpRq.hasErr) {
+                    clusterSnpFut.onDone();
+
+                    if (log.isInfoEnabled())
+                        log.info("Cluster-wide snapshot operation finished successfully [req=" + snpRq + ']');
+                }
+                else {
+                    clusterSnpFut.onDone(new IgniteCheckedException("Snapshot creation has been finished with an error. " +
+                        "Local snapshot tasks may not finished completely or finalizing results fails " +
+                        "[hasErr" + snpRq.hasErr + ", fail=" + endFail + ']'));
+                }
+
+                clusterSnpFut = null;
+            }
+        }
+    }
+
+    /**
+     * @return {@code True} if snapshot operation is in progress.
+     */
+    public boolean isSnapshotCreating() {
+        if (clusterSnpRq != null)
+            return true;
+
+        synchronized (snpOpMux) {
+            return clusterSnpRq != null || clusterSnpFut != null;
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public List<String> getSnapshots() {
+        if (cctx.kernalContext().clientNode())
+            throw new UnsupportedOperationException("Client and daemon nodes can not perform this operation.");
+
+        synchronized (snpOpMux) {
+            return Arrays.stream(locSnpDir.listFiles(File::isDirectory))
+                .map(File::getName)
+                .collect(Collectors.toList());
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public IgniteFuture<Void> createSnapshot(String name) {
+        A.notNullOrEmpty(name, "name");
+
+        try {
+            if (cctx.kernalContext().clientNode())
+                throw new UnsupportedOperationException("Client and daemon nodes can not perform this operation.");
+
+            if (!IgniteFeatures.allNodesSupports(cctx.discovery().allNodes(), PERSISTENCE_CACHE_SNAPSHOT))
+                throw new IgniteException("Not all nodes in the cluster support a snapshot operation.");
+
+            if (!active(cctx.kernalContext().state().clusterState().state()))
+                throw new IgniteException("Snapshot operation has been rejected. The cluster is inactive.");
+
+            DiscoveryDataClusterState clusterState = cctx.kernalContext().state().clusterState();
+
+            if (!clusterState.hasBaselineTopology())
+                throw new IgniteException("Snapshot operation has been rejected. The baseline topology is not configured for cluster.");
+
+            ClusterSnapshotFuture snpFut0;
+
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null && !clusterSnpFut.isDone())
+                    throw new IgniteException("Create snapshot request has been rejected. The previous snapshot operation was not completed.");
+
+                if (clusterSnpRq != null)
+                    throw new IgniteException("Create snapshot request has been rejected. Parallel snapshot processes are not allowed.");
+
+                if (getSnapshots().contains(name))
+                    throw new IgniteException("Create snapshot request has been rejected. Snapshot with given name already exists.");
+
+                snpFut0 = new ClusterSnapshotFuture(UUID.randomUUID(), name);
+
+                clusterSnpFut = snpFut0;
+                lastSeenSnpFut = snpFut0;
+            }
+
+            List<Integer> grps = cctx.cache().persistentGroups().stream()
+                .filter(g -> cctx.cache().cacheType(g.cacheOrGroupName()) == CacheType.USER)
+                .filter(g -> !g.config().isEncryptionEnabled())
+                .map(CacheGroupDescriptor::groupId)
+                .collect(Collectors.toList());
+
+            List<ClusterNode> srvNodes = cctx.discovery().serverNodes(AffinityTopologyVersion.NONE);
+
+            startSnpProc.start(snpFut0.rqId, new SnapshotOperationRequest(snpFut0.rqId,
+                cctx.localNodeId(),
+                name,
+                grps,
+                new HashSet<>(F.viewReadOnly(srvNodes,
+                    F.node2id(),
+                    (node) -> CU.baselineNode(node, clusterState)))));
+
+            if (log.isInfoEnabled())
+                log.info("Cluster-wide snapshot operation started [snpName=" + name + ", grps=" + grps + ']');
+
+            return new IgniteFutureImpl<>(snpFut0);
+        }
+        catch (Exception e) {
+            U.error(log, "Start snapshot operation failed", e);
+
+            lastSeenSnpFut = new ClusterSnapshotFuture(name, e);
+
+            return new IgniteFinishedFutureImpl<>(e);
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onReadyForReadWrite(ReadWriteMetastorage metaStorage) throws IgniteCheckedException {
+        synchronized (snpOpMux) {
+            this.metaStorage = metaStorage;
+
+            if (recovered)
+                removeLastMetaStorageKey();
+
+            recovered = false;
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onReadyForRead(ReadOnlyMetastorage metaStorage) throws IgniteCheckedException {
+        // Snapshot which has not been completed due to the local node crashed must be deleted.
+        String snpName = (String)metaStorage.read(SNP_RUNNING_KEY);
+
+        if (snpName == null)
+            return;
+
+        recovered = true;
+
+        for (File tmp : snapshotTmpDir().listFiles())
+            U.delete(tmp);
+
+        deleteSnapshot(snapshotLocalDir(snpName), pdsSettings.folderName());
+
+        if (log.isInfoEnabled()) {
+            log.info("Previous attempt to create snapshot fail due to the local node crash. All resources " +
+                "related to snapshot operation have been deleted: " + snpName);
+        }
+    }
+
+    /**
+     * @param evt Discovery event to check.
+     * @return {@code true} if exchange started by snapshot operation.
+     */
+    public static boolean isSnapshotOperation(DiscoveryEvent evt) {
+        return !evt.eventNode().isClient() &&
+            evt.type() == EVT_DISCOVERY_CUSTOM_EVT &&
+            ((DiscoveryCustomEvent)evt).customMessage() instanceof SnapshotStartDiscoveryMessage;
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onDoneBeforeTopologyUnlock(GridDhtPartitionsExchangeFuture fut) {
+        if (clusterSnpRq == null || cctx.kernalContext().clientNode())
+            return;
+
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        SnapshotFutureTask task = locSnpTasks.get(snpRq.snpName);
+
+        if (task == null)
+            return;
+
+        if (task.start()) {
+            cctx.database().forceCheckpoint(String.format("Start snapshot operation: %s", snpRq.snpName));
+
+            // schedule task on checkpoint and wait when it starts
+            try {
+                task.awaitStarted();
+            }
+            catch (IgniteCheckedException e) {
+                U.error(log, "Fail to wait while cluster-wide snapshot operation started", e);
+            }
+        }
+    }
+
+    /**
+     * @param parts Collection of pairs group and appropriate cache partition to be snapshot.
+     * @param rmtNodeId The remote node to connect to.
+     * @param partConsumer Received partition handler.
+     * @return Future which will be completed when requested snapshot fully received.
+     */
+    public IgniteInternalFuture<Void> createRemoteSnapshot(
+        UUID rmtNodeId,
+        Map<Integer, Set<Integer>> parts,
+        BiConsumer<File, GroupPartitionId> partConsumer
+    ) {
+        assert partConsumer != null;
+
+        ClusterNode rmtNode = cctx.discovery().node(rmtNodeId);
+
+        if (!nodeSupports(rmtNode, PERSISTENCE_CACHE_SNAPSHOT))
+            return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot on remote node is not supported: " + rmtNode.id()));
+
+        if (rmtNode == null) {
+            return new GridFinishedFuture<>(new ClusterTopologyCheckedException("Snapshot request cannot be performed. " +
+                "Remote node left the grid [rmtNodeId=" + rmtNodeId + ']'));
+        }
+
+        String snpName = RMT_SNAPSHOT_PREFIX + UUID.randomUUID().toString();
+
+        RemoteSnapshotFuture snpTransFut = new RemoteSnapshotFuture(rmtNodeId, snpName, partConsumer);
+
+        busyLock.enterBusy();
+        SnapshotRequestMessage msg0;
+
+        try {
+            msg0 = new SnapshotRequestMessage(snpName, parts);
+
+            RemoteSnapshotFuture fut = rmtSnpReq.get();
+
+            try {
+                if (fut != null)
+                    fut.get(DFLT_SNAPSHOT_TIMEOUT, TimeUnit.MILLISECONDS);
+            }
+            catch (IgniteCheckedException e) {
+                if (log.isInfoEnabled())
+                    log.info("The previous snapshot request finished with an exception:" + e.getMessage());
+            }
+
+            try {
+                if (rmtSnpReq.compareAndSet(null, snpTransFut)) {
+                    cctx.gridIO().sendOrderedMessage(rmtNode, DFLT_INITIAL_SNAPSHOT_TOPIC, msg0, SYSTEM_POOL,
+                        Long.MAX_VALUE, true);
+                }
+                else
+                    return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot request has been concurrently interrupted."));
+
+            }
+            catch (IgniteCheckedException e) {
+                rmtSnpReq.compareAndSet(snpTransFut, null);
+
+                return new GridFinishedFuture<>(e);
+            }
+        }
+        finally {
+            busyLock.leaveBusy();
+        }
+
+        if (log.isInfoEnabled()) {
+            log.info("Snapshot request is sent to the remote node [rmtNodeId=" + rmtNodeId +
+                ", msg0=" + msg0 + ", snpTransFut=" + snpTransFut +
+                ", topVer=" + cctx.discovery().topologyVersionEx() + ']');
+        }
+
+        return snpTransFut;
+    }
+
+    /**
+     * @param grps List of cache groups which will be destroyed.
+     */
+    public void onCacheGroupsStopped(List<Integer> grps) {
+        for (SnapshotFutureTask sctx : locSnpTasks.values()) {
+            Set<Integer> retain = new HashSet<>(grps);
+            retain.retainAll(sctx.affectedCacheGroups());
+
+            if (!retain.isEmpty()) {
+                sctx.acceptException(new IgniteCheckedException("Snapshot has been interrupted due to some of the required " +
+                    "cache groups stopped: " + retain));
+            }
+        }
+    }
+
+    /**
+     * @param snpName Unique snapshot name.
+     * @param srcNodeId Node id which cause snapshot operation.
+     * @param parts Collection of pairs group and appropriate cache partition to be snapshot.
+     * @param snpSndr Factory which produces snapshot receiver instance.
+     * @return Snapshot operation task which should be registered on checkpoint to run.
+     */
+    SnapshotFutureTask registerSnapshotTask(
+        String snpName,
+        UUID srcNodeId,
+        Map<Integer, Set<Integer>> parts,
+        SnapshotSender snpSndr
+    ) {
+        if (!busyLock.enterBusy())
+            return new SnapshotFutureTask(new IgniteCheckedException("Snapshot manager is stopping [locNodeId=" + cctx.localNodeId() + ']'));
+
+        try {
+            if (locSnpTasks.containsKey(snpName))
+                return new SnapshotFutureTask(new IgniteCheckedException("Snapshot with requested name is already scheduled: " + snpName));
+
+            SnapshotFutureTask snpFutTask;
+
+            SnapshotFutureTask prev = locSnpTasks.putIfAbsent(snpName,
+                snpFutTask = new SnapshotFutureTask(cctx,
+                    srcNodeId,
+                    snpName,
+                    tmpWorkDir,
+                    ioFactory,
+                    snpSndr,
+                    parts,
+                    locBuff));
+
+            if (prev != null)
+                return new SnapshotFutureTask(new IgniteCheckedException("Snapshot with requested name is already scheduled: " + snpName));
+
+            if (log.isInfoEnabled()) {
+                log.info("Snapshot task has been registered on local node [sctx=" + this +
+                    ", topVer=" + cctx.discovery().topologyVersionEx() + ']');
+            }
+
+            snpFutTask.listen(f -> locSnpTasks.remove(snpName));
+
+            return snpFutTask;
+        }
+        finally {
+            busyLock.leaveBusy();
+        }
+    }
+
+    /**
+     * @param factory Factory which produces {@link LocalSnapshotSender} implementation.
+     */
+    void setLocalSnapshotSenderFactory(Function<String, SnapshotSender> factory) {
+        locSndrFactory = factory;
+    }
+
+    /**
+     * @return Factory which produces {@link LocalSnapshotSender} implementation.
+     */
+    Function<String, SnapshotSender> localSnapshotSenderFactory() {
+        return LocalSnapshotSender::new;
+    }
+
+    /**
+     * @param snpName Snapshot name.
+     * @param rmtNodeId Remote node id to send snapshot to.
+     * @return Snapshot sender instance.
+     */
+    SnapshotSender remoteSnapshotSender(String snpName, UUID rmtNodeId) {
+        // Remote snapshots can be send only by single threaded executor since only one transmissionSender created.
+        return new RemoteSnapshotSender(log,
+            new SequentialExecutorWrapper(log, snpRunner),
+            () -> databaseRelativePath(pdsSettings.folderName()),
+            cctx.gridIO().openTransmissionSender(rmtNodeId, DFLT_INITIAL_SNAPSHOT_TOPIC),
+            snpName);
+    }
+
+    /** Snapshot finished successfully or already restored. Key can be removed. */
+    private void removeLastMetaStorageKey() throws IgniteCheckedException {
+        cctx.database().checkpointReadLock();
+
+        try {
+            metaStorage.remove(SNP_RUNNING_KEY);
+        }
+        finally {
+            cctx.database().checkpointReadUnlock();
+        }
+    }
+
+    /**
+     * @return The executor service used to run snapshot tasks.
+     */
+    ExecutorService snapshotExecutorService() {
+        assert snpRunner != null;
+
+        return snpRunner;
+    }
+
+    /**
+     * @param ioFactory Factory to create IO interface over a page stores.
+     */
+    void ioFactory(FileIOFactory ioFactory) {
+        this.ioFactory = ioFactory;
+    }
+
+    /**
+     * @param nodeId Remote node id on which requests has been registered.
+     * @return Snapshot future related to given node id.
+     */
+    SnapshotFutureTask lastScheduledRemoteSnapshotTask(UUID nodeId) {
+        return locSnpTasks.values().stream()
+            .filter(t -> t.type() == RemoteSnapshotSender.class && t.sourceNodeId().equals(nodeId))
+            .findFirst()
+            .orElse(null);
+    }
+
+    /**
+     * @return Relative configured path of persistence data storage directory for the local node.
+     * Example: {@code snapshotWorkDir/db/IgniteNodeName0}
+     */
+    static String databaseRelativePath(String folderName) {
+        return Paths.get(DB_DEFAULT_FOLDER, folderName).toString();
+    }
+
+    /**
+     * @param cfg Ignite configuration.
+     * @return Snapshot work path.
+     */
+    static File resolveSnapshotWorkDirectory(IgniteConfiguration cfg) {
+        try {
+            return cfg.getSnapshotPath() == null ?
+                U.resolveWorkDirectory(cfg.getWorkDirectory(), DFLT_SNAPSHOT_DIRECTORY, false) :
+                U.resolveWorkDirectory(cfg.getWorkDirectory(), cfg.getSnapshotPath(), false);
+        }
+        catch (IgniteCheckedException e) {
+            throw new IgniteException(e);
+        }
+    }
+
+    /** Remote snapshot future which tracks remote snapshot transmission result. */
+    private class RemoteSnapshotFuture extends GridFutureAdapter<Void> {
+        /** Snapshot name to create. */
+        private final String snpName;
+
+        /** Remote node id to request snapshot from. */
+        private final UUID rmtNodeId;
+
+        /** Collection of partition to be received. */
+        private final Map<GroupPartitionId, FilePageStore> stores = new ConcurrentHashMap<>();
+
+        /** Partition handler given by request initiator. */
+        private final BiConsumer<File, GroupPartitionId> partConsumer;
+
+        /** Counter which show how many partitions left to be received. */
+        private int partsLeft = -1;
+
+        /**
+         * @param partConsumer Received partition handler.
+         */
+        public RemoteSnapshotFuture(UUID rmtNodeId, String snpName, BiConsumer<File, GroupPartitionId> partConsumer) {
+            this.snpName = snpName;
+            this.rmtNodeId = rmtNodeId;
+            this.partConsumer = partConsumer;
+        }
+
+        /** {@inheritDoc} */
+        @Override public boolean cancel() {
+            return onCancelled();
+        }
+
+        /** {@inheritDoc} */
+        @Override protected boolean onDone(@Nullable Void res, @Nullable Throwable err, boolean cancel) {
+            assert err != null || cancel || stores.isEmpty() : "Not all file storage processed: " + stores;
+
+            rmtSnpReq.compareAndSet(this, null);
+
+            if (err != null || cancel) {
+                // Close non finished file storage.
+                for (Map.Entry<GroupPartitionId, FilePageStore> entry : stores.entrySet()) {
+                    FilePageStore store = entry.getValue();
+
+                    try {
+                        store.stop(true);
+                    }
+                    catch (StorageException e) {
+                        log.warning("Error stopping received file page store", e);
+                    }
+                }
+            }
+
+            U.delete(Paths.get(tmpWorkDir.getAbsolutePath(), snpName));
+
+            return super.onDone(res, err, cancel);
+        }
+
+        /** {@inheritDoc} */
+        @Override public boolean equals(Object o) {
+            if (this == o)
+                return true;
+
+            if (o == null || getClass() != o.getClass())
+                return false;
+
+            RemoteSnapshotFuture fut = (RemoteSnapshotFuture)o;
+
+            return rmtNodeId.equals(fut.rmtNodeId) &&
+                snpName.equals(fut.snpName);
+        }
+
+        /** {@inheritDoc} */
+        @Override public int hashCode() {
+            return Objects.hash(rmtNodeId, snpName);
+        }
+
+        /** {@inheritDoc} */
+        @Override public String toString() {
+            return S.toString(RemoteSnapshotFuture.class, this);
+        }
+    }
+
+    /**
+     * Such an executor can executes tasks not in a single thread, but executes them
+     * on different threads sequentially. It's important for some {@link SnapshotSender}'s
+     * to process sub-task sequentially due to all these sub-tasks may share a single socket
+     * channel to send data to.
+     */
+    private static class SequentialExecutorWrapper implements Executor {
+        /** Ignite logger. */
+        private final IgniteLogger log;
+
+        /** Queue of task to execute. */
+        private final Queue<Runnable> tasks = new ArrayDeque<>();
+
+        /** Delegate executor. */
+        private final Executor executor;
+
+        /** Currently running task. */
+        private volatile Runnable active;
+
+        /** If wrapped executor is shutting down. */
+        private volatile boolean stopping;
+
+        /**
+         * @param executor Executor to run tasks on.
+         */
+        public SequentialExecutorWrapper(IgniteLogger log, Executor executor) {
+            this.log = log.getLogger(SequentialExecutorWrapper.class);
+            this.executor = executor;
+        }
+
+        /** {@inheritDoc} */
+        @Override public synchronized void execute(final Runnable r) {
+            assert !stopping : "Task must be cancelled prior to the wrapped executor is shutting down.";
+
+            tasks.offer(() -> {
+                try {
+                    r.run();
+                }
+                finally {
+                    scheduleNext();
+                }
+            });
+
+            if (active == null)
+                scheduleNext();
+        }
+
+        /** */
+        protected synchronized void scheduleNext() {
+            if ((active = tasks.poll()) != null) {
+                try {
+                    executor.execute(active);
+                }
+                catch (RejectedExecutionException e) {
+                    tasks.clear();
+
+                    stopping = true;
+
+                    log.warning("Task is outdated. Wrapped executor is shutting down.", e);
+                }
+            }
+        }
+    }
+
+    /**
+     *
+     */
+    private static class RemoteSnapshotSender extends SnapshotSender {
+        /** The sender which sends files to remote node. */
+        private final GridIoManager.TransmissionSender sndr;
+
+        /** Relative node path initializer. */
+        private final Supplier<String> initPath;
+
+        /** Snapshot name */
+        private final String snpName;
+
+        /** Local node persistent directory with consistent id. */
+        private String relativeNodePath;
+
+        /** The number of cache partition files expected to be processed. */
+        private int partsCnt;
+
+        /**
+         * @param log Ignite logger.
+         * @param sndr File sender instance.
+         * @param snpName Snapshot name.
+         */
+        public RemoteSnapshotSender(
+            IgniteLogger log,
+            Executor exec,
+            Supplier<String> initPath,
+            GridIoManager.TransmissionSender sndr,
+            String snpName
+        ) {
+            super(log, exec);
+
+            this.sndr = sndr;
+            this.snpName = snpName;
+            this.initPath = initPath;
+        }
+
+        /** {@inheritDoc} */
+        @Override protected void init(int partsCnt) {
+            this.partsCnt = partsCnt;
+
+            relativeNodePath = initPath.get();
+
+            if (relativeNodePath == null)
+                throw new IgniteException("Relative node path cannot be empty.");
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendPart0(File part, String cacheDirName, GroupPartitionId pair, Long len) {
+            try {
+                assert part.exists();
+                assert len > 0 : "Requested partitions has incorrect file length " +
+                    "[pair=" + pair + ", cacheDirName=" + cacheDirName + ']';
+
+                sndr.send(part, 0, len, transmissionParams(snpName, cacheDirName, pair), TransmissionPolicy.FILE);
+
+                if (log.isInfoEnabled()) {
+                    log.info("Partition file has been send [part=" + part.getName() + ", pair=" + pair +
+                        ", length=" + len + ']');
+                }
+            }
+            catch (TransmissionCancelledException e) {
+                if (log.isInfoEnabled()) {
+                    log.info("Transmission partition file has been interrupted [part=" + part.getName() +
+                        ", pair=" + pair + ']');
+                }
+            }
+            catch (IgniteCheckedException | InterruptedException | IOException e) {
+                U.error(log, "Error sending partition file [part=" + part.getName() + ", pair=" + pair +
+                    ", length=" + len + ']', e);
+
+                throw new IgniteException(e);
+            }
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendDelta0(File delta, String cacheDirName, GroupPartitionId pair) {
+            try {
+                sndr.send(delta, transmissionParams(snpName, cacheDirName, pair), TransmissionPolicy.CHUNK);
+
+                if (log.isInfoEnabled())
+                    log.info("Delta pages storage has been send [part=" + delta.getName() + ", pair=" + pair + ']');
+            }
+            catch (TransmissionCancelledException e) {
+                if (log.isInfoEnabled()) {
+                    log.info("Transmission delta pages has been interrupted [part=" + delta.getName() +
+                        ", pair=" + pair + ']');
+                }
+            }
+            catch (IgniteCheckedException | InterruptedException | IOException e) {
+                U.error(log, "Error sending delta file  [part=" + delta.getName() + ", pair=" + pair + ']', e);
+
+                throw new IgniteException(e);
+            }
+        }
+
+        /**
+         * @param cacheDirName Cache directory name.
+         * @param pair Cache group id with corresponding partition id.
+         * @return Map of params.
+         */
+        private Map<String, Serializable> transmissionParams(String snpName, String cacheDirName,
+            GroupPartitionId pair) {
+            Map<String, Serializable> params = new HashMap<>();
+
+            params.put(SNP_GRP_ID_PARAM, pair.getGroupId());
+            params.put(SNP_PART_ID_PARAM, pair.getPartitionId());
+            params.put(SNP_DB_NODE_PATH_PARAM, relativeNodePath);
+            params.put(SNP_CACHE_DIR_NAME_PARAM, cacheDirName);
+            params.put(SNP_NAME_PARAM, snpName);
+            params.put(SNP_PARTITIONS_CNT, partsCnt);
+
+            return params;
+        }
+
+        /** {@inheritDoc} */
+        @Override public void close0(@Nullable Throwable th) {
+            U.closeQuiet(sndr);
+
+            if (th == null) {
+                if (log.isInfoEnabled())
+                    log.info("The remote snapshot sender closed normally [snpName=" + snpName + ']');
+            }
+            else {
+                U.warn(log, "The remote snapshot sender closed due to an error occurred while processing " +
+                    "snapshot operation [snpName=" + snpName + ']', th);
+            }
+        }
+    }
+
+    /**
+     * Snapshot sender which writes all data to local directory.
+     */
+    private class LocalSnapshotSender extends SnapshotSender {
+        /** Snapshot name. */
+        private final String snpName;
+
+        /** Local snapshot directory. */
+        private final File snpLocDir;
+
+        /** Local node snapshot directory calculated on snapshot directory. */
+        private File dbDir;
+
+        /** Size of page. */
+        private final int pageSize;
+
+        /**
+         * @param snpName Snapshot name.
+         */
+        public LocalSnapshotSender(String snpName) {
+            super(IgniteSnapshotManager.this.log, snpRunner);
+
+            this.snpName = snpName;
+            snpLocDir = snapshotLocalDir(snpName);
+            pageSize = cctx.kernalContext().config().getDataStorageConfiguration().getPageSize();
+        }
+
+        /** {@inheritDoc} */
+        @Override protected void init(int partsCnt) {
+            dbDir = new File (snpLocDir, databaseRelativePath(pdsSettings.folderName()));
+
+            if (dbDir.exists()) {
+                throw new IgniteException("Snapshot with given name already exists " +
+                    "[snpName=" + snpName + ", absPath=" + dbDir.getAbsolutePath() + ']');
+            }
+
+            cctx.database().checkpointReadLock();
+
+            try {
+                assert metaStorage != null && metaStorage.read(SNP_RUNNING_KEY) == null :
+                    "The previous snapshot hasn't been completed correctly";
+
+                metaStorage.write(SNP_RUNNING_KEY, snpName);
+
+                U.ensureDirectory(dbDir, "snapshot work directory", log);
+            }
+            catch (IgniteCheckedException e) {
+                throw new IgniteException(e);
+            }
+            finally {
+                cctx.database().checkpointReadUnlock();
+            }
+
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendCacheConfig0(File ccfg, String cacheDirName) {
+            assert dbDir != null;
+
+            try {
+                File cacheDir = U.resolveWorkDirectory(dbDir.getAbsolutePath(), cacheDirName, false);
+
+                copy(ccfg, new File(cacheDir, ccfg.getName()), ccfg.length());
+            }
+            catch (IgniteCheckedException | IOException e) {
+                throw new IgniteException(e);
+            }
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendMarshallerMeta0(List<Map<Integer, MappedName>> mappings) {
+            if (mappings == null)
+                return;
+
+            saveMappings(cctx.kernalContext(), mappings, snpLocDir);
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendBinaryMeta0(Collection<BinaryType> types) {
+            if (types == null)
+                return;
+
+            cctx.kernalContext().cacheObjects().saveMetadata(types, snpLocDir);
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendPart0(File part, String cacheDirName, GroupPartitionId pair, Long len) {
+            try {
+                if (len == 0)
+                    return;
+
+                File cacheDir = U.resolveWorkDirectory(dbDir.getAbsolutePath(), cacheDirName, false);
+
+                File snpPart = new File(cacheDir, part.getName());
+
+                if (!snpPart.exists() || snpPart.delete())
+                    snpPart.createNewFile();
+
+                copy(part, snpPart, len);
+
+                if (log.isInfoEnabled()) {
+                    log.info("Partition has been snapshot [snapshotDir=" + dbDir.getAbsolutePath() +
+                        ", cacheDirName=" + cacheDirName + ", part=" + part.getName() +
+                        ", length=" + part.length() + ", snapshot=" + snpPart.getName() + ']');
+                }
+            }
+            catch (IOException | IgniteCheckedException ex) {
+                throw new IgniteException(ex);
+            }
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendDelta0(File delta, String cacheDirName, GroupPartitionId pair) {
+            File snpPart = getPartitionFile(dbDir, cacheDirName, pair.getPartitionId());
+
+            if (log.isInfoEnabled()) {
+                log.info("Start partition snapshot recovery with the given delta page file [part=" + snpPart +
+                    ", delta=" + delta + ']');
+            }
+
+            try (FileIO fileIo = ioFactory.create(delta, READ);
+                 FilePageStore pageStore = (FilePageStore)storeFactory
+                     .apply(pair.getGroupId(), false)
+                     .createPageStore(getFlagByPartId(pair.getPartitionId()),
+                         snpPart::toPath,
+                         new LongAdderMetric("NO_OP", null))
+            ) {
+                ByteBuffer pageBuf = ByteBuffer.allocate(pageSize)
+                    .order(ByteOrder.nativeOrder());
+
+                long totalBytes = fileIo.size();
+
+                assert totalBytes % pageSize == 0 : "Given file with delta pages has incorrect size: " + fileIo.size();
+
+                pageStore.beginRecover();
+
+                for (long pos = 0; pos < totalBytes; pos += pageSize) {
+                    long read = fileIo.readFully(pageBuf, pos);
+
+                    assert read == pageBuf.capacity();
+
+                    pageBuf.flip();
+
+                    if (log.isDebugEnabled()) {
+                        log.debug("Read page given delta file [path=" + delta.getName() +
+                            ", pageId=" + PageIO.getPageId(pageBuf) + ", pos=" + pos + ", pages=" + (totalBytes / pageSize) +
+                            ", crcBuff=" + FastCrc.calcCrc(pageBuf, pageBuf.limit()) + ", crcPage=" + PageIO.getCrc(pageBuf) + ']');
+
+                        pageBuf.rewind();
+                    }
+
+                    pageStore.write(PageIO.getPageId(pageBuf), pageBuf, 0, false);
+
+                    pageBuf.flip();
+                }
+
+                pageStore.finishRecover();
+            }
+            catch (IOException | IgniteCheckedException e) {
+                throw new IgniteException(e);
+            }
+        }
+
+        /** {@inheritDoc} */
+        @Override protected void close0(@Nullable Throwable th) {
+            if (th == null) {
+                if (log.isInfoEnabled())
+                    log.info("Local snapshot sender closed, resources released [dbNodeSnpDir=" + dbDir + ']');
+            }
+            else {
+                deleteSnapshot(snpLocDir, pdsSettings.folderName());
+
+                U.warn(log, "Local snapshot sender closed due to an error occurred", th);
+            }
+        }
+
+        /**
+         * @param from Copy from file.
+         * @param to Copy data to file.
+         * @param length Number of bytes to copy from beginning.
+         * @throws IOException If fails.
+         */
+        private void copy(File from, File to, long length) throws IOException {
+            try (FileIO src = ioFactory.create(from, READ);
+                 FileChannel dest = new FileOutputStream(to).getChannel()) {
+                if (src.size() < length) {
+                    throw new IgniteException("The source file to copy has to enough length " +
+                        "[expected=" + length + ", actual=" + src.size() + ']');
+                }
+
+                src.position(0);
+
+                long written = 0;
+
+                while (written < length)
+                    written += src.transferTo(written, length - written, dest);
+            }
+        }
+    }
+
+    /** Snapshot start request for {@link DistributedProcess} initiate message. */
+    private static class SnapshotOperationRequest implements Serializable {
+        /** Serial version uid. */
+        private static final long serialVersionUID = 0L;
+
+        /** Unique snapshot request id. */
+        private final UUID rqId;
+
+        /** Source node id which trigger request. */
+        private final UUID srcNodeId;
+
+        /** Snapshot name. */
+        private final String snpName;
+
+        @GridToStringInclude
+        /** The list of cache groups to include into snapshot. */
+        private final List<Integer> grpIds;
+
+        @GridToStringInclude
+        /** The list of affected by snapshot operation baseline nodes. */
+        private final Set<UUID> bltNodes;
+
+        /** {@code true} if an execution of local snapshot tasks failed with an error. */
+        private volatile boolean hasErr;
+
+        /**
+         * @param snpName Snapshot name.
+         * @param grpIds Cache groups to include into snapshot.
+         */
+        public SnapshotOperationRequest(UUID rqId, UUID srcNodeId, String snpName, List<Integer> grpIds, Set<UUID> bltNodes) {
+            this.rqId = rqId;
+            this.srcNodeId = srcNodeId;
+            this.snpName = snpName;
+            this.grpIds = grpIds;
+            this.bltNodes = bltNodes;
+        }
+
+        /** {@inheritDoc} */
+        @Override public String toString() {
+            return S.toString(SnapshotOperationRequest.class, this);
+        }
+    }
+
+    /** */
+    private static class SnapshotOperationResponse implements Serializable {
+        /** Serial version uid. */
+        private static final long serialVersionUID = 0L;
+    }
+
+    /** Snapshot operation start message. */
+    private static class SnapshotStartDiscoveryMessage implements SnapshotDiscoveryMessage {
+        /** Serial version UID. */
+        private static final long serialVersionUID = 0L;
+
+        /** Discovery cache. */
+        private final DiscoCache discoCache;
+
+        /** Snapshot request id */
+        private final IgniteUuid id;
+
+        /**
+         * @param discoCache Discovery cache.
+         * @param id Snapshot request id.
+         */
+        public SnapshotStartDiscoveryMessage(DiscoCache discoCache, UUID id) {
+            this.discoCache = discoCache;
+            this.id = new IgniteUuid(id, 0);
+        }
+
+        /** {@inheritDoc} */
+        @Override public boolean needExchange() {
+            return true;
+        }
+
+        /** {@inheritDoc} */
+        @Override public boolean needAssignPartitions() {
+            return false;
+        }
+
+        /** {@inheritDoc} */
+        @Override public IgniteUuid id() {
+            return id;
+        }
+
+        /** {@inheritDoc} */
+        @Override public @Nullable DiscoveryCustomMessage ackMessage() {
+            return null;
+        }
+
+        /** {@inheritDoc} */
+        @Override public boolean isMutable() {
+            return false;
+        }
+
+        /** {@inheritDoc} */
+        @Override public DiscoCache createDiscoCache(GridDiscoveryManager mgr, AffinityTopologyVersion topVer,
+            DiscoCache discoCache) {
+            return this.discoCache;
+        }
+
+        /** {@inheritDoc} */
+        @Override public boolean equals(Object o) {
+            if (this == o)
+                return true;
+
+            if (o == null || getClass() != o.getClass())
+                return false;
+
+            SnapshotStartDiscoveryMessage message = (SnapshotStartDiscoveryMessage)o;
+
+            return id.equals(message.id);
+        }
+
+        /** {@inheritDoc} */
+        @Override public int hashCode() {
+            return Objects.hash(id);
+        }
+
+        /** {@inheritDoc} */
+        @Override public String toString() {
+            return S.toString(SnapshotStartDiscoveryMessage.class, this);
+        }
+    }
+
+    /** */
+    private static class ClusterSnapshotFuture extends GridFutureAdapter<Void> {
+        /** Unique snapshot request id. */
+        private final UUID rqId;
+
+        /** Snapshot name */
 
 Review comment:
   Fixed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r410185944
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotFutureTask.java
 ##########
 @@ -0,0 +1,881 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.Closeable;
+import java.io.File;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.atomic.AtomicBoolean;
+import java.util.concurrent.atomic.AtomicIntegerArray;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.concurrent.locks.ReadWriteLock;
+import java.util.concurrent.locks.ReentrantReadWriteLock;
+import java.util.function.BooleanSupplier;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.internal.pagemem.PageIdUtils;
+import org.apache.ignite.internal.pagemem.store.PageStore;
+import org.apache.ignite.internal.pagemem.store.PageWriteListener;
+import org.apache.ignite.internal.processors.cache.CacheGroupContext;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtLocalPartition;
+import org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionState;
+import org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopology;
+import org.apache.ignite.internal.processors.cache.persistence.DbCheckpointListener;
+import org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.tree.io.PageIO;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.processors.marshaller.MappedName;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.lang.IgniteThrowableRunner;
+import org.apache.ignite.internal.util.tostring.GridToStringExclude;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.A;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.S;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.INDEX_PARTITION;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.cacheDirName;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.cacheWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFile;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.partDeltaFile;
+
+/**
+ *
+ */
+class SnapshotFutureTask extends GridFutureAdapter<Boolean> implements DbCheckpointListener {
+    /** Shared context. */
+    private final GridCacheSharedContext<?, ?> cctx;
+
+    /** Ignite logger. */
+    private final IgniteLogger log;
+
+    /** Node id which cause snapshot operation. */
+    private final UUID srcNodeId;
+
+    /** Unique identifier of snapshot process. */
+    private final String snpName;
+
+    /** Snapshot working directory on file system. */
+    private final File tmpTaskWorkDir;
+
+    /** Local buffer to perform copy-on-write operations for {@link PageStoreSerialWriter}. */
+    private final ThreadLocal<ByteBuffer> locBuff;
+
+    /** IO factory which will be used for creating snapshot delta-writers. */
+    private final FileIOFactory ioFactory;
+
+    /**
+     * The length of file size per each cache partition file.
+     * Partition has value greater than zero only for partitions in OWNING state.
+     * Information collected under checkpoint write lock.
+     */
+    private final Map<GroupPartitionId, Long> partFileLengths = new HashMap<>();
+
+    /**
+     * Map of partitions to snapshot and theirs corresponding delta PageStores.
+     * Writers are pinned to the snapshot context due to controlling partition
+     * processing supplier.
+     */
+    private final Map<GroupPartitionId, PageStoreSerialWriter> partDeltaWriters = new HashMap<>();
+
+    /** Snapshot data sender. */
+    @GridToStringExclude
+    private final SnapshotSender snpSndr;
+
+    /**
+     * Requested map of cache groups and its partitions to include into snapshot. If array of partitions
+     * is {@code null} than all OWNING partitions for given cache groups will be included into snapshot.
+     * In this case if all of partitions have OWNING state the index partition also will be included.
+     * <p>
+     * If partitions for particular cache group are not provided that they will be collected and added
+     * on checkpoint under the write lock.
+     */
+    private final Map<Integer, Set<Integer>> parts;
+
+    /** Cache group and corresponding partitions collected under the checkpoint write lock. */
+    private final Map<Integer, Set<Integer>> processed = new HashMap<>();
+
+    /** Checkpoint end future. */
+    private final CompletableFuture<Boolean> cpEndFut = new CompletableFuture<>();
+
+    /** Future to wait until checkpoint mark phase will be finished and snapshot tasks scheduled. */
+    private final GridFutureAdapter<Void> startedFut = new GridFutureAdapter<>();
+
+    /** Absolute snapshot storage path. */
+    private File tmpSnpDir;
+
+    /** Future which will be completed when task requested to be closed. Will be executed on system pool. */
+    private volatile CompletableFuture<Void> closeFut;
+
+    /** An exception which has been occurred during snapshot processing. */
+    private final AtomicReference<Throwable> err = new AtomicReference<>();
+
+    /** Flag indicates that task already scheduled on checkpoint. */
+    private final AtomicBoolean started = new AtomicBoolean();
+
+    /**
+     * @param e Finished snapshot task future with particular exception.
+     */
+    public SnapshotFutureTask(IgniteCheckedException e) {
+        A.notNull(e, "Exception for a finished snapshot task must be not null");
+
+        cctx = null;
+        log = null;
+        snpName = null;
+        srcNodeId = null;
+        tmpTaskWorkDir = null;
+        snpSndr = null;
+
+        err.set(e);
+        startedFut.onDone(e);
+        onDone(e);
+        parts = null;
+        ioFactory = null;
+        locBuff = null;
+    }
+
+    /**
+     * @param snpName Unique identifier of snapshot task.
+     * @param ioFactory Factory to working with delta as file storage.
+     * @param parts Map of cache groups and its partitions to include into snapshot, if set of partitions
+     * is {@code null} than all OWNING partitions for given cache groups will be included into snapshot.
+     */
+    public SnapshotFutureTask(
+        GridCacheSharedContext<?, ?> cctx,
+        UUID srcNodeId,
+        String snpName,
+        File tmpWorkDir,
+        FileIOFactory ioFactory,
+        SnapshotSender snpSndr,
+        Map<Integer, Set<Integer>> parts,
+        ThreadLocal<ByteBuffer> locBuff
+    ) {
+        A.notNull(snpName, "Snapshot name cannot be empty or null");
+        A.notNull(snpSndr, "Snapshot sender which handles execution tasks must be not null");
+        A.notNull(snpSndr.executor(), "Executor service must be not null");
+
+        this.parts = parts;
+        this.cctx = cctx;
+        this.log = cctx.logger(SnapshotFutureTask.class);
+        this.snpName = snpName;
+        this.srcNodeId = srcNodeId;
+        this.tmpTaskWorkDir = new File(tmpWorkDir, snpName);
+        this.snpSndr = snpSndr;
+        this.ioFactory = ioFactory;
+        this.locBuff = locBuff;
+    }
+
+    /**
+     * @return Snapshot name.
+     */
+    public String snapshotName() {
+        return snpName;
+    }
+
+    /**
+     * @return Node id which triggers this operation.
+     */
+    public UUID sourceNodeId() {
+        return srcNodeId;
+    }
+
+    /**
+     * @return Type of snapshot operation.
+     */
+    public Class<? extends SnapshotSender> type() {
+        return snpSndr.getClass();
+    }
+
+    /**
+     * @return Set of cache groups included into snapshot operation.
+     */
+    public Set<Integer> affectedCacheGroups() {
+        return parts.keySet();
+    }
+
+    /**
+     * @param th An exception which occurred during snapshot processing.
+     */
+    public void acceptException(Throwable th) {
+        if (th == null)
+            return;
+
+        if (err.compareAndSet(null, th))
+            closeAsync();
+
+        startedFut.onDone(th);
+
+        U.warn(log, "Snapshot task has accepted exception to stop: " + th);
+    }
+
+    /** {@inheritDoc} */
+    @Override public boolean onDone(@Nullable Boolean res, @Nullable Throwable err) {
+        for (PageStoreSerialWriter writer : partDeltaWriters.values())
+            U.closeQuiet(writer);
+
+        snpSndr.close(err);
+
+        if (tmpSnpDir != null)
+            U.delete(tmpSnpDir);
+
+        // Delete snapshot directory if no other files exists.
+        try {
+            if (U.fileCount(tmpTaskWorkDir.toPath()) == 0 || err != null)
+                U.delete(tmpTaskWorkDir.toPath());
+        }
+        catch (IOException e) {
+            log.error("Snapshot directory doesn't exist [snpName=" + snpName + ", dir=" + tmpTaskWorkDir + ']');
+        }
+
+        if (err != null)
+            startedFut.onDone(err);
+
+        return super.onDone(res, err);
+    }
+
+    /**
+     * @throws IgniteCheckedException If fails.
+     */
+    public void awaitStarted() throws IgniteCheckedException {
+        startedFut.get();
+    }
+
+    /**
+     * @return {@code true} if current task requested to be stopped.
+     */
+    private boolean stopping() {
+        return err.get() != null;
+    }
+
+    /**
+     * Initiates snapshot task.
+     *
+     * @return {@code true} if task started by this call.
+     */
+    public boolean start() {
+        if (stopping())
+            return false;
+
+        try {
+            if (!started.compareAndSet(false, true))
+                return false;
+
+            tmpSnpDir = U.resolveWorkDirectory(tmpTaskWorkDir.getAbsolutePath(),
+                databaseRelativePath(cctx.kernalContext().pdsFolderResolver().resolveFolders().folderName()),
+                false);
+
+            for (Integer grpId : parts.keySet()) {
+                CacheGroupContext gctx = cctx.cache().cacheGroup(grpId);
+
+                if (gctx == null)
+                    throw new IgniteCheckedException("Cache group context not found: " + grpId);
+
+                if (!CU.isPersistentCache(gctx.config(), cctx.kernalContext().config().getDataStorageConfiguration()))
+                    throw new IgniteCheckedException("In-memory cache groups are not allowed to be snapshot: " + grpId);
+
+                if (gctx.config().isEncryptionEnabled())
+                    throw new IgniteCheckedException("Encrypted cache groups are not allowed to be snapshot: " + grpId);
+
+                // Create cache group snapshot directory on start in a single thread.
+                U.ensureDirectory(cacheWorkDir(tmpSnpDir, cacheDirName(gctx.config())),
+                    "directory for snapshotting cache group",
+                    log);
+            }
+
+            startedFut.listen(f ->
+                ((GridCacheDatabaseSharedManager)cctx.database()).removeCheckpointListener(this)
+            );
+
+            // Listener will be removed right after first execution.
+            ((GridCacheDatabaseSharedManager)cctx.database()).addCheckpointListener(this);
+
+            if (log.isInfoEnabled()) {
+                log.info("Snapshot operation is scheduled on local node and will be handled by the checkpoint " +
+                    "listener [sctx=" + this + ", topVer=" + cctx.discovery().topologyVersionEx() + ']');
+            }
+        }
+        catch (IgniteCheckedException e) {
+            acceptException(e);
+
+            return false;
+        }
+
+        return true;
+    }
+
+    /** {@inheritDoc} */
+    @Override public void beforeCheckpointBegin(Context ctx) {
+        if (stopping())
+            return;
+
+        ctx.finishedStateFut().listen(f -> {
+            if (f.error() == null)
+                cpEndFut.complete(true);
+            else
+                cpEndFut.completeExceptionally(f.error());
+        });
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onMarkCheckpointBegin(Context ctx) {
+        // Write lock is hold. Partition pages counters has been collected under write lock.
+        if (stopping())
+            return;
+
+        try {
+            for (Map.Entry<Integer, Set<Integer>> e : parts.entrySet()) {
+                int grpId = e.getKey();
+                Set<Integer> grpParts = e.getValue();
+
+                GridDhtPartitionTopology top = cctx.cache().cacheGroup(grpId).topology();
+
+                Iterator<GridDhtLocalPartition> iter;
+
+                if (grpParts == null)
+                    iter = top.currentLocalPartitions().iterator();
+                else {
+                    if (grpParts.contains(INDEX_PARTITION)) {
+                        throw new IgniteCheckedException("Index partition cannot be included into snapshot if " +
+                            " set of cache group partitions has been explicitly provided [grpId=" + grpId + ']');
+                    }
+
+                    iter = F.iterator(grpParts, top::localPartition, false);
+                }
+
+                Set<Integer> owning = new HashSet<>();
+                Set<Integer> missed = new HashSet<>();
+
+                // Iterate over partitions in particular cache group.
+                while (iter.hasNext()) {
+                    GridDhtLocalPartition part = iter.next();
+
+                    // Partition can be in MOVING\RENTING states.
+                    // Index partition will be excluded if not all partition OWNING.
+                    // There is no data assigned to partition, thus it haven't been created yet.
+                    if (part.state() == GridDhtPartitionState.OWNING)
+                        owning.add(part.id());
+                    else
+                        missed.add(part.id());
+                }
+
+                if (grpParts != null) {
+                    // Partition has been provided for cache group, but some of them are not in OWNING state.
+                    // Exit with an error.
+                    if (!missed.isEmpty()) {
+                        throw new IgniteCheckedException("Snapshot operation cancelled due to " +
+                            "not all of requested partitions has OWNING state on local node [grpId=" + grpId +
+                            ", missed" + missed + ']');
+                    }
+                }
+                else {
+                    // Partitions has not been provided for snapshot task and all partitions have
+                    // OWNING state, so index partition must be included into snapshot.
+                    if (!missed.isEmpty()) {
+                        log.warning("All local cache group partitions in OWNING state have been included into a snapshot. " +
+                            "Partitions which have different states skipped. Index partitions has also been skipped " +
+                            "[snpName=" + snpName + ", grpId=" + grpId + ", missed=" + missed + ']');
+                    }
+                    else if (missed.isEmpty() && cctx.kernalContext().query().moduleEnabled())
+                        owning.add(INDEX_PARTITION);
+                }
+
+                processed.put(grpId, owning);
+            }
+
+            for (Map.Entry<Integer, Set<Integer>> e : processed.entrySet()) {
+                int grpId = e.getKey();
+
+                CacheGroupContext gctx = cctx.cache().cacheGroup(grpId);
+
+                if (gctx == null) {
+                    throw new IgniteCheckedException("Cache group context has not found " +
+                        "due to the cache group is stopped: " + grpId);
+                }
+
+                for (int partId : e.getValue()) {
+                    GroupPartitionId pair = new GroupPartitionId(grpId, partId);
+
+                    PageStore store = ((FilePageStoreManager)cctx.pageStore()).getStore(grpId, partId);
+
+                    partDeltaWriters.put(pair,
+                        new PageStoreSerialWriter(store,
+                            partDeltaFile(cacheWorkDir(tmpSnpDir, cacheDirName(gctx.config())), partId)));
+
+                    partFileLengths.put(pair, store.size());
+                }
+            }
+        }
+        catch (IgniteCheckedException e) {
+            acceptException(e);
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onCheckpointBegin(Context ctx) {
+        if (stopping())
+            return;
+
+        // Snapshot task is now started since checkpoint write lock released.
+        if (!startedFut.onDone())
+            return;
+
+        assert !processed.isEmpty() : "Partitions to process must be collected under checkpoint mark phase";
+
+        wrapExceptionIfStarted(() -> snpSndr.init(processed.values().stream().mapToInt(Set::size).sum()))
 
 Review comment:
   Do we really need to start other futures in case of failure on init?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r408971159
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManager.java
 ##########
 @@ -0,0 +1,1986 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.channels.FileChannel;
+import java.nio.file.FileVisitResult;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.SimpleFileVisitor;
+import java.nio.file.attribute.BasicFileAttributes;
+import java.util.ArrayDeque;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Queue;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.Executor;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.concurrent.atomic.LongAdder;
+import java.util.function.BiConsumer;
+import java.util.function.BiFunction;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.events.DiscoveryEvent;
+import org.apache.ignite.failure.FailureContext;
+import org.apache.ignite.failure.FailureType;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.GridTopic;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.events.DiscoveryCustomEvent;
+import org.apache.ignite.internal.managers.communication.GridIoManager;
+import org.apache.ignite.internal.managers.communication.GridMessageListener;
+import org.apache.ignite.internal.managers.communication.TransmissionCancelledException;
+import org.apache.ignite.internal.managers.communication.TransmissionHandler;
+import org.apache.ignite.internal.managers.communication.TransmissionMeta;
+import org.apache.ignite.internal.managers.communication.TransmissionPolicy;
+import org.apache.ignite.internal.managers.discovery.DiscoCache;
+import org.apache.ignite.internal.managers.discovery.DiscoveryCustomMessage;
+import org.apache.ignite.internal.managers.discovery.GridDiscoveryManager;
+import org.apache.ignite.internal.managers.eventstorage.DiscoveryEventListener;
+import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.CacheType;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.PartitionsExchangeAware;
+import org.apache.ignite.internal.processors.cache.persistence.StorageException;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.filename.PdsFolderSettings;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.MetastorageLifecycleListener;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadOnlyMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadWriteMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.tree.io.PageIO;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.processors.marshaller.MappedName;
+import org.apache.ignite.internal.processors.metric.MetricRegistry;
+import org.apache.ignite.internal.processors.metric.impl.LongAdderMetric;
+import org.apache.ignite.internal.util.GridBusyLock;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.distributed.InitMessage;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.tostring.GridToStringInclude;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.A;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.S;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.lang.IgniteUuid;
+import org.apache.ignite.thread.IgniteThreadPoolExecutor;
+import org.apache.ignite.thread.OomExceptionHandler;
+import org.jetbrains.annotations.Nullable;
+
+import static java.nio.file.StandardOpenOption.READ;
+import static org.apache.ignite.cluster.ClusterState.active;
+import static org.apache.ignite.configuration.IgniteConfiguration.DFLT_SNAPSHOT_DIRECTORY;
+import static org.apache.ignite.events.EventType.EVT_NODE_FAILED;
+import static org.apache.ignite.events.EventType.EVT_NODE_LEFT;
+import static org.apache.ignite.internal.IgniteFeatures.PERSISTENCE_CACHE_SNAPSHOT;
+import static org.apache.ignite.internal.IgniteFeatures.nodeSupports;
+import static org.apache.ignite.internal.MarshallerContextImpl.mappingFileStoreWorkDir;
+import static org.apache.ignite.internal.MarshallerContextImpl.saveMappings;
+import static org.apache.ignite.internal.events.DiscoveryCustomEvent.EVT_DISCOVERY_CUSTOM_EVT;
+import static org.apache.ignite.internal.managers.communication.GridIoPolicy.SYSTEM_POOL;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.INDEX_PARTITION;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.MAX_PARTITION_ID;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.resolveBinaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.INDEX_FILE_NAME;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.PART_FILE_TEMPLATE;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFile;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFileName;
+import static org.apache.ignite.internal.processors.cache.persistence.filename.PdsConsistentIdProcessor.DB_DEFAULT_FOLDER;
+import static org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId.getFlagByPartId;
+import static org.apache.ignite.internal.util.IgniteUtils.isLocalNodeCoordinator;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.END_SNAPSHOT;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.START_SNAPSHOT;
+
+/**
+ * Internal implementation of snapshot operations over persistence caches.
+ * <p>
+ * There are two major actions available:
+ * <ul>
+ *     <li>Create snapshot of the whole cluster cache groups by triggering PME to achieve consistency.</li>
+ *     <li>Create local snapshot of requested cache groups and send it to the node which request this operation.
+ *     Cache groups will be transmitted using internal API for transferring files. See {@link TransmissionHandler}.</li>
+ * </ul>
+ */
+public class IgniteSnapshotManager extends GridCacheSharedManagerAdapter
+    implements IgniteSnapshot, PartitionsExchangeAware, MetastorageLifecycleListener {
+    /** File with delta pages suffix. */
+    public static final String DELTA_SUFFIX = ".delta";
+
+    /** File name template consists of delta pages. */
+    public static final String PART_DELTA_TEMPLATE = PART_FILE_TEMPLATE + DELTA_SUFFIX;
+
+    /** File name template for index delta pages. */
+    public static final String INDEX_DELTA_NAME = INDEX_FILE_NAME + DELTA_SUFFIX;
+
+    /** Text Reason for checkpoint to start snapshot operation. */
+    public static final String CP_SNAPSHOT_REASON = "Checkpoint started to enforce snapshot operation: %s";
+
+    /** Name prefix for each remote snapshot operation. */
+    public static final String RMT_SNAPSHOT_PREFIX = "snapshot_";
+
+    /** Default snapshot directory for loading remote snapshots. */
+    public static final String DFLT_SNAPSHOT_TMP_DIR = "snp";
+
+    /** Timeout in millisecond for snapshot operations. */
+    public static final long DFLT_SNAPSHOT_TIMEOUT = 15_000L;
+
+    /** Snapshot in progress error message. */
+    public static final String SNP_IN_PROGRESS_ERR_MSG = "Operation rejected due to the snapshot operation in progress.";
+
+    /** Error message to finalize snapshot tasks. */
+    public static final String SNP_NODE_STOPPING_ERR_MSG = "Snapshot has been cancelled due to the local node " +
+        "is stopping";
+
+    /** Metastorage key to save currently running snapshot. */
+    public static final String SNP_RUNNING_KEY = "snapshot-running";
+
+    /** Snapshot metrics prefix. */
+    public static final String SNAPSHOT_METRICS = "snapshot";
+
+    /** Prefix for snapshot threads. */
+    private static final String SNAPSHOT_RUNNER_THREAD_PREFIX = "snapshot-runner";
+
+    /** Total number of thread to perform local snapshot. */
+    private static final int SNAPSHOT_THREAD_POOL_SIZE = 4;
+
+    /** Default snapshot topic to receive snapshots from remote node. */
+    private static final Object DFLT_INITIAL_SNAPSHOT_TOPIC = GridTopic.TOPIC_SNAPSHOT.topic("rmt_snp");
+
+    /** File transmission parameter of cache group id. */
+    private static final String SNP_GRP_ID_PARAM = "grpId";
+
+    /** File transmission parameter of cache partition id. */
+    private static final String SNP_PART_ID_PARAM = "partId";
+
+    /** File transmission parameter of node-sender directory path with its consistentId (e.g. db/IgniteNode0). */
+    private static final String SNP_DB_NODE_PATH_PARAM = "dbNodePath";
+
+    /** File transmission parameter of a cache directory with is currently sends its partitions. */
+    private static final String SNP_CACHE_DIR_NAME_PARAM = "cacheDirName";
+
+    /** Snapshot parameter name for a file transmission. */
+    private static final String SNP_NAME_PARAM = "snpName";
+
+    /** Total snapshot files count which receiver should expect to receive. */
+    private static final String SNP_PARTITIONS_CNT = "partsCnt";
+
+    /**
+     * Local buffer to perform copy-on-write operations with pages for {@code SnapshotFutureTask.PageStoreSerialWriter}s.
+     * It is important to have only only buffer per thread (instead of creating each buffer per
+     * each {@code SnapshotFutureTask.PageStoreSerialWriter}) this is redundant and can lead to OOM errors. Direct buffer
+     * deallocate only when ByteBuffer is garbage collected, but it can get out of off-heap memory before it.
+     */
+    private final ThreadLocal<ByteBuffer> locBuff;
+
+    /** Map of registered cache snapshot processes and their corresponding contexts. */
+    private final ConcurrentMap<String, SnapshotFutureTask> locSnpTasks = new ConcurrentHashMap<>();
+
+    /** Lock to protect the resources is used. */
+    private final GridBusyLock busyLock = new GridBusyLock();
+
+    /** Requested snapshot from remote node. */
+    private final AtomicReference<RemoteSnapshotFuture> rmtSnpReq = new AtomicReference<>();
+
+    /** Mutex used to order cluster snapshot operation progress. */
+    private final Object snpOpMux = new Object();
+
+    /** Take snapshot operation procedure. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> startSnpProc;
+
+    /** Check previously performed snapshot operation and delete uncompleted files if need. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> endSnpProc;
+
+    /** Resolved persistent data storage settings. */
+    private volatile PdsFolderSettings pdsSettings;
+
+    /** Fully initialized metastorage. */
+    private volatile ReadWriteMetastorage metaStorage;
+
+    /** Local snapshot sender factory. */
+    private Function<String, SnapshotSender> locSndrFactory = localSnapshotSenderFactory();
+
+    /** Main snapshot directory to save created snapshots. */
+    private volatile File locSnpDir;
+
+    /**
+     * Working directory for loaded snapshots from the remote nodes and storing
+     * temporary partition delta-files of locally started snapshot process.
+     */
+    private File tmpWorkDir;
+
+    /** Factory to working with delta as file storage. */
+    private volatile FileIOFactory ioFactory = new RandomAccessFileIOFactory();
+
+    /** Factory to create page store for restore. */
+    private volatile BiFunction<Integer, Boolean, FilePageStoreFactory> storeFactory;
+
+    /** Snapshot thread pool to perform local partition snapshots. */
+    private ExecutorService snpRunner;
+
+    /** System discovery message listener. */
+    private DiscoveryEventListener discoLsnr;
+
+    /** Cluster snapshot operation requested by user. */
+    private ClusterSnapshotFuture clusterSnpFut;
+
+    /** Current snapshot operation on local node. */
+    private volatile SnapshotOperationRequest clusterSnpRq;
+
+    /** {@code true} if recovery process occurred for snapshot. */
+    private volatile boolean recovered;
+
+    /** Last seen cluster snapshot operation. */
+    private volatile ClusterSnapshotFuture lastSeenSnpFut = new ClusterSnapshotFuture();
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public IgniteSnapshotManager(GridKernalContext ctx) {
+        locBuff = ThreadLocal.withInitial(() ->
+            ByteBuffer.allocateDirect(ctx.config().getDataStorageConfiguration().getPageSize())
+                .order(ByteOrder.nativeOrder()));
+
+        startSnpProc = new DistributedProcess<>(ctx, START_SNAPSHOT, this::initLocalSnapshotStartStage,
+            this::processLocalSnapshotStartStageResult);
+
+        endSnpProc = new DistributedProcess<>(ctx, END_SNAPSHOT, this::initLocalSnapshotEndStage,
+            this::processLocalSnapshotEndStageResult);
+    }
+
+    /**
+     * @param snapshotCacheDir Snapshot directory to store files.
+     * @param partId Cache partition identifier.
+     * @return A file representation.
+     */
+    public static File partDeltaFile(File snapshotCacheDir, int partId) {
+        return new File(snapshotCacheDir, partDeltaFileName(partId));
+    }
+
+    /**
+     * @param partId Partition id.
+     * @return File name of delta partition pages.
+     */
+    public static String partDeltaFileName(int partId) {
+        assert partId <= MAX_PARTITION_ID || partId == INDEX_PARTITION;
+
+        return partId == INDEX_PARTITION ? INDEX_DELTA_NAME : String.format(PART_DELTA_TEMPLATE, partId);
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void start0() throws IgniteCheckedException {
+        super.start0();
+
+        GridKernalContext ctx = cctx.kernalContext();
+
+        if (ctx.clientNode())
+            return;
+
+        if (!CU.isPersistenceEnabled(ctx.config()))
+            return;
+
+        snpRunner = new IgniteThreadPoolExecutor(SNAPSHOT_RUNNER_THREAD_PREFIX,
+            cctx.igniteInstanceName(),
+            SNAPSHOT_THREAD_POOL_SIZE,
+            SNAPSHOT_THREAD_POOL_SIZE,
+            IgniteConfiguration.DFLT_THREAD_KEEP_ALIVE_TIME,
+            new LinkedBlockingQueue<>(),
+            SYSTEM_POOL,
+            new OomExceptionHandler(ctx));
+
+        assert cctx.pageStore() instanceof FilePageStoreManager;
+
+        FilePageStoreManager storeMgr = (FilePageStoreManager)cctx.pageStore();
+
+        pdsSettings = cctx.kernalContext().pdsFolderResolver().resolveFolders();
+
+        locSnpDir = resolveSnapshotWorkDirectory(ctx.config());
+        tmpWorkDir = Paths.get(storeMgr.workDir().getAbsolutePath(), DFLT_SNAPSHOT_TMP_DIR).toFile();
+
+        U.ensureDirectory(locSnpDir, "snapshot work directory", log);
+        U.ensureDirectory(tmpWorkDir, "temp directory for snapshot creation", log);
+
+        MetricRegistry mreg = cctx.kernalContext().metric().registry(SNAPSHOT_METRICS);
+
+        mreg.register("LastSnapshotStartTime", () -> lastSeenSnpFut.startTime,
+            "The system time approximated by 10 ms when the last cluster snapshot operation has been started.");
+        mreg.register("LastSnapshotEndTime", () -> lastSeenSnpFut.endTime,
+            "The system time approximated by 10 ms when the last cluster snapshot operation has been finished.");
+        mreg.register("LastSnapshotName", () -> lastSeenSnpFut.name, String.class,
+            "The name of last started cluster snapshot operation.");
+        mreg.register("LastSnapshotErrorMessage",
+            () -> lastSeenSnpFut.error() == null ? null : lastSeenSnpFut.error().getMessage(),
+            String.class,
+            "The error message of last started cluster snapshot operation which fail. This value will be 'null' " +
+                "if last snapshot operation completed successfully.");
+        mreg.register("localSnapshotList", this::getSnapshots, List.class,
+            "The list of all known snapshots currently saved on the local node with respect to " +
+                "configured via IgniteConfiguration a snapshot path.");
+
+        storeFactory = storeMgr::getPageStoreFactory;
+
+        cctx.exchange().registerExchangeAwareComponent(this);
+        ctx.internalSubscriptionProcessor().registerMetastorageListener(this);
+
+        // Receive remote snapshots requests.
+        cctx.gridIO().addMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC, new GridMessageListener() {
+            @Override public void onMessage(UUID nodeId, Object msg, byte plc) {
+                if (!busyLock.enterBusy())
+                    return;
+
+                try {
+                    if (msg instanceof SnapshotRequestMessage) {
+                        SnapshotRequestMessage reqMsg0 = (SnapshotRequestMessage)msg;
+                        String snpName = reqMsg0.snapshotName();
+
+                        synchronized (this) {
+                            SnapshotFutureTask task = lastScheduledRemoteSnapshotTask(nodeId);
+
+                            if (task != null) {
+                                // Task will also be removed from local map due to the listener on future done.
+                                task.cancel();
+
+                                log.info("Snapshot request has been cancelled due to another request received " +
+                                    "[prevSnpResp=" + task + ", msg0=" + reqMsg0 + ']');
+                            }
+                        }
+
+                        SnapshotFutureTask task = registerSnapshotTask(snpName,
+                            nodeId,
+                            reqMsg0.parts(),
+                            remoteSnapshotSender(snpName, nodeId));
+
+                        task.listen(f -> {
+                            if (f.error() == null)
+                                return;
+
+                            U.error(log, "Failed to process request of creating a snapshot " +
+                                "[from=" + nodeId + ", msg=" + reqMsg0 + ']', f.error());
+
+                            try {
+                                cctx.gridIO().sendToCustomTopic(nodeId,
+                                    DFLT_INITIAL_SNAPSHOT_TOPIC,
+                                    new SnapshotResponseMessage(reqMsg0.snapshotName(), f.error().getMessage()),
+                                    SYSTEM_POOL);
+                            }
+                            catch (IgniteCheckedException ex0) {
+                                U.error(log, "Fail to send the response message with processing snapshot request " +
+                                    "error [request=" + reqMsg0 + ", nodeId=" + nodeId + ']', ex0);
+                            }
+                        });
+
+                        task.start();
+                    }
+                    else if (msg instanceof SnapshotResponseMessage) {
+                        SnapshotResponseMessage respMsg0 = (SnapshotResponseMessage)msg;
+
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.snpName.equals(respMsg0.snapshotName())) {
+                            if (log.isInfoEnabled()) {
+                                log.info("A stale snapshot response message has been received. Will be ignored " +
+                                    "[fromNodeId=" + nodeId + ", response=" + respMsg0 + ']');
+                            }
+
+                            return;
+                        }
+
+                        if (respMsg0.errorMessage() != null) {
+                            fut0.onDone(new IgniteCheckedException("Request cancelled. The snapshot operation stopped " +
+                                "on the remote node with an error: " + respMsg0.errorMessage()));
+                        }
+                    }
+                }
+                catch (Throwable e) {
+                    U.error(log, "Processing snapshot request from remote node fails with an error", e);
+
+                    cctx.kernalContext().failure().process(new FailureContext(FailureType.CRITICAL_ERROR, e));
+                }
+                finally {
+                    busyLock.leaveBusy();
+                }
+            }
+        });
+
+        cctx.gridEvents().addDiscoveryEventListener(discoLsnr = (evt, discoCache) -> {
+            if (!busyLock.enterBusy())
+                return;
+
+            try {
+                UUID leftNodeId = evt.eventNode().id();
+
+                if (evt.type() == EVT_DISCOVERY_CUSTOM_EVT) {
+                    DiscoveryCustomEvent evt0 = (DiscoveryCustomEvent)evt;
+
+                    if (evt0.customMessage() instanceof InitMessage) {
+                        InitMessage<?> msg = (InitMessage<?>)evt0.customMessage();
+
+                        // This happens when #takeSnapshot() method already invoked and distributed process
+                        // starts its action.
+                        if (msg.type() == START_SNAPSHOT.ordinal()) {
+                            assert clusterSnpRq != null ||
+                                !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()) : evt;
+
+                            DiscoveryCustomEvent customEvt = new DiscoveryCustomEvent();
+
+                            customEvt.node(evt0.node());
+                            customEvt.eventNode(evt0.eventNode());
+                            customEvt.affinityTopologyVersion(evt0.affinityTopologyVersion());
+                            customEvt.customMessage(new SnapshotStartDiscoveryMessage(discoCache, msg.processId()));
+
+                            // Handle new event inside discovery thread, so no guarantees will be violated.
+                            cctx.exchange().onDiscoveryEvent(customEvt, discoCache);
+                        }
+                    }
+                }
+                else if (evt.type() == EVT_NODE_LEFT || evt.type() == EVT_NODE_FAILED) {
+                    SnapshotOperationRequest snpRq = clusterSnpRq;
+
+                    for (SnapshotFutureTask sctx : locSnpTasks.values()) {
+                        if (sctx.sourceNodeId().equals(leftNodeId) ||
+                            (snpRq != null &&
+                                snpRq.snpName.equals(sctx.snapshotName()) &&
+                                snpRq.bltNodes.contains(leftNodeId))) {
+                            sctx.acceptException(new ClusterTopologyCheckedException("The node which requested snapshot " +
+                                "creation has left the grid"));
+                        }
+                    }
+
+                    RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                    if (snpTrFut != null && snpTrFut.rmtNodeId.equals(leftNodeId)) {
+                        snpTrFut.onDone(new ClusterTopologyCheckedException("The node from which a snapshot has been " +
+                            "requested left the grid"));
+                    }
+                }
+            }
+            finally {
+                busyLock.leaveBusy();
+            }
+        }, EVT_NODE_LEFT, EVT_NODE_FAILED, EVT_DISCOVERY_CUSTOM_EVT);
+
+        // Remote snapshot handler.
+        cctx.kernalContext().io().addTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC, new TransmissionHandler() {
+            @Override public void onEnd(UUID nodeId) {
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                assert snpTrFut.stores.isEmpty() : snpTrFut.stores.entrySet();
+                assert snpTrFut.partsLeft == 0 : snpTrFut;
+
+                snpTrFut.onDone();
+
+                log.info("Requested snapshot from remote node has been fully received " +
+                    "[snpName=" + snpTrFut.snpName + ", snpTrans=" + snpTrFut + ']');
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onException(UUID nodeId, Throwable err) {
+                RemoteSnapshotFuture fut = rmtSnpReq.get();
+
+                if (fut == null)
+                    return;
+
+                if (fut.rmtNodeId.equals(nodeId))
+                    fut.onDone(err);
+            }
+
+            /** {@inheritDoc} */
+            @Override public String filePath(UUID nodeId, TransmissionMeta fileMeta) {
+                Integer partId = (Integer)fileMeta.params().get(SNP_PART_ID_PARAM);
+                String rmtDbNodePath = (String)fileMeta.params().get(SNP_DB_NODE_PATH_PARAM);
+                String cacheDirName = (String)fileMeta.params().get(SNP_CACHE_DIR_NAME_PARAM);
+
+                RemoteSnapshotFuture transFut = resolve(nodeId, fileMeta);
+
+                try {
+                    File cacheDir = U.resolveWorkDirectory(tmpWorkDir.getAbsolutePath(),
+                        Paths.get(transFut.snpName, rmtDbNodePath, cacheDirName).toString(),
+                        false);
+
+                    return new File(cacheDir, getPartitionFileName(partId)).getAbsolutePath();
+                }
+                catch (IgniteCheckedException e) {
+                    throw new IgniteException(e);
+                }
+            }
+
+            /**
+             * @param nodeId Remote node id.
+             * @param meta Transmission meta.
+             * @return Resolved transmission future.
+             */
+            private RemoteSnapshotFuture resolve(UUID nodeId, TransmissionMeta meta) {
+                String snpName = (String)meta.params().get(SNP_NAME_PARAM);
+                Integer partsCnt = (Integer)meta.params().get(SNP_PARTITIONS_CNT);
+
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                if (snpTrFut == null || !snpTrFut.snpName.equals(snpName)) {
+                    throw new TransmissionCancelledException("Stale snapshot transmission will be ignored " +
+                        "[snpName=" + snpName + ", meta=" + meta + ", snpTrFut=" + snpTrFut + ']');
+                }
+
+                assert snpTrFut.snpName.equals(snpName) && snpTrFut.rmtNodeId.equals(nodeId) :
+                    "Another transmission in progress [snpTrFut=" + snpTrFut + ", nodeId=" + snpName + ']';
+
+                if (snpTrFut.partsLeft == -1)
+                    snpTrFut.partsLeft = partsCnt;
+
+                return snpTrFut;
+            }
+
+            /**
+             * @param snpTrans Current snapshot transmission.
+             * @param grpPartId Pair of group id and its partition id.
+             */
+            private void finishRecover(RemoteSnapshotFuture snpTrans, GroupPartitionId grpPartId) {
+                FilePageStore pageStore = null;
+
+                try {
+                    pageStore = snpTrans.stores.remove(grpPartId);
+
+                    pageStore.finishRecover();
+
+                    snpTrans.partConsumer.accept(new File(pageStore.getFileAbsolutePath()), grpPartId);
+
+                    snpTrans.partsLeft--;
+                }
+                catch (StorageException e) {
+                    throw new IgniteException(e);
+                }
+                finally {
+                    U.closeQuiet(pageStore);
+                }
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<ByteBuffer> chunkHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+
+                RemoteSnapshotFuture snpTrFut = resolve(nodeId, initMeta);
+
+                GroupPartitionId grpPartId = new GroupPartitionId(grpId, partId);
+                FilePageStore pageStore = snpTrFut.stores.get(grpPartId);
+
+                if (pageStore == null) {
+                    throw new IgniteException("Partition must be loaded before applying snapshot delta pages " +
+                        "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                pageStore.beginRecover();
+
+                // No snapshot delta pages received. Finalize recovery.
+                if (initMeta.count() == 0)
+                    finishRecover(snpTrFut, grpPartId);
+
+                return new Consumer<ByteBuffer>() {
+                    final LongAdder transferred = new LongAdder();
+
+                    @Override public void accept(ByteBuffer buff) {
+                        try {
+                            assert initMeta.count() != 0 : initMeta;
+
+                            RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                            if (fut0 == null || !fut0.equals(snpTrFut) || fut0.isCancelled()) {
+                                throw new TransmissionCancelledException("Snapshot request is cancelled " +
+                                    "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                            }
+
+                            pageStore.write(PageIO.getPageId(buff), buff, 0, false);
+
+                            transferred.add(buff.capacity());
+
+                            if (transferred.longValue() == initMeta.count())
+                                finishRecover(snpTrFut, grpPartId);
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                    }
+                };
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<File> fileHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+                String snpName = (String)initMeta.params().get(SNP_NAME_PARAM);
+
+                assert grpId != null;
+                assert partId != null;
+                assert snpName != null;
+                assert storeFactory != null;
+
+                RemoteSnapshotFuture transFut = rmtSnpReq.get();
+
+                if (transFut == null) {
+                    throw new IgniteException("Snapshot transmission with given name doesn't exists " +
+                        "[snpName=" + snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                return new Consumer<File>() {
+                    @Override public void accept(File file) {
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.equals(transFut) || fut0.isCancelled()) {
+                            throw new TransmissionCancelledException("Snapshot request is cancelled [snpName=" + snpName +
+                                ", grpId=" + grpId + ", partId=" + partId + ']');
+                        }
+
+                        busyLock.enterBusy();
+
+                        try {
+                            FilePageStore pageStore = (FilePageStore)storeFactory
+                                .apply(grpId, false)
+                                .createPageStore(getFlagByPartId(partId),
+                                    file::toPath,
+                                    new LongAdderMetric("NO_OP", null));
+
+                            transFut.stores.put(new GroupPartitionId(grpId, partId), pageStore);
+
+                            pageStore.init();
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                        finally {
+                            busyLock.leaveBusy();
+                        }
+                    }
+                };
+            }
+        });
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void stop0(boolean cancel) {
+        busyLock.block();
+
+        try {
+            // Try stop all snapshot processing if not yet.
+            for (SnapshotFutureTask sctx : locSnpTasks.values())
+                sctx.acceptException(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+            locSnpTasks.clear();
+
+            RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+            if (snpTrFut != null)
+                snpTrFut.cancel();
+
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null) {
+                    clusterSnpFut.onDone(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+                    clusterSnpFut = null;
+                }
+            }
+
+            if (snpRunner != null)
+                snpRunner.shutdownNow();
+
+            cctx.kernalContext().io().removeMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC);
+            cctx.kernalContext().io().removeTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC);
+
+            if (discoLsnr != null)
+                cctx.kernalContext().event().removeDiscoveryEventListener(discoLsnr);
+
+            cctx.exchange().unregisterExchangeAwareComponent(this);
+        }
+        finally {
+            busyLock.unblock();
+        }
+    }
+
+    /**
+     * @param snpDir Snapshot dire
+     * @param folderName Local node folder name (see {@link U#maskForFileName} with consistent id).
+     */
+    public static void deleteSnapshot(File snpDir, String folderName) {
+        if (!snpDir.exists())
+            return;
+
+        assert snpDir.isDirectory() : snpDir;
+
+        try {
+            File binDir = resolveBinaryWorkDir(snpDir.getAbsolutePath(), folderName);
+            File dbDir = U.resolveWorkDirectory(snpDir.getAbsolutePath(), databaseRelativePath(folderName), false);
+
+            U.delete(binDir);
+            U.delete(dbDir);
+
+            File marshDir = mappingFileStoreWorkDir(snpDir.getAbsolutePath());
+
+            // Concurrently traverse the snapshot marshaller directory and delete all files.
+            Files.walkFileTree(marshDir.toPath(), new SimpleFileVisitor<Path>() {
+                @Override public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException {
+                    U.delete(file);
+
+                    return FileVisitResult.CONTINUE;
+                }
+
+                @Override public FileVisitResult visitFileFailed(Path file, IOException exc) {
+                    // Skip files which can be concurrently removed from FileTree.
+                    return FileVisitResult.CONTINUE;
+                }
+            });
+
+            File db = new File(snpDir, DB_DEFAULT_FOLDER);
+
+            if (!db.exists() || db.list().length == 0)
+                U.delete(snpDir);
+        }
+        catch (IOException | IgniteCheckedException e) {
+            throw new IgniteException(e);
+        }
+    }
+
+    /**
+     * @param snpName Snapshot name.
+     * @return Local snapshot directory for snapshot with given name.
+     */
+    public File snapshotLocalDir(String snpName) {
+        assert locSnpDir != null;
+
+        return new File(locSnpDir, snpName);
+    }
+
+    /**
+     * @return Node snapshot working directory.
+     */
+    public File snapshotTmpDir() {
+        assert tmpWorkDir != null;
+
+        return tmpWorkDir;
+    }
+
+    /**
+     * @param req Request on snapshot creation.
+     * @return Future which will be completed when a snapshot has been started.
+     */
+    private IgniteInternalFuture<SnapshotOperationResponse> initLocalSnapshotStartStage(SnapshotOperationRequest req) {
+        if (cctx.kernalContext().clientNode() ||
+            !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()))
+            return new GridFinishedFuture<>();
+
+        // Executed inside discovery notifier thread, prior to firing discovery custom event,
+        // so it is safe to set new snapshot task inside this method without synchronization.
+        if (clusterSnpRq != null) {
+            return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot operation has been rejected. " +
+                "Another snapshot operation in progress [req=" + req + ", curr=" + clusterSnpRq + ']'));
+        }
+
+        // Collection of pairs group and appropriate cache partition to be snapshot.
+        Map<Integer, Set<Integer>> parts = new HashMap<>();
+
+        for (Integer grpId : req.grpIds)
+            parts.put(grpId, null);
+
+        SnapshotFutureTask task0 = registerSnapshotTask(req.snpName,
+            req.srcNodeId,
+            parts,
+            locSndrFactory.apply(req.snpName));
+
+        clusterSnpRq = req;
+
+        return task0.chain(f -> new SnapshotOperationResponse());
+    }
+
+    /**
+     * @param id Request id.
+     * @param res Results.
+     * @param err Errors.
+     */
+    private void processLocalSnapshotStartStageResult(UUID id, Map<UUID, SnapshotOperationResponse> res, Map<UUID, Exception> err) {
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        if (snpRq == null)
+            return;
+
+        if (!snpRq.rqId.equals(id)) {
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null && clusterSnpFut.rqId.equals(id)) {
+                    clusterSnpFut.onDone(new IgniteCheckedException("Snapshot operation failed due to another snapshot " +
+                        "operation in progress: " + snpRq.snpName));
+
+                    clusterSnpFut = null;
+                }
+
+                return;
+            }
+        }
+
+        if (isLocalNodeCoordinator(cctx.discovery())) {
+            Set<UUID> missed = new HashSet<>(snpRq.bltNodes);
+            missed.removeAll(res.keySet());
+            missed.removeAll(err.keySet());
+
+            snpRq.hasErr = !F.isEmpty(err) || !missed.isEmpty();
+
+            if (snpRq.hasErr) {
+                U.warn(log, "Execution of local snapshot tasks fails or them haven't been executed " +
+                    "due to some of nodes left the cluster. Uncompleted snapshot will be deleted " +
+                    "[err=" + err + ", missed=" + missed + ']');
+            }
+
+            endSnpProc.start(UUID.randomUUID(), snpRq);
+        }
+    }
+
+    /**
+     * @param req Request on snapshot creation.
+     * @return Future which will be completed when the snapshot will be finalized.
+     */
+    private IgniteInternalFuture<SnapshotOperationResponse> initLocalSnapshotEndStage(SnapshotOperationRequest req) {
+        if (clusterSnpRq == null)
+            return new GridFinishedFuture<>(new SnapshotOperationResponse());
+
+        try {
+            if (req.hasErr)
+                deleteSnapshot(snapshotLocalDir(req.snpName), pdsSettings.folderName());
+
+            removeLastMetaStorageKey();
+        }
+        catch (Exception e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        return new GridFinishedFuture<>(new SnapshotOperationResponse());
+    }
+
+    /**
+     * @param id Request id.
+     * @param res Results.
+     * @param err Errors.
+     */
+    private void processLocalSnapshotEndStageResult(UUID id, Map<UUID, SnapshotOperationResponse> res, Map<UUID, Exception> err) {
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        if (snpRq == null)
+            return;
+
+        Set<UUID> endFail = new HashSet<>(snpRq.bltNodes);
+        endFail.removeAll(res.keySet());
+
+        clusterSnpRq = null;
+
+        synchronized (snpOpMux) {
+            if (clusterSnpFut != null) {
+                if (endFail.isEmpty() && !snpRq.hasErr) {
+                    clusterSnpFut.onDone();
+
+                    if (log.isInfoEnabled())
+                        log.info("Cluster-wide snapshot operation finished successfully [req=" + snpRq + ']');
+                }
+                else {
+                    clusterSnpFut.onDone(new IgniteCheckedException("Snapshot creation has been finished with an error. " +
+                        "Local snapshot tasks may not finished completely or finalizing results fails " +
+                        "[hasErr" + snpRq.hasErr + ", fail=" + endFail + ']'));
+                }
+
+                clusterSnpFut = null;
+            }
+        }
+    }
+
+    /**
+     * @return {@code True} if snapshot operation is in progress.
+     */
+    public boolean isSnapshotCreating() {
+        if (clusterSnpRq != null)
+            return true;
+
+        synchronized (snpOpMux) {
+            return clusterSnpRq != null || clusterSnpFut != null;
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public List<String> getSnapshots() {
+        if (cctx.kernalContext().clientNode())
+            throw new UnsupportedOperationException("Client and daemon nodes can not perform this operation.");
+
+        synchronized (snpOpMux) {
+            return Arrays.stream(locSnpDir.listFiles(File::isDirectory))
+                .map(File::getName)
+                .collect(Collectors.toList());
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public IgniteFuture<Void> createSnapshot(String name) {
+        A.notNullOrEmpty(name, "name");
+
+        try {
+            if (cctx.kernalContext().clientNode())
+                throw new UnsupportedOperationException("Client and daemon nodes can not perform this operation.");
+
+            if (!IgniteFeatures.allNodesSupports(cctx.discovery().allNodes(), PERSISTENCE_CACHE_SNAPSHOT))
+                throw new IgniteException("Not all nodes in the cluster support a snapshot operation.");
+
+            if (!active(cctx.kernalContext().state().clusterState().state()))
+                throw new IgniteException("Snapshot operation has been rejected. The cluster is inactive.");
+
+            DiscoveryDataClusterState clusterState = cctx.kernalContext().state().clusterState();
+
+            if (!clusterState.hasBaselineTopology())
+                throw new IgniteException("Snapshot operation has been rejected. The baseline topology is not configured for cluster.");
+
+            ClusterSnapshotFuture snpFut0;
+
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null && !clusterSnpFut.isDone())
+                    throw new IgniteException("Create snapshot request has been rejected. The previous snapshot operation was not completed.");
+
+                if (clusterSnpRq != null)
+                    throw new IgniteException("Create snapshot request has been rejected. Parallel snapshot processes are not allowed.");
+
+                if (getSnapshots().contains(name))
+                    throw new IgniteException("Create snapshot request has been rejected. Snapshot with given name already exists.");
+
+                snpFut0 = new ClusterSnapshotFuture(UUID.randomUUID(), name);
+
+                clusterSnpFut = snpFut0;
+                lastSeenSnpFut = snpFut0;
+            }
+
+            List<Integer> grps = cctx.cache().persistentGroups().stream()
+                .filter(g -> cctx.cache().cacheType(g.cacheOrGroupName()) == CacheType.USER)
+                .filter(g -> !g.config().isEncryptionEnabled())
+                .map(CacheGroupDescriptor::groupId)
+                .collect(Collectors.toList());
+
+            List<ClusterNode> srvNodes = cctx.discovery().serverNodes(AffinityTopologyVersion.NONE);
+
+            startSnpProc.start(snpFut0.rqId, new SnapshotOperationRequest(snpFut0.rqId,
+                cctx.localNodeId(),
+                name,
+                grps,
+                new HashSet<>(F.viewReadOnly(srvNodes,
+                    F.node2id(),
+                    (node) -> CU.baselineNode(node, clusterState)))));
+
+            if (log.isInfoEnabled())
+                log.info("Cluster-wide snapshot operation started [snpName=" + name + ", grps=" + grps + ']');
+
+            return new IgniteFutureImpl<>(snpFut0);
+        }
+        catch (Exception e) {
+            U.error(log, "Start snapshot operation failed", e);
+
+            lastSeenSnpFut = new ClusterSnapshotFuture(name, e);
+
+            return new IgniteFinishedFutureImpl<>(e);
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onReadyForReadWrite(ReadWriteMetastorage metaStorage) throws IgniteCheckedException {
+        synchronized (snpOpMux) {
+            this.metaStorage = metaStorage;
+
+            if (recovered)
+                removeLastMetaStorageKey();
+
+            recovered = false;
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onReadyForRead(ReadOnlyMetastorage metaStorage) throws IgniteCheckedException {
+        // Snapshot which has not been completed due to the local node crashed must be deleted.
+        String snpName = (String)metaStorage.read(SNP_RUNNING_KEY);
+
+        if (snpName == null)
+            return;
+
+        recovered = true;
+
+        for (File tmp : snapshotTmpDir().listFiles())
+            U.delete(tmp);
+
+        deleteSnapshot(snapshotLocalDir(snpName), pdsSettings.folderName());
+
+        if (log.isInfoEnabled()) {
+            log.info("Previous attempt to create snapshot fail due to the local node crash. All resources " +
+                "related to snapshot operation have been deleted: " + snpName);
+        }
+    }
+
+    /**
+     * @param evt Discovery event to check.
+     * @return {@code true} if exchange started by snapshot operation.
+     */
+    public static boolean isSnapshotOperation(DiscoveryEvent evt) {
+        return !evt.eventNode().isClient() &&
+            evt.type() == EVT_DISCOVERY_CUSTOM_EVT &&
+            ((DiscoveryCustomEvent)evt).customMessage() instanceof SnapshotStartDiscoveryMessage;
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onDoneBeforeTopologyUnlock(GridDhtPartitionsExchangeFuture fut) {
+        if (clusterSnpRq == null || cctx.kernalContext().clientNode())
+            return;
+
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        SnapshotFutureTask task = locSnpTasks.get(snpRq.snpName);
+
+        if (task == null)
+            return;
+
+        if (task.start()) {
+            cctx.database().forceCheckpoint(String.format("Start snapshot operation: %s", snpRq.snpName));
+
+            // schedule task on checkpoint and wait when it starts
+            try {
+                task.awaitStarted();
+            }
+            catch (IgniteCheckedException e) {
+                U.error(log, "Fail to wait while cluster-wide snapshot operation started", e);
+            }
+        }
+    }
+
+    /**
+     * @param parts Collection of pairs group and appropriate cache partition to be snapshot.
+     * @param rmtNodeId The remote node to connect to.
+     * @param partConsumer Received partition handler.
+     * @return Future which will be completed when requested snapshot fully received.
+     */
+    public IgniteInternalFuture<Void> createRemoteSnapshot(
+        UUID rmtNodeId,
+        Map<Integer, Set<Integer>> parts,
+        BiConsumer<File, GroupPartitionId> partConsumer
+    ) {
+        assert partConsumer != null;
+
+        ClusterNode rmtNode = cctx.discovery().node(rmtNodeId);
+
+        if (!nodeSupports(rmtNode, PERSISTENCE_CACHE_SNAPSHOT))
+            return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot on remote node is not supported: " + rmtNode.id()));
+
+        if (rmtNode == null) {
+            return new GridFinishedFuture<>(new ClusterTopologyCheckedException("Snapshot request cannot be performed. " +
+                "Remote node left the grid [rmtNodeId=" + rmtNodeId + ']'));
+        }
+
+        String snpName = RMT_SNAPSHOT_PREFIX + UUID.randomUUID().toString();
+
+        RemoteSnapshotFuture snpTransFut = new RemoteSnapshotFuture(rmtNodeId, snpName, partConsumer);
+
+        busyLock.enterBusy();
+        SnapshotRequestMessage msg0;
+
+        try {
+            msg0 = new SnapshotRequestMessage(snpName, parts);
+
+            RemoteSnapshotFuture fut = rmtSnpReq.get();
+
+            try {
+                if (fut != null)
+                    fut.get(DFLT_SNAPSHOT_TIMEOUT, TimeUnit.MILLISECONDS);
+            }
+            catch (IgniteCheckedException e) {
+                if (log.isInfoEnabled())
+                    log.info("The previous snapshot request finished with an exception:" + e.getMessage());
+            }
+
+            try {
+                if (rmtSnpReq.compareAndSet(null, snpTransFut)) {
+                    cctx.gridIO().sendOrderedMessage(rmtNode, DFLT_INITIAL_SNAPSHOT_TOPIC, msg0, SYSTEM_POOL,
+                        Long.MAX_VALUE, true);
+                }
+                else
+                    return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot request has been concurrently interrupted."));
+
+            }
+            catch (IgniteCheckedException e) {
+                rmtSnpReq.compareAndSet(snpTransFut, null);
+
+                return new GridFinishedFuture<>(e);
+            }
+        }
+        finally {
+            busyLock.leaveBusy();
+        }
+
+        if (log.isInfoEnabled()) {
+            log.info("Snapshot request is sent to the remote node [rmtNodeId=" + rmtNodeId +
+                ", msg0=" + msg0 + ", snpTransFut=" + snpTransFut +
+                ", topVer=" + cctx.discovery().topologyVersionEx() + ']');
+        }
+
+        return snpTransFut;
+    }
+
+    /**
+     * @param grps List of cache groups which will be destroyed.
+     */
+    public void onCacheGroupsStopped(List<Integer> grps) {
+        for (SnapshotFutureTask sctx : locSnpTasks.values()) {
+            Set<Integer> retain = new HashSet<>(grps);
+            retain.retainAll(sctx.affectedCacheGroups());
+
+            if (!retain.isEmpty()) {
+                sctx.acceptException(new IgniteCheckedException("Snapshot has been interrupted due to some of the required " +
+                    "cache groups stopped: " + retain));
+            }
+        }
+    }
+
+    /**
+     * @param snpName Unique snapshot name.
+     * @param srcNodeId Node id which cause snapshot operation.
+     * @param parts Collection of pairs group and appropriate cache partition to be snapshot.
+     * @param snpSndr Factory which produces snapshot receiver instance.
+     * @return Snapshot operation task which should be registered on checkpoint to run.
+     */
+    SnapshotFutureTask registerSnapshotTask(
+        String snpName,
+        UUID srcNodeId,
+        Map<Integer, Set<Integer>> parts,
+        SnapshotSender snpSndr
+    ) {
+        if (!busyLock.enterBusy())
+            return new SnapshotFutureTask(new IgniteCheckedException("Snapshot manager is stopping [locNodeId=" + cctx.localNodeId() + ']'));
+
+        try {
+            if (locSnpTasks.containsKey(snpName))
+                return new SnapshotFutureTask(new IgniteCheckedException("Snapshot with requested name is already scheduled: " + snpName));
+
+            SnapshotFutureTask snpFutTask;
+
+            SnapshotFutureTask prev = locSnpTasks.putIfAbsent(snpName,
+                snpFutTask = new SnapshotFutureTask(cctx,
+                    srcNodeId,
+                    snpName,
+                    tmpWorkDir,
+                    ioFactory,
+                    snpSndr,
+                    parts,
+                    locBuff));
+
+            if (prev != null)
+                return new SnapshotFutureTask(new IgniteCheckedException("Snapshot with requested name is already scheduled: " + snpName));
+
+            if (log.isInfoEnabled()) {
+                log.info("Snapshot task has been registered on local node [sctx=" + this +
+                    ", topVer=" + cctx.discovery().topologyVersionEx() + ']');
+            }
+
+            snpFutTask.listen(f -> locSnpTasks.remove(snpName));
+
+            return snpFutTask;
+        }
+        finally {
+            busyLock.leaveBusy();
+        }
+    }
+
+    /**
+     * @param factory Factory which produces {@link LocalSnapshotSender} implementation.
+     */
+    void setLocalSnapshotSenderFactory(Function<String, SnapshotSender> factory) {
+        locSndrFactory = factory;
+    }
+
+    /**
+     * @return Factory which produces {@link LocalSnapshotSender} implementation.
+     */
+    Function<String, SnapshotSender> localSnapshotSenderFactory() {
+        return LocalSnapshotSender::new;
+    }
+
+    /**
+     * @param snpName Snapshot name.
+     * @param rmtNodeId Remote node id to send snapshot to.
+     * @return Snapshot sender instance.
+     */
+    SnapshotSender remoteSnapshotSender(String snpName, UUID rmtNodeId) {
+        // Remote snapshots can be send only by single threaded executor since only one transmissionSender created.
+        return new RemoteSnapshotSender(log,
+            new SequentialExecutorWrapper(log, snpRunner),
+            () -> databaseRelativePath(pdsSettings.folderName()),
+            cctx.gridIO().openTransmissionSender(rmtNodeId, DFLT_INITIAL_SNAPSHOT_TOPIC),
+            snpName);
+    }
+
+    /** Snapshot finished successfully or already restored. Key can be removed. */
+    private void removeLastMetaStorageKey() throws IgniteCheckedException {
+        cctx.database().checkpointReadLock();
+
+        try {
+            metaStorage.remove(SNP_RUNNING_KEY);
+        }
+        finally {
+            cctx.database().checkpointReadUnlock();
+        }
+    }
+
+    /**
+     * @return The executor service used to run snapshot tasks.
+     */
+    ExecutorService snapshotExecutorService() {
+        assert snpRunner != null;
+
+        return snpRunner;
+    }
+
+    /**
+     * @param ioFactory Factory to create IO interface over a page stores.
+     */
+    void ioFactory(FileIOFactory ioFactory) {
+        this.ioFactory = ioFactory;
+    }
+
+    /**
+     * @param nodeId Remote node id on which requests has been registered.
+     * @return Snapshot future related to given node id.
+     */
+    SnapshotFutureTask lastScheduledRemoteSnapshotTask(UUID nodeId) {
+        return locSnpTasks.values().stream()
+            .filter(t -> t.type() == RemoteSnapshotSender.class && t.sourceNodeId().equals(nodeId))
+            .findFirst()
+            .orElse(null);
+    }
+
+    /**
+     * @return Relative configured path of persistence data storage directory for the local node.
+     * Example: {@code snapshotWorkDir/db/IgniteNodeName0}
+     */
+    static String databaseRelativePath(String folderName) {
+        return Paths.get(DB_DEFAULT_FOLDER, folderName).toString();
+    }
+
+    /**
+     * @param cfg Ignite configuration.
+     * @return Snapshot work path.
+     */
+    static File resolveSnapshotWorkDirectory(IgniteConfiguration cfg) {
+        try {
+            return cfg.getSnapshotPath() == null ?
+                U.resolveWorkDirectory(cfg.getWorkDirectory(), DFLT_SNAPSHOT_DIRECTORY, false) :
+                U.resolveWorkDirectory(cfg.getWorkDirectory(), cfg.getSnapshotPath(), false);
+        }
+        catch (IgniteCheckedException e) {
+            throw new IgniteException(e);
+        }
+    }
+
+    /** Remote snapshot future which tracks remote snapshot transmission result. */
+    private class RemoteSnapshotFuture extends GridFutureAdapter<Void> {
+        /** Snapshot name to create. */
+        private final String snpName;
+
+        /** Remote node id to request snapshot from. */
+        private final UUID rmtNodeId;
+
+        /** Collection of partition to be received. */
+        private final Map<GroupPartitionId, FilePageStore> stores = new ConcurrentHashMap<>();
+
+        /** Partition handler given by request initiator. */
+        private final BiConsumer<File, GroupPartitionId> partConsumer;
+
+        /** Counter which show how many partitions left to be received. */
+        private int partsLeft = -1;
+
+        /**
+         * @param partConsumer Received partition handler.
+         */
+        public RemoteSnapshotFuture(UUID rmtNodeId, String snpName, BiConsumer<File, GroupPartitionId> partConsumer) {
+            this.snpName = snpName;
+            this.rmtNodeId = rmtNodeId;
+            this.partConsumer = partConsumer;
+        }
+
+        /** {@inheritDoc} */
+        @Override public boolean cancel() {
+            return onCancelled();
+        }
+
+        /** {@inheritDoc} */
+        @Override protected boolean onDone(@Nullable Void res, @Nullable Throwable err, boolean cancel) {
+            assert err != null || cancel || stores.isEmpty() : "Not all file storage processed: " + stores;
+
+            rmtSnpReq.compareAndSet(this, null);
+
+            if (err != null || cancel) {
+                // Close non finished file storage.
+                for (Map.Entry<GroupPartitionId, FilePageStore> entry : stores.entrySet()) {
+                    FilePageStore store = entry.getValue();
+
+                    try {
+                        store.stop(true);
+                    }
+                    catch (StorageException e) {
+                        log.warning("Error stopping received file page store", e);
+                    }
+                }
+            }
+
+            U.delete(Paths.get(tmpWorkDir.getAbsolutePath(), snpName));
+
+            return super.onDone(res, err, cancel);
+        }
+
+        /** {@inheritDoc} */
+        @Override public boolean equals(Object o) {
+            if (this == o)
+                return true;
+
+            if (o == null || getClass() != o.getClass())
+                return false;
+
+            RemoteSnapshotFuture fut = (RemoteSnapshotFuture)o;
+
+            return rmtNodeId.equals(fut.rmtNodeId) &&
+                snpName.equals(fut.snpName);
+        }
+
+        /** {@inheritDoc} */
+        @Override public int hashCode() {
+            return Objects.hash(rmtNodeId, snpName);
+        }
+
+        /** {@inheritDoc} */
+        @Override public String toString() {
+            return S.toString(RemoteSnapshotFuture.class, this);
+        }
+    }
+
+    /**
+     * Such an executor can executes tasks not in a single thread, but executes them
+     * on different threads sequentially. It's important for some {@link SnapshotSender}'s
+     * to process sub-task sequentially due to all these sub-tasks may share a single socket
+     * channel to send data to.
+     */
+    private static class SequentialExecutorWrapper implements Executor {
+        /** Ignite logger. */
+        private final IgniteLogger log;
+
+        /** Queue of task to execute. */
+        private final Queue<Runnable> tasks = new ArrayDeque<>();
+
+        /** Delegate executor. */
+        private final Executor executor;
+
+        /** Currently running task. */
+        private volatile Runnable active;
+
+        /** If wrapped executor is shutting down. */
+        private volatile boolean stopping;
+
+        /**
+         * @param executor Executor to run tasks on.
+         */
+        public SequentialExecutorWrapper(IgniteLogger log, Executor executor) {
+            this.log = log.getLogger(SequentialExecutorWrapper.class);
+            this.executor = executor;
+        }
+
+        /** {@inheritDoc} */
+        @Override public synchronized void execute(final Runnable r) {
+            assert !stopping : "Task must be cancelled prior to the wrapped executor is shutting down.";
+
+            tasks.offer(() -> {
+                try {
+                    r.run();
+                }
+                finally {
+                    scheduleNext();
+                }
+            });
+
+            if (active == null)
+                scheduleNext();
+        }
+
+        /** */
+        protected synchronized void scheduleNext() {
+            if ((active = tasks.poll()) != null) {
+                try {
+                    executor.execute(active);
+                }
+                catch (RejectedExecutionException e) {
+                    tasks.clear();
+
+                    stopping = true;
+
+                    log.warning("Task is outdated. Wrapped executor is shutting down.", e);
+                }
+            }
+        }
+    }
+
+    /**
+     *
+     */
+    private static class RemoteSnapshotSender extends SnapshotSender {
+        /** The sender which sends files to remote node. */
+        private final GridIoManager.TransmissionSender sndr;
+
+        /** Relative node path initializer. */
+        private final Supplier<String> initPath;
+
+        /** Snapshot name */
+        private final String snpName;
+
+        /** Local node persistent directory with consistent id. */
+        private String relativeNodePath;
+
+        /** The number of cache partition files expected to be processed. */
+        private int partsCnt;
+
+        /**
+         * @param log Ignite logger.
+         * @param sndr File sender instance.
+         * @param snpName Snapshot name.
+         */
+        public RemoteSnapshotSender(
+            IgniteLogger log,
+            Executor exec,
+            Supplier<String> initPath,
+            GridIoManager.TransmissionSender sndr,
+            String snpName
+        ) {
+            super(log, exec);
+
+            this.sndr = sndr;
+            this.snpName = snpName;
+            this.initPath = initPath;
+        }
+
+        /** {@inheritDoc} */
+        @Override protected void init(int partsCnt) {
+            this.partsCnt = partsCnt;
+
+            relativeNodePath = initPath.get();
+
+            if (relativeNodePath == null)
+                throw new IgniteException("Relative node path cannot be empty.");
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendPart0(File part, String cacheDirName, GroupPartitionId pair, Long len) {
+            try {
+                assert part.exists();
+                assert len > 0 : "Requested partitions has incorrect file length " +
+                    "[pair=" + pair + ", cacheDirName=" + cacheDirName + ']';
+
+                sndr.send(part, 0, len, transmissionParams(snpName, cacheDirName, pair), TransmissionPolicy.FILE);
+
+                if (log.isInfoEnabled()) {
+                    log.info("Partition file has been send [part=" + part.getName() + ", pair=" + pair +
+                        ", length=" + len + ']');
+                }
+            }
+            catch (TransmissionCancelledException e) {
+                if (log.isInfoEnabled()) {
+                    log.info("Transmission partition file has been interrupted [part=" + part.getName() +
+                        ", pair=" + pair + ']');
+                }
+            }
+            catch (IgniteCheckedException | InterruptedException | IOException e) {
+                U.error(log, "Error sending partition file [part=" + part.getName() + ", pair=" + pair +
+                    ", length=" + len + ']', e);
+
+                throw new IgniteException(e);
+            }
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendDelta0(File delta, String cacheDirName, GroupPartitionId pair) {
+            try {
+                sndr.send(delta, transmissionParams(snpName, cacheDirName, pair), TransmissionPolicy.CHUNK);
+
+                if (log.isInfoEnabled())
+                    log.info("Delta pages storage has been send [part=" + delta.getName() + ", pair=" + pair + ']');
+            }
+            catch (TransmissionCancelledException e) {
+                if (log.isInfoEnabled()) {
+                    log.info("Transmission delta pages has been interrupted [part=" + delta.getName() +
+                        ", pair=" + pair + ']');
+                }
+            }
+            catch (IgniteCheckedException | InterruptedException | IOException e) {
+                U.error(log, "Error sending delta file  [part=" + delta.getName() + ", pair=" + pair + ']', e);
+
+                throw new IgniteException(e);
+            }
+        }
+
+        /**
+         * @param cacheDirName Cache directory name.
+         * @param pair Cache group id with corresponding partition id.
+         * @return Map of params.
+         */
+        private Map<String, Serializable> transmissionParams(String snpName, String cacheDirName,
+            GroupPartitionId pair) {
+            Map<String, Serializable> params = new HashMap<>();
+
+            params.put(SNP_GRP_ID_PARAM, pair.getGroupId());
+            params.put(SNP_PART_ID_PARAM, pair.getPartitionId());
+            params.put(SNP_DB_NODE_PATH_PARAM, relativeNodePath);
+            params.put(SNP_CACHE_DIR_NAME_PARAM, cacheDirName);
+            params.put(SNP_NAME_PARAM, snpName);
+            params.put(SNP_PARTITIONS_CNT, partsCnt);
+
+            return params;
+        }
+
+        /** {@inheritDoc} */
+        @Override public void close0(@Nullable Throwable th) {
+            U.closeQuiet(sndr);
+
+            if (th == null) {
+                if (log.isInfoEnabled())
+                    log.info("The remote snapshot sender closed normally [snpName=" + snpName + ']');
+            }
+            else {
+                U.warn(log, "The remote snapshot sender closed due to an error occurred while processing " +
+                    "snapshot operation [snpName=" + snpName + ']', th);
+            }
+        }
+    }
+
+    /**
+     * Snapshot sender which writes all data to local directory.
+     */
+    private class LocalSnapshotSender extends SnapshotSender {
+        /** Snapshot name. */
+        private final String snpName;
+
+        /** Local snapshot directory. */
+        private final File snpLocDir;
+
+        /** Local node snapshot directory calculated on snapshot directory. */
+        private File dbDir;
+
+        /** Size of page. */
+        private final int pageSize;
+
+        /**
+         * @param snpName Snapshot name.
+         */
+        public LocalSnapshotSender(String snpName) {
+            super(IgniteSnapshotManager.this.log, snpRunner);
+
+            this.snpName = snpName;
+            snpLocDir = snapshotLocalDir(snpName);
+            pageSize = cctx.kernalContext().config().getDataStorageConfiguration().getPageSize();
+        }
+
+        /** {@inheritDoc} */
+        @Override protected void init(int partsCnt) {
+            dbDir = new File (snpLocDir, databaseRelativePath(pdsSettings.folderName()));
+
+            if (dbDir.exists()) {
+                throw new IgniteException("Snapshot with given name already exists " +
+                    "[snpName=" + snpName + ", absPath=" + dbDir.getAbsolutePath() + ']');
+            }
+
+            cctx.database().checkpointReadLock();
+
+            try {
+                assert metaStorage != null && metaStorage.read(SNP_RUNNING_KEY) == null :
+                    "The previous snapshot hasn't been completed correctly";
+
+                metaStorage.write(SNP_RUNNING_KEY, snpName);
+
+                U.ensureDirectory(dbDir, "snapshot work directory", log);
+            }
+            catch (IgniteCheckedException e) {
+                throw new IgniteException(e);
+            }
+            finally {
+                cctx.database().checkpointReadUnlock();
+            }
+
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendCacheConfig0(File ccfg, String cacheDirName) {
+            assert dbDir != null;
+
+            try {
+                File cacheDir = U.resolveWorkDirectory(dbDir.getAbsolutePath(), cacheDirName, false);
+
+                copy(ccfg, new File(cacheDir, ccfg.getName()), ccfg.length());
+            }
+            catch (IgniteCheckedException | IOException e) {
+                throw new IgniteException(e);
+            }
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendMarshallerMeta0(List<Map<Integer, MappedName>> mappings) {
+            if (mappings == null)
+                return;
+
+            saveMappings(cctx.kernalContext(), mappings, snpLocDir);
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendBinaryMeta0(Collection<BinaryType> types) {
+            if (types == null)
+                return;
+
+            cctx.kernalContext().cacheObjects().saveMetadata(types, snpLocDir);
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendPart0(File part, String cacheDirName, GroupPartitionId pair, Long len) {
+            try {
+                if (len == 0)
+                    return;
+
+                File cacheDir = U.resolveWorkDirectory(dbDir.getAbsolutePath(), cacheDirName, false);
+
+                File snpPart = new File(cacheDir, part.getName());
+
+                if (!snpPart.exists() || snpPart.delete())
+                    snpPart.createNewFile();
+
+                copy(part, snpPart, len);
+
+                if (log.isInfoEnabled()) {
+                    log.info("Partition has been snapshot [snapshotDir=" + dbDir.getAbsolutePath() +
+                        ", cacheDirName=" + cacheDirName + ", part=" + part.getName() +
+                        ", length=" + part.length() + ", snapshot=" + snpPart.getName() + ']');
+                }
+            }
+            catch (IOException | IgniteCheckedException ex) {
+                throw new IgniteException(ex);
+            }
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendDelta0(File delta, String cacheDirName, GroupPartitionId pair) {
+            File snpPart = getPartitionFile(dbDir, cacheDirName, pair.getPartitionId());
+
+            if (log.isInfoEnabled()) {
+                log.info("Start partition snapshot recovery with the given delta page file [part=" + snpPart +
+                    ", delta=" + delta + ']');
+            }
+
+            try (FileIO fileIo = ioFactory.create(delta, READ);
+                 FilePageStore pageStore = (FilePageStore)storeFactory
+                     .apply(pair.getGroupId(), false)
+                     .createPageStore(getFlagByPartId(pair.getPartitionId()),
+                         snpPart::toPath,
+                         new LongAdderMetric("NO_OP", null))
+            ) {
+                ByteBuffer pageBuf = ByteBuffer.allocate(pageSize)
+                    .order(ByteOrder.nativeOrder());
+
+                long totalBytes = fileIo.size();
+
+                assert totalBytes % pageSize == 0 : "Given file with delta pages has incorrect size: " + fileIo.size();
+
+                pageStore.beginRecover();
+
+                for (long pos = 0; pos < totalBytes; pos += pageSize) {
+                    long read = fileIo.readFully(pageBuf, pos);
+
+                    assert read == pageBuf.capacity();
+
+                    pageBuf.flip();
+
+                    if (log.isDebugEnabled()) {
+                        log.debug("Read page given delta file [path=" + delta.getName() +
+                            ", pageId=" + PageIO.getPageId(pageBuf) + ", pos=" + pos + ", pages=" + (totalBytes / pageSize) +
+                            ", crcBuff=" + FastCrc.calcCrc(pageBuf, pageBuf.limit()) + ", crcPage=" + PageIO.getCrc(pageBuf) + ']');
+
+                        pageBuf.rewind();
+                    }
+
+                    pageStore.write(PageIO.getPageId(pageBuf), pageBuf, 0, false);
+
+                    pageBuf.flip();
+                }
+
+                pageStore.finishRecover();
+            }
+            catch (IOException | IgniteCheckedException e) {
+                throw new IgniteException(e);
+            }
+        }
+
+        /** {@inheritDoc} */
+        @Override protected void close0(@Nullable Throwable th) {
+            if (th == null) {
+                if (log.isInfoEnabled())
+                    log.info("Local snapshot sender closed, resources released [dbNodeSnpDir=" + dbDir + ']');
+            }
+            else {
+                deleteSnapshot(snpLocDir, pdsSettings.folderName());
+
+                U.warn(log, "Local snapshot sender closed due to an error occurred", th);
+            }
+        }
+
+        /**
+         * @param from Copy from file.
+         * @param to Copy data to file.
+         * @param length Number of bytes to copy from beginning.
+         * @throws IOException If fails.
+         */
+        private void copy(File from, File to, long length) throws IOException {
+            try (FileIO src = ioFactory.create(from, READ);
+                 FileChannel dest = new FileOutputStream(to).getChannel()) {
+                if (src.size() < length) {
+                    throw new IgniteException("The source file to copy has to enough length " +
+                        "[expected=" + length + ", actual=" + src.size() + ']');
+                }
+
+                src.position(0);
+
+                long written = 0;
+
+                while (written < length)
+                    written += src.transferTo(written, length - written, dest);
+            }
+        }
+    }
+
+    /** Snapshot start request for {@link DistributedProcess} initiate message. */
+    private static class SnapshotOperationRequest implements Serializable {
+        /** Serial version uid. */
+        private static final long serialVersionUID = 0L;
+
+        /** Unique snapshot request id. */
+        private final UUID rqId;
+
+        /** Source node id which trigger request. */
+        private final UUID srcNodeId;
+
+        /** Snapshot name. */
+        private final String snpName;
+
+        @GridToStringInclude
+        /** The list of cache groups to include into snapshot. */
+        private final List<Integer> grpIds;
+
+        @GridToStringInclude
+        /** The list of affected by snapshot operation baseline nodes. */
+        private final Set<UUID> bltNodes;
+
+        /** {@code true} if an execution of local snapshot tasks failed with an error. */
+        private volatile boolean hasErr;
+
+        /**
+         * @param snpName Snapshot name.
+         * @param grpIds Cache groups to include into snapshot.
+         */
+        public SnapshotOperationRequest(UUID rqId, UUID srcNodeId, String snpName, List<Integer> grpIds, Set<UUID> bltNodes) {
+            this.rqId = rqId;
+            this.srcNodeId = srcNodeId;
+            this.snpName = snpName;
+            this.grpIds = grpIds;
+            this.bltNodes = bltNodes;
+        }
+
+        /** {@inheritDoc} */
+        @Override public String toString() {
+            return S.toString(SnapshotOperationRequest.class, this);
+        }
+    }
+
+    /** */
+    private static class SnapshotOperationResponse implements Serializable {
+        /** Serial version uid. */
+        private static final long serialVersionUID = 0L;
+    }
+
+    /** Snapshot operation start message. */
+    private static class SnapshotStartDiscoveryMessage implements SnapshotDiscoveryMessage {
+        /** Serial version UID. */
+        private static final long serialVersionUID = 0L;
+
+        /** Discovery cache. */
+        private final DiscoCache discoCache;
+
+        /** Snapshot request id */
+        private final IgniteUuid id;
+
+        /**
+         * @param discoCache Discovery cache.
+         * @param id Snapshot request id.
+         */
+        public SnapshotStartDiscoveryMessage(DiscoCache discoCache, UUID id) {
+            this.discoCache = discoCache;
+            this.id = new IgniteUuid(id, 0);
+        }
+
+        /** {@inheritDoc} */
+        @Override public boolean needExchange() {
+            return true;
+        }
+
+        /** {@inheritDoc} */
+        @Override public boolean needAssignPartitions() {
+            return false;
+        }
+
+        /** {@inheritDoc} */
+        @Override public IgniteUuid id() {
+            return id;
+        }
+
+        /** {@inheritDoc} */
+        @Override public @Nullable DiscoveryCustomMessage ackMessage() {
+            return null;
+        }
+
+        /** {@inheritDoc} */
+        @Override public boolean isMutable() {
+            return false;
+        }
+
+        /** {@inheritDoc} */
+        @Override public DiscoCache createDiscoCache(GridDiscoveryManager mgr, AffinityTopologyVersion topVer,
+            DiscoCache discoCache) {
+            return this.discoCache;
+        }
+
+        /** {@inheritDoc} */
+        @Override public boolean equals(Object o) {
+            if (this == o)
+                return true;
+
+            if (o == null || getClass() != o.getClass())
+                return false;
+
+            SnapshotStartDiscoveryMessage message = (SnapshotStartDiscoveryMessage)o;
 
 Review comment:
   Abbreviation should be used

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r408313641
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManager.java
 ##########
 @@ -0,0 +1,1894 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.channels.FileChannel;
+import java.nio.file.FileVisitResult;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.SimpleFileVisitor;
+import java.nio.file.attribute.BasicFileAttributes;
+import java.util.ArrayDeque;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Queue;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.Executor;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.concurrent.atomic.LongAdder;
+import java.util.function.BiConsumer;
+import java.util.function.BiFunction;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.events.DiscoveryEvent;
+import org.apache.ignite.failure.FailureContext;
+import org.apache.ignite.failure.FailureType;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.GridTopic;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.events.DiscoveryCustomEvent;
+import org.apache.ignite.internal.managers.communication.GridIoManager;
+import org.apache.ignite.internal.managers.communication.GridMessageListener;
+import org.apache.ignite.internal.managers.communication.TransmissionCancelledException;
+import org.apache.ignite.internal.managers.communication.TransmissionHandler;
+import org.apache.ignite.internal.managers.communication.TransmissionMeta;
+import org.apache.ignite.internal.managers.communication.TransmissionPolicy;
+import org.apache.ignite.internal.managers.discovery.DiscoCache;
+import org.apache.ignite.internal.managers.discovery.DiscoveryCustomMessage;
+import org.apache.ignite.internal.managers.discovery.GridDiscoveryManager;
+import org.apache.ignite.internal.managers.eventstorage.DiscoveryEventListener;
+import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.CacheType;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.PartitionsExchangeAware;
+import org.apache.ignite.internal.processors.cache.persistence.StorageException;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.filename.PdsFolderSettings;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.MetastorageLifecycleListener;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadOnlyMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadWriteMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.tree.io.PageIO;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.processors.marshaller.MappedName;
+import org.apache.ignite.internal.processors.metric.impl.LongAdderMetric;
+import org.apache.ignite.internal.util.GridBusyLock;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.distributed.InitMessage;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.tostring.GridToStringInclude;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.S;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.lang.IgniteUuid;
+import org.apache.ignite.thread.IgniteThreadPoolExecutor;
+import org.apache.ignite.thread.OomExceptionHandler;
+import org.jetbrains.annotations.Nullable;
+
+import static java.nio.file.StandardOpenOption.READ;
+import static org.apache.ignite.cluster.ClusterState.active;
+import static org.apache.ignite.configuration.IgniteConfiguration.DFLT_SNAPSHOT_DIRECTORY;
+import static org.apache.ignite.events.EventType.EVT_NODE_FAILED;
+import static org.apache.ignite.events.EventType.EVT_NODE_LEFT;
+import static org.apache.ignite.internal.IgniteFeatures.PERSISTENCE_CACHE_SNAPSHOT;
+import static org.apache.ignite.internal.IgniteFeatures.nodeSupports;
+import static org.apache.ignite.internal.MarshallerContextImpl.saveMappings;
+import static org.apache.ignite.internal.events.DiscoveryCustomEvent.EVT_DISCOVERY_CUSTOM_EVT;
+import static org.apache.ignite.internal.managers.communication.GridIoPolicy.SYSTEM_POOL;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.INDEX_PARTITION;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.MAX_PARTITION_ID;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.INDEX_FILE_NAME;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.PART_FILE_TEMPLATE;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFile;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFileName;
+import static org.apache.ignite.internal.processors.cache.persistence.filename.PdsConsistentIdProcessor.DB_DEFAULT_FOLDER;
+import static org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId.getFlagByPartId;
+import static org.apache.ignite.internal.util.IgniteUtils.isLocalNodeCoordinator;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.END_SNAPSHOT;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.START_SNAPSHOT;
+
+/**
+ * Internal implementation of snapshot operations over persistence caches.
+ * <p>
+ * There are two major actions available:
+ * <ul>
+ *     <li>Create snapshot of the whole cluster cache groups by triggering PME to achieve consistency.</li>
+ *     <li>Create local snapshot of requested cache groups and send it to the node which request this operation.
+ *     Cache groups will be transmitted using internal API for transferring files. See {@link TransmissionHandler}.</li>
+ * </ul>
+ */
+public class IgniteSnapshotManager extends GridCacheSharedManagerAdapter
+    implements IgniteSnapshot, PartitionsExchangeAware, MetastorageLifecycleListener {
+    /** File with delta pages suffix. */
+    public static final String DELTA_SUFFIX = ".delta";
+
+    /** File name template consists of delta pages. */
+    public static final String PART_DELTA_TEMPLATE = PART_FILE_TEMPLATE + DELTA_SUFFIX;
+
+    /** File name template for index delta pages. */
+    public static final String INDEX_DELTA_NAME = INDEX_FILE_NAME + DELTA_SUFFIX;
+
+    /** Text Reason for checkpoint to start snapshot operation. */
+    public static final String CP_SNAPSHOT_REASON = "Checkpoint started to enforce snapshot operation: %s";
+
+    /** Name prefix for each remote snapshot operation. */
+    public static final String RMT_SNAPSHOT_PREFIX = "snapshot_";
+
+    /** Default snapshot directory for loading remote snapshots. */
+    public static final String DFLT_SNAPSHOT_TMP_DIR = "snp";
+
+    /** Timeout in millisecond for snapshot operations. */
+    public static final long DFLT_SNAPSHOT_TIMEOUT = 15_000L;
+
+    /** Snapshot in progress error message. */
+    public static final String SNP_IN_PROGRESS_ERR_MSG = "Operation rejected due to the snapshot operation in progress.";
+
+    /** Error message to finalize snapshot tasks. */
+    public static final String SNP_NODE_STOPPING_ERR_MSG = "Snapshot has been cancelled due to the local node " +
+        "is stopping";
+
+    /** Metastorage key to save currently running snapshot. */
+    public static final String SNP_RUNNING_KEY = "snapshot-running";
+
+    /** Prefix for snapshot threads. */
+    private static final String SNAPSHOT_RUNNER_THREAD_PREFIX = "snapshot-runner";
+
+    /** Total number of thread to perform local snapshot. */
+    private static final int SNAPSHOT_THREAD_POOL_SIZE = 4;
+
+    /** Default snapshot topic to receive snapshots from remote node. */
+    private static final Object DFLT_INITIAL_SNAPSHOT_TOPIC = GridTopic.TOPIC_SNAPSHOT.topic("rmt_snp");
+
+    /** File transmission parameter of cache group id. */
+    private static final String SNP_GRP_ID_PARAM = "grpId";
+
+    /** File transmission parameter of cache partition id. */
+    private static final String SNP_PART_ID_PARAM = "partId";
+
+    /** File transmission parameter of node-sender directory path with its consistentId (e.g. db/IgniteNode0). */
+    private static final String SNP_DB_NODE_PATH_PARAM = "dbNodePath";
+
+    /** File transmission parameter of a cache directory with is currently sends its partitions. */
+    private static final String SNP_CACHE_DIR_NAME_PARAM = "cacheDirName";
+
+    /** Snapshot parameter name for a file transmission. */
+    private static final String SNP_NAME_PARAM = "snpName";
+
+    /** Total snapshot files count which receiver should expect to receive. */
+    private static final String SNP_PARTITIONS_CNT = "partsCnt";
+
+    /**
+     * Local buffer to perform copy-on-write operations with pages for {@code SnapshotFutureTask.PageStoreSerialWriter}s.
+     * It is important to have only only buffer per thread (instead of creating each buffer per
+     * each {@code SnapshotFutureTask.PageStoreSerialWriter}) this is redundant and can lead to OOM errors. Direct buffer
+     * deallocate only when ByteBuffer is garbage collected, but it can get out of off-heap memory before it.
+     */
+    private final ThreadLocal<ByteBuffer> locBuff;
+
+    /** Map of registered cache snapshot processes and their corresponding contexts. */
+    private final ConcurrentMap<String, SnapshotFutureTask> locSnpTasks = new ConcurrentHashMap<>();
+
+    /** Lock to protect the resources is used. */
+    private final GridBusyLock busyLock = new GridBusyLock();
+
+    /** Requested snapshot from remote node. */
+    private final AtomicReference<RemoteSnapshotFuture> rmtSnpReq = new AtomicReference<>();
+
+    /** Mutex used to order cluster snapshot operation progress. */
+    private final Object snpOpMux = new Object();
+
+    /** Take snapshot operation procedure. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> startSnpProc;
+
+    /** Check previously preformed snapshot operation and delete uncompleted files if need. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> endSnpProc;
+
+    /** Resolved persistent data storage settings. */
+    private volatile PdsFolderSettings pdsSettings;
+
+    /** Fully initialized metastorage. */
+    private volatile ReadWriteMetastorage metaStorage;
+
+    /** Local snapshot sender factory. */
+    private Function<String, SnapshotSender> locSndrFactory = this::localSnapshotSender;
+
+    /** Main snapshot directory to save created snapshots. */
+    private volatile File locSnpDir;
+
+    /**
+     * Working directory for loaded snapshots from the remote nodes and storing
+     * temporary partition delta-files of locally started snapshot process.
+     */
+    private File tmpWorkDir;
+
+    /** Factory to working with delta as file storage. */
+    private volatile FileIOFactory ioFactory = new RandomAccessFileIOFactory();
+
+    /** Factory to create page store for restore. */
+    private volatile BiFunction<Integer, Boolean, FilePageStoreFactory> storeFactory;
+
+    /** Snapshot thread pool to perform local partition snapshots. */
+    private ExecutorService snpRunner;
+
+    /** System discovery message listener. */
+    private DiscoveryEventListener discoLsnr;
+
+    /** Cluster snapshot operation requested by user. */
+    private GridFutureAdapter<Void> clusterSnpFut;
+
+    /** Current snapshot operation on local node. */
+    private volatile SnapshotOperationRequest clusterSnpRq;
+
+    /** {@code true} if recovery process occurred for snapshot. */
+    private volatile boolean recovered;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public IgniteSnapshotManager(GridKernalContext ctx) {
+        locBuff = ThreadLocal.withInitial(() ->
+            ByteBuffer.allocateDirect(ctx.config().getDataStorageConfiguration().getPageSize())
+                .order(ByteOrder.nativeOrder()));
+
+        startSnpProc = new DistributedProcess<>(ctx, START_SNAPSHOT, this::startLocalSnapshot,
+            this::startLocalSnapshotResult);
+
+        endSnpProc = new DistributedProcess<>(ctx, END_SNAPSHOT, this::endLocalSnapshot,
+            this::endLocalSnapshotResult);
+    }
+
+    /**
+     * @param snapshotCacheDir Snapshot directory to store files.
+     * @param partId Cache partition identifier.
+     * @return A file representation.
+     */
+    public static File partDeltaFile(File snapshotCacheDir, int partId) {
+        return new File(snapshotCacheDir, partDeltaFileName(partId));
+    }
+
+    /**
+     * @param partId Partition id.
+     * @return File name of delta partition pages.
+     */
+    public static String partDeltaFileName(int partId) {
+        assert partId <= MAX_PARTITION_ID || partId == INDEX_PARTITION;
+
+        return partId == INDEX_PARTITION ? INDEX_DELTA_NAME : String.format(PART_DELTA_TEMPLATE, partId);
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void start0() throws IgniteCheckedException {
+        super.start0();
+
+        GridKernalContext ctx = cctx.kernalContext();
+
+        if (ctx.clientNode())
+            return;
+
+        if (!CU.isPersistenceEnabled(ctx.config()))
+            return;
+
+        snpRunner = new IgniteThreadPoolExecutor(SNAPSHOT_RUNNER_THREAD_PREFIX,
+            cctx.igniteInstanceName(),
+            SNAPSHOT_THREAD_POOL_SIZE,
+            SNAPSHOT_THREAD_POOL_SIZE,
+            IgniteConfiguration.DFLT_THREAD_KEEP_ALIVE_TIME,
+            new LinkedBlockingQueue<>(),
+            SYSTEM_POOL,
+            new OomExceptionHandler(ctx));
+
+        assert cctx.pageStore() instanceof FilePageStoreManager;
+
+        FilePageStoreManager storeMgr = (FilePageStoreManager)cctx.pageStore();
+
+        pdsSettings = cctx.kernalContext().pdsFolderResolver().resolveFolders();
+
+        locSnpDir = snapshotPath(ctx.config()).toFile();
+        tmpWorkDir = Paths.get(storeMgr.workDir().getAbsolutePath(), DFLT_SNAPSHOT_TMP_DIR).toFile();
+
+        U.ensureDirectory(locSnpDir, "snapshot work directory", log);
+        U.ensureDirectory(tmpWorkDir, "temp directory for snapshot creation", log);
+
+        storeFactory = storeMgr::getPageStoreFactory;
+
+        cctx.exchange().registerExchangeAwareComponent(this);
+        ctx.internalSubscriptionProcessor().registerMetastorageListener(this);
+
+        // Receive remote snapshots requests.
+        cctx.gridIO().addMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC, new GridMessageListener() {
+            @Override public void onMessage(UUID nodeId, Object msg, byte plc) {
+                if (!busyLock.enterBusy())
+                    return;
+
+                try {
+                    if (msg instanceof SnapshotRequestMessage) {
+                        SnapshotRequestMessage reqMsg0 = (SnapshotRequestMessage)msg;
+                        String snpName = reqMsg0.snapshotName();
+
+                        synchronized (this) {
+                            SnapshotFutureTask task = lastScheduledRemoteSnapshotTask(nodeId);
+
+                            if (task != null) {
+                                // Task will also be removed from local map due to the listener on future done.
+                                task.cancel();
+
+                                log.info("Snapshot request has been cancelled due to another request received " +
+                                    "[prevSnpResp=" + task + ", msg0=" + reqMsg0 + ']');
+                            }
+                        }
+
+                        SnapshotFutureTask task = registerSnapshotTask(snpName,
+                            nodeId,
+                            reqMsg0.parts(),
+                            remoteSnapshotSender(snpName, nodeId));
+
+                        task.listen(f -> {
+                            if (f.error() == null)
+                                return;
+
+                            U.error(log, "Failed to process request of creating a snapshot " +
+                                "[from=" + nodeId + ", msg=" + reqMsg0 + ']', f.error());
+
+                            try {
+                                cctx.gridIO().sendToCustomTopic(nodeId,
+                                    DFLT_INITIAL_SNAPSHOT_TOPIC,
+                                    new SnapshotResponseMessage(reqMsg0.snapshotName(), f.error().getMessage()),
+                                    SYSTEM_POOL);
+                            }
+                            catch (IgniteCheckedException ex0) {
+                                U.error(log, "Fail to send the response message with processing snapshot request " +
+                                    "error [request=" + reqMsg0 + ", nodeId=" + nodeId + ']', ex0);
+                            }
+                        });
+
+                        task.start();
+                    }
+                    else if (msg instanceof SnapshotResponseMessage) {
+                        SnapshotResponseMessage respMsg0 = (SnapshotResponseMessage)msg;
+
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.snpName.equals(respMsg0.snapshotName())) {
+                            if (log.isInfoEnabled()) {
+                                log.info("A stale snapshot response message has been received. Will be ignored " +
+                                    "[fromNodeId=" + nodeId + ", response=" + respMsg0 + ']');
+                            }
+
+                            return;
+                        }
+
+                        if (respMsg0.errorMessage() != null) {
+                            fut0.onDone(new IgniteCheckedException("Request cancelled. The snapshot operation stopped " +
+                                "on the remote node with an error: " + respMsg0.errorMessage()));
+                        }
+                    }
+                }
+                catch (Throwable e) {
+                    U.error(log, "Processing snapshot request from remote node fails with an error", e);
+
+                    cctx.kernalContext().failure().process(new FailureContext(FailureType.CRITICAL_ERROR, e));
+                }
+                finally {
+                    busyLock.leaveBusy();
+                }
+            }
+        });
+
+        cctx.gridEvents().addDiscoveryEventListener(discoLsnr = (evt, discoCache) -> {
+            if (!busyLock.enterBusy())
+                return;
+
+            try {
+                UUID leftNodeId = evt.eventNode().id();
+
+                if (evt.type() == EVT_DISCOVERY_CUSTOM_EVT) {
+                    DiscoveryCustomEvent evt0 = (DiscoveryCustomEvent)evt;
+
+                    if (evt0.customMessage() instanceof InitMessage) {
+                        InitMessage<?> msg = (InitMessage<?>)evt0.customMessage();
+
+                        // This happens when #takeSnapshot() method already invoked and distributed process
+                        // starts its action.
+                        if (msg.type() == START_SNAPSHOT.ordinal()) {
+                            assert clusterSnpRq != null ||
+                                !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()) : evt;
+
+                            DiscoveryCustomEvent customEvt = new DiscoveryCustomEvent();
+
+                            customEvt.node(evt0.node());
+                            customEvt.eventNode(evt0.eventNode());
+                            customEvt.affinityTopologyVersion(evt0.affinityTopologyVersion());
+                            customEvt.customMessage(new SnapshotStartDiscoveryMessage(discoCache, msg.processId()));
+
+                            // Handle new event inside discovery thread, so no guarantees will be violated.
+                            cctx.exchange().onDiscoveryEvent(customEvt, discoCache);
+                        }
+                    }
+                }
+                else if (evt.type() == EVT_NODE_LEFT || evt.type() == EVT_NODE_FAILED) {
+                    SnapshotOperationRequest snpRq = clusterSnpRq;
+
+                    for (SnapshotFutureTask sctx : locSnpTasks.values()) {
+                        if (sctx.sourceNodeId().equals(leftNodeId) ||
+                            (snpRq != null &&
+                                snpRq.snpName.equals(sctx.snapshotName()) &&
+                                snpRq.bltNodes.contains(leftNodeId))) {
+                            sctx.acceptException(new ClusterTopologyCheckedException("The node which requested snapshot " +
+                                "creation has left the grid"));
+                        }
+                    }
+
+                    RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                    if (snpTrFut != null && snpTrFut.rmtNodeId.equals(leftNodeId)) {
+                        snpTrFut.onDone(new ClusterTopologyCheckedException("The node from which a snapshot has been " +
+                            "requested left the grid"));
+                    }
+                }
+            }
+            finally {
+                busyLock.leaveBusy();
+            }
+        }, EVT_NODE_LEFT, EVT_NODE_FAILED, EVT_DISCOVERY_CUSTOM_EVT);
+
+        // Remote snapshot handler.
+        cctx.kernalContext().io().addTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC, new TransmissionHandler() {
+            @Override public void onEnd(UUID nodeId) {
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                assert snpTrFut.stores.isEmpty() : snpTrFut.stores.entrySet();
+                assert snpTrFut.partsLeft == 0 : snpTrFut;
+
+                snpTrFut.onDone();
+
+                log.info("Requested snapshot from remote node has been fully received " +
+                    "[snpName=" + snpTrFut.snpName + ", snpTrans=" + snpTrFut + ']');
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onException(UUID nodeId, Throwable err) {
+                RemoteSnapshotFuture fut = rmtSnpReq.get();
+
+                if (fut == null)
+                    return;
+
+                if (fut.rmtNodeId.equals(nodeId))
+                    fut.onDone(err);
+            }
+
+            /** {@inheritDoc} */
+            @Override public String filePath(UUID nodeId, TransmissionMeta fileMeta) {
+                Integer partId = (Integer)fileMeta.params().get(SNP_PART_ID_PARAM);
+                String rmtDbNodePath = (String)fileMeta.params().get(SNP_DB_NODE_PATH_PARAM);
+                String cacheDirName = (String)fileMeta.params().get(SNP_CACHE_DIR_NAME_PARAM);
+
+                RemoteSnapshotFuture transFut = resolve(nodeId, fileMeta);
+
+                try {
+                    File cacheDir = U.resolveWorkDirectory(tmpWorkDir.getAbsolutePath(),
+                        Paths.get(transFut.snpName, rmtDbNodePath, cacheDirName).toString(),
+                        false);
+
+                    return new File(cacheDir, getPartitionFileName(partId)).getAbsolutePath();
+                }
+                catch (IgniteCheckedException e) {
+                    throw new IgniteException(e);
+                }
+            }
+
+            /**
+             * @param nodeId Remote node id.
+             * @param meta Transmission meta.
+             * @return Resolved transmission future.
+             */
+            private RemoteSnapshotFuture resolve(UUID nodeId, TransmissionMeta meta) {
+                String snpName = (String)meta.params().get(SNP_NAME_PARAM);
+                Integer partsCnt = (Integer)meta.params().get(SNP_PARTITIONS_CNT);
+
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                if (snpTrFut == null || !snpTrFut.snpName.equals(snpName)) {
+                    throw new TransmissionCancelledException("Stale snapshot transmission will be ignored " +
+                        "[snpName=" + snpName + ", meta=" + meta + ", snpTrFut=" + snpTrFut + ']');
+                }
+
+                assert snpTrFut.snpName.equals(snpName) && snpTrFut.rmtNodeId.equals(nodeId) :
+                    "Another transmission in progress [snpTrFut=" + snpTrFut + ", nodeId=" + snpName + ']';
+
+                if (snpTrFut.partsLeft == -1)
+                    snpTrFut.partsLeft = partsCnt;
+
+                return snpTrFut;
+            }
+
+            /**
+             * @param snpTrans Current snapshot transmission.
+             * @param grpPartId Pair of group id and its partition id.
+             */
+            private void finishRecover(RemoteSnapshotFuture snpTrans, GroupPartitionId grpPartId) {
+                FilePageStore pageStore = null;
+
+                try {
+                    pageStore = snpTrans.stores.remove(grpPartId);
+
+                    pageStore.finishRecover();
+
+                    snpTrans.partConsumer.accept(new File(pageStore.getFileAbsolutePath()), grpPartId);
+
+                    snpTrans.partsLeft--;
+                }
+                catch (StorageException e) {
+                    throw new IgniteException(e);
+                }
+                finally {
+                    U.closeQuiet(pageStore);
+                }
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<ByteBuffer> chunkHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+
+                RemoteSnapshotFuture snpTrFut = resolve(nodeId, initMeta);
+
+                GroupPartitionId grpPartId = new GroupPartitionId(grpId, partId);
+                FilePageStore pageStore = snpTrFut.stores.get(grpPartId);
+
+                if (pageStore == null) {
+                    throw new IgniteException("Partition must be loaded before applying snapshot delta pages " +
+                        "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                pageStore.beginRecover();
+
+                // No snapshot delta pages received. Finalize recovery.
+                if (initMeta.count() == 0)
+                    finishRecover(snpTrFut, grpPartId);
+
+                return new Consumer<ByteBuffer>() {
+                    final LongAdder transferred = new LongAdder();
+
+                    @Override public void accept(ByteBuffer buff) {
+                        try {
+                            assert initMeta.count() != 0 : initMeta;
+
+                            RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                            if (fut0 == null || !fut0.equals(snpTrFut) || fut0.isCancelled()) {
+                                throw new TransmissionCancelledException("Snapshot request is cancelled " +
+                                    "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                            }
+
+                            pageStore.write(PageIO.getPageId(buff), buff, 0, false);
+
+                            transferred.add(buff.capacity());
+
+                            if (transferred.longValue() == initMeta.count())
+                                finishRecover(snpTrFut, grpPartId);
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                    }
+                };
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<File> fileHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+                String snpName = (String)initMeta.params().get(SNP_NAME_PARAM);
+
+                assert grpId != null;
+                assert partId != null;
+                assert snpName != null;
+                assert storeFactory != null;
+
+                RemoteSnapshotFuture transFut = rmtSnpReq.get();
+
+                if (transFut == null) {
+                    throw new IgniteException("Snapshot transmission with given name doesn't exists " +
+                        "[snpName=" + snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                return new Consumer<File>() {
+                    @Override public void accept(File file) {
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.equals(transFut) || fut0.isCancelled()) {
+                            throw new TransmissionCancelledException("Snapshot request is cancelled [snpName=" + snpName +
+                                ", grpId=" + grpId + ", partId=" + partId + ']');
+                        }
+
+                        busyLock.enterBusy();
+
+                        try {
+                            FilePageStore pageStore = (FilePageStore)storeFactory
+                                .apply(grpId, false)
+                                .createPageStore(getFlagByPartId(partId),
+                                    file::toPath,
+                                    new LongAdderMetric("NO_OP", null));
+
+                            transFut.stores.put(new GroupPartitionId(grpId, partId), pageStore);
+
+                            pageStore.init();
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                        finally {
+                            busyLock.leaveBusy();
+                        }
+                    }
+                };
+            }
+        });
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void stop0(boolean cancel) {
+        busyLock.block();
+
+        try {
+            // Try stop all snapshot processing if not yet.
+            for (SnapshotFutureTask sctx : locSnpTasks.values())
+                sctx.acceptException(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+            locSnpTasks.clear();
+
+            RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+            if (snpTrFut != null)
+                snpTrFut.cancel();
+
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null) {
+                    clusterSnpFut.onDone(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+                    clusterSnpFut = null;
+                }
+            }
+
+            if (snpRunner != null)
+                snpRunner.shutdownNow();
+
+            cctx.kernalContext().io().removeMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC);
+            cctx.kernalContext().io().removeTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC);
+
+            if (discoLsnr != null)
+                cctx.kernalContext().event().removeDiscoveryEventListener(discoLsnr);
+
+            cctx.exchange().unregisterExchangeAwareComponent(this);
+        }
+        finally {
+            busyLock.unblock();
+        }
+    }
+
+    /**
+     * Concurrently traverse the snapshot directory for given local node folder name and
+     * delete recursively all files from it if exist.
+     *
+     * @param snpDir Snapshot dire
+     * @param folderName Local node folder name (see U.maskForFileName with consistent id).
+     */
+    public static void deleteSnapshot(File snpDir, String folderName) {
+        if (!snpDir.exists())
+            return;
+
+        assert snpDir.isDirectory() : snpDir;
+
+        try {
+            List<Path> dirs = new ArrayList<>();
+
+            Files.walkFileTree(snpDir.toPath(), new SimpleFileVisitor<Path>() {
 
 Review comment:
   Fixed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r409102157
 
 

 ##########
 File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotMXBeanTest.java
 ##########
 @@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.processors.metric.MetricRegistry;
+import org.apache.ignite.mxbean.SnapshotMXBean;
+import org.apache.ignite.spi.metric.LongMetric;
+import org.apache.ignite.testframework.GridTestUtils;
+import org.junit.Test;
+
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.SNAPSHOT_METRICS;
+
+/**
+ * Tests {@link SnapshotMXBean}.
+ */
+public class IgniteSnapshotMXBeanTest extends AbstractSnapshotSelfTest {
+    /** @throws Exception If fails. */
+    @Test
+    public void testCreateSnapshot() throws Exception {
+        IgniteEx ignite = startGridsWithCache(2, dfltCacheCfg, CACHE_KEYS_RANGE);
+
+        SnapshotMXBean mxBean = getMBean(ignite.name());
+
+        mxBean.createSnapshot(SNAPSHOT_NAME);
+
+        MetricRegistry mreg = ignite.context().metric().registry(SNAPSHOT_METRICS);
+
+        LongMetric endTime = mreg.findMetric("LastSnapshotEndTime");
 
 Review comment:
   Fixed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r407976331
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManager.java
 ##########
 @@ -0,0 +1,1894 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.channels.FileChannel;
+import java.nio.file.FileVisitResult;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.SimpleFileVisitor;
+import java.nio.file.attribute.BasicFileAttributes;
+import java.util.ArrayDeque;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Queue;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.Executor;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.concurrent.atomic.LongAdder;
+import java.util.function.BiConsumer;
+import java.util.function.BiFunction;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.events.DiscoveryEvent;
+import org.apache.ignite.failure.FailureContext;
+import org.apache.ignite.failure.FailureType;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.GridTopic;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.events.DiscoveryCustomEvent;
+import org.apache.ignite.internal.managers.communication.GridIoManager;
+import org.apache.ignite.internal.managers.communication.GridMessageListener;
+import org.apache.ignite.internal.managers.communication.TransmissionCancelledException;
+import org.apache.ignite.internal.managers.communication.TransmissionHandler;
+import org.apache.ignite.internal.managers.communication.TransmissionMeta;
+import org.apache.ignite.internal.managers.communication.TransmissionPolicy;
+import org.apache.ignite.internal.managers.discovery.DiscoCache;
+import org.apache.ignite.internal.managers.discovery.DiscoveryCustomMessage;
+import org.apache.ignite.internal.managers.discovery.GridDiscoveryManager;
+import org.apache.ignite.internal.managers.eventstorage.DiscoveryEventListener;
+import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.CacheType;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.PartitionsExchangeAware;
+import org.apache.ignite.internal.processors.cache.persistence.StorageException;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.filename.PdsFolderSettings;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.MetastorageLifecycleListener;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadOnlyMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadWriteMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.tree.io.PageIO;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.processors.marshaller.MappedName;
+import org.apache.ignite.internal.processors.metric.impl.LongAdderMetric;
+import org.apache.ignite.internal.util.GridBusyLock;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.distributed.InitMessage;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.tostring.GridToStringInclude;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.S;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.lang.IgniteUuid;
+import org.apache.ignite.thread.IgniteThreadPoolExecutor;
+import org.apache.ignite.thread.OomExceptionHandler;
+import org.jetbrains.annotations.Nullable;
+
+import static java.nio.file.StandardOpenOption.READ;
+import static org.apache.ignite.cluster.ClusterState.active;
+import static org.apache.ignite.configuration.IgniteConfiguration.DFLT_SNAPSHOT_DIRECTORY;
+import static org.apache.ignite.events.EventType.EVT_NODE_FAILED;
+import static org.apache.ignite.events.EventType.EVT_NODE_LEFT;
+import static org.apache.ignite.internal.IgniteFeatures.PERSISTENCE_CACHE_SNAPSHOT;
+import static org.apache.ignite.internal.IgniteFeatures.nodeSupports;
+import static org.apache.ignite.internal.MarshallerContextImpl.saveMappings;
+import static org.apache.ignite.internal.events.DiscoveryCustomEvent.EVT_DISCOVERY_CUSTOM_EVT;
+import static org.apache.ignite.internal.managers.communication.GridIoPolicy.SYSTEM_POOL;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.INDEX_PARTITION;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.MAX_PARTITION_ID;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.INDEX_FILE_NAME;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.PART_FILE_TEMPLATE;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFile;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFileName;
+import static org.apache.ignite.internal.processors.cache.persistence.filename.PdsConsistentIdProcessor.DB_DEFAULT_FOLDER;
+import static org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId.getFlagByPartId;
+import static org.apache.ignite.internal.util.IgniteUtils.isLocalNodeCoordinator;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.END_SNAPSHOT;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.START_SNAPSHOT;
+
+/**
+ * Internal implementation of snapshot operations over persistence caches.
+ * <p>
+ * There are two major actions available:
+ * <ul>
+ *     <li>Create snapshot of the whole cluster cache groups by triggering PME to achieve consistency.</li>
+ *     <li>Create local snapshot of requested cache groups and send it to the node which request this operation.
+ *     Cache groups will be transmitted using internal API for transferring files. See {@link TransmissionHandler}.</li>
+ * </ul>
+ */
+public class IgniteSnapshotManager extends GridCacheSharedManagerAdapter
+    implements IgniteSnapshot, PartitionsExchangeAware, MetastorageLifecycleListener {
+    /** File with delta pages suffix. */
+    public static final String DELTA_SUFFIX = ".delta";
+
+    /** File name template consists of delta pages. */
+    public static final String PART_DELTA_TEMPLATE = PART_FILE_TEMPLATE + DELTA_SUFFIX;
+
+    /** File name template for index delta pages. */
+    public static final String INDEX_DELTA_NAME = INDEX_FILE_NAME + DELTA_SUFFIX;
+
+    /** Text Reason for checkpoint to start snapshot operation. */
+    public static final String CP_SNAPSHOT_REASON = "Checkpoint started to enforce snapshot operation: %s";
+
+    /** Name prefix for each remote snapshot operation. */
+    public static final String RMT_SNAPSHOT_PREFIX = "snapshot_";
+
+    /** Default snapshot directory for loading remote snapshots. */
+    public static final String DFLT_SNAPSHOT_TMP_DIR = "snp";
+
+    /** Timeout in millisecond for snapshot operations. */
+    public static final long DFLT_SNAPSHOT_TIMEOUT = 15_000L;
+
+    /** Snapshot in progress error message. */
+    public static final String SNP_IN_PROGRESS_ERR_MSG = "Operation rejected due to the snapshot operation in progress.";
+
+    /** Error message to finalize snapshot tasks. */
+    public static final String SNP_NODE_STOPPING_ERR_MSG = "Snapshot has been cancelled due to the local node " +
+        "is stopping";
+
+    /** Metastorage key to save currently running snapshot. */
+    public static final String SNP_RUNNING_KEY = "snapshot-running";
+
+    /** Prefix for snapshot threads. */
+    private static final String SNAPSHOT_RUNNER_THREAD_PREFIX = "snapshot-runner";
+
+    /** Total number of thread to perform local snapshot. */
+    private static final int SNAPSHOT_THREAD_POOL_SIZE = 4;
+
+    /** Default snapshot topic to receive snapshots from remote node. */
+    private static final Object DFLT_INITIAL_SNAPSHOT_TOPIC = GridTopic.TOPIC_SNAPSHOT.topic("rmt_snp");
+
+    /** File transmission parameter of cache group id. */
+    private static final String SNP_GRP_ID_PARAM = "grpId";
+
+    /** File transmission parameter of cache partition id. */
+    private static final String SNP_PART_ID_PARAM = "partId";
+
+    /** File transmission parameter of node-sender directory path with its consistentId (e.g. db/IgniteNode0). */
+    private static final String SNP_DB_NODE_PATH_PARAM = "dbNodePath";
+
+    /** File transmission parameter of a cache directory with is currently sends its partitions. */
+    private static final String SNP_CACHE_DIR_NAME_PARAM = "cacheDirName";
+
+    /** Snapshot parameter name for a file transmission. */
+    private static final String SNP_NAME_PARAM = "snpName";
+
+    /** Total snapshot files count which receiver should expect to receive. */
+    private static final String SNP_PARTITIONS_CNT = "partsCnt";
+
+    /**
+     * Local buffer to perform copy-on-write operations with pages for {@code SnapshotFutureTask.PageStoreSerialWriter}s.
+     * It is important to have only only buffer per thread (instead of creating each buffer per
+     * each {@code SnapshotFutureTask.PageStoreSerialWriter}) this is redundant and can lead to OOM errors. Direct buffer
+     * deallocate only when ByteBuffer is garbage collected, but it can get out of off-heap memory before it.
+     */
+    private final ThreadLocal<ByteBuffer> locBuff;
+
+    /** Map of registered cache snapshot processes and their corresponding contexts. */
+    private final ConcurrentMap<String, SnapshotFutureTask> locSnpTasks = new ConcurrentHashMap<>();
+
+    /** Lock to protect the resources is used. */
+    private final GridBusyLock busyLock = new GridBusyLock();
+
+    /** Requested snapshot from remote node. */
+    private final AtomicReference<RemoteSnapshotFuture> rmtSnpReq = new AtomicReference<>();
+
+    /** Mutex used to order cluster snapshot operation progress. */
+    private final Object snpOpMux = new Object();
+
+    /** Take snapshot operation procedure. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> startSnpProc;
+
+    /** Check previously preformed snapshot operation and delete uncompleted files if need. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> endSnpProc;
+
+    /** Resolved persistent data storage settings. */
+    private volatile PdsFolderSettings pdsSettings;
+
+    /** Fully initialized metastorage. */
+    private volatile ReadWriteMetastorage metaStorage;
+
+    /** Local snapshot sender factory. */
+    private Function<String, SnapshotSender> locSndrFactory = this::localSnapshotSender;
+
+    /** Main snapshot directory to save created snapshots. */
+    private volatile File locSnpDir;
+
+    /**
+     * Working directory for loaded snapshots from the remote nodes and storing
+     * temporary partition delta-files of locally started snapshot process.
+     */
+    private File tmpWorkDir;
+
+    /** Factory to working with delta as file storage. */
+    private volatile FileIOFactory ioFactory = new RandomAccessFileIOFactory();
+
+    /** Factory to create page store for restore. */
+    private volatile BiFunction<Integer, Boolean, FilePageStoreFactory> storeFactory;
+
+    /** Snapshot thread pool to perform local partition snapshots. */
+    private ExecutorService snpRunner;
+
+    /** System discovery message listener. */
+    private DiscoveryEventListener discoLsnr;
+
+    /** Cluster snapshot operation requested by user. */
+    private GridFutureAdapter<Void> clusterSnpFut;
+
+    /** Current snapshot operation on local node. */
+    private volatile SnapshotOperationRequest clusterSnpRq;
+
+    /** {@code true} if recovery process occurred for snapshot. */
+    private volatile boolean recovered;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public IgniteSnapshotManager(GridKernalContext ctx) {
+        locBuff = ThreadLocal.withInitial(() ->
+            ByteBuffer.allocateDirect(ctx.config().getDataStorageConfiguration().getPageSize())
+                .order(ByteOrder.nativeOrder()));
+
+        startSnpProc = new DistributedProcess<>(ctx, START_SNAPSHOT, this::startLocalSnapshot,
+            this::startLocalSnapshotResult);
+
+        endSnpProc = new DistributedProcess<>(ctx, END_SNAPSHOT, this::endLocalSnapshot,
+            this::endLocalSnapshotResult);
+    }
+
+    /**
+     * @param snapshotCacheDir Snapshot directory to store files.
+     * @param partId Cache partition identifier.
+     * @return A file representation.
+     */
+    public static File partDeltaFile(File snapshotCacheDir, int partId) {
+        return new File(snapshotCacheDir, partDeltaFileName(partId));
+    }
+
+    /**
+     * @param partId Partition id.
+     * @return File name of delta partition pages.
+     */
+    public static String partDeltaFileName(int partId) {
+        assert partId <= MAX_PARTITION_ID || partId == INDEX_PARTITION;
+
+        return partId == INDEX_PARTITION ? INDEX_DELTA_NAME : String.format(PART_DELTA_TEMPLATE, partId);
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void start0() throws IgniteCheckedException {
+        super.start0();
+
+        GridKernalContext ctx = cctx.kernalContext();
+
+        if (ctx.clientNode())
+            return;
+
+        if (!CU.isPersistenceEnabled(ctx.config()))
+            return;
+
+        snpRunner = new IgniteThreadPoolExecutor(SNAPSHOT_RUNNER_THREAD_PREFIX,
+            cctx.igniteInstanceName(),
+            SNAPSHOT_THREAD_POOL_SIZE,
+            SNAPSHOT_THREAD_POOL_SIZE,
+            IgniteConfiguration.DFLT_THREAD_KEEP_ALIVE_TIME,
+            new LinkedBlockingQueue<>(),
+            SYSTEM_POOL,
+            new OomExceptionHandler(ctx));
+
+        assert cctx.pageStore() instanceof FilePageStoreManager;
+
+        FilePageStoreManager storeMgr = (FilePageStoreManager)cctx.pageStore();
+
+        pdsSettings = cctx.kernalContext().pdsFolderResolver().resolveFolders();
+
+        locSnpDir = snapshotPath(ctx.config()).toFile();
+        tmpWorkDir = Paths.get(storeMgr.workDir().getAbsolutePath(), DFLT_SNAPSHOT_TMP_DIR).toFile();
+
+        U.ensureDirectory(locSnpDir, "snapshot work directory", log);
+        U.ensureDirectory(tmpWorkDir, "temp directory for snapshot creation", log);
+
+        storeFactory = storeMgr::getPageStoreFactory;
+
+        cctx.exchange().registerExchangeAwareComponent(this);
+        ctx.internalSubscriptionProcessor().registerMetastorageListener(this);
+
+        // Receive remote snapshots requests.
+        cctx.gridIO().addMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC, new GridMessageListener() {
+            @Override public void onMessage(UUID nodeId, Object msg, byte plc) {
+                if (!busyLock.enterBusy())
+                    return;
+
+                try {
+                    if (msg instanceof SnapshotRequestMessage) {
+                        SnapshotRequestMessage reqMsg0 = (SnapshotRequestMessage)msg;
+                        String snpName = reqMsg0.snapshotName();
+
+                        synchronized (this) {
+                            SnapshotFutureTask task = lastScheduledRemoteSnapshotTask(nodeId);
+
+                            if (task != null) {
+                                // Task will also be removed from local map due to the listener on future done.
+                                task.cancel();
+
+                                log.info("Snapshot request has been cancelled due to another request received " +
+                                    "[prevSnpResp=" + task + ", msg0=" + reqMsg0 + ']');
+                            }
+                        }
+
+                        SnapshotFutureTask task = registerSnapshotTask(snpName,
+                            nodeId,
+                            reqMsg0.parts(),
+                            remoteSnapshotSender(snpName, nodeId));
+
+                        task.listen(f -> {
+                            if (f.error() == null)
+                                return;
+
+                            U.error(log, "Failed to process request of creating a snapshot " +
+                                "[from=" + nodeId + ", msg=" + reqMsg0 + ']', f.error());
+
+                            try {
+                                cctx.gridIO().sendToCustomTopic(nodeId,
+                                    DFLT_INITIAL_SNAPSHOT_TOPIC,
+                                    new SnapshotResponseMessage(reqMsg0.snapshotName(), f.error().getMessage()),
+                                    SYSTEM_POOL);
+                            }
+                            catch (IgniteCheckedException ex0) {
+                                U.error(log, "Fail to send the response message with processing snapshot request " +
+                                    "error [request=" + reqMsg0 + ", nodeId=" + nodeId + ']', ex0);
+                            }
+                        });
+
+                        task.start();
+                    }
+                    else if (msg instanceof SnapshotResponseMessage) {
+                        SnapshotResponseMessage respMsg0 = (SnapshotResponseMessage)msg;
+
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.snpName.equals(respMsg0.snapshotName())) {
+                            if (log.isInfoEnabled()) {
+                                log.info("A stale snapshot response message has been received. Will be ignored " +
+                                    "[fromNodeId=" + nodeId + ", response=" + respMsg0 + ']');
+                            }
+
+                            return;
+                        }
+
+                        if (respMsg0.errorMessage() != null) {
+                            fut0.onDone(new IgniteCheckedException("Request cancelled. The snapshot operation stopped " +
+                                "on the remote node with an error: " + respMsg0.errorMessage()));
+                        }
+                    }
+                }
+                catch (Throwable e) {
+                    U.error(log, "Processing snapshot request from remote node fails with an error", e);
+
+                    cctx.kernalContext().failure().process(new FailureContext(FailureType.CRITICAL_ERROR, e));
+                }
+                finally {
+                    busyLock.leaveBusy();
+                }
+            }
+        });
+
+        cctx.gridEvents().addDiscoveryEventListener(discoLsnr = (evt, discoCache) -> {
+            if (!busyLock.enterBusy())
+                return;
+
+            try {
+                UUID leftNodeId = evt.eventNode().id();
+
+                if (evt.type() == EVT_DISCOVERY_CUSTOM_EVT) {
+                    DiscoveryCustomEvent evt0 = (DiscoveryCustomEvent)evt;
+
+                    if (evt0.customMessage() instanceof InitMessage) {
+                        InitMessage<?> msg = (InitMessage<?>)evt0.customMessage();
+
+                        // This happens when #takeSnapshot() method already invoked and distributed process
+                        // starts its action.
+                        if (msg.type() == START_SNAPSHOT.ordinal()) {
+                            assert clusterSnpRq != null ||
+                                !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()) : evt;
+
+                            DiscoveryCustomEvent customEvt = new DiscoveryCustomEvent();
+
+                            customEvt.node(evt0.node());
+                            customEvt.eventNode(evt0.eventNode());
+                            customEvt.affinityTopologyVersion(evt0.affinityTopologyVersion());
+                            customEvt.customMessage(new SnapshotStartDiscoveryMessage(discoCache, msg.processId()));
+
+                            // Handle new event inside discovery thread, so no guarantees will be violated.
+                            cctx.exchange().onDiscoveryEvent(customEvt, discoCache);
+                        }
+                    }
+                }
+                else if (evt.type() == EVT_NODE_LEFT || evt.type() == EVT_NODE_FAILED) {
+                    SnapshotOperationRequest snpRq = clusterSnpRq;
+
+                    for (SnapshotFutureTask sctx : locSnpTasks.values()) {
+                        if (sctx.sourceNodeId().equals(leftNodeId) ||
+                            (snpRq != null &&
+                                snpRq.snpName.equals(sctx.snapshotName()) &&
+                                snpRq.bltNodes.contains(leftNodeId))) {
+                            sctx.acceptException(new ClusterTopologyCheckedException("The node which requested snapshot " +
+                                "creation has left the grid"));
+                        }
+                    }
+
+                    RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                    if (snpTrFut != null && snpTrFut.rmtNodeId.equals(leftNodeId)) {
+                        snpTrFut.onDone(new ClusterTopologyCheckedException("The node from which a snapshot has been " +
+                            "requested left the grid"));
+                    }
+                }
+            }
+            finally {
+                busyLock.leaveBusy();
+            }
+        }, EVT_NODE_LEFT, EVT_NODE_FAILED, EVT_DISCOVERY_CUSTOM_EVT);
+
+        // Remote snapshot handler.
+        cctx.kernalContext().io().addTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC, new TransmissionHandler() {
+            @Override public void onEnd(UUID nodeId) {
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                assert snpTrFut.stores.isEmpty() : snpTrFut.stores.entrySet();
+                assert snpTrFut.partsLeft == 0 : snpTrFut;
+
+                snpTrFut.onDone();
+
+                log.info("Requested snapshot from remote node has been fully received " +
+                    "[snpName=" + snpTrFut.snpName + ", snpTrans=" + snpTrFut + ']');
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onException(UUID nodeId, Throwable err) {
+                RemoteSnapshotFuture fut = rmtSnpReq.get();
+
+                if (fut == null)
+                    return;
+
+                if (fut.rmtNodeId.equals(nodeId))
+                    fut.onDone(err);
+            }
+
+            /** {@inheritDoc} */
+            @Override public String filePath(UUID nodeId, TransmissionMeta fileMeta) {
+                Integer partId = (Integer)fileMeta.params().get(SNP_PART_ID_PARAM);
+                String rmtDbNodePath = (String)fileMeta.params().get(SNP_DB_NODE_PATH_PARAM);
+                String cacheDirName = (String)fileMeta.params().get(SNP_CACHE_DIR_NAME_PARAM);
+
+                RemoteSnapshotFuture transFut = resolve(nodeId, fileMeta);
+
+                try {
+                    File cacheDir = U.resolveWorkDirectory(tmpWorkDir.getAbsolutePath(),
+                        Paths.get(transFut.snpName, rmtDbNodePath, cacheDirName).toString(),
+                        false);
+
+                    return new File(cacheDir, getPartitionFileName(partId)).getAbsolutePath();
+                }
+                catch (IgniteCheckedException e) {
+                    throw new IgniteException(e);
+                }
+            }
+
+            /**
+             * @param nodeId Remote node id.
+             * @param meta Transmission meta.
+             * @return Resolved transmission future.
+             */
+            private RemoteSnapshotFuture resolve(UUID nodeId, TransmissionMeta meta) {
+                String snpName = (String)meta.params().get(SNP_NAME_PARAM);
+                Integer partsCnt = (Integer)meta.params().get(SNP_PARTITIONS_CNT);
+
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                if (snpTrFut == null || !snpTrFut.snpName.equals(snpName)) {
+                    throw new TransmissionCancelledException("Stale snapshot transmission will be ignored " +
+                        "[snpName=" + snpName + ", meta=" + meta + ", snpTrFut=" + snpTrFut + ']');
+                }
+
+                assert snpTrFut.snpName.equals(snpName) && snpTrFut.rmtNodeId.equals(nodeId) :
+                    "Another transmission in progress [snpTrFut=" + snpTrFut + ", nodeId=" + snpName + ']';
+
+                if (snpTrFut.partsLeft == -1)
+                    snpTrFut.partsLeft = partsCnt;
+
+                return snpTrFut;
+            }
+
+            /**
+             * @param snpTrans Current snapshot transmission.
+             * @param grpPartId Pair of group id and its partition id.
+             */
+            private void finishRecover(RemoteSnapshotFuture snpTrans, GroupPartitionId grpPartId) {
+                FilePageStore pageStore = null;
+
+                try {
+                    pageStore = snpTrans.stores.remove(grpPartId);
+
+                    pageStore.finishRecover();
+
+                    snpTrans.partConsumer.accept(new File(pageStore.getFileAbsolutePath()), grpPartId);
+
+                    snpTrans.partsLeft--;
+                }
+                catch (StorageException e) {
+                    throw new IgniteException(e);
+                }
+                finally {
+                    U.closeQuiet(pageStore);
+                }
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<ByteBuffer> chunkHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+
+                RemoteSnapshotFuture snpTrFut = resolve(nodeId, initMeta);
+
+                GroupPartitionId grpPartId = new GroupPartitionId(grpId, partId);
+                FilePageStore pageStore = snpTrFut.stores.get(grpPartId);
+
+                if (pageStore == null) {
+                    throw new IgniteException("Partition must be loaded before applying snapshot delta pages " +
+                        "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                pageStore.beginRecover();
+
+                // No snapshot delta pages received. Finalize recovery.
+                if (initMeta.count() == 0)
+                    finishRecover(snpTrFut, grpPartId);
+
+                return new Consumer<ByteBuffer>() {
+                    final LongAdder transferred = new LongAdder();
+
+                    @Override public void accept(ByteBuffer buff) {
+                        try {
+                            assert initMeta.count() != 0 : initMeta;
+
+                            RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                            if (fut0 == null || !fut0.equals(snpTrFut) || fut0.isCancelled()) {
+                                throw new TransmissionCancelledException("Snapshot request is cancelled " +
+                                    "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                            }
+
+                            pageStore.write(PageIO.getPageId(buff), buff, 0, false);
+
+                            transferred.add(buff.capacity());
+
+                            if (transferred.longValue() == initMeta.count())
+                                finishRecover(snpTrFut, grpPartId);
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                    }
+                };
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<File> fileHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+                String snpName = (String)initMeta.params().get(SNP_NAME_PARAM);
+
+                assert grpId != null;
+                assert partId != null;
+                assert snpName != null;
+                assert storeFactory != null;
+
+                RemoteSnapshotFuture transFut = rmtSnpReq.get();
+
+                if (transFut == null) {
+                    throw new IgniteException("Snapshot transmission with given name doesn't exists " +
+                        "[snpName=" + snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                return new Consumer<File>() {
+                    @Override public void accept(File file) {
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.equals(transFut) || fut0.isCancelled()) {
+                            throw new TransmissionCancelledException("Snapshot request is cancelled [snpName=" + snpName +
+                                ", grpId=" + grpId + ", partId=" + partId + ']');
+                        }
+
+                        busyLock.enterBusy();
+
+                        try {
+                            FilePageStore pageStore = (FilePageStore)storeFactory
+                                .apply(grpId, false)
+                                .createPageStore(getFlagByPartId(partId),
+                                    file::toPath,
+                                    new LongAdderMetric("NO_OP", null));
+
+                            transFut.stores.put(new GroupPartitionId(grpId, partId), pageStore);
+
+                            pageStore.init();
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                        finally {
+                            busyLock.leaveBusy();
+                        }
+                    }
+                };
+            }
+        });
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void stop0(boolean cancel) {
+        busyLock.block();
+
+        try {
+            // Try stop all snapshot processing if not yet.
+            for (SnapshotFutureTask sctx : locSnpTasks.values())
+                sctx.acceptException(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+            locSnpTasks.clear();
+
+            RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+            if (snpTrFut != null)
+                snpTrFut.cancel();
+
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null) {
+                    clusterSnpFut.onDone(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+                    clusterSnpFut = null;
+                }
+            }
+
+            if (snpRunner != null)
+                snpRunner.shutdownNow();
+
+            cctx.kernalContext().io().removeMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC);
+            cctx.kernalContext().io().removeTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC);
+
+            if (discoLsnr != null)
+                cctx.kernalContext().event().removeDiscoveryEventListener(discoLsnr);
+
+            cctx.exchange().unregisterExchangeAwareComponent(this);
+        }
+        finally {
+            busyLock.unblock();
+        }
+    }
+
+    /**
+     * Concurrently traverse the snapshot directory for given local node folder name and
+     * delete recursively all files from it if exist.
+     *
+     * @param snpDir Snapshot dire
+     * @param folderName Local node folder name (see U.maskForFileName with consistent id).
+     */
+    public static void deleteSnapshot(File snpDir, String folderName) {
+        if (!snpDir.exists())
+            return;
+
+        assert snpDir.isDirectory() : snpDir;
+
+        try {
+            List<Path> dirs = new ArrayList<>();
+
+            Files.walkFileTree(snpDir.toPath(), new SimpleFileVisitor<Path>() {
+                @Override public FileVisitResult preVisitDirectory(Path dir,
+                    BasicFileAttributes attrs) throws IOException {
+                    if (Files.isDirectory(dir) &&
+                        Files.exists(dir) &&
+                        folderName.equals(dir.getFileName().toString())) {
+                        // Directory found, add it for processing.
+                        dirs.add(dir);
+                    }
+
+                    return super.preVisitDirectory(dir, attrs);
+                }
+
+                @Override public FileVisitResult visitFileFailed(Path file, IOException exc) {
+                    // Skip files which can be concurrently removed from FileTree.
+                    return FileVisitResult.CONTINUE;
+                }
+            });
+
+            dirs.forEach(U::delete);
+
+            File db = new File(snpDir, DB_DEFAULT_FOLDER);
+
+            if (!db.exists() || db.list().length == 0)
+                U.delete(snpDir);
+        }
+        catch (IOException e) {
+            throw new IgniteException(e);
+        }
+    }
+
+    /**
+     * @param snpName Snapshot name.
+     * @return Local snapshot directory for snapshot with given name.
+     */
+    public File snapshotLocalDir(String snpName) {
+        assert locSnpDir != null;
+
+        return new File(locSnpDir, snpName);
+    }
+
+    /**
+     * @return Node snapshot working directory.
+     */
+    public File snapshotTmpDir() {
+        assert tmpWorkDir != null;
+
+        return tmpWorkDir;
+    }
+
+    /**
+     * @param req Request on snapshot creation.
+     * @return Future which will be completed when a snapshot has been started.
+     */
+    private IgniteInternalFuture<SnapshotOperationResponse> startLocalSnapshot(SnapshotOperationRequest req) {
+        if (cctx.kernalContext().clientNode() ||
+            !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()))
+            return new GridFinishedFuture<>();
+
+        // Executed inside discovery notifier thread, prior to firing discovery custom event,
+        // so it is safe to set new snapshot task inside this method without synchronization.
+        if (clusterSnpRq != null) {
+            return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot operation has been rejected. " +
+                "Another snapshot operation in progress [req=" + req + ", curr=" + clusterSnpRq + ']'));
+        }
+
+        // Collection of pairs group and appropriate cache partition to be snapshot.
+        Map<Integer, Set<Integer>> parts = new HashMap<>();
+
+        for (Integer grpId : req.grpIds)
+            parts.put(grpId, null);
+
+        SnapshotFutureTask task0 = registerSnapshotTask(req.snpName,
+            req.srcNodeId,
+            parts,
+            locSndrFactory.apply(req.snpName));
+
+        clusterSnpRq = req;
+
+        return task0.chain(f -> new SnapshotOperationResponse());
+    }
+
+    /**
+     * @param id Request id.
+     * @param res Results.
+     * @param err Errors.
+     */
+    private void startLocalSnapshotResult(UUID id, Map<UUID, SnapshotOperationResponse> res, Map<UUID, Exception> err) {
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        if (snpRq == null)
+            return;
+
+        if (isLocalNodeCoordinator(cctx.discovery())) {
+            Set<UUID> missed = new HashSet<>(snpRq.bltNodes);
+            missed.removeAll(res.keySet());
+            missed.removeAll(err.keySet());
+
+            snpRq.hasErr = !F.isEmpty(err) || !missed.isEmpty();
+
+            if (snpRq.hasErr) {
+                U.warn(log, "Execution of local snapshot tasks fails or them haven't been executed " +
+                    "due to some of nodes left the cluster. Uncompleted snapshot will be deleted " +
+                    "[err=" + err + ", missed=" + missed + ']');
+            }
+
+            endSnpProc.start(UUID.randomUUID(), snpRq);
+        }
+    }
+
+    /**
+     * @param req Request on snapshot creation.
+     * @return Future which will be completed when the snapshot will be finalized.
+     */
+    private IgniteInternalFuture<SnapshotOperationResponse> endLocalSnapshot(SnapshotOperationRequest req) {
+        if (clusterSnpRq == null)
+            return new GridFinishedFuture<>(new SnapshotOperationResponse());
+
+        try {
+            if (req.hasErr)
+                deleteSnapshot(snapshotLocalDir(req.snpName), pdsSettings.folderName());
+
+            removeLastMetaStorageKey();
+        }
+        catch (Exception e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        return new GridFinishedFuture<>(new SnapshotOperationResponse());
+    }
+
+    /**
+     * @param id Request id.
+     * @param res Results.
+     * @param err Errors.
+     */
+    private void endLocalSnapshotResult(UUID id, Map<UUID, SnapshotOperationResponse> res, Map<UUID, Exception> err) {
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        if (snpRq == null)
+            return;
+
+        Set<UUID> endFail = new HashSet<>(snpRq.bltNodes);
+        endFail.removeAll(res.keySet());
+
+        clusterSnpRq = null;
+
+        synchronized (snpOpMux) {
+            if (clusterSnpFut != null) {
+                if (endFail.isEmpty() && !snpRq.hasErr) {
+                    clusterSnpFut.onDone();
+
+                    if (log.isInfoEnabled())
+                        log.info("Cluster-wide snapshot operation finished successfully [req=" + snpRq + ']');
+                }
+                else {
+                    clusterSnpFut.onDone(new IgniteCheckedException("Snapshot creation has been finished with an error. " +
+                        "Local snapshot tasks may not finished completely or finalizing results fails " +
+                        "[hasErr" + snpRq.hasErr + ", fail=" + endFail + ']'));
+                }
+
+                clusterSnpFut = null;
+            }
+        }
+    }
+
+    /**
+     * @return {@code True} if snapshot operation started.
+     */
+    public boolean inProgress() {
+        synchronized (snpOpMux) {
+            return clusterSnpRq != null || clusterSnpFut != null;
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public List<String> getSnapshots() {
+        if (cctx.kernalContext().clientNode())
+            throw new UnsupportedOperationException("Client and daemon nodes can not perform this operation.");
+
+        synchronized (snpOpMux) {
+            return Arrays.stream(locSnpDir.listFiles(File::isDirectory))
+                .map(File::getName)
+                .collect(Collectors.toList());
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public IgniteFuture<Void> createSnapshot(String name) {
+        if (cctx.kernalContext().clientNode()) {
+            return new IgniteFinishedFutureImpl<>(new UnsupportedOperationException("Client and daemon nodes can not " +
+                "perform this operation."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(cctx.discovery().allNodes(), PERSISTENCE_CACHE_SNAPSHOT)) {
+            return new IgniteFinishedFutureImpl<>(new IllegalStateException("Not all nodes in the cluster support " +
+                "a snapshot operation."));
+        }
+
+        if (!active(cctx.kernalContext().state().clusterState().state())) {
+            return new IgniteFinishedFutureImpl<>(new IgniteException("Snapshot operation has been rejected. " +
+                "The cluster is inactive."));
+        }
+
+        DiscoveryDataClusterState clusterState = cctx.kernalContext().state().clusterState();
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(new IgniteException("Snapshot operation has been rejected. " +
+                "The baseline topology is not configured for cluster."));
+        }
+
+        GridFutureAdapter<Void> snpFut0;
+
+        synchronized (snpOpMux) {
+            if (clusterSnpFut != null && !clusterSnpFut.isDone()) {
+                return new IgniteFinishedFutureImpl<>(new IgniteException("Create snapshot request has been rejected. " +
+                    "The previous snapshot operation was not completed."));
+            }
+
+            if (clusterSnpRq != null) {
+                return new IgniteFinishedFutureImpl<>(new IgniteException("Create snapshot request has been rejected. " +
+                    "Parallel snapshot processes are not allowed."));
+            }
+
+            if (getSnapshots().contains(name))
+                return new IgniteFinishedFutureImpl<>(new IgniteException("Create snapshot request has been rejected. " +
+                    "Snapshot with given name already exists."));
+
+            snpFut0 = new GridFutureAdapter<>();
+
+            clusterSnpFut = snpFut0;
+        }
+
+        List<Integer> grps = cctx.cache().persistentGroups().stream()
+            .filter(g -> cctx.cache().cacheType(g.cacheOrGroupName()) == CacheType.USER)
+            .filter(g -> !g.config().isEncryptionEnabled())
+            .map(CacheGroupDescriptor::groupId)
+            .collect(Collectors.toList());
+
+        List<ClusterNode> srvNodes = cctx.discovery().serverNodes(AffinityTopologyVersion.NONE);
+
+        startSnpProc.start(UUID.randomUUID(), new SnapshotOperationRequest(cctx.localNodeId(),
+            name,
+            grps,
+            new HashSet<>(F.viewReadOnly(srvNodes,
+                F.node2id(),
+                (node) -> CU.baselineNode(node, clusterState)))));
+
+        if (log.isInfoEnabled())
+            log.info("Cluster-wide snapshot operation started [snpName=" + name + ", grps=" + grps + ']');
+
+        return new IgniteFutureImpl<>(snpFut0);
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onReadyForReadWrite(ReadWriteMetastorage metaStorage) throws IgniteCheckedException {
+        synchronized (snpOpMux) {
+            this.metaStorage = metaStorage;
+
+            if (recovered)
+                removeLastMetaStorageKey();
+
+            recovered = false;
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onReadyForRead(ReadOnlyMetastorage metaStorage) throws IgniteCheckedException {
+        // Snapshot which has not been completed due to the local node crashed must be deleted.
+        String snpName = (String)metaStorage.read(SNP_RUNNING_KEY);
+
+        if (snpName == null)
+            return;
+
+        recovered = true;
+
+        for (File tmp : snapshotTmpDir().listFiles())
+            U.delete(tmp);
+
+        deleteSnapshot(snapshotLocalDir(snpName), pdsSettings.folderName());
+
+        if (log.isInfoEnabled()) {
+            log.info("Previous attempt to create snapshot fail due to the local node crash. All resources " +
+                "related to snapshot operation have been deleted: " + snpName);
+        }
+    }
+
+    /**
+     * @param evt Discovery event to check.
+     * @return {@code true} if exchange started by snapshot operation.
+     */
+    public static boolean isSnapshotOperation(DiscoveryEvent evt) {
+        return !evt.eventNode().isClient() &&
+            evt.type() == EVT_DISCOVERY_CUSTOM_EVT &&
+            ((DiscoveryCustomEvent)evt).customMessage() instanceof SnapshotStartDiscoveryMessage;
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onDoneBeforeTopologyUnlock(GridDhtPartitionsExchangeFuture fut) {
+        if (clusterSnpRq == null || cctx.kernalContext().clientNode())
+            return;
+
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        SnapshotFutureTask task = locSnpTasks.get(snpRq.snpName);
+
+        if (task == null)
+            return;
+
+        if (task.start()) {
+            cctx.database().forceCheckpoint(String.format("Start snapshot operation: %s", snpRq.snpName));
+
+            // schedule task on checkpoint and wait when it starts
+            try {
+                task.awaitStarted();
+            }
+            catch (IgniteCheckedException e) {
+                U.error(log, "Fail to wait while cluster-wide snapshot operation started", e);
+            }
+        }
+    }
+
+    /**
+     * @param parts Collection of pairs group and appropriate cache partition to be snapshot.
+     * @param rmtNodeId The remote node to connect to.
+     * @param partConsumer Received partition handler.
+     * @return Future which will be completed when requested snapshot fully received.
+     */
+    public IgniteInternalFuture<Void> createRemoteSnapshot(
+        UUID rmtNodeId,
+        Map<Integer, Set<Integer>> parts,
+        BiConsumer<File, GroupPartitionId> partConsumer
+    ) {
+        assert partConsumer != null;
+
+        ClusterNode rmtNode = cctx.discovery().node(rmtNodeId);
+
+        if (!nodeSupports(rmtNode, PERSISTENCE_CACHE_SNAPSHOT))
+            return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot on remote node is not supported: " + rmtNode.id()));
+
+        if (rmtNode == null) {
+            return new GridFinishedFuture<>(new ClusterTopologyCheckedException("Snapshot request cannot be performed. " +
+                "Remote node left the grid [rmtNodeId=" + rmtNodeId + ']'));
+        }
+
+        String snpName = RMT_SNAPSHOT_PREFIX + UUID.randomUUID().toString();
+
+        RemoteSnapshotFuture snpTransFut = new RemoteSnapshotFuture(rmtNodeId, snpName, partConsumer);
+
+        busyLock.enterBusy();
+        SnapshotRequestMessage msg0;
+
+        try {
+            msg0 = new SnapshotRequestMessage(snpName, parts);
+
+            RemoteSnapshotFuture fut = rmtSnpReq.get();
+
+            try {
+                if (fut != null)
+                    fut.get(DFLT_SNAPSHOT_TIMEOUT, TimeUnit.MILLISECONDS);
+            }
+            catch (IgniteCheckedException e) {
+                if (log.isInfoEnabled())
+                    log.info("The previous snapshot request finished with an exception:" + e.getMessage());
+            }
+
+            try {
+                if (rmtSnpReq.compareAndSet(null, snpTransFut)) {
+                    cctx.gridIO().sendOrderedMessage(rmtNode, DFLT_INITIAL_SNAPSHOT_TOPIC, msg0, SYSTEM_POOL,
+                        Long.MAX_VALUE, true);
+                }
+                else
+                    return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot request has been concurrently interrupted."));
+
+            }
+            catch (IgniteCheckedException e) {
+                rmtSnpReq.compareAndSet(snpTransFut, null);
+
+                return new GridFinishedFuture<>(e);
+            }
+        }
+        finally {
+            busyLock.leaveBusy();
+        }
+
+        if (log.isInfoEnabled()) {
+            log.info("Snapshot request is sent to the remote node [rmtNodeId=" + rmtNodeId +
+                ", msg0=" + msg0 + ", snpTransFut=" + snpTransFut +
+                ", topVer=" + cctx.discovery().topologyVersionEx() + ']');
+        }
+
+        return snpTransFut;
+    }
+
+    /**
+     * @param grps List of cache groups which will be destroyed.
+     */
+    public void onCacheGroupsStopped(List<Integer> grps) {
+        for (SnapshotFutureTask sctx : locSnpTasks.values()) {
+            Set<Integer> retain = new HashSet<>(grps);
+            retain.retainAll(sctx.affectedCacheGroups());
+
+            if (!retain.isEmpty()) {
+                sctx.acceptException(new IgniteCheckedException("Snapshot has been interrupted due to some of the required " +
+                    "cache groups stopped: " + retain));
+            }
+        }
+    }
+
+    /**
+     * @param snpName Unique snapshot name.
+     * @param srcNodeId Node id which cause snapshot operation.
+     * @param parts Collection of pairs group and appropriate cache partition to be snapshot.
+     * @param snpSndr Factory which produces snapshot receiver instance.
+     * @return Snapshot operation task which should be registered on checkpoint to run.
+     */
+    SnapshotFutureTask registerSnapshotTask(
+        String snpName,
+        UUID srcNodeId,
+        Map<Integer, Set<Integer>> parts,
+        SnapshotSender snpSndr
+    ) {
+        if (!busyLock.enterBusy())
+            return new SnapshotFutureTask(new IgniteCheckedException("Snapshot manager is stopping [locNodeId=" + cctx.localNodeId() + ']'));
+
+        try {
+            if (locSnpTasks.containsKey(snpName))
+                return new SnapshotFutureTask(new IgniteCheckedException("Snapshot with requested name is already scheduled: " + snpName));
+
+            SnapshotFutureTask snpFutTask;
+
+            SnapshotFutureTask prev = locSnpTasks.putIfAbsent(snpName,
+                snpFutTask = new SnapshotFutureTask(cctx,
+                    srcNodeId,
+                    snpName,
+                    tmpWorkDir,
+                    ioFactory,
+                    snpSndr,
+                    parts,
+                    locBuff));
+
+            if (prev != null)
+                return new SnapshotFutureTask(new IgniteCheckedException("Snapshot with requested name is already scheduled: " + snpName));
+
+            if (log.isInfoEnabled()) {
+                log.info("Snapshot task has been registered on local node [sctx=" + this +
+                    ", topVer=" + cctx.discovery().topologyVersionEx() + ']');
+            }
+
+            snpFutTask.listen(f -> locSnpTasks.remove(snpName));
+
+            return snpFutTask;
+        }
+        finally {
+            busyLock.leaveBusy();
+        }
+    }
+
+    /**
+     * @return Snapshot receiver instance.
+     */
+    SnapshotSender localSnapshotSender(String snpName) {
+        return new LocalSnapshotSender(snpName);
+    }
+
+    /**
+     * @param snpName Snapshot name.
+     * @param rmtNodeId Remote node id to send snapshot to.
+     * @return Snapshot sender instance.
+     */
+    SnapshotSender remoteSnapshotSender(String snpName, UUID rmtNodeId) {
+        // Remote snapshots can be send only by single threaded executor since only one transmissionSender created.
+        return new RemoteSnapshotSender(log,
+            new SequentialExecutorWrapper(log, snpRunner),
+            () -> igniteCacheStoragePath(pdsSettings),
+            cctx.gridIO().openTransmissionSender(rmtNodeId, DFLT_INITIAL_SNAPSHOT_TOPIC),
+            snpName);
+    }
+
+    /** Snapshot finished successfully or already restored. Key can be removed. */
+    private void removeLastMetaStorageKey() throws IgniteCheckedException {
+        cctx.database().checkpointReadLock();
+
+        try {
+            metaStorage.remove(SNP_RUNNING_KEY);
+        }
+        finally {
+            cctx.database().checkpointReadUnlock();
+        }
+    }
+
+    /**
+     * @return The executor service used to run snapshot tasks.
+     */
+    ExecutorService snapshotExecutorService() {
+        assert snpRunner != null;
+
+        return snpRunner;
+    }
+
+    /**
+     * @param ioFactory Factory to create IO interface over a page stores.
+     */
+    void ioFactory(FileIOFactory ioFactory) {
+        this.ioFactory = ioFactory;
+    }
+
+    /**
+     * @param nodeId Remote node id on which requests has been registered.
+     * @return Snapshot future related to given node id.
+     */
+    SnapshotFutureTask lastScheduledRemoteSnapshotTask(UUID nodeId) {
+        return locSnpTasks.values().stream()
+            .filter(t -> t.type() == RemoteSnapshotSender.class && t.sourceNodeId().equals(nodeId))
+            .findFirst()
+            .orElse(null);
+    }
+
+    /**
+     * @return Relative configured path of persistence data storage directory for the local node.
+     * Example: {@code snapshotWorkDir/db/IgniteNodeName0}
+     */
+    static String igniteCacheStoragePath(PdsFolderSettings pcfg) {
+        return Paths.get(DB_DEFAULT_FOLDER, pcfg.folderName()).toString();
+    }
+
+    /**
+     * @param cfg Ignite configuration.
+     * @return Snapshot work path.
+     */
+    static Path snapshotPath(IgniteConfiguration cfg) {
 
 Review comment:
   You limit user to only absolute paths for snapshot, I think it's better to do something like this and allow relative paths:
   ```
   File f = new File(cfg.getSnapshotPath());
   return f.isAbsolute() ? f : new File(cfg.getWorkDirectory(), f);
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r409022510
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/GridCacheProcessor.java
 ##########
 @@ -4042,6 +4047,10 @@ public void onDiscoveryEvent(
      * @return {@code True} if minor topology version should be increased.
      */
     public boolean onCustomEvent(DiscoveryCustomMessage msg, AffinityTopologyVersion topVer, ClusterNode node) {
+        if (msg instanceof InitMessage &&
 
 Review comment:
   Removed, since the API of `DistributedProcess` changed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r409050109
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManager.java
 ##########
 @@ -0,0 +1,1986 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.channels.FileChannel;
+import java.nio.file.FileVisitResult;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.SimpleFileVisitor;
+import java.nio.file.attribute.BasicFileAttributes;
+import java.util.ArrayDeque;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Queue;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.Executor;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.concurrent.atomic.LongAdder;
+import java.util.function.BiConsumer;
+import java.util.function.BiFunction;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.events.DiscoveryEvent;
+import org.apache.ignite.failure.FailureContext;
+import org.apache.ignite.failure.FailureType;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.GridTopic;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.events.DiscoveryCustomEvent;
+import org.apache.ignite.internal.managers.communication.GridIoManager;
+import org.apache.ignite.internal.managers.communication.GridMessageListener;
+import org.apache.ignite.internal.managers.communication.TransmissionCancelledException;
+import org.apache.ignite.internal.managers.communication.TransmissionHandler;
+import org.apache.ignite.internal.managers.communication.TransmissionMeta;
+import org.apache.ignite.internal.managers.communication.TransmissionPolicy;
+import org.apache.ignite.internal.managers.discovery.DiscoCache;
+import org.apache.ignite.internal.managers.discovery.DiscoveryCustomMessage;
+import org.apache.ignite.internal.managers.discovery.GridDiscoveryManager;
+import org.apache.ignite.internal.managers.eventstorage.DiscoveryEventListener;
+import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.CacheType;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.PartitionsExchangeAware;
+import org.apache.ignite.internal.processors.cache.persistence.StorageException;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.filename.PdsFolderSettings;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.MetastorageLifecycleListener;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadOnlyMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadWriteMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.tree.io.PageIO;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.processors.marshaller.MappedName;
+import org.apache.ignite.internal.processors.metric.MetricRegistry;
+import org.apache.ignite.internal.processors.metric.impl.LongAdderMetric;
+import org.apache.ignite.internal.util.GridBusyLock;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.distributed.InitMessage;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.tostring.GridToStringInclude;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.A;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.S;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.lang.IgniteUuid;
+import org.apache.ignite.thread.IgniteThreadPoolExecutor;
+import org.apache.ignite.thread.OomExceptionHandler;
+import org.jetbrains.annotations.Nullable;
+
+import static java.nio.file.StandardOpenOption.READ;
+import static org.apache.ignite.cluster.ClusterState.active;
+import static org.apache.ignite.configuration.IgniteConfiguration.DFLT_SNAPSHOT_DIRECTORY;
+import static org.apache.ignite.events.EventType.EVT_NODE_FAILED;
+import static org.apache.ignite.events.EventType.EVT_NODE_LEFT;
+import static org.apache.ignite.internal.IgniteFeatures.PERSISTENCE_CACHE_SNAPSHOT;
+import static org.apache.ignite.internal.IgniteFeatures.nodeSupports;
+import static org.apache.ignite.internal.MarshallerContextImpl.mappingFileStoreWorkDir;
+import static org.apache.ignite.internal.MarshallerContextImpl.saveMappings;
+import static org.apache.ignite.internal.events.DiscoveryCustomEvent.EVT_DISCOVERY_CUSTOM_EVT;
+import static org.apache.ignite.internal.managers.communication.GridIoPolicy.SYSTEM_POOL;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.INDEX_PARTITION;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.MAX_PARTITION_ID;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.resolveBinaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.INDEX_FILE_NAME;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.PART_FILE_TEMPLATE;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFile;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFileName;
+import static org.apache.ignite.internal.processors.cache.persistence.filename.PdsConsistentIdProcessor.DB_DEFAULT_FOLDER;
+import static org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId.getFlagByPartId;
+import static org.apache.ignite.internal.util.IgniteUtils.isLocalNodeCoordinator;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.END_SNAPSHOT;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.START_SNAPSHOT;
+
+/**
+ * Internal implementation of snapshot operations over persistence caches.
+ * <p>
+ * There are two major actions available:
+ * <ul>
+ *     <li>Create snapshot of the whole cluster cache groups by triggering PME to achieve consistency.</li>
+ *     <li>Create local snapshot of requested cache groups and send it to the node which request this operation.
+ *     Cache groups will be transmitted using internal API for transferring files. See {@link TransmissionHandler}.</li>
+ * </ul>
+ */
+public class IgniteSnapshotManager extends GridCacheSharedManagerAdapter
+    implements IgniteSnapshot, PartitionsExchangeAware, MetastorageLifecycleListener {
+    /** File with delta pages suffix. */
+    public static final String DELTA_SUFFIX = ".delta";
+
+    /** File name template consists of delta pages. */
+    public static final String PART_DELTA_TEMPLATE = PART_FILE_TEMPLATE + DELTA_SUFFIX;
+
+    /** File name template for index delta pages. */
+    public static final String INDEX_DELTA_NAME = INDEX_FILE_NAME + DELTA_SUFFIX;
+
+    /** Text Reason for checkpoint to start snapshot operation. */
+    public static final String CP_SNAPSHOT_REASON = "Checkpoint started to enforce snapshot operation: %s";
+
+    /** Name prefix for each remote snapshot operation. */
+    public static final String RMT_SNAPSHOT_PREFIX = "snapshot_";
+
+    /** Default snapshot directory for loading remote snapshots. */
+    public static final String DFLT_SNAPSHOT_TMP_DIR = "snp";
+
+    /** Timeout in millisecond for snapshot operations. */
+    public static final long DFLT_SNAPSHOT_TIMEOUT = 15_000L;
+
+    /** Snapshot in progress error message. */
+    public static final String SNP_IN_PROGRESS_ERR_MSG = "Operation rejected due to the snapshot operation in progress.";
+
+    /** Error message to finalize snapshot tasks. */
+    public static final String SNP_NODE_STOPPING_ERR_MSG = "Snapshot has been cancelled due to the local node " +
+        "is stopping";
+
+    /** Metastorage key to save currently running snapshot. */
+    public static final String SNP_RUNNING_KEY = "snapshot-running";
+
+    /** Snapshot metrics prefix. */
+    public static final String SNAPSHOT_METRICS = "snapshot";
+
+    /** Prefix for snapshot threads. */
+    private static final String SNAPSHOT_RUNNER_THREAD_PREFIX = "snapshot-runner";
+
+    /** Total number of thread to perform local snapshot. */
+    private static final int SNAPSHOT_THREAD_POOL_SIZE = 4;
+
+    /** Default snapshot topic to receive snapshots from remote node. */
+    private static final Object DFLT_INITIAL_SNAPSHOT_TOPIC = GridTopic.TOPIC_SNAPSHOT.topic("rmt_snp");
+
+    /** File transmission parameter of cache group id. */
+    private static final String SNP_GRP_ID_PARAM = "grpId";
+
+    /** File transmission parameter of cache partition id. */
+    private static final String SNP_PART_ID_PARAM = "partId";
+
+    /** File transmission parameter of node-sender directory path with its consistentId (e.g. db/IgniteNode0). */
+    private static final String SNP_DB_NODE_PATH_PARAM = "dbNodePath";
+
+    /** File transmission parameter of a cache directory with is currently sends its partitions. */
+    private static final String SNP_CACHE_DIR_NAME_PARAM = "cacheDirName";
+
+    /** Snapshot parameter name for a file transmission. */
+    private static final String SNP_NAME_PARAM = "snpName";
+
+    /** Total snapshot files count which receiver should expect to receive. */
+    private static final String SNP_PARTITIONS_CNT = "partsCnt";
+
+    /**
+     * Local buffer to perform copy-on-write operations with pages for {@code SnapshotFutureTask.PageStoreSerialWriter}s.
+     * It is important to have only only buffer per thread (instead of creating each buffer per
+     * each {@code SnapshotFutureTask.PageStoreSerialWriter}) this is redundant and can lead to OOM errors. Direct buffer
+     * deallocate only when ByteBuffer is garbage collected, but it can get out of off-heap memory before it.
+     */
+    private final ThreadLocal<ByteBuffer> locBuff;
+
+    /** Map of registered cache snapshot processes and their corresponding contexts. */
+    private final ConcurrentMap<String, SnapshotFutureTask> locSnpTasks = new ConcurrentHashMap<>();
+
+    /** Lock to protect the resources is used. */
+    private final GridBusyLock busyLock = new GridBusyLock();
+
+    /** Requested snapshot from remote node. */
+    private final AtomicReference<RemoteSnapshotFuture> rmtSnpReq = new AtomicReference<>();
+
+    /** Mutex used to order cluster snapshot operation progress. */
+    private final Object snpOpMux = new Object();
+
+    /** Take snapshot operation procedure. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> startSnpProc;
+
+    /** Check previously performed snapshot operation and delete uncompleted files if need. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> endSnpProc;
+
+    /** Resolved persistent data storage settings. */
+    private volatile PdsFolderSettings pdsSettings;
+
+    /** Fully initialized metastorage. */
+    private volatile ReadWriteMetastorage metaStorage;
+
+    /** Local snapshot sender factory. */
+    private Function<String, SnapshotSender> locSndrFactory = localSnapshotSenderFactory();
+
+    /** Main snapshot directory to save created snapshots. */
+    private volatile File locSnpDir;
+
+    /**
+     * Working directory for loaded snapshots from the remote nodes and storing
+     * temporary partition delta-files of locally started snapshot process.
+     */
+    private File tmpWorkDir;
+
+    /** Factory to working with delta as file storage. */
+    private volatile FileIOFactory ioFactory = new RandomAccessFileIOFactory();
+
+    /** Factory to create page store for restore. */
+    private volatile BiFunction<Integer, Boolean, FilePageStoreFactory> storeFactory;
+
+    /** Snapshot thread pool to perform local partition snapshots. */
+    private ExecutorService snpRunner;
+
+    /** System discovery message listener. */
+    private DiscoveryEventListener discoLsnr;
+
+    /** Cluster snapshot operation requested by user. */
+    private ClusterSnapshotFuture clusterSnpFut;
+
+    /** Current snapshot operation on local node. */
+    private volatile SnapshotOperationRequest clusterSnpRq;
+
+    /** {@code true} if recovery process occurred for snapshot. */
+    private volatile boolean recovered;
+
+    /** Last seen cluster snapshot operation. */
+    private volatile ClusterSnapshotFuture lastSeenSnpFut = new ClusterSnapshotFuture();
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public IgniteSnapshotManager(GridKernalContext ctx) {
+        locBuff = ThreadLocal.withInitial(() ->
+            ByteBuffer.allocateDirect(ctx.config().getDataStorageConfiguration().getPageSize())
+                .order(ByteOrder.nativeOrder()));
+
+        startSnpProc = new DistributedProcess<>(ctx, START_SNAPSHOT, this::initLocalSnapshotStartStage,
+            this::processLocalSnapshotStartStageResult);
+
+        endSnpProc = new DistributedProcess<>(ctx, END_SNAPSHOT, this::initLocalSnapshotEndStage,
+            this::processLocalSnapshotEndStageResult);
+    }
+
+    /**
+     * @param snapshotCacheDir Snapshot directory to store files.
+     * @param partId Cache partition identifier.
+     * @return A file representation.
+     */
+    public static File partDeltaFile(File snapshotCacheDir, int partId) {
+        return new File(snapshotCacheDir, partDeltaFileName(partId));
+    }
+
+    /**
+     * @param partId Partition id.
+     * @return File name of delta partition pages.
+     */
+    public static String partDeltaFileName(int partId) {
+        assert partId <= MAX_PARTITION_ID || partId == INDEX_PARTITION;
+
+        return partId == INDEX_PARTITION ? INDEX_DELTA_NAME : String.format(PART_DELTA_TEMPLATE, partId);
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void start0() throws IgniteCheckedException {
+        super.start0();
+
+        GridKernalContext ctx = cctx.kernalContext();
+
+        if (ctx.clientNode())
+            return;
+
+        if (!CU.isPersistenceEnabled(ctx.config()))
+            return;
+
+        snpRunner = new IgniteThreadPoolExecutor(SNAPSHOT_RUNNER_THREAD_PREFIX,
+            cctx.igniteInstanceName(),
+            SNAPSHOT_THREAD_POOL_SIZE,
+            SNAPSHOT_THREAD_POOL_SIZE,
+            IgniteConfiguration.DFLT_THREAD_KEEP_ALIVE_TIME,
+            new LinkedBlockingQueue<>(),
+            SYSTEM_POOL,
+            new OomExceptionHandler(ctx));
+
+        assert cctx.pageStore() instanceof FilePageStoreManager;
+
+        FilePageStoreManager storeMgr = (FilePageStoreManager)cctx.pageStore();
+
+        pdsSettings = cctx.kernalContext().pdsFolderResolver().resolveFolders();
+
+        locSnpDir = resolveSnapshotWorkDirectory(ctx.config());
+        tmpWorkDir = Paths.get(storeMgr.workDir().getAbsolutePath(), DFLT_SNAPSHOT_TMP_DIR).toFile();
+
+        U.ensureDirectory(locSnpDir, "snapshot work directory", log);
+        U.ensureDirectory(tmpWorkDir, "temp directory for snapshot creation", log);
+
+        MetricRegistry mreg = cctx.kernalContext().metric().registry(SNAPSHOT_METRICS);
+
+        mreg.register("LastSnapshotStartTime", () -> lastSeenSnpFut.startTime,
+            "The system time approximated by 10 ms when the last cluster snapshot operation has been started.");
+        mreg.register("LastSnapshotEndTime", () -> lastSeenSnpFut.endTime,
+            "The system time approximated by 10 ms when the last cluster snapshot operation has been finished.");
+        mreg.register("LastSnapshotName", () -> lastSeenSnpFut.name, String.class,
+            "The name of last started cluster snapshot operation.");
+        mreg.register("LastSnapshotErrorMessage",
+            () -> lastSeenSnpFut.error() == null ? null : lastSeenSnpFut.error().getMessage(),
+            String.class,
+            "The error message of last started cluster snapshot operation which fail. This value will be 'null' " +
+                "if last snapshot operation completed successfully.");
+        mreg.register("localSnapshotList", this::getSnapshots, List.class,
+            "The list of all known snapshots currently saved on the local node with respect to " +
+                "configured via IgniteConfiguration a snapshot path.");
+
+        storeFactory = storeMgr::getPageStoreFactory;
+
+        cctx.exchange().registerExchangeAwareComponent(this);
+        ctx.internalSubscriptionProcessor().registerMetastorageListener(this);
+
+        // Receive remote snapshots requests.
+        cctx.gridIO().addMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC, new GridMessageListener() {
+            @Override public void onMessage(UUID nodeId, Object msg, byte plc) {
+                if (!busyLock.enterBusy())
+                    return;
+
+                try {
+                    if (msg instanceof SnapshotRequestMessage) {
+                        SnapshotRequestMessage reqMsg0 = (SnapshotRequestMessage)msg;
+                        String snpName = reqMsg0.snapshotName();
+
+                        synchronized (this) {
+                            SnapshotFutureTask task = lastScheduledRemoteSnapshotTask(nodeId);
+
+                            if (task != null) {
+                                // Task will also be removed from local map due to the listener on future done.
+                                task.cancel();
+
+                                log.info("Snapshot request has been cancelled due to another request received " +
+                                    "[prevSnpResp=" + task + ", msg0=" + reqMsg0 + ']');
+                            }
+                        }
+
+                        SnapshotFutureTask task = registerSnapshotTask(snpName,
+                            nodeId,
+                            reqMsg0.parts(),
+                            remoteSnapshotSender(snpName, nodeId));
+
+                        task.listen(f -> {
+                            if (f.error() == null)
+                                return;
+
+                            U.error(log, "Failed to process request of creating a snapshot " +
+                                "[from=" + nodeId + ", msg=" + reqMsg0 + ']', f.error());
+
+                            try {
+                                cctx.gridIO().sendToCustomTopic(nodeId,
+                                    DFLT_INITIAL_SNAPSHOT_TOPIC,
+                                    new SnapshotResponseMessage(reqMsg0.snapshotName(), f.error().getMessage()),
+                                    SYSTEM_POOL);
+                            }
+                            catch (IgniteCheckedException ex0) {
+                                U.error(log, "Fail to send the response message with processing snapshot request " +
+                                    "error [request=" + reqMsg0 + ", nodeId=" + nodeId + ']', ex0);
+                            }
+                        });
+
+                        task.start();
+                    }
+                    else if (msg instanceof SnapshotResponseMessage) {
+                        SnapshotResponseMessage respMsg0 = (SnapshotResponseMessage)msg;
+
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.snpName.equals(respMsg0.snapshotName())) {
+                            if (log.isInfoEnabled()) {
+                                log.info("A stale snapshot response message has been received. Will be ignored " +
+                                    "[fromNodeId=" + nodeId + ", response=" + respMsg0 + ']');
+                            }
+
+                            return;
+                        }
+
+                        if (respMsg0.errorMessage() != null) {
+                            fut0.onDone(new IgniteCheckedException("Request cancelled. The snapshot operation stopped " +
+                                "on the remote node with an error: " + respMsg0.errorMessage()));
+                        }
+                    }
+                }
+                catch (Throwable e) {
+                    U.error(log, "Processing snapshot request from remote node fails with an error", e);
+
+                    cctx.kernalContext().failure().process(new FailureContext(FailureType.CRITICAL_ERROR, e));
+                }
+                finally {
+                    busyLock.leaveBusy();
+                }
+            }
+        });
+
+        cctx.gridEvents().addDiscoveryEventListener(discoLsnr = (evt, discoCache) -> {
+            if (!busyLock.enterBusy())
+                return;
+
+            try {
+                UUID leftNodeId = evt.eventNode().id();
+
+                if (evt.type() == EVT_DISCOVERY_CUSTOM_EVT) {
+                    DiscoveryCustomEvent evt0 = (DiscoveryCustomEvent)evt;
+
+                    if (evt0.customMessage() instanceof InitMessage) {
+                        InitMessage<?> msg = (InitMessage<?>)evt0.customMessage();
+
+                        // This happens when #takeSnapshot() method already invoked and distributed process
+                        // starts its action.
+                        if (msg.type() == START_SNAPSHOT.ordinal()) {
+                            assert clusterSnpRq != null ||
+                                !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()) : evt;
+
+                            DiscoveryCustomEvent customEvt = new DiscoveryCustomEvent();
+
+                            customEvt.node(evt0.node());
+                            customEvt.eventNode(evt0.eventNode());
+                            customEvt.affinityTopologyVersion(evt0.affinityTopologyVersion());
+                            customEvt.customMessage(new SnapshotStartDiscoveryMessage(discoCache, msg.processId()));
+
+                            // Handle new event inside discovery thread, so no guarantees will be violated.
+                            cctx.exchange().onDiscoveryEvent(customEvt, discoCache);
+                        }
+                    }
+                }
+                else if (evt.type() == EVT_NODE_LEFT || evt.type() == EVT_NODE_FAILED) {
+                    SnapshotOperationRequest snpRq = clusterSnpRq;
+
+                    for (SnapshotFutureTask sctx : locSnpTasks.values()) {
+                        if (sctx.sourceNodeId().equals(leftNodeId) ||
+                            (snpRq != null &&
+                                snpRq.snpName.equals(sctx.snapshotName()) &&
+                                snpRq.bltNodes.contains(leftNodeId))) {
+                            sctx.acceptException(new ClusterTopologyCheckedException("The node which requested snapshot " +
+                                "creation has left the grid"));
+                        }
+                    }
+
+                    RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                    if (snpTrFut != null && snpTrFut.rmtNodeId.equals(leftNodeId)) {
+                        snpTrFut.onDone(new ClusterTopologyCheckedException("The node from which a snapshot has been " +
+                            "requested left the grid"));
+                    }
+                }
+            }
+            finally {
+                busyLock.leaveBusy();
+            }
+        }, EVT_NODE_LEFT, EVT_NODE_FAILED, EVT_DISCOVERY_CUSTOM_EVT);
+
+        // Remote snapshot handler.
+        cctx.kernalContext().io().addTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC, new TransmissionHandler() {
+            @Override public void onEnd(UUID nodeId) {
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                assert snpTrFut.stores.isEmpty() : snpTrFut.stores.entrySet();
+                assert snpTrFut.partsLeft == 0 : snpTrFut;
+
+                snpTrFut.onDone();
+
+                log.info("Requested snapshot from remote node has been fully received " +
+                    "[snpName=" + snpTrFut.snpName + ", snpTrans=" + snpTrFut + ']');
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onException(UUID nodeId, Throwable err) {
+                RemoteSnapshotFuture fut = rmtSnpReq.get();
+
+                if (fut == null)
+                    return;
+
+                if (fut.rmtNodeId.equals(nodeId))
+                    fut.onDone(err);
+            }
+
+            /** {@inheritDoc} */
+            @Override public String filePath(UUID nodeId, TransmissionMeta fileMeta) {
+                Integer partId = (Integer)fileMeta.params().get(SNP_PART_ID_PARAM);
+                String rmtDbNodePath = (String)fileMeta.params().get(SNP_DB_NODE_PATH_PARAM);
+                String cacheDirName = (String)fileMeta.params().get(SNP_CACHE_DIR_NAME_PARAM);
+
+                RemoteSnapshotFuture transFut = resolve(nodeId, fileMeta);
+
+                try {
+                    File cacheDir = U.resolveWorkDirectory(tmpWorkDir.getAbsolutePath(),
+                        Paths.get(transFut.snpName, rmtDbNodePath, cacheDirName).toString(),
+                        false);
+
+                    return new File(cacheDir, getPartitionFileName(partId)).getAbsolutePath();
+                }
+                catch (IgniteCheckedException e) {
+                    throw new IgniteException(e);
+                }
+            }
+
+            /**
+             * @param nodeId Remote node id.
+             * @param meta Transmission meta.
+             * @return Resolved transmission future.
+             */
+            private RemoteSnapshotFuture resolve(UUID nodeId, TransmissionMeta meta) {
+                String snpName = (String)meta.params().get(SNP_NAME_PARAM);
+                Integer partsCnt = (Integer)meta.params().get(SNP_PARTITIONS_CNT);
+
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                if (snpTrFut == null || !snpTrFut.snpName.equals(snpName)) {
+                    throw new TransmissionCancelledException("Stale snapshot transmission will be ignored " +
+                        "[snpName=" + snpName + ", meta=" + meta + ", snpTrFut=" + snpTrFut + ']');
+                }
+
+                assert snpTrFut.snpName.equals(snpName) && snpTrFut.rmtNodeId.equals(nodeId) :
+                    "Another transmission in progress [snpTrFut=" + snpTrFut + ", nodeId=" + snpName + ']';
+
+                if (snpTrFut.partsLeft == -1)
+                    snpTrFut.partsLeft = partsCnt;
+
+                return snpTrFut;
+            }
+
+            /**
+             * @param snpTrans Current snapshot transmission.
+             * @param grpPartId Pair of group id and its partition id.
+             */
+            private void finishRecover(RemoteSnapshotFuture snpTrans, GroupPartitionId grpPartId) {
+                FilePageStore pageStore = null;
+
+                try {
+                    pageStore = snpTrans.stores.remove(grpPartId);
+
+                    pageStore.finishRecover();
+
+                    snpTrans.partConsumer.accept(new File(pageStore.getFileAbsolutePath()), grpPartId);
+
+                    snpTrans.partsLeft--;
+                }
+                catch (StorageException e) {
+                    throw new IgniteException(e);
+                }
+                finally {
+                    U.closeQuiet(pageStore);
+                }
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<ByteBuffer> chunkHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+
+                RemoteSnapshotFuture snpTrFut = resolve(nodeId, initMeta);
+
+                GroupPartitionId grpPartId = new GroupPartitionId(grpId, partId);
+                FilePageStore pageStore = snpTrFut.stores.get(grpPartId);
+
+                if (pageStore == null) {
+                    throw new IgniteException("Partition must be loaded before applying snapshot delta pages " +
+                        "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                pageStore.beginRecover();
+
+                // No snapshot delta pages received. Finalize recovery.
+                if (initMeta.count() == 0)
+                    finishRecover(snpTrFut, grpPartId);
+
+                return new Consumer<ByteBuffer>() {
+                    final LongAdder transferred = new LongAdder();
+
+                    @Override public void accept(ByteBuffer buff) {
+                        try {
+                            assert initMeta.count() != 0 : initMeta;
+
+                            RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                            if (fut0 == null || !fut0.equals(snpTrFut) || fut0.isCancelled()) {
+                                throw new TransmissionCancelledException("Snapshot request is cancelled " +
+                                    "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                            }
+
+                            pageStore.write(PageIO.getPageId(buff), buff, 0, false);
+
+                            transferred.add(buff.capacity());
+
+                            if (transferred.longValue() == initMeta.count())
+                                finishRecover(snpTrFut, grpPartId);
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                    }
+                };
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<File> fileHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+                String snpName = (String)initMeta.params().get(SNP_NAME_PARAM);
+
+                assert grpId != null;
+                assert partId != null;
+                assert snpName != null;
+                assert storeFactory != null;
+
+                RemoteSnapshotFuture transFut = rmtSnpReq.get();
+
+                if (transFut == null) {
+                    throw new IgniteException("Snapshot transmission with given name doesn't exists " +
+                        "[snpName=" + snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                return new Consumer<File>() {
+                    @Override public void accept(File file) {
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.equals(transFut) || fut0.isCancelled()) {
+                            throw new TransmissionCancelledException("Snapshot request is cancelled [snpName=" + snpName +
+                                ", grpId=" + grpId + ", partId=" + partId + ']');
+                        }
+
+                        busyLock.enterBusy();
+
+                        try {
+                            FilePageStore pageStore = (FilePageStore)storeFactory
+                                .apply(grpId, false)
+                                .createPageStore(getFlagByPartId(partId),
+                                    file::toPath,
+                                    new LongAdderMetric("NO_OP", null));
+
+                            transFut.stores.put(new GroupPartitionId(grpId, partId), pageStore);
+
+                            pageStore.init();
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                        finally {
+                            busyLock.leaveBusy();
+                        }
+                    }
+                };
+            }
+        });
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void stop0(boolean cancel) {
+        busyLock.block();
+
+        try {
+            // Try stop all snapshot processing if not yet.
+            for (SnapshotFutureTask sctx : locSnpTasks.values())
+                sctx.acceptException(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+            locSnpTasks.clear();
+
+            RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+            if (snpTrFut != null)
+                snpTrFut.cancel();
+
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null) {
+                    clusterSnpFut.onDone(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+                    clusterSnpFut = null;
+                }
+            }
+
+            if (snpRunner != null)
+                snpRunner.shutdownNow();
+
+            cctx.kernalContext().io().removeMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC);
+            cctx.kernalContext().io().removeTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC);
+
+            if (discoLsnr != null)
+                cctx.kernalContext().event().removeDiscoveryEventListener(discoLsnr);
+
+            cctx.exchange().unregisterExchangeAwareComponent(this);
+        }
+        finally {
+            busyLock.unblock();
+        }
+    }
+
+    /**
+     * @param snpDir Snapshot dire
+     * @param folderName Local node folder name (see {@link U#maskForFileName} with consistent id).
+     */
+    public static void deleteSnapshot(File snpDir, String folderName) {
+        if (!snpDir.exists())
+            return;
+
+        assert snpDir.isDirectory() : snpDir;
+
+        try {
+            File binDir = resolveBinaryWorkDir(snpDir.getAbsolutePath(), folderName);
+            File dbDir = U.resolveWorkDirectory(snpDir.getAbsolutePath(), databaseRelativePath(folderName), false);
+
+            U.delete(binDir);
+            U.delete(dbDir);
+
+            File marshDir = mappingFileStoreWorkDir(snpDir.getAbsolutePath());
+
+            // Concurrently traverse the snapshot marshaller directory and delete all files.
+            Files.walkFileTree(marshDir.toPath(), new SimpleFileVisitor<Path>() {
+                @Override public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException {
+                    U.delete(file);
+
+                    return FileVisitResult.CONTINUE;
+                }
+
+                @Override public FileVisitResult visitFileFailed(Path file, IOException exc) {
+                    // Skip files which can be concurrently removed from FileTree.
+                    return FileVisitResult.CONTINUE;
+                }
+            });
+
+            File db = new File(snpDir, DB_DEFAULT_FOLDER);
+
+            if (!db.exists() || db.list().length == 0)
+                U.delete(snpDir);
+        }
+        catch (IOException | IgniteCheckedException e) {
+            throw new IgniteException(e);
+        }
+    }
+
+    /**
+     * @param snpName Snapshot name.
+     * @return Local snapshot directory for snapshot with given name.
+     */
+    public File snapshotLocalDir(String snpName) {
+        assert locSnpDir != null;
+
+        return new File(locSnpDir, snpName);
+    }
+
+    /**
+     * @return Node snapshot working directory.
+     */
+    public File snapshotTmpDir() {
+        assert tmpWorkDir != null;
+
+        return tmpWorkDir;
+    }
+
+    /**
+     * @param req Request on snapshot creation.
+     * @return Future which will be completed when a snapshot has been started.
+     */
+    private IgniteInternalFuture<SnapshotOperationResponse> initLocalSnapshotStartStage(SnapshotOperationRequest req) {
+        if (cctx.kernalContext().clientNode() ||
+            !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()))
+            return new GridFinishedFuture<>();
+
+        // Executed inside discovery notifier thread, prior to firing discovery custom event,
+        // so it is safe to set new snapshot task inside this method without synchronization.
+        if (clusterSnpRq != null) {
+            return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot operation has been rejected. " +
+                "Another snapshot operation in progress [req=" + req + ", curr=" + clusterSnpRq + ']'));
+        }
+
+        // Collection of pairs group and appropriate cache partition to be snapshot.
+        Map<Integer, Set<Integer>> parts = new HashMap<>();
+
+        for (Integer grpId : req.grpIds)
+            parts.put(grpId, null);
+
+        SnapshotFutureTask task0 = registerSnapshotTask(req.snpName,
+            req.srcNodeId,
+            parts,
+            locSndrFactory.apply(req.snpName));
+
+        clusterSnpRq = req;
+
+        return task0.chain(f -> new SnapshotOperationResponse());
+    }
+
+    /**
+     * @param id Request id.
+     * @param res Results.
+     * @param err Errors.
+     */
+    private void processLocalSnapshotStartStageResult(UUID id, Map<UUID, SnapshotOperationResponse> res, Map<UUID, Exception> err) {
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        if (snpRq == null)
+            return;
+
+        if (!snpRq.rqId.equals(id)) {
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null && clusterSnpFut.rqId.equals(id)) {
+                    clusterSnpFut.onDone(new IgniteCheckedException("Snapshot operation failed due to another snapshot " +
+                        "operation in progress: " + snpRq.snpName));
+
+                    clusterSnpFut = null;
+                }
+
+                return;
+            }
+        }
+
+        if (isLocalNodeCoordinator(cctx.discovery())) {
+            Set<UUID> missed = new HashSet<>(snpRq.bltNodes);
+            missed.removeAll(res.keySet());
+            missed.removeAll(err.keySet());
+
+            snpRq.hasErr = !F.isEmpty(err) || !missed.isEmpty();
+
+            if (snpRq.hasErr) {
+                U.warn(log, "Execution of local snapshot tasks fails or them haven't been executed " +
+                    "due to some of nodes left the cluster. Uncompleted snapshot will be deleted " +
+                    "[err=" + err + ", missed=" + missed + ']');
+            }
+
+            endSnpProc.start(UUID.randomUUID(), snpRq);
+        }
+    }
+
+    /**
+     * @param req Request on snapshot creation.
+     * @return Future which will be completed when the snapshot will be finalized.
+     */
+    private IgniteInternalFuture<SnapshotOperationResponse> initLocalSnapshotEndStage(SnapshotOperationRequest req) {
+        if (clusterSnpRq == null)
+            return new GridFinishedFuture<>(new SnapshotOperationResponse());
+
+        try {
+            if (req.hasErr)
+                deleteSnapshot(snapshotLocalDir(req.snpName), pdsSettings.folderName());
+
+            removeLastMetaStorageKey();
+        }
+        catch (Exception e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        return new GridFinishedFuture<>(new SnapshotOperationResponse());
+    }
+
+    /**
+     * @param id Request id.
+     * @param res Results.
+     * @param err Errors.
+     */
+    private void processLocalSnapshotEndStageResult(UUID id, Map<UUID, SnapshotOperationResponse> res, Map<UUID, Exception> err) {
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        if (snpRq == null)
+            return;
+
+        Set<UUID> endFail = new HashSet<>(snpRq.bltNodes);
+        endFail.removeAll(res.keySet());
+
+        clusterSnpRq = null;
+
+        synchronized (snpOpMux) {
+            if (clusterSnpFut != null) {
+                if (endFail.isEmpty() && !snpRq.hasErr) {
+                    clusterSnpFut.onDone();
+
+                    if (log.isInfoEnabled())
+                        log.info("Cluster-wide snapshot operation finished successfully [req=" + snpRq + ']');
+                }
+                else {
+                    clusterSnpFut.onDone(new IgniteCheckedException("Snapshot creation has been finished with an error. " +
+                        "Local snapshot tasks may not finished completely or finalizing results fails " +
+                        "[hasErr" + snpRq.hasErr + ", fail=" + endFail + ']'));
+                }
+
+                clusterSnpFut = null;
+            }
+        }
+    }
+
+    /**
+     * @return {@code True} if snapshot operation is in progress.
+     */
+    public boolean isSnapshotCreating() {
+        if (clusterSnpRq != null)
+            return true;
+
+        synchronized (snpOpMux) {
+            return clusterSnpRq != null || clusterSnpFut != null;
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public List<String> getSnapshots() {
+        if (cctx.kernalContext().clientNode())
+            throw new UnsupportedOperationException("Client and daemon nodes can not perform this operation.");
+
+        synchronized (snpOpMux) {
+            return Arrays.stream(locSnpDir.listFiles(File::isDirectory))
+                .map(File::getName)
+                .collect(Collectors.toList());
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public IgniteFuture<Void> createSnapshot(String name) {
+        A.notNullOrEmpty(name, "name");
+
+        try {
+            if (cctx.kernalContext().clientNode())
+                throw new UnsupportedOperationException("Client and daemon nodes can not perform this operation.");
+
+            if (!IgniteFeatures.allNodesSupports(cctx.discovery().allNodes(), PERSISTENCE_CACHE_SNAPSHOT))
+                throw new IgniteException("Not all nodes in the cluster support a snapshot operation.");
+
+            if (!active(cctx.kernalContext().state().clusterState().state()))
+                throw new IgniteException("Snapshot operation has been rejected. The cluster is inactive.");
+
+            DiscoveryDataClusterState clusterState = cctx.kernalContext().state().clusterState();
+
+            if (!clusterState.hasBaselineTopology())
+                throw new IgniteException("Snapshot operation has been rejected. The baseline topology is not configured for cluster.");
+
+            ClusterSnapshotFuture snpFut0;
+
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null && !clusterSnpFut.isDone())
+                    throw new IgniteException("Create snapshot request has been rejected. The previous snapshot operation was not completed.");
+
+                if (clusterSnpRq != null)
+                    throw new IgniteException("Create snapshot request has been rejected. Parallel snapshot processes are not allowed.");
+
+                if (getSnapshots().contains(name))
+                    throw new IgniteException("Create snapshot request has been rejected. Snapshot with given name already exists.");
+
+                snpFut0 = new ClusterSnapshotFuture(UUID.randomUUID(), name);
+
+                clusterSnpFut = snpFut0;
+                lastSeenSnpFut = snpFut0;
+            }
+
+            List<Integer> grps = cctx.cache().persistentGroups().stream()
+                .filter(g -> cctx.cache().cacheType(g.cacheOrGroupName()) == CacheType.USER)
+                .filter(g -> !g.config().isEncryptionEnabled())
+                .map(CacheGroupDescriptor::groupId)
+                .collect(Collectors.toList());
+
+            List<ClusterNode> srvNodes = cctx.discovery().serverNodes(AffinityTopologyVersion.NONE);
+
+            startSnpProc.start(snpFut0.rqId, new SnapshotOperationRequest(snpFut0.rqId,
+                cctx.localNodeId(),
+                name,
+                grps,
+                new HashSet<>(F.viewReadOnly(srvNodes,
+                    F.node2id(),
+                    (node) -> CU.baselineNode(node, clusterState)))));
+
+            if (log.isInfoEnabled())
+                log.info("Cluster-wide snapshot operation started [snpName=" + name + ", grps=" + grps + ']');
+
+            return new IgniteFutureImpl<>(snpFut0);
+        }
+        catch (Exception e) {
+            U.error(log, "Start snapshot operation failed", e);
+
+            lastSeenSnpFut = new ClusterSnapshotFuture(name, e);
+
+            return new IgniteFinishedFutureImpl<>(e);
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onReadyForReadWrite(ReadWriteMetastorage metaStorage) throws IgniteCheckedException {
+        synchronized (snpOpMux) {
+            this.metaStorage = metaStorage;
+
+            if (recovered)
+                removeLastMetaStorageKey();
+
+            recovered = false;
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onReadyForRead(ReadOnlyMetastorage metaStorage) throws IgniteCheckedException {
+        // Snapshot which has not been completed due to the local node crashed must be deleted.
+        String snpName = (String)metaStorage.read(SNP_RUNNING_KEY);
+
+        if (snpName == null)
+            return;
+
+        recovered = true;
+
+        for (File tmp : snapshotTmpDir().listFiles())
+            U.delete(tmp);
+
+        deleteSnapshot(snapshotLocalDir(snpName), pdsSettings.folderName());
+
+        if (log.isInfoEnabled()) {
+            log.info("Previous attempt to create snapshot fail due to the local node crash. All resources " +
+                "related to snapshot operation have been deleted: " + snpName);
+        }
+    }
+
+    /**
+     * @param evt Discovery event to check.
+     * @return {@code true} if exchange started by snapshot operation.
+     */
+    public static boolean isSnapshotOperation(DiscoveryEvent evt) {
+        return !evt.eventNode().isClient() &&
+            evt.type() == EVT_DISCOVERY_CUSTOM_EVT &&
+            ((DiscoveryCustomEvent)evt).customMessage() instanceof SnapshotStartDiscoveryMessage;
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onDoneBeforeTopologyUnlock(GridDhtPartitionsExchangeFuture fut) {
+        if (clusterSnpRq == null || cctx.kernalContext().clientNode())
+            return;
+
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        SnapshotFutureTask task = locSnpTasks.get(snpRq.snpName);
+
+        if (task == null)
+            return;
+
+        if (task.start()) {
+            cctx.database().forceCheckpoint(String.format("Start snapshot operation: %s", snpRq.snpName));
+
+            // schedule task on checkpoint and wait when it starts
+            try {
+                task.awaitStarted();
+            }
+            catch (IgniteCheckedException e) {
+                U.error(log, "Fail to wait while cluster-wide snapshot operation started", e);
+            }
+        }
+    }
+
+    /**
+     * @param parts Collection of pairs group and appropriate cache partition to be snapshot.
+     * @param rmtNodeId The remote node to connect to.
+     * @param partConsumer Received partition handler.
+     * @return Future which will be completed when requested snapshot fully received.
+     */
+    public IgniteInternalFuture<Void> createRemoteSnapshot(
+        UUID rmtNodeId,
+        Map<Integer, Set<Integer>> parts,
+        BiConsumer<File, GroupPartitionId> partConsumer
+    ) {
+        assert partConsumer != null;
+
+        ClusterNode rmtNode = cctx.discovery().node(rmtNodeId);
+
+        if (!nodeSupports(rmtNode, PERSISTENCE_CACHE_SNAPSHOT))
+            return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot on remote node is not supported: " + rmtNode.id()));
+
+        if (rmtNode == null) {
+            return new GridFinishedFuture<>(new ClusterTopologyCheckedException("Snapshot request cannot be performed. " +
+                "Remote node left the grid [rmtNodeId=" + rmtNodeId + ']'));
+        }
+
+        String snpName = RMT_SNAPSHOT_PREFIX + UUID.randomUUID().toString();
+
+        RemoteSnapshotFuture snpTransFut = new RemoteSnapshotFuture(rmtNodeId, snpName, partConsumer);
+
+        busyLock.enterBusy();
+        SnapshotRequestMessage msg0;
+
+        try {
+            msg0 = new SnapshotRequestMessage(snpName, parts);
+
+            RemoteSnapshotFuture fut = rmtSnpReq.get();
+
+            try {
+                if (fut != null)
+                    fut.get(DFLT_SNAPSHOT_TIMEOUT, TimeUnit.MILLISECONDS);
+            }
+            catch (IgniteCheckedException e) {
+                if (log.isInfoEnabled())
+                    log.info("The previous snapshot request finished with an exception:" + e.getMessage());
+            }
+
+            try {
+                if (rmtSnpReq.compareAndSet(null, snpTransFut)) {
+                    cctx.gridIO().sendOrderedMessage(rmtNode, DFLT_INITIAL_SNAPSHOT_TOPIC, msg0, SYSTEM_POOL,
+                        Long.MAX_VALUE, true);
+                }
+                else
+                    return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot request has been concurrently interrupted."));
+
+            }
+            catch (IgniteCheckedException e) {
+                rmtSnpReq.compareAndSet(snpTransFut, null);
+
+                return new GridFinishedFuture<>(e);
+            }
+        }
+        finally {
+            busyLock.leaveBusy();
+        }
+
+        if (log.isInfoEnabled()) {
+            log.info("Snapshot request is sent to the remote node [rmtNodeId=" + rmtNodeId +
+                ", msg0=" + msg0 + ", snpTransFut=" + snpTransFut +
+                ", topVer=" + cctx.discovery().topologyVersionEx() + ']');
+        }
+
+        return snpTransFut;
+    }
+
+    /**
+     * @param grps List of cache groups which will be destroyed.
+     */
+    public void onCacheGroupsStopped(List<Integer> grps) {
+        for (SnapshotFutureTask sctx : locSnpTasks.values()) {
+            Set<Integer> retain = new HashSet<>(grps);
+            retain.retainAll(sctx.affectedCacheGroups());
+
+            if (!retain.isEmpty()) {
+                sctx.acceptException(new IgniteCheckedException("Snapshot has been interrupted due to some of the required " +
+                    "cache groups stopped: " + retain));
+            }
+        }
+    }
+
+    /**
+     * @param snpName Unique snapshot name.
+     * @param srcNodeId Node id which cause snapshot operation.
+     * @param parts Collection of pairs group and appropriate cache partition to be snapshot.
+     * @param snpSndr Factory which produces snapshot receiver instance.
+     * @return Snapshot operation task which should be registered on checkpoint to run.
+     */
+    SnapshotFutureTask registerSnapshotTask(
+        String snpName,
+        UUID srcNodeId,
+        Map<Integer, Set<Integer>> parts,
+        SnapshotSender snpSndr
+    ) {
+        if (!busyLock.enterBusy())
+            return new SnapshotFutureTask(new IgniteCheckedException("Snapshot manager is stopping [locNodeId=" + cctx.localNodeId() + ']'));
+
+        try {
+            if (locSnpTasks.containsKey(snpName))
+                return new SnapshotFutureTask(new IgniteCheckedException("Snapshot with requested name is already scheduled: " + snpName));
+
+            SnapshotFutureTask snpFutTask;
+
+            SnapshotFutureTask prev = locSnpTasks.putIfAbsent(snpName,
+                snpFutTask = new SnapshotFutureTask(cctx,
+                    srcNodeId,
+                    snpName,
+                    tmpWorkDir,
+                    ioFactory,
+                    snpSndr,
+                    parts,
+                    locBuff));
+
+            if (prev != null)
+                return new SnapshotFutureTask(new IgniteCheckedException("Snapshot with requested name is already scheduled: " + snpName));
+
+            if (log.isInfoEnabled()) {
+                log.info("Snapshot task has been registered on local node [sctx=" + this +
+                    ", topVer=" + cctx.discovery().topologyVersionEx() + ']');
+            }
+
+            snpFutTask.listen(f -> locSnpTasks.remove(snpName));
+
+            return snpFutTask;
+        }
+        finally {
+            busyLock.leaveBusy();
+        }
+    }
+
+    /**
+     * @param factory Factory which produces {@link LocalSnapshotSender} implementation.
+     */
+    void setLocalSnapshotSenderFactory(Function<String, SnapshotSender> factory) {
+        locSndrFactory = factory;
+    }
+
+    /**
+     * @return Factory which produces {@link LocalSnapshotSender} implementation.
+     */
+    Function<String, SnapshotSender> localSnapshotSenderFactory() {
 
 Review comment:
   Fixed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r408818297
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManager.java
 ##########
 @@ -0,0 +1,1986 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.channels.FileChannel;
+import java.nio.file.FileVisitResult;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.SimpleFileVisitor;
+import java.nio.file.attribute.BasicFileAttributes;
+import java.util.ArrayDeque;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Queue;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.Executor;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.concurrent.atomic.LongAdder;
+import java.util.function.BiConsumer;
+import java.util.function.BiFunction;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.events.DiscoveryEvent;
+import org.apache.ignite.failure.FailureContext;
+import org.apache.ignite.failure.FailureType;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.GridTopic;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.events.DiscoveryCustomEvent;
+import org.apache.ignite.internal.managers.communication.GridIoManager;
+import org.apache.ignite.internal.managers.communication.GridMessageListener;
+import org.apache.ignite.internal.managers.communication.TransmissionCancelledException;
+import org.apache.ignite.internal.managers.communication.TransmissionHandler;
+import org.apache.ignite.internal.managers.communication.TransmissionMeta;
+import org.apache.ignite.internal.managers.communication.TransmissionPolicy;
+import org.apache.ignite.internal.managers.discovery.DiscoCache;
+import org.apache.ignite.internal.managers.discovery.DiscoveryCustomMessage;
+import org.apache.ignite.internal.managers.discovery.GridDiscoveryManager;
+import org.apache.ignite.internal.managers.eventstorage.DiscoveryEventListener;
+import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.CacheType;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.PartitionsExchangeAware;
+import org.apache.ignite.internal.processors.cache.persistence.StorageException;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.filename.PdsFolderSettings;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.MetastorageLifecycleListener;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadOnlyMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadWriteMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.tree.io.PageIO;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.processors.marshaller.MappedName;
+import org.apache.ignite.internal.processors.metric.MetricRegistry;
+import org.apache.ignite.internal.processors.metric.impl.LongAdderMetric;
+import org.apache.ignite.internal.util.GridBusyLock;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.distributed.InitMessage;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.tostring.GridToStringInclude;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.A;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.S;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.lang.IgniteUuid;
+import org.apache.ignite.thread.IgniteThreadPoolExecutor;
+import org.apache.ignite.thread.OomExceptionHandler;
+import org.jetbrains.annotations.Nullable;
+
+import static java.nio.file.StandardOpenOption.READ;
+import static org.apache.ignite.cluster.ClusterState.active;
+import static org.apache.ignite.configuration.IgniteConfiguration.DFLT_SNAPSHOT_DIRECTORY;
+import static org.apache.ignite.events.EventType.EVT_NODE_FAILED;
+import static org.apache.ignite.events.EventType.EVT_NODE_LEFT;
+import static org.apache.ignite.internal.IgniteFeatures.PERSISTENCE_CACHE_SNAPSHOT;
+import static org.apache.ignite.internal.IgniteFeatures.nodeSupports;
+import static org.apache.ignite.internal.MarshallerContextImpl.mappingFileStoreWorkDir;
+import static org.apache.ignite.internal.MarshallerContextImpl.saveMappings;
+import static org.apache.ignite.internal.events.DiscoveryCustomEvent.EVT_DISCOVERY_CUSTOM_EVT;
+import static org.apache.ignite.internal.managers.communication.GridIoPolicy.SYSTEM_POOL;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.INDEX_PARTITION;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.MAX_PARTITION_ID;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.resolveBinaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.INDEX_FILE_NAME;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.PART_FILE_TEMPLATE;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFile;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFileName;
+import static org.apache.ignite.internal.processors.cache.persistence.filename.PdsConsistentIdProcessor.DB_DEFAULT_FOLDER;
+import static org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId.getFlagByPartId;
+import static org.apache.ignite.internal.util.IgniteUtils.isLocalNodeCoordinator;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.END_SNAPSHOT;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.START_SNAPSHOT;
+
+/**
+ * Internal implementation of snapshot operations over persistence caches.
+ * <p>
+ * There are two major actions available:
+ * <ul>
+ *     <li>Create snapshot of the whole cluster cache groups by triggering PME to achieve consistency.</li>
+ *     <li>Create local snapshot of requested cache groups and send it to the node which request this operation.
+ *     Cache groups will be transmitted using internal API for transferring files. See {@link TransmissionHandler}.</li>
+ * </ul>
+ */
+public class IgniteSnapshotManager extends GridCacheSharedManagerAdapter
+    implements IgniteSnapshot, PartitionsExchangeAware, MetastorageLifecycleListener {
+    /** File with delta pages suffix. */
+    public static final String DELTA_SUFFIX = ".delta";
+
+    /** File name template consists of delta pages. */
+    public static final String PART_DELTA_TEMPLATE = PART_FILE_TEMPLATE + DELTA_SUFFIX;
+
+    /** File name template for index delta pages. */
+    public static final String INDEX_DELTA_NAME = INDEX_FILE_NAME + DELTA_SUFFIX;
+
+    /** Text Reason for checkpoint to start snapshot operation. */
+    public static final String CP_SNAPSHOT_REASON = "Checkpoint started to enforce snapshot operation: %s";
+
+    /** Name prefix for each remote snapshot operation. */
+    public static final String RMT_SNAPSHOT_PREFIX = "snapshot_";
+
+    /** Default snapshot directory for loading remote snapshots. */
+    public static final String DFLT_SNAPSHOT_TMP_DIR = "snp";
+
+    /** Timeout in millisecond for snapshot operations. */
+    public static final long DFLT_SNAPSHOT_TIMEOUT = 15_000L;
+
+    /** Snapshot in progress error message. */
+    public static final String SNP_IN_PROGRESS_ERR_MSG = "Operation rejected due to the snapshot operation in progress.";
+
+    /** Error message to finalize snapshot tasks. */
+    public static final String SNP_NODE_STOPPING_ERR_MSG = "Snapshot has been cancelled due to the local node " +
+        "is stopping";
+
+    /** Metastorage key to save currently running snapshot. */
+    public static final String SNP_RUNNING_KEY = "snapshot-running";
+
+    /** Snapshot metrics prefix. */
+    public static final String SNAPSHOT_METRICS = "snapshot";
+
+    /** Prefix for snapshot threads. */
+    private static final String SNAPSHOT_RUNNER_THREAD_PREFIX = "snapshot-runner";
+
+    /** Total number of thread to perform local snapshot. */
+    private static final int SNAPSHOT_THREAD_POOL_SIZE = 4;
+
+    /** Default snapshot topic to receive snapshots from remote node. */
+    private static final Object DFLT_INITIAL_SNAPSHOT_TOPIC = GridTopic.TOPIC_SNAPSHOT.topic("rmt_snp");
+
+    /** File transmission parameter of cache group id. */
+    private static final String SNP_GRP_ID_PARAM = "grpId";
+
+    /** File transmission parameter of cache partition id. */
+    private static final String SNP_PART_ID_PARAM = "partId";
+
+    /** File transmission parameter of node-sender directory path with its consistentId (e.g. db/IgniteNode0). */
+    private static final String SNP_DB_NODE_PATH_PARAM = "dbNodePath";
+
+    /** File transmission parameter of a cache directory with is currently sends its partitions. */
+    private static final String SNP_CACHE_DIR_NAME_PARAM = "cacheDirName";
+
+    /** Snapshot parameter name for a file transmission. */
+    private static final String SNP_NAME_PARAM = "snpName";
+
+    /** Total snapshot files count which receiver should expect to receive. */
+    private static final String SNP_PARTITIONS_CNT = "partsCnt";
+
+    /**
+     * Local buffer to perform copy-on-write operations with pages for {@code SnapshotFutureTask.PageStoreSerialWriter}s.
+     * It is important to have only only buffer per thread (instead of creating each buffer per
+     * each {@code SnapshotFutureTask.PageStoreSerialWriter}) this is redundant and can lead to OOM errors. Direct buffer
+     * deallocate only when ByteBuffer is garbage collected, but it can get out of off-heap memory before it.
+     */
+    private final ThreadLocal<ByteBuffer> locBuff;
+
+    /** Map of registered cache snapshot processes and their corresponding contexts. */
+    private final ConcurrentMap<String, SnapshotFutureTask> locSnpTasks = new ConcurrentHashMap<>();
+
+    /** Lock to protect the resources is used. */
+    private final GridBusyLock busyLock = new GridBusyLock();
+
+    /** Requested snapshot from remote node. */
+    private final AtomicReference<RemoteSnapshotFuture> rmtSnpReq = new AtomicReference<>();
+
+    /** Mutex used to order cluster snapshot operation progress. */
+    private final Object snpOpMux = new Object();
+
+    /** Take snapshot operation procedure. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> startSnpProc;
+
+    /** Check previously performed snapshot operation and delete uncompleted files if need. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> endSnpProc;
+
+    /** Resolved persistent data storage settings. */
+    private volatile PdsFolderSettings pdsSettings;
+
+    /** Fully initialized metastorage. */
+    private volatile ReadWriteMetastorage metaStorage;
+
+    /** Local snapshot sender factory. */
+    private Function<String, SnapshotSender> locSndrFactory = localSnapshotSenderFactory();
+
+    /** Main snapshot directory to save created snapshots. */
+    private volatile File locSnpDir;
+
+    /**
+     * Working directory for loaded snapshots from the remote nodes and storing
+     * temporary partition delta-files of locally started snapshot process.
+     */
+    private File tmpWorkDir;
+
+    /** Factory to working with delta as file storage. */
+    private volatile FileIOFactory ioFactory = new RandomAccessFileIOFactory();
+
+    /** Factory to create page store for restore. */
+    private volatile BiFunction<Integer, Boolean, FilePageStoreFactory> storeFactory;
+
+    /** Snapshot thread pool to perform local partition snapshots. */
+    private ExecutorService snpRunner;
+
+    /** System discovery message listener. */
+    private DiscoveryEventListener discoLsnr;
+
+    /** Cluster snapshot operation requested by user. */
+    private ClusterSnapshotFuture clusterSnpFut;
+
+    /** Current snapshot operation on local node. */
+    private volatile SnapshotOperationRequest clusterSnpRq;
+
+    /** {@code true} if recovery process occurred for snapshot. */
+    private volatile boolean recovered;
+
+    /** Last seen cluster snapshot operation. */
+    private volatile ClusterSnapshotFuture lastSeenSnpFut = new ClusterSnapshotFuture();
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public IgniteSnapshotManager(GridKernalContext ctx) {
+        locBuff = ThreadLocal.withInitial(() ->
+            ByteBuffer.allocateDirect(ctx.config().getDataStorageConfiguration().getPageSize())
+                .order(ByteOrder.nativeOrder()));
+
+        startSnpProc = new DistributedProcess<>(ctx, START_SNAPSHOT, this::initLocalSnapshotStartStage,
+            this::processLocalSnapshotStartStageResult);
+
+        endSnpProc = new DistributedProcess<>(ctx, END_SNAPSHOT, this::initLocalSnapshotEndStage,
+            this::processLocalSnapshotEndStageResult);
+    }
+
+    /**
+     * @param snapshotCacheDir Snapshot directory to store files.
+     * @param partId Cache partition identifier.
+     * @return A file representation.
+     */
+    public static File partDeltaFile(File snapshotCacheDir, int partId) {
+        return new File(snapshotCacheDir, partDeltaFileName(partId));
+    }
+
+    /**
+     * @param partId Partition id.
+     * @return File name of delta partition pages.
+     */
+    public static String partDeltaFileName(int partId) {
+        assert partId <= MAX_PARTITION_ID || partId == INDEX_PARTITION;
+
+        return partId == INDEX_PARTITION ? INDEX_DELTA_NAME : String.format(PART_DELTA_TEMPLATE, partId);
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void start0() throws IgniteCheckedException {
+        super.start0();
+
+        GridKernalContext ctx = cctx.kernalContext();
+
+        if (ctx.clientNode())
+            return;
+
+        if (!CU.isPersistenceEnabled(ctx.config()))
+            return;
+
+        snpRunner = new IgniteThreadPoolExecutor(SNAPSHOT_RUNNER_THREAD_PREFIX,
+            cctx.igniteInstanceName(),
+            SNAPSHOT_THREAD_POOL_SIZE,
+            SNAPSHOT_THREAD_POOL_SIZE,
+            IgniteConfiguration.DFLT_THREAD_KEEP_ALIVE_TIME,
+            new LinkedBlockingQueue<>(),
+            SYSTEM_POOL,
+            new OomExceptionHandler(ctx));
+
+        assert cctx.pageStore() instanceof FilePageStoreManager;
+
+        FilePageStoreManager storeMgr = (FilePageStoreManager)cctx.pageStore();
+
+        pdsSettings = cctx.kernalContext().pdsFolderResolver().resolveFolders();
+
+        locSnpDir = resolveSnapshotWorkDirectory(ctx.config());
+        tmpWorkDir = Paths.get(storeMgr.workDir().getAbsolutePath(), DFLT_SNAPSHOT_TMP_DIR).toFile();
+
+        U.ensureDirectory(locSnpDir, "snapshot work directory", log);
+        U.ensureDirectory(tmpWorkDir, "temp directory for snapshot creation", log);
+
+        MetricRegistry mreg = cctx.kernalContext().metric().registry(SNAPSHOT_METRICS);
+
+        mreg.register("LastSnapshotStartTime", () -> lastSeenSnpFut.startTime,
+            "The system time approximated by 10 ms when the last cluster snapshot operation has been started.");
+        mreg.register("LastSnapshotEndTime", () -> lastSeenSnpFut.endTime,
+            "The system time approximated by 10 ms when the last cluster snapshot operation has been finished.");
+        mreg.register("LastSnapshotName", () -> lastSeenSnpFut.name, String.class,
+            "The name of last started cluster snapshot operation.");
+        mreg.register("LastSnapshotErrorMessage",
+            () -> lastSeenSnpFut.error() == null ? null : lastSeenSnpFut.error().getMessage(),
+            String.class,
+            "The error message of last started cluster snapshot operation which fail. This value will be 'null' " +
+                "if last snapshot operation completed successfully.");
+        mreg.register("localSnapshotList", this::getSnapshots, List.class,
+            "The list of all known snapshots currently saved on the local node with respect to " +
+                "configured via IgniteConfiguration a snapshot path.");
+
+        storeFactory = storeMgr::getPageStoreFactory;
+
+        cctx.exchange().registerExchangeAwareComponent(this);
+        ctx.internalSubscriptionProcessor().registerMetastorageListener(this);
+
+        // Receive remote snapshots requests.
+        cctx.gridIO().addMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC, new GridMessageListener() {
+            @Override public void onMessage(UUID nodeId, Object msg, byte plc) {
+                if (!busyLock.enterBusy())
+                    return;
+
+                try {
+                    if (msg instanceof SnapshotRequestMessage) {
+                        SnapshotRequestMessage reqMsg0 = (SnapshotRequestMessage)msg;
+                        String snpName = reqMsg0.snapshotName();
+
+                        synchronized (this) {
+                            SnapshotFutureTask task = lastScheduledRemoteSnapshotTask(nodeId);
+
+                            if (task != null) {
+                                // Task will also be removed from local map due to the listener on future done.
+                                task.cancel();
+
+                                log.info("Snapshot request has been cancelled due to another request received " +
+                                    "[prevSnpResp=" + task + ", msg0=" + reqMsg0 + ']');
+                            }
+                        }
+
+                        SnapshotFutureTask task = registerSnapshotTask(snpName,
+                            nodeId,
+                            reqMsg0.parts(),
+                            remoteSnapshotSender(snpName, nodeId));
+
+                        task.listen(f -> {
+                            if (f.error() == null)
+                                return;
+
+                            U.error(log, "Failed to process request of creating a snapshot " +
+                                "[from=" + nodeId + ", msg=" + reqMsg0 + ']', f.error());
+
+                            try {
+                                cctx.gridIO().sendToCustomTopic(nodeId,
+                                    DFLT_INITIAL_SNAPSHOT_TOPIC,
+                                    new SnapshotResponseMessage(reqMsg0.snapshotName(), f.error().getMessage()),
+                                    SYSTEM_POOL);
+                            }
+                            catch (IgniteCheckedException ex0) {
+                                U.error(log, "Fail to send the response message with processing snapshot request " +
+                                    "error [request=" + reqMsg0 + ", nodeId=" + nodeId + ']', ex0);
+                            }
+                        });
+
+                        task.start();
+                    }
+                    else if (msg instanceof SnapshotResponseMessage) {
+                        SnapshotResponseMessage respMsg0 = (SnapshotResponseMessage)msg;
+
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.snpName.equals(respMsg0.snapshotName())) {
+                            if (log.isInfoEnabled()) {
+                                log.info("A stale snapshot response message has been received. Will be ignored " +
+                                    "[fromNodeId=" + nodeId + ", response=" + respMsg0 + ']');
+                            }
+
+                            return;
+                        }
+
+                        if (respMsg0.errorMessage() != null) {
+                            fut0.onDone(new IgniteCheckedException("Request cancelled. The snapshot operation stopped " +
+                                "on the remote node with an error: " + respMsg0.errorMessage()));
+                        }
+                    }
+                }
+                catch (Throwable e) {
+                    U.error(log, "Processing snapshot request from remote node fails with an error", e);
+
+                    cctx.kernalContext().failure().process(new FailureContext(FailureType.CRITICAL_ERROR, e));
+                }
+                finally {
+                    busyLock.leaveBusy();
+                }
+            }
+        });
+
+        cctx.gridEvents().addDiscoveryEventListener(discoLsnr = (evt, discoCache) -> {
+            if (!busyLock.enterBusy())
+                return;
+
+            try {
+                UUID leftNodeId = evt.eventNode().id();
+
+                if (evt.type() == EVT_DISCOVERY_CUSTOM_EVT) {
+                    DiscoveryCustomEvent evt0 = (DiscoveryCustomEvent)evt;
+
+                    if (evt0.customMessage() instanceof InitMessage) {
+                        InitMessage<?> msg = (InitMessage<?>)evt0.customMessage();
+
+                        // This happens when #takeSnapshot() method already invoked and distributed process
+                        // starts its action.
+                        if (msg.type() == START_SNAPSHOT.ordinal()) {
+                            assert clusterSnpRq != null ||
+                                !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()) : evt;
+
+                            DiscoveryCustomEvent customEvt = new DiscoveryCustomEvent();
+
+                            customEvt.node(evt0.node());
+                            customEvt.eventNode(evt0.eventNode());
+                            customEvt.affinityTopologyVersion(evt0.affinityTopologyVersion());
+                            customEvt.customMessage(new SnapshotStartDiscoveryMessage(discoCache, msg.processId()));
+
+                            // Handle new event inside discovery thread, so no guarantees will be violated.
+                            cctx.exchange().onDiscoveryEvent(customEvt, discoCache);
+                        }
+                    }
+                }
+                else if (evt.type() == EVT_NODE_LEFT || evt.type() == EVT_NODE_FAILED) {
+                    SnapshotOperationRequest snpRq = clusterSnpRq;
+
+                    for (SnapshotFutureTask sctx : locSnpTasks.values()) {
+                        if (sctx.sourceNodeId().equals(leftNodeId) ||
+                            (snpRq != null &&
+                                snpRq.snpName.equals(sctx.snapshotName()) &&
+                                snpRq.bltNodes.contains(leftNodeId))) {
+                            sctx.acceptException(new ClusterTopologyCheckedException("The node which requested snapshot " +
+                                "creation has left the grid"));
+                        }
+                    }
+
+                    RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                    if (snpTrFut != null && snpTrFut.rmtNodeId.equals(leftNodeId)) {
+                        snpTrFut.onDone(new ClusterTopologyCheckedException("The node from which a snapshot has been " +
+                            "requested left the grid"));
+                    }
+                }
+            }
+            finally {
+                busyLock.leaveBusy();
+            }
+        }, EVT_NODE_LEFT, EVT_NODE_FAILED, EVT_DISCOVERY_CUSTOM_EVT);
+
+        // Remote snapshot handler.
+        cctx.kernalContext().io().addTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC, new TransmissionHandler() {
+            @Override public void onEnd(UUID nodeId) {
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                assert snpTrFut.stores.isEmpty() : snpTrFut.stores.entrySet();
+                assert snpTrFut.partsLeft == 0 : snpTrFut;
+
+                snpTrFut.onDone();
+
+                log.info("Requested snapshot from remote node has been fully received " +
+                    "[snpName=" + snpTrFut.snpName + ", snpTrans=" + snpTrFut + ']');
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onException(UUID nodeId, Throwable err) {
+                RemoteSnapshotFuture fut = rmtSnpReq.get();
+
+                if (fut == null)
+                    return;
+
+                if (fut.rmtNodeId.equals(nodeId))
+                    fut.onDone(err);
+            }
+
+            /** {@inheritDoc} */
+            @Override public String filePath(UUID nodeId, TransmissionMeta fileMeta) {
+                Integer partId = (Integer)fileMeta.params().get(SNP_PART_ID_PARAM);
+                String rmtDbNodePath = (String)fileMeta.params().get(SNP_DB_NODE_PATH_PARAM);
+                String cacheDirName = (String)fileMeta.params().get(SNP_CACHE_DIR_NAME_PARAM);
+
+                RemoteSnapshotFuture transFut = resolve(nodeId, fileMeta);
+
+                try {
+                    File cacheDir = U.resolveWorkDirectory(tmpWorkDir.getAbsolutePath(),
+                        Paths.get(transFut.snpName, rmtDbNodePath, cacheDirName).toString(),
+                        false);
+
+                    return new File(cacheDir, getPartitionFileName(partId)).getAbsolutePath();
+                }
+                catch (IgniteCheckedException e) {
+                    throw new IgniteException(e);
+                }
+            }
+
+            /**
+             * @param nodeId Remote node id.
+             * @param meta Transmission meta.
+             * @return Resolved transmission future.
+             */
+            private RemoteSnapshotFuture resolve(UUID nodeId, TransmissionMeta meta) {
+                String snpName = (String)meta.params().get(SNP_NAME_PARAM);
+                Integer partsCnt = (Integer)meta.params().get(SNP_PARTITIONS_CNT);
+
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                if (snpTrFut == null || !snpTrFut.snpName.equals(snpName)) {
+                    throw new TransmissionCancelledException("Stale snapshot transmission will be ignored " +
+                        "[snpName=" + snpName + ", meta=" + meta + ", snpTrFut=" + snpTrFut + ']');
+                }
+
+                assert snpTrFut.snpName.equals(snpName) && snpTrFut.rmtNodeId.equals(nodeId) :
+                    "Another transmission in progress [snpTrFut=" + snpTrFut + ", nodeId=" + snpName + ']';
+
+                if (snpTrFut.partsLeft == -1)
+                    snpTrFut.partsLeft = partsCnt;
+
+                return snpTrFut;
+            }
+
+            /**
+             * @param snpTrans Current snapshot transmission.
+             * @param grpPartId Pair of group id and its partition id.
+             */
+            private void finishRecover(RemoteSnapshotFuture snpTrans, GroupPartitionId grpPartId) {
+                FilePageStore pageStore = null;
+
+                try {
+                    pageStore = snpTrans.stores.remove(grpPartId);
+
+                    pageStore.finishRecover();
+
+                    snpTrans.partConsumer.accept(new File(pageStore.getFileAbsolutePath()), grpPartId);
+
+                    snpTrans.partsLeft--;
+                }
+                catch (StorageException e) {
+                    throw new IgniteException(e);
+                }
+                finally {
+                    U.closeQuiet(pageStore);
+                }
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<ByteBuffer> chunkHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+
+                RemoteSnapshotFuture snpTrFut = resolve(nodeId, initMeta);
+
+                GroupPartitionId grpPartId = new GroupPartitionId(grpId, partId);
+                FilePageStore pageStore = snpTrFut.stores.get(grpPartId);
+
+                if (pageStore == null) {
+                    throw new IgniteException("Partition must be loaded before applying snapshot delta pages " +
+                        "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                pageStore.beginRecover();
+
+                // No snapshot delta pages received. Finalize recovery.
+                if (initMeta.count() == 0)
+                    finishRecover(snpTrFut, grpPartId);
+
+                return new Consumer<ByteBuffer>() {
+                    final LongAdder transferred = new LongAdder();
+
+                    @Override public void accept(ByteBuffer buff) {
+                        try {
+                            assert initMeta.count() != 0 : initMeta;
+
+                            RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                            if (fut0 == null || !fut0.equals(snpTrFut) || fut0.isCancelled()) {
+                                throw new TransmissionCancelledException("Snapshot request is cancelled " +
+                                    "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                            }
+
+                            pageStore.write(PageIO.getPageId(buff), buff, 0, false);
+
+                            transferred.add(buff.capacity());
+
+                            if (transferred.longValue() == initMeta.count())
+                                finishRecover(snpTrFut, grpPartId);
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                    }
+                };
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<File> fileHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+                String snpName = (String)initMeta.params().get(SNP_NAME_PARAM);
+
+                assert grpId != null;
+                assert partId != null;
+                assert snpName != null;
+                assert storeFactory != null;
+
+                RemoteSnapshotFuture transFut = rmtSnpReq.get();
+
+                if (transFut == null) {
+                    throw new IgniteException("Snapshot transmission with given name doesn't exists " +
+                        "[snpName=" + snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                return new Consumer<File>() {
+                    @Override public void accept(File file) {
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.equals(transFut) || fut0.isCancelled()) {
+                            throw new TransmissionCancelledException("Snapshot request is cancelled [snpName=" + snpName +
+                                ", grpId=" + grpId + ", partId=" + partId + ']');
+                        }
+
+                        busyLock.enterBusy();
+
+                        try {
+                            FilePageStore pageStore = (FilePageStore)storeFactory
+                                .apply(grpId, false)
+                                .createPageStore(getFlagByPartId(partId),
+                                    file::toPath,
+                                    new LongAdderMetric("NO_OP", null));
+
+                            transFut.stores.put(new GroupPartitionId(grpId, partId), pageStore);
+
+                            pageStore.init();
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                        finally {
+                            busyLock.leaveBusy();
+                        }
+                    }
+                };
+            }
+        });
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void stop0(boolean cancel) {
+        busyLock.block();
+
+        try {
+            // Try stop all snapshot processing if not yet.
+            for (SnapshotFutureTask sctx : locSnpTasks.values())
+                sctx.acceptException(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+            locSnpTasks.clear();
+
+            RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+            if (snpTrFut != null)
+                snpTrFut.cancel();
+
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null) {
+                    clusterSnpFut.onDone(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+                    clusterSnpFut = null;
+                }
+            }
+
+            if (snpRunner != null)
+                snpRunner.shutdownNow();
+
+            cctx.kernalContext().io().removeMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC);
+            cctx.kernalContext().io().removeTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC);
+
+            if (discoLsnr != null)
+                cctx.kernalContext().event().removeDiscoveryEventListener(discoLsnr);
+
+            cctx.exchange().unregisterExchangeAwareComponent(this);
+        }
+        finally {
+            busyLock.unblock();
+        }
+    }
+
+    /**
+     * @param snpDir Snapshot dire
+     * @param folderName Local node folder name (see {@link U#maskForFileName} with consistent id).
+     */
+    public static void deleteSnapshot(File snpDir, String folderName) {
+        if (!snpDir.exists())
+            return;
+
+        assert snpDir.isDirectory() : snpDir;
+
+        try {
+            File binDir = resolveBinaryWorkDir(snpDir.getAbsolutePath(), folderName);
+            File dbDir = U.resolveWorkDirectory(snpDir.getAbsolutePath(), databaseRelativePath(folderName), false);
+
+            U.delete(binDir);
+            U.delete(dbDir);
+
+            File marshDir = mappingFileStoreWorkDir(snpDir.getAbsolutePath());
+
+            // Concurrently traverse the snapshot marshaller directory and delete all files.
+            Files.walkFileTree(marshDir.toPath(), new SimpleFileVisitor<Path>() {
+                @Override public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException {
+                    U.delete(file);
+
+                    return FileVisitResult.CONTINUE;
+                }
+
+                @Override public FileVisitResult visitFileFailed(Path file, IOException exc) {
+                    // Skip files which can be concurrently removed from FileTree.
+                    return FileVisitResult.CONTINUE;
+                }
+            });
+
+            File db = new File(snpDir, DB_DEFAULT_FOLDER);
+
+            if (!db.exists() || db.list().length == 0)
+                U.delete(snpDir);
+        }
+        catch (IOException | IgniteCheckedException e) {
+            throw new IgniteException(e);
+        }
+    }
+
+    /**
+     * @param snpName Snapshot name.
+     * @return Local snapshot directory for snapshot with given name.
+     */
+    public File snapshotLocalDir(String snpName) {
+        assert locSnpDir != null;
+
+        return new File(locSnpDir, snpName);
+    }
+
+    /**
+     * @return Node snapshot working directory.
+     */
+    public File snapshotTmpDir() {
+        assert tmpWorkDir != null;
+
+        return tmpWorkDir;
+    }
+
+    /**
+     * @param req Request on snapshot creation.
+     * @return Future which will be completed when a snapshot has been started.
+     */
+    private IgniteInternalFuture<SnapshotOperationResponse> initLocalSnapshotStartStage(SnapshotOperationRequest req) {
+        if (cctx.kernalContext().clientNode() ||
+            !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()))
+            return new GridFinishedFuture<>();
+
+        // Executed inside discovery notifier thread, prior to firing discovery custom event,
+        // so it is safe to set new snapshot task inside this method without synchronization.
+        if (clusterSnpRq != null) {
+            return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot operation has been rejected. " +
+                "Another snapshot operation in progress [req=" + req + ", curr=" + clusterSnpRq + ']'));
+        }
+
+        // Collection of pairs group and appropriate cache partition to be snapshot.
+        Map<Integer, Set<Integer>> parts = new HashMap<>();
+
+        for (Integer grpId : req.grpIds)
+            parts.put(grpId, null);
+
+        SnapshotFutureTask task0 = registerSnapshotTask(req.snpName,
+            req.srcNodeId,
+            parts,
+            locSndrFactory.apply(req.snpName));
+
+        clusterSnpRq = req;
+
+        return task0.chain(f -> new SnapshotOperationResponse());
+    }
+
+    /**
+     * @param id Request id.
+     * @param res Results.
+     * @param err Errors.
+     */
+    private void processLocalSnapshotStartStageResult(UUID id, Map<UUID, SnapshotOperationResponse> res, Map<UUID, Exception> err) {
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        if (snpRq == null)
+            return;
+
+        if (!snpRq.rqId.equals(id)) {
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null && clusterSnpFut.rqId.equals(id)) {
+                    clusterSnpFut.onDone(new IgniteCheckedException("Snapshot operation failed due to another snapshot " +
+                        "operation in progress: " + snpRq.snpName));
+
+                    clusterSnpFut = null;
+                }
+
+                return;
+            }
+        }
+
+        if (isLocalNodeCoordinator(cctx.discovery())) {
+            Set<UUID> missed = new HashSet<>(snpRq.bltNodes);
+            missed.removeAll(res.keySet());
+            missed.removeAll(err.keySet());
+
+            snpRq.hasErr = !F.isEmpty(err) || !missed.isEmpty();
+
+            if (snpRq.hasErr) {
+                U.warn(log, "Execution of local snapshot tasks fails or them haven't been executed " +
+                    "due to some of nodes left the cluster. Uncompleted snapshot will be deleted " +
+                    "[err=" + err + ", missed=" + missed + ']');
+            }
+
+            endSnpProc.start(UUID.randomUUID(), snpRq);
+        }
+    }
+
+    /**
+     * @param req Request on snapshot creation.
+     * @return Future which will be completed when the snapshot will be finalized.
+     */
+    private IgniteInternalFuture<SnapshotOperationResponse> initLocalSnapshotEndStage(SnapshotOperationRequest req) {
+        if (clusterSnpRq == null)
+            return new GridFinishedFuture<>(new SnapshotOperationResponse());
+
+        try {
+            if (req.hasErr)
+                deleteSnapshot(snapshotLocalDir(req.snpName), pdsSettings.folderName());
+
+            removeLastMetaStorageKey();
+        }
+        catch (Exception e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        return new GridFinishedFuture<>(new SnapshotOperationResponse());
+    }
+
+    /**
+     * @param id Request id.
+     * @param res Results.
+     * @param err Errors.
+     */
+    private void processLocalSnapshotEndStageResult(UUID id, Map<UUID, SnapshotOperationResponse> res, Map<UUID, Exception> err) {
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        if (snpRq == null)
+            return;
+
+        Set<UUID> endFail = new HashSet<>(snpRq.bltNodes);
+        endFail.removeAll(res.keySet());
+
+        clusterSnpRq = null;
+
+        synchronized (snpOpMux) {
+            if (clusterSnpFut != null) {
+                if (endFail.isEmpty() && !snpRq.hasErr) {
+                    clusterSnpFut.onDone();
+
+                    if (log.isInfoEnabled())
+                        log.info("Cluster-wide snapshot operation finished successfully [req=" + snpRq + ']');
+                }
+                else {
+                    clusterSnpFut.onDone(new IgniteCheckedException("Snapshot creation has been finished with an error. " +
+                        "Local snapshot tasks may not finished completely or finalizing results fails " +
+                        "[hasErr" + snpRq.hasErr + ", fail=" + endFail + ']'));
+                }
+
+                clusterSnpFut = null;
+            }
+        }
+    }
+
+    /**
+     * @return {@code True} if snapshot operation is in progress.
+     */
+    public boolean isSnapshotCreating() {
+        if (clusterSnpRq != null)
+            return true;
+
+        synchronized (snpOpMux) {
+            return clusterSnpRq != null || clusterSnpFut != null;
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public List<String> getSnapshots() {
+        if (cctx.kernalContext().clientNode())
+            throw new UnsupportedOperationException("Client and daemon nodes can not perform this operation.");
+
+        synchronized (snpOpMux) {
+            return Arrays.stream(locSnpDir.listFiles(File::isDirectory))
+                .map(File::getName)
+                .collect(Collectors.toList());
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public IgniteFuture<Void> createSnapshot(String name) {
+        A.notNullOrEmpty(name, "name");
+
+        try {
+            if (cctx.kernalContext().clientNode())
+                throw new UnsupportedOperationException("Client and daemon nodes can not perform this operation.");
+
+            if (!IgniteFeatures.allNodesSupports(cctx.discovery().allNodes(), PERSISTENCE_CACHE_SNAPSHOT))
+                throw new IgniteException("Not all nodes in the cluster support a snapshot operation.");
+
+            if (!active(cctx.kernalContext().state().clusterState().state()))
+                throw new IgniteException("Snapshot operation has been rejected. The cluster is inactive.");
+
+            DiscoveryDataClusterState clusterState = cctx.kernalContext().state().clusterState();
+
+            if (!clusterState.hasBaselineTopology())
+                throw new IgniteException("Snapshot operation has been rejected. The baseline topology is not configured for cluster.");
+
+            ClusterSnapshotFuture snpFut0;
+
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null && !clusterSnpFut.isDone())
+                    throw new IgniteException("Create snapshot request has been rejected. The previous snapshot operation was not completed.");
+
+                if (clusterSnpRq != null)
+                    throw new IgniteException("Create snapshot request has been rejected. Parallel snapshot processes are not allowed.");
+
+                if (getSnapshots().contains(name))
+                    throw new IgniteException("Create snapshot request has been rejected. Snapshot with given name already exists.");
+
+                snpFut0 = new ClusterSnapshotFuture(UUID.randomUUID(), name);
+
+                clusterSnpFut = snpFut0;
+                lastSeenSnpFut = snpFut0;
+            }
+
+            List<Integer> grps = cctx.cache().persistentGroups().stream()
+                .filter(g -> cctx.cache().cacheType(g.cacheOrGroupName()) == CacheType.USER)
+                .filter(g -> !g.config().isEncryptionEnabled())
+                .map(CacheGroupDescriptor::groupId)
+                .collect(Collectors.toList());
+
+            List<ClusterNode> srvNodes = cctx.discovery().serverNodes(AffinityTopologyVersion.NONE);
 
 Review comment:
   There is some lag between receiving server nodes and discovery message to start distributed process. Some node can left the grid right before you start snapshot, in this case snapshot will fail only after distributed process completed. It's better to fill bltNodes while processing snapshot start message or at least check that topology has changed when we start snapshot and fail fast.
   The same for cache groups.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r410148969
 
 

 ##########
 File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotSelfTest.java
 ##########
 @@ -286,6 +293,134 @@ public void testSnapshotPrimaryBackupsTheSame() throws Exception {
         TestRecordingCommunicationSpi.stopBlockAll();
     }
 
+    /** @throws Exception If fails. */
+    @Test
+    public void testClusterSnapshotConsistencyUnderLoad() throws Exception {
+        int clients = 50;
+        int balance = 10_000;
+        int transferLimit = 1000;
+        int total = clients * balance * 2;
+        int grids = 3;
+        int transferThreadCnt = 4;
+        AtomicBoolean stop = new AtomicBoolean(false);
+        CountDownLatch txStarted = new CountDownLatch(1);
+
+        CacheConfiguration<Integer, Account> eastCcfg = txCacheConfig(new CacheConfiguration<>("east"));
+        CacheConfiguration<Integer, Account> westCcfg = txCacheConfig(new CacheConfiguration<>("west"));
+
+        for (int i = 0; i < grids; i++)
+            startGrid(optimize(getConfiguration(getTestIgniteInstanceName(i)).setCacheConfiguration(eastCcfg, westCcfg)));
+
+        grid(0).cluster().state(ACTIVE);
+
+        Ignite client = startClientGrid(grids);
+
+        IgniteCache<Integer, Account> eastCache = client.cache(eastCcfg.getName());
+        IgniteCache<Integer, Account> westCache = client.cache(westCcfg.getName());
+
+        // Create clients with zero balance.
+        for (int i = 0; i < clients; i++) {
+            eastCache.put(i, new Account(i, balance));
+            westCache.put(i, new Account(i, balance));
+        }
+
+        assertEquals("The initial summary value in all caches is not correct.",
+            total, sumAllCacheValues(client, clients, eastCcfg.getName(), westCcfg.getName()));
+
+        forceCheckpoint();
+
+        IgniteInternalFuture<?> txLoadFut = GridTestUtils.runMultiThreadedAsync(
+            () -> {
+                ThreadLocalRandom rnd = ThreadLocalRandom.current();
+
+                int amount;
+
+                try {
+                    while (!stop.get()) {
+                        IgniteEx ignite = grid(rnd.nextInt(grids));
+                        IgniteCache<Integer, Account> east = ignite.cache("east");
+                        IgniteCache<Integer, Account> west = ignite.cache("west");
+
+                        amount = rnd.nextInt(transferLimit);
+
+                        try (Transaction tx = ignite.transactions().txStart()) {
+                            Integer id = rnd.nextInt(clients);
+
+                            Account acc0 = east.get(id);
+                            Account acc1 = west.get(id);
+
+                            acc0.balance -= amount;
+
+                            txStarted.countDown();
 
 Review comment:
   Fixed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r408971404
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManager.java
 ##########
 @@ -0,0 +1,1986 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.channels.FileChannel;
+import java.nio.file.FileVisitResult;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.SimpleFileVisitor;
+import java.nio.file.attribute.BasicFileAttributes;
+import java.util.ArrayDeque;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Queue;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.Executor;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.concurrent.atomic.LongAdder;
+import java.util.function.BiConsumer;
+import java.util.function.BiFunction;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.events.DiscoveryEvent;
+import org.apache.ignite.failure.FailureContext;
+import org.apache.ignite.failure.FailureType;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.GridTopic;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.events.DiscoveryCustomEvent;
+import org.apache.ignite.internal.managers.communication.GridIoManager;
+import org.apache.ignite.internal.managers.communication.GridMessageListener;
+import org.apache.ignite.internal.managers.communication.TransmissionCancelledException;
+import org.apache.ignite.internal.managers.communication.TransmissionHandler;
+import org.apache.ignite.internal.managers.communication.TransmissionMeta;
+import org.apache.ignite.internal.managers.communication.TransmissionPolicy;
+import org.apache.ignite.internal.managers.discovery.DiscoCache;
+import org.apache.ignite.internal.managers.discovery.DiscoveryCustomMessage;
+import org.apache.ignite.internal.managers.discovery.GridDiscoveryManager;
+import org.apache.ignite.internal.managers.eventstorage.DiscoveryEventListener;
+import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.CacheType;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.PartitionsExchangeAware;
+import org.apache.ignite.internal.processors.cache.persistence.StorageException;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.filename.PdsFolderSettings;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.MetastorageLifecycleListener;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadOnlyMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadWriteMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.tree.io.PageIO;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.processors.marshaller.MappedName;
+import org.apache.ignite.internal.processors.metric.MetricRegistry;
+import org.apache.ignite.internal.processors.metric.impl.LongAdderMetric;
+import org.apache.ignite.internal.util.GridBusyLock;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.distributed.InitMessage;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.tostring.GridToStringInclude;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.A;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.S;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.lang.IgniteUuid;
+import org.apache.ignite.thread.IgniteThreadPoolExecutor;
+import org.apache.ignite.thread.OomExceptionHandler;
+import org.jetbrains.annotations.Nullable;
+
+import static java.nio.file.StandardOpenOption.READ;
+import static org.apache.ignite.cluster.ClusterState.active;
+import static org.apache.ignite.configuration.IgniteConfiguration.DFLT_SNAPSHOT_DIRECTORY;
+import static org.apache.ignite.events.EventType.EVT_NODE_FAILED;
+import static org.apache.ignite.events.EventType.EVT_NODE_LEFT;
+import static org.apache.ignite.internal.IgniteFeatures.PERSISTENCE_CACHE_SNAPSHOT;
+import static org.apache.ignite.internal.IgniteFeatures.nodeSupports;
+import static org.apache.ignite.internal.MarshallerContextImpl.mappingFileStoreWorkDir;
+import static org.apache.ignite.internal.MarshallerContextImpl.saveMappings;
+import static org.apache.ignite.internal.events.DiscoveryCustomEvent.EVT_DISCOVERY_CUSTOM_EVT;
+import static org.apache.ignite.internal.managers.communication.GridIoPolicy.SYSTEM_POOL;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.INDEX_PARTITION;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.MAX_PARTITION_ID;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.resolveBinaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.INDEX_FILE_NAME;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.PART_FILE_TEMPLATE;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFile;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFileName;
+import static org.apache.ignite.internal.processors.cache.persistence.filename.PdsConsistentIdProcessor.DB_DEFAULT_FOLDER;
+import static org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId.getFlagByPartId;
+import static org.apache.ignite.internal.util.IgniteUtils.isLocalNodeCoordinator;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.END_SNAPSHOT;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.START_SNAPSHOT;
+
+/**
+ * Internal implementation of snapshot operations over persistence caches.
+ * <p>
+ * There are two major actions available:
+ * <ul>
+ *     <li>Create snapshot of the whole cluster cache groups by triggering PME to achieve consistency.</li>
+ *     <li>Create local snapshot of requested cache groups and send it to the node which request this operation.
+ *     Cache groups will be transmitted using internal API for transferring files. See {@link TransmissionHandler}.</li>
+ * </ul>
+ */
+public class IgniteSnapshotManager extends GridCacheSharedManagerAdapter
+    implements IgniteSnapshot, PartitionsExchangeAware, MetastorageLifecycleListener {
+    /** File with delta pages suffix. */
+    public static final String DELTA_SUFFIX = ".delta";
+
+    /** File name template consists of delta pages. */
+    public static final String PART_DELTA_TEMPLATE = PART_FILE_TEMPLATE + DELTA_SUFFIX;
+
+    /** File name template for index delta pages. */
+    public static final String INDEX_DELTA_NAME = INDEX_FILE_NAME + DELTA_SUFFIX;
+
+    /** Text Reason for checkpoint to start snapshot operation. */
+    public static final String CP_SNAPSHOT_REASON = "Checkpoint started to enforce snapshot operation: %s";
+
+    /** Name prefix for each remote snapshot operation. */
+    public static final String RMT_SNAPSHOT_PREFIX = "snapshot_";
+
+    /** Default snapshot directory for loading remote snapshots. */
+    public static final String DFLT_SNAPSHOT_TMP_DIR = "snp";
+
+    /** Timeout in millisecond for snapshot operations. */
+    public static final long DFLT_SNAPSHOT_TIMEOUT = 15_000L;
+
+    /** Snapshot in progress error message. */
+    public static final String SNP_IN_PROGRESS_ERR_MSG = "Operation rejected due to the snapshot operation in progress.";
+
+    /** Error message to finalize snapshot tasks. */
+    public static final String SNP_NODE_STOPPING_ERR_MSG = "Snapshot has been cancelled due to the local node " +
+        "is stopping";
+
+    /** Metastorage key to save currently running snapshot. */
+    public static final String SNP_RUNNING_KEY = "snapshot-running";
+
+    /** Snapshot metrics prefix. */
+    public static final String SNAPSHOT_METRICS = "snapshot";
+
+    /** Prefix for snapshot threads. */
+    private static final String SNAPSHOT_RUNNER_THREAD_PREFIX = "snapshot-runner";
+
+    /** Total number of thread to perform local snapshot. */
+    private static final int SNAPSHOT_THREAD_POOL_SIZE = 4;
+
+    /** Default snapshot topic to receive snapshots from remote node. */
+    private static final Object DFLT_INITIAL_SNAPSHOT_TOPIC = GridTopic.TOPIC_SNAPSHOT.topic("rmt_snp");
+
+    /** File transmission parameter of cache group id. */
+    private static final String SNP_GRP_ID_PARAM = "grpId";
+
+    /** File transmission parameter of cache partition id. */
+    private static final String SNP_PART_ID_PARAM = "partId";
+
+    /** File transmission parameter of node-sender directory path with its consistentId (e.g. db/IgniteNode0). */
+    private static final String SNP_DB_NODE_PATH_PARAM = "dbNodePath";
+
+    /** File transmission parameter of a cache directory with is currently sends its partitions. */
+    private static final String SNP_CACHE_DIR_NAME_PARAM = "cacheDirName";
+
+    /** Snapshot parameter name for a file transmission. */
+    private static final String SNP_NAME_PARAM = "snpName";
+
+    /** Total snapshot files count which receiver should expect to receive. */
+    private static final String SNP_PARTITIONS_CNT = "partsCnt";
+
+    /**
+     * Local buffer to perform copy-on-write operations with pages for {@code SnapshotFutureTask.PageStoreSerialWriter}s.
+     * It is important to have only only buffer per thread (instead of creating each buffer per
+     * each {@code SnapshotFutureTask.PageStoreSerialWriter}) this is redundant and can lead to OOM errors. Direct buffer
+     * deallocate only when ByteBuffer is garbage collected, but it can get out of off-heap memory before it.
+     */
+    private final ThreadLocal<ByteBuffer> locBuff;
+
+    /** Map of registered cache snapshot processes and their corresponding contexts. */
+    private final ConcurrentMap<String, SnapshotFutureTask> locSnpTasks = new ConcurrentHashMap<>();
+
+    /** Lock to protect the resources is used. */
+    private final GridBusyLock busyLock = new GridBusyLock();
+
+    /** Requested snapshot from remote node. */
+    private final AtomicReference<RemoteSnapshotFuture> rmtSnpReq = new AtomicReference<>();
+
+    /** Mutex used to order cluster snapshot operation progress. */
+    private final Object snpOpMux = new Object();
+
+    /** Take snapshot operation procedure. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> startSnpProc;
+
+    /** Check previously performed snapshot operation and delete uncompleted files if need. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> endSnpProc;
+
+    /** Resolved persistent data storage settings. */
+    private volatile PdsFolderSettings pdsSettings;
+
+    /** Fully initialized metastorage. */
+    private volatile ReadWriteMetastorage metaStorage;
+
+    /** Local snapshot sender factory. */
+    private Function<String, SnapshotSender> locSndrFactory = localSnapshotSenderFactory();
+
+    /** Main snapshot directory to save created snapshots. */
+    private volatile File locSnpDir;
+
+    /**
+     * Working directory for loaded snapshots from the remote nodes and storing
+     * temporary partition delta-files of locally started snapshot process.
+     */
+    private File tmpWorkDir;
+
+    /** Factory to working with delta as file storage. */
+    private volatile FileIOFactory ioFactory = new RandomAccessFileIOFactory();
+
+    /** Factory to create page store for restore. */
+    private volatile BiFunction<Integer, Boolean, FilePageStoreFactory> storeFactory;
+
+    /** Snapshot thread pool to perform local partition snapshots. */
+    private ExecutorService snpRunner;
+
+    /** System discovery message listener. */
+    private DiscoveryEventListener discoLsnr;
+
+    /** Cluster snapshot operation requested by user. */
+    private ClusterSnapshotFuture clusterSnpFut;
+
+    /** Current snapshot operation on local node. */
+    private volatile SnapshotOperationRequest clusterSnpRq;
+
+    /** {@code true} if recovery process occurred for snapshot. */
+    private volatile boolean recovered;
+
+    /** Last seen cluster snapshot operation. */
+    private volatile ClusterSnapshotFuture lastSeenSnpFut = new ClusterSnapshotFuture();
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public IgniteSnapshotManager(GridKernalContext ctx) {
+        locBuff = ThreadLocal.withInitial(() ->
+            ByteBuffer.allocateDirect(ctx.config().getDataStorageConfiguration().getPageSize())
+                .order(ByteOrder.nativeOrder()));
+
+        startSnpProc = new DistributedProcess<>(ctx, START_SNAPSHOT, this::initLocalSnapshotStartStage,
+            this::processLocalSnapshotStartStageResult);
+
+        endSnpProc = new DistributedProcess<>(ctx, END_SNAPSHOT, this::initLocalSnapshotEndStage,
+            this::processLocalSnapshotEndStageResult);
+    }
+
+    /**
+     * @param snapshotCacheDir Snapshot directory to store files.
+     * @param partId Cache partition identifier.
+     * @return A file representation.
+     */
+    public static File partDeltaFile(File snapshotCacheDir, int partId) {
+        return new File(snapshotCacheDir, partDeltaFileName(partId));
+    }
+
+    /**
+     * @param partId Partition id.
+     * @return File name of delta partition pages.
+     */
+    public static String partDeltaFileName(int partId) {
+        assert partId <= MAX_PARTITION_ID || partId == INDEX_PARTITION;
+
+        return partId == INDEX_PARTITION ? INDEX_DELTA_NAME : String.format(PART_DELTA_TEMPLATE, partId);
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void start0() throws IgniteCheckedException {
+        super.start0();
+
+        GridKernalContext ctx = cctx.kernalContext();
+
+        if (ctx.clientNode())
+            return;
+
+        if (!CU.isPersistenceEnabled(ctx.config()))
+            return;
+
+        snpRunner = new IgniteThreadPoolExecutor(SNAPSHOT_RUNNER_THREAD_PREFIX,
+            cctx.igniteInstanceName(),
+            SNAPSHOT_THREAD_POOL_SIZE,
+            SNAPSHOT_THREAD_POOL_SIZE,
+            IgniteConfiguration.DFLT_THREAD_KEEP_ALIVE_TIME,
+            new LinkedBlockingQueue<>(),
+            SYSTEM_POOL,
+            new OomExceptionHandler(ctx));
+
+        assert cctx.pageStore() instanceof FilePageStoreManager;
+
+        FilePageStoreManager storeMgr = (FilePageStoreManager)cctx.pageStore();
+
+        pdsSettings = cctx.kernalContext().pdsFolderResolver().resolveFolders();
+
+        locSnpDir = resolveSnapshotWorkDirectory(ctx.config());
+        tmpWorkDir = Paths.get(storeMgr.workDir().getAbsolutePath(), DFLT_SNAPSHOT_TMP_DIR).toFile();
+
+        U.ensureDirectory(locSnpDir, "snapshot work directory", log);
+        U.ensureDirectory(tmpWorkDir, "temp directory for snapshot creation", log);
+
+        MetricRegistry mreg = cctx.kernalContext().metric().registry(SNAPSHOT_METRICS);
+
+        mreg.register("LastSnapshotStartTime", () -> lastSeenSnpFut.startTime,
+            "The system time approximated by 10 ms when the last cluster snapshot operation has been started.");
+        mreg.register("LastSnapshotEndTime", () -> lastSeenSnpFut.endTime,
+            "The system time approximated by 10 ms when the last cluster snapshot operation has been finished.");
+        mreg.register("LastSnapshotName", () -> lastSeenSnpFut.name, String.class,
+            "The name of last started cluster snapshot operation.");
+        mreg.register("LastSnapshotErrorMessage",
+            () -> lastSeenSnpFut.error() == null ? null : lastSeenSnpFut.error().getMessage(),
+            String.class,
+            "The error message of last started cluster snapshot operation which fail. This value will be 'null' " +
+                "if last snapshot operation completed successfully.");
+        mreg.register("localSnapshotList", this::getSnapshots, List.class,
+            "The list of all known snapshots currently saved on the local node with respect to " +
+                "configured via IgniteConfiguration a snapshot path.");
+
+        storeFactory = storeMgr::getPageStoreFactory;
+
+        cctx.exchange().registerExchangeAwareComponent(this);
+        ctx.internalSubscriptionProcessor().registerMetastorageListener(this);
+
+        // Receive remote snapshots requests.
+        cctx.gridIO().addMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC, new GridMessageListener() {
+            @Override public void onMessage(UUID nodeId, Object msg, byte plc) {
+                if (!busyLock.enterBusy())
+                    return;
+
+                try {
+                    if (msg instanceof SnapshotRequestMessage) {
+                        SnapshotRequestMessage reqMsg0 = (SnapshotRequestMessage)msg;
+                        String snpName = reqMsg0.snapshotName();
+
+                        synchronized (this) {
+                            SnapshotFutureTask task = lastScheduledRemoteSnapshotTask(nodeId);
+
+                            if (task != null) {
+                                // Task will also be removed from local map due to the listener on future done.
+                                task.cancel();
+
+                                log.info("Snapshot request has been cancelled due to another request received " +
+                                    "[prevSnpResp=" + task + ", msg0=" + reqMsg0 + ']');
+                            }
+                        }
+
+                        SnapshotFutureTask task = registerSnapshotTask(snpName,
+                            nodeId,
+                            reqMsg0.parts(),
+                            remoteSnapshotSender(snpName, nodeId));
+
+                        task.listen(f -> {
+                            if (f.error() == null)
+                                return;
+
+                            U.error(log, "Failed to process request of creating a snapshot " +
+                                "[from=" + nodeId + ", msg=" + reqMsg0 + ']', f.error());
+
+                            try {
+                                cctx.gridIO().sendToCustomTopic(nodeId,
+                                    DFLT_INITIAL_SNAPSHOT_TOPIC,
+                                    new SnapshotResponseMessage(reqMsg0.snapshotName(), f.error().getMessage()),
+                                    SYSTEM_POOL);
+                            }
+                            catch (IgniteCheckedException ex0) {
+                                U.error(log, "Fail to send the response message with processing snapshot request " +
+                                    "error [request=" + reqMsg0 + ", nodeId=" + nodeId + ']', ex0);
+                            }
+                        });
+
+                        task.start();
+                    }
+                    else if (msg instanceof SnapshotResponseMessage) {
+                        SnapshotResponseMessage respMsg0 = (SnapshotResponseMessage)msg;
+
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.snpName.equals(respMsg0.snapshotName())) {
+                            if (log.isInfoEnabled()) {
+                                log.info("A stale snapshot response message has been received. Will be ignored " +
+                                    "[fromNodeId=" + nodeId + ", response=" + respMsg0 + ']');
+                            }
+
+                            return;
+                        }
+
+                        if (respMsg0.errorMessage() != null) {
+                            fut0.onDone(new IgniteCheckedException("Request cancelled. The snapshot operation stopped " +
+                                "on the remote node with an error: " + respMsg0.errorMessage()));
+                        }
+                    }
+                }
+                catch (Throwable e) {
+                    U.error(log, "Processing snapshot request from remote node fails with an error", e);
+
+                    cctx.kernalContext().failure().process(new FailureContext(FailureType.CRITICAL_ERROR, e));
+                }
+                finally {
+                    busyLock.leaveBusy();
+                }
+            }
+        });
+
+        cctx.gridEvents().addDiscoveryEventListener(discoLsnr = (evt, discoCache) -> {
+            if (!busyLock.enterBusy())
+                return;
+
+            try {
+                UUID leftNodeId = evt.eventNode().id();
+
+                if (evt.type() == EVT_DISCOVERY_CUSTOM_EVT) {
+                    DiscoveryCustomEvent evt0 = (DiscoveryCustomEvent)evt;
+
+                    if (evt0.customMessage() instanceof InitMessage) {
+                        InitMessage<?> msg = (InitMessage<?>)evt0.customMessage();
+
+                        // This happens when #takeSnapshot() method already invoked and distributed process
+                        // starts its action.
+                        if (msg.type() == START_SNAPSHOT.ordinal()) {
+                            assert clusterSnpRq != null ||
+                                !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()) : evt;
+
+                            DiscoveryCustomEvent customEvt = new DiscoveryCustomEvent();
+
+                            customEvt.node(evt0.node());
+                            customEvt.eventNode(evt0.eventNode());
+                            customEvt.affinityTopologyVersion(evt0.affinityTopologyVersion());
+                            customEvt.customMessage(new SnapshotStartDiscoveryMessage(discoCache, msg.processId()));
+
+                            // Handle new event inside discovery thread, so no guarantees will be violated.
+                            cctx.exchange().onDiscoveryEvent(customEvt, discoCache);
+                        }
+                    }
+                }
+                else if (evt.type() == EVT_NODE_LEFT || evt.type() == EVT_NODE_FAILED) {
+                    SnapshotOperationRequest snpRq = clusterSnpRq;
+
+                    for (SnapshotFutureTask sctx : locSnpTasks.values()) {
+                        if (sctx.sourceNodeId().equals(leftNodeId) ||
+                            (snpRq != null &&
+                                snpRq.snpName.equals(sctx.snapshotName()) &&
+                                snpRq.bltNodes.contains(leftNodeId))) {
+                            sctx.acceptException(new ClusterTopologyCheckedException("The node which requested snapshot " +
+                                "creation has left the grid"));
+                        }
+                    }
+
+                    RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                    if (snpTrFut != null && snpTrFut.rmtNodeId.equals(leftNodeId)) {
+                        snpTrFut.onDone(new ClusterTopologyCheckedException("The node from which a snapshot has been " +
+                            "requested left the grid"));
+                    }
+                }
+            }
+            finally {
+                busyLock.leaveBusy();
+            }
+        }, EVT_NODE_LEFT, EVT_NODE_FAILED, EVT_DISCOVERY_CUSTOM_EVT);
+
+        // Remote snapshot handler.
+        cctx.kernalContext().io().addTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC, new TransmissionHandler() {
+            @Override public void onEnd(UUID nodeId) {
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                assert snpTrFut.stores.isEmpty() : snpTrFut.stores.entrySet();
+                assert snpTrFut.partsLeft == 0 : snpTrFut;
+
+                snpTrFut.onDone();
+
+                log.info("Requested snapshot from remote node has been fully received " +
+                    "[snpName=" + snpTrFut.snpName + ", snpTrans=" + snpTrFut + ']');
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onException(UUID nodeId, Throwable err) {
+                RemoteSnapshotFuture fut = rmtSnpReq.get();
+
+                if (fut == null)
+                    return;
+
+                if (fut.rmtNodeId.equals(nodeId))
+                    fut.onDone(err);
+            }
+
+            /** {@inheritDoc} */
+            @Override public String filePath(UUID nodeId, TransmissionMeta fileMeta) {
+                Integer partId = (Integer)fileMeta.params().get(SNP_PART_ID_PARAM);
+                String rmtDbNodePath = (String)fileMeta.params().get(SNP_DB_NODE_PATH_PARAM);
+                String cacheDirName = (String)fileMeta.params().get(SNP_CACHE_DIR_NAME_PARAM);
+
+                RemoteSnapshotFuture transFut = resolve(nodeId, fileMeta);
+
+                try {
+                    File cacheDir = U.resolveWorkDirectory(tmpWorkDir.getAbsolutePath(),
+                        Paths.get(transFut.snpName, rmtDbNodePath, cacheDirName).toString(),
+                        false);
+
+                    return new File(cacheDir, getPartitionFileName(partId)).getAbsolutePath();
+                }
+                catch (IgniteCheckedException e) {
+                    throw new IgniteException(e);
+                }
+            }
+
+            /**
+             * @param nodeId Remote node id.
+             * @param meta Transmission meta.
+             * @return Resolved transmission future.
+             */
+            private RemoteSnapshotFuture resolve(UUID nodeId, TransmissionMeta meta) {
+                String snpName = (String)meta.params().get(SNP_NAME_PARAM);
+                Integer partsCnt = (Integer)meta.params().get(SNP_PARTITIONS_CNT);
+
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                if (snpTrFut == null || !snpTrFut.snpName.equals(snpName)) {
+                    throw new TransmissionCancelledException("Stale snapshot transmission will be ignored " +
+                        "[snpName=" + snpName + ", meta=" + meta + ", snpTrFut=" + snpTrFut + ']');
+                }
+
+                assert snpTrFut.snpName.equals(snpName) && snpTrFut.rmtNodeId.equals(nodeId) :
+                    "Another transmission in progress [snpTrFut=" + snpTrFut + ", nodeId=" + snpName + ']';
+
+                if (snpTrFut.partsLeft == -1)
+                    snpTrFut.partsLeft = partsCnt;
+
+                return snpTrFut;
+            }
+
+            /**
+             * @param snpTrans Current snapshot transmission.
+             * @param grpPartId Pair of group id and its partition id.
+             */
+            private void finishRecover(RemoteSnapshotFuture snpTrans, GroupPartitionId grpPartId) {
+                FilePageStore pageStore = null;
+
+                try {
+                    pageStore = snpTrans.stores.remove(grpPartId);
+
+                    pageStore.finishRecover();
+
+                    snpTrans.partConsumer.accept(new File(pageStore.getFileAbsolutePath()), grpPartId);
+
+                    snpTrans.partsLeft--;
+                }
+                catch (StorageException e) {
+                    throw new IgniteException(e);
+                }
+                finally {
+                    U.closeQuiet(pageStore);
+                }
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<ByteBuffer> chunkHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+
+                RemoteSnapshotFuture snpTrFut = resolve(nodeId, initMeta);
+
+                GroupPartitionId grpPartId = new GroupPartitionId(grpId, partId);
+                FilePageStore pageStore = snpTrFut.stores.get(grpPartId);
+
+                if (pageStore == null) {
+                    throw new IgniteException("Partition must be loaded before applying snapshot delta pages " +
+                        "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                pageStore.beginRecover();
+
+                // No snapshot delta pages received. Finalize recovery.
+                if (initMeta.count() == 0)
+                    finishRecover(snpTrFut, grpPartId);
+
+                return new Consumer<ByteBuffer>() {
+                    final LongAdder transferred = new LongAdder();
+
+                    @Override public void accept(ByteBuffer buff) {
+                        try {
+                            assert initMeta.count() != 0 : initMeta;
+
+                            RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                            if (fut0 == null || !fut0.equals(snpTrFut) || fut0.isCancelled()) {
+                                throw new TransmissionCancelledException("Snapshot request is cancelled " +
+                                    "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                            }
+
+                            pageStore.write(PageIO.getPageId(buff), buff, 0, false);
+
+                            transferred.add(buff.capacity());
+
+                            if (transferred.longValue() == initMeta.count())
+                                finishRecover(snpTrFut, grpPartId);
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                    }
+                };
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<File> fileHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+                String snpName = (String)initMeta.params().get(SNP_NAME_PARAM);
+
+                assert grpId != null;
+                assert partId != null;
+                assert snpName != null;
+                assert storeFactory != null;
+
+                RemoteSnapshotFuture transFut = rmtSnpReq.get();
+
+                if (transFut == null) {
+                    throw new IgniteException("Snapshot transmission with given name doesn't exists " +
+                        "[snpName=" + snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                return new Consumer<File>() {
+                    @Override public void accept(File file) {
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.equals(transFut) || fut0.isCancelled()) {
+                            throw new TransmissionCancelledException("Snapshot request is cancelled [snpName=" + snpName +
+                                ", grpId=" + grpId + ", partId=" + partId + ']');
+                        }
+
+                        busyLock.enterBusy();
+
+                        try {
+                            FilePageStore pageStore = (FilePageStore)storeFactory
+                                .apply(grpId, false)
+                                .createPageStore(getFlagByPartId(partId),
+                                    file::toPath,
+                                    new LongAdderMetric("NO_OP", null));
+
+                            transFut.stores.put(new GroupPartitionId(grpId, partId), pageStore);
+
+                            pageStore.init();
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                        finally {
+                            busyLock.leaveBusy();
+                        }
+                    }
+                };
+            }
+        });
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void stop0(boolean cancel) {
+        busyLock.block();
+
+        try {
+            // Try stop all snapshot processing if not yet.
+            for (SnapshotFutureTask sctx : locSnpTasks.values())
+                sctx.acceptException(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+            locSnpTasks.clear();
+
+            RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+            if (snpTrFut != null)
+                snpTrFut.cancel();
+
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null) {
+                    clusterSnpFut.onDone(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+                    clusterSnpFut = null;
+                }
+            }
+
+            if (snpRunner != null)
+                snpRunner.shutdownNow();
+
+            cctx.kernalContext().io().removeMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC);
+            cctx.kernalContext().io().removeTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC);
+
+            if (discoLsnr != null)
+                cctx.kernalContext().event().removeDiscoveryEventListener(discoLsnr);
+
+            cctx.exchange().unregisterExchangeAwareComponent(this);
+        }
+        finally {
+            busyLock.unblock();
+        }
+    }
+
+    /**
+     * @param snpDir Snapshot dire
+     * @param folderName Local node folder name (see {@link U#maskForFileName} with consistent id).
+     */
+    public static void deleteSnapshot(File snpDir, String folderName) {
+        if (!snpDir.exists())
+            return;
+
+        assert snpDir.isDirectory() : snpDir;
+
+        try {
+            File binDir = resolveBinaryWorkDir(snpDir.getAbsolutePath(), folderName);
+            File dbDir = U.resolveWorkDirectory(snpDir.getAbsolutePath(), databaseRelativePath(folderName), false);
+
+            U.delete(binDir);
+            U.delete(dbDir);
+
+            File marshDir = mappingFileStoreWorkDir(snpDir.getAbsolutePath());
+
+            // Concurrently traverse the snapshot marshaller directory and delete all files.
+            Files.walkFileTree(marshDir.toPath(), new SimpleFileVisitor<Path>() {
+                @Override public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException {
+                    U.delete(file);
+
+                    return FileVisitResult.CONTINUE;
+                }
+
+                @Override public FileVisitResult visitFileFailed(Path file, IOException exc) {
+                    // Skip files which can be concurrently removed from FileTree.
+                    return FileVisitResult.CONTINUE;
+                }
+            });
+
+            File db = new File(snpDir, DB_DEFAULT_FOLDER);
+
+            if (!db.exists() || db.list().length == 0)
+                U.delete(snpDir);
+        }
+        catch (IOException | IgniteCheckedException e) {
+            throw new IgniteException(e);
+        }
+    }
+
+    /**
+     * @param snpName Snapshot name.
+     * @return Local snapshot directory for snapshot with given name.
+     */
+    public File snapshotLocalDir(String snpName) {
+        assert locSnpDir != null;
+
+        return new File(locSnpDir, snpName);
+    }
+
+    /**
+     * @return Node snapshot working directory.
+     */
+    public File snapshotTmpDir() {
+        assert tmpWorkDir != null;
+
+        return tmpWorkDir;
+    }
+
+    /**
+     * @param req Request on snapshot creation.
+     * @return Future which will be completed when a snapshot has been started.
+     */
+    private IgniteInternalFuture<SnapshotOperationResponse> initLocalSnapshotStartStage(SnapshotOperationRequest req) {
+        if (cctx.kernalContext().clientNode() ||
+            !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()))
+            return new GridFinishedFuture<>();
+
+        // Executed inside discovery notifier thread, prior to firing discovery custom event,
+        // so it is safe to set new snapshot task inside this method without synchronization.
+        if (clusterSnpRq != null) {
+            return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot operation has been rejected. " +
+                "Another snapshot operation in progress [req=" + req + ", curr=" + clusterSnpRq + ']'));
+        }
+
+        // Collection of pairs group and appropriate cache partition to be snapshot.
+        Map<Integer, Set<Integer>> parts = new HashMap<>();
+
+        for (Integer grpId : req.grpIds)
+            parts.put(grpId, null);
+
+        SnapshotFutureTask task0 = registerSnapshotTask(req.snpName,
+            req.srcNodeId,
+            parts,
+            locSndrFactory.apply(req.snpName));
+
+        clusterSnpRq = req;
+
+        return task0.chain(f -> new SnapshotOperationResponse());
+    }
+
+    /**
+     * @param id Request id.
+     * @param res Results.
+     * @param err Errors.
+     */
+    private void processLocalSnapshotStartStageResult(UUID id, Map<UUID, SnapshotOperationResponse> res, Map<UUID, Exception> err) {
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        if (snpRq == null)
+            return;
+
+        if (!snpRq.rqId.equals(id)) {
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null && clusterSnpFut.rqId.equals(id)) {
+                    clusterSnpFut.onDone(new IgniteCheckedException("Snapshot operation failed due to another snapshot " +
+                        "operation in progress: " + snpRq.snpName));
+
+                    clusterSnpFut = null;
+                }
+
+                return;
+            }
+        }
+
+        if (isLocalNodeCoordinator(cctx.discovery())) {
+            Set<UUID> missed = new HashSet<>(snpRq.bltNodes);
+            missed.removeAll(res.keySet());
+            missed.removeAll(err.keySet());
+
+            snpRq.hasErr = !F.isEmpty(err) || !missed.isEmpty();
+
+            if (snpRq.hasErr) {
+                U.warn(log, "Execution of local snapshot tasks fails or them haven't been executed " +
+                    "due to some of nodes left the cluster. Uncompleted snapshot will be deleted " +
+                    "[err=" + err + ", missed=" + missed + ']');
+            }
+
+            endSnpProc.start(UUID.randomUUID(), snpRq);
+        }
+    }
+
+    /**
+     * @param req Request on snapshot creation.
+     * @return Future which will be completed when the snapshot will be finalized.
+     */
+    private IgniteInternalFuture<SnapshotOperationResponse> initLocalSnapshotEndStage(SnapshotOperationRequest req) {
+        if (clusterSnpRq == null)
+            return new GridFinishedFuture<>(new SnapshotOperationResponse());
+
+        try {
+            if (req.hasErr)
+                deleteSnapshot(snapshotLocalDir(req.snpName), pdsSettings.folderName());
+
+            removeLastMetaStorageKey();
+        }
+        catch (Exception e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        return new GridFinishedFuture<>(new SnapshotOperationResponse());
+    }
+
+    /**
+     * @param id Request id.
+     * @param res Results.
+     * @param err Errors.
+     */
+    private void processLocalSnapshotEndStageResult(UUID id, Map<UUID, SnapshotOperationResponse> res, Map<UUID, Exception> err) {
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        if (snpRq == null)
+            return;
+
+        Set<UUID> endFail = new HashSet<>(snpRq.bltNodes);
+        endFail.removeAll(res.keySet());
+
+        clusterSnpRq = null;
+
+        synchronized (snpOpMux) {
+            if (clusterSnpFut != null) {
+                if (endFail.isEmpty() && !snpRq.hasErr) {
+                    clusterSnpFut.onDone();
+
+                    if (log.isInfoEnabled())
+                        log.info("Cluster-wide snapshot operation finished successfully [req=" + snpRq + ']');
+                }
+                else {
+                    clusterSnpFut.onDone(new IgniteCheckedException("Snapshot creation has been finished with an error. " +
+                        "Local snapshot tasks may not finished completely or finalizing results fails " +
+                        "[hasErr" + snpRq.hasErr + ", fail=" + endFail + ']'));
+                }
+
+                clusterSnpFut = null;
+            }
+        }
+    }
+
+    /**
+     * @return {@code True} if snapshot operation is in progress.
+     */
+    public boolean isSnapshotCreating() {
+        if (clusterSnpRq != null)
+            return true;
+
+        synchronized (snpOpMux) {
+            return clusterSnpRq != null || clusterSnpFut != null;
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public List<String> getSnapshots() {
+        if (cctx.kernalContext().clientNode())
+            throw new UnsupportedOperationException("Client and daemon nodes can not perform this operation.");
+
+        synchronized (snpOpMux) {
+            return Arrays.stream(locSnpDir.listFiles(File::isDirectory))
+                .map(File::getName)
+                .collect(Collectors.toList());
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public IgniteFuture<Void> createSnapshot(String name) {
+        A.notNullOrEmpty(name, "name");
+
+        try {
+            if (cctx.kernalContext().clientNode())
+                throw new UnsupportedOperationException("Client and daemon nodes can not perform this operation.");
+
+            if (!IgniteFeatures.allNodesSupports(cctx.discovery().allNodes(), PERSISTENCE_CACHE_SNAPSHOT))
+                throw new IgniteException("Not all nodes in the cluster support a snapshot operation.");
+
+            if (!active(cctx.kernalContext().state().clusterState().state()))
+                throw new IgniteException("Snapshot operation has been rejected. The cluster is inactive.");
+
+            DiscoveryDataClusterState clusterState = cctx.kernalContext().state().clusterState();
+
+            if (!clusterState.hasBaselineTopology())
+                throw new IgniteException("Snapshot operation has been rejected. The baseline topology is not configured for cluster.");
+
+            ClusterSnapshotFuture snpFut0;
+
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null && !clusterSnpFut.isDone())
+                    throw new IgniteException("Create snapshot request has been rejected. The previous snapshot operation was not completed.");
+
+                if (clusterSnpRq != null)
+                    throw new IgniteException("Create snapshot request has been rejected. Parallel snapshot processes are not allowed.");
+
+                if (getSnapshots().contains(name))
+                    throw new IgniteException("Create snapshot request has been rejected. Snapshot with given name already exists.");
+
+                snpFut0 = new ClusterSnapshotFuture(UUID.randomUUID(), name);
+
+                clusterSnpFut = snpFut0;
+                lastSeenSnpFut = snpFut0;
+            }
+
+            List<Integer> grps = cctx.cache().persistentGroups().stream()
+                .filter(g -> cctx.cache().cacheType(g.cacheOrGroupName()) == CacheType.USER)
+                .filter(g -> !g.config().isEncryptionEnabled())
+                .map(CacheGroupDescriptor::groupId)
+                .collect(Collectors.toList());
+
+            List<ClusterNode> srvNodes = cctx.discovery().serverNodes(AffinityTopologyVersion.NONE);
+
+            startSnpProc.start(snpFut0.rqId, new SnapshotOperationRequest(snpFut0.rqId,
+                cctx.localNodeId(),
+                name,
+                grps,
+                new HashSet<>(F.viewReadOnly(srvNodes,
+                    F.node2id(),
+                    (node) -> CU.baselineNode(node, clusterState)))));
+
+            if (log.isInfoEnabled())
+                log.info("Cluster-wide snapshot operation started [snpName=" + name + ", grps=" + grps + ']');
+
+            return new IgniteFutureImpl<>(snpFut0);
+        }
+        catch (Exception e) {
+            U.error(log, "Start snapshot operation failed", e);
+
+            lastSeenSnpFut = new ClusterSnapshotFuture(name, e);
+
+            return new IgniteFinishedFutureImpl<>(e);
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onReadyForReadWrite(ReadWriteMetastorage metaStorage) throws IgniteCheckedException {
+        synchronized (snpOpMux) {
+            this.metaStorage = metaStorage;
+
+            if (recovered)
+                removeLastMetaStorageKey();
+
+            recovered = false;
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onReadyForRead(ReadOnlyMetastorage metaStorage) throws IgniteCheckedException {
+        // Snapshot which has not been completed due to the local node crashed must be deleted.
+        String snpName = (String)metaStorage.read(SNP_RUNNING_KEY);
+
+        if (snpName == null)
+            return;
+
+        recovered = true;
+
+        for (File tmp : snapshotTmpDir().listFiles())
+            U.delete(tmp);
+
+        deleteSnapshot(snapshotLocalDir(snpName), pdsSettings.folderName());
+
+        if (log.isInfoEnabled()) {
+            log.info("Previous attempt to create snapshot fail due to the local node crash. All resources " +
+                "related to snapshot operation have been deleted: " + snpName);
+        }
+    }
+
+    /**
+     * @param evt Discovery event to check.
+     * @return {@code true} if exchange started by snapshot operation.
+     */
+    public static boolean isSnapshotOperation(DiscoveryEvent evt) {
+        return !evt.eventNode().isClient() &&
+            evt.type() == EVT_DISCOVERY_CUSTOM_EVT &&
+            ((DiscoveryCustomEvent)evt).customMessage() instanceof SnapshotStartDiscoveryMessage;
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onDoneBeforeTopologyUnlock(GridDhtPartitionsExchangeFuture fut) {
+        if (clusterSnpRq == null || cctx.kernalContext().clientNode())
+            return;
+
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        SnapshotFutureTask task = locSnpTasks.get(snpRq.snpName);
+
+        if (task == null)
+            return;
+
+        if (task.start()) {
+            cctx.database().forceCheckpoint(String.format("Start snapshot operation: %s", snpRq.snpName));
+
+            // schedule task on checkpoint and wait when it starts
+            try {
+                task.awaitStarted();
+            }
+            catch (IgniteCheckedException e) {
+                U.error(log, "Fail to wait while cluster-wide snapshot operation started", e);
+            }
+        }
+    }
+
+    /**
+     * @param parts Collection of pairs group and appropriate cache partition to be snapshot.
+     * @param rmtNodeId The remote node to connect to.
+     * @param partConsumer Received partition handler.
+     * @return Future which will be completed when requested snapshot fully received.
+     */
+    public IgniteInternalFuture<Void> createRemoteSnapshot(
+        UUID rmtNodeId,
+        Map<Integer, Set<Integer>> parts,
+        BiConsumer<File, GroupPartitionId> partConsumer
+    ) {
+        assert partConsumer != null;
+
+        ClusterNode rmtNode = cctx.discovery().node(rmtNodeId);
+
+        if (!nodeSupports(rmtNode, PERSISTENCE_CACHE_SNAPSHOT))
+            return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot on remote node is not supported: " + rmtNode.id()));
+
+        if (rmtNode == null) {
+            return new GridFinishedFuture<>(new ClusterTopologyCheckedException("Snapshot request cannot be performed. " +
+                "Remote node left the grid [rmtNodeId=" + rmtNodeId + ']'));
+        }
+
+        String snpName = RMT_SNAPSHOT_PREFIX + UUID.randomUUID().toString();
+
+        RemoteSnapshotFuture snpTransFut = new RemoteSnapshotFuture(rmtNodeId, snpName, partConsumer);
+
+        busyLock.enterBusy();
+        SnapshotRequestMessage msg0;
+
+        try {
+            msg0 = new SnapshotRequestMessage(snpName, parts);
+
+            RemoteSnapshotFuture fut = rmtSnpReq.get();
+
+            try {
+                if (fut != null)
+                    fut.get(DFLT_SNAPSHOT_TIMEOUT, TimeUnit.MILLISECONDS);
+            }
+            catch (IgniteCheckedException e) {
+                if (log.isInfoEnabled())
+                    log.info("The previous snapshot request finished with an exception:" + e.getMessage());
+            }
+
+            try {
+                if (rmtSnpReq.compareAndSet(null, snpTransFut)) {
+                    cctx.gridIO().sendOrderedMessage(rmtNode, DFLT_INITIAL_SNAPSHOT_TOPIC, msg0, SYSTEM_POOL,
+                        Long.MAX_VALUE, true);
+                }
+                else
+                    return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot request has been concurrently interrupted."));
+
+            }
+            catch (IgniteCheckedException e) {
+                rmtSnpReq.compareAndSet(snpTransFut, null);
+
+                return new GridFinishedFuture<>(e);
+            }
+        }
+        finally {
+            busyLock.leaveBusy();
+        }
+
+        if (log.isInfoEnabled()) {
+            log.info("Snapshot request is sent to the remote node [rmtNodeId=" + rmtNodeId +
+                ", msg0=" + msg0 + ", snpTransFut=" + snpTransFut +
+                ", topVer=" + cctx.discovery().topologyVersionEx() + ']');
+        }
+
+        return snpTransFut;
+    }
+
+    /**
+     * @param grps List of cache groups which will be destroyed.
+     */
+    public void onCacheGroupsStopped(List<Integer> grps) {
+        for (SnapshotFutureTask sctx : locSnpTasks.values()) {
+            Set<Integer> retain = new HashSet<>(grps);
+            retain.retainAll(sctx.affectedCacheGroups());
+
+            if (!retain.isEmpty()) {
+                sctx.acceptException(new IgniteCheckedException("Snapshot has been interrupted due to some of the required " +
+                    "cache groups stopped: " + retain));
+            }
+        }
+    }
+
+    /**
+     * @param snpName Unique snapshot name.
+     * @param srcNodeId Node id which cause snapshot operation.
+     * @param parts Collection of pairs group and appropriate cache partition to be snapshot.
+     * @param snpSndr Factory which produces snapshot receiver instance.
+     * @return Snapshot operation task which should be registered on checkpoint to run.
+     */
+    SnapshotFutureTask registerSnapshotTask(
+        String snpName,
+        UUID srcNodeId,
+        Map<Integer, Set<Integer>> parts,
+        SnapshotSender snpSndr
+    ) {
+        if (!busyLock.enterBusy())
+            return new SnapshotFutureTask(new IgniteCheckedException("Snapshot manager is stopping [locNodeId=" + cctx.localNodeId() + ']'));
+
+        try {
+            if (locSnpTasks.containsKey(snpName))
+                return new SnapshotFutureTask(new IgniteCheckedException("Snapshot with requested name is already scheduled: " + snpName));
+
+            SnapshotFutureTask snpFutTask;
+
+            SnapshotFutureTask prev = locSnpTasks.putIfAbsent(snpName,
+                snpFutTask = new SnapshotFutureTask(cctx,
+                    srcNodeId,
+                    snpName,
+                    tmpWorkDir,
+                    ioFactory,
+                    snpSndr,
+                    parts,
+                    locBuff));
+
+            if (prev != null)
+                return new SnapshotFutureTask(new IgniteCheckedException("Snapshot with requested name is already scheduled: " + snpName));
+
+            if (log.isInfoEnabled()) {
+                log.info("Snapshot task has been registered on local node [sctx=" + this +
+                    ", topVer=" + cctx.discovery().topologyVersionEx() + ']');
+            }
+
+            snpFutTask.listen(f -> locSnpTasks.remove(snpName));
+
+            return snpFutTask;
+        }
+        finally {
+            busyLock.leaveBusy();
+        }
+    }
+
+    /**
+     * @param factory Factory which produces {@link LocalSnapshotSender} implementation.
+     */
+    void setLocalSnapshotSenderFactory(Function<String, SnapshotSender> factory) {
+        locSndrFactory = factory;
+    }
+
+    /**
+     * @return Factory which produces {@link LocalSnapshotSender} implementation.
+     */
+    Function<String, SnapshotSender> localSnapshotSenderFactory() {
+        return LocalSnapshotSender::new;
+    }
+
+    /**
+     * @param snpName Snapshot name.
+     * @param rmtNodeId Remote node id to send snapshot to.
+     * @return Snapshot sender instance.
+     */
+    SnapshotSender remoteSnapshotSender(String snpName, UUID rmtNodeId) {
+        // Remote snapshots can be send only by single threaded executor since only one transmissionSender created.
+        return new RemoteSnapshotSender(log,
+            new SequentialExecutorWrapper(log, snpRunner),
+            () -> databaseRelativePath(pdsSettings.folderName()),
+            cctx.gridIO().openTransmissionSender(rmtNodeId, DFLT_INITIAL_SNAPSHOT_TOPIC),
+            snpName);
+    }
+
+    /** Snapshot finished successfully or already restored. Key can be removed. */
+    private void removeLastMetaStorageKey() throws IgniteCheckedException {
+        cctx.database().checkpointReadLock();
+
+        try {
+            metaStorage.remove(SNP_RUNNING_KEY);
+        }
+        finally {
+            cctx.database().checkpointReadUnlock();
+        }
+    }
+
+    /**
+     * @return The executor service used to run snapshot tasks.
+     */
+    ExecutorService snapshotExecutorService() {
+        assert snpRunner != null;
+
+        return snpRunner;
+    }
+
+    /**
+     * @param ioFactory Factory to create IO interface over a page stores.
+     */
+    void ioFactory(FileIOFactory ioFactory) {
+        this.ioFactory = ioFactory;
+    }
+
+    /**
+     * @param nodeId Remote node id on which requests has been registered.
+     * @return Snapshot future related to given node id.
+     */
+    SnapshotFutureTask lastScheduledRemoteSnapshotTask(UUID nodeId) {
+        return locSnpTasks.values().stream()
+            .filter(t -> t.type() == RemoteSnapshotSender.class && t.sourceNodeId().equals(nodeId))
+            .findFirst()
+            .orElse(null);
+    }
+
+    /**
+     * @return Relative configured path of persistence data storage directory for the local node.
+     * Example: {@code snapshotWorkDir/db/IgniteNodeName0}
+     */
+    static String databaseRelativePath(String folderName) {
+        return Paths.get(DB_DEFAULT_FOLDER, folderName).toString();
+    }
+
+    /**
+     * @param cfg Ignite configuration.
+     * @return Snapshot work path.
+     */
+    static File resolveSnapshotWorkDirectory(IgniteConfiguration cfg) {
+        try {
+            return cfg.getSnapshotPath() == null ?
+                U.resolveWorkDirectory(cfg.getWorkDirectory(), DFLT_SNAPSHOT_DIRECTORY, false) :
+                U.resolveWorkDirectory(cfg.getWorkDirectory(), cfg.getSnapshotPath(), false);
+        }
+        catch (IgniteCheckedException e) {
+            throw new IgniteException(e);
+        }
+    }
+
+    /** Remote snapshot future which tracks remote snapshot transmission result. */
+    private class RemoteSnapshotFuture extends GridFutureAdapter<Void> {
+        /** Snapshot name to create. */
+        private final String snpName;
+
+        /** Remote node id to request snapshot from. */
+        private final UUID rmtNodeId;
+
+        /** Collection of partition to be received. */
+        private final Map<GroupPartitionId, FilePageStore> stores = new ConcurrentHashMap<>();
+
+        /** Partition handler given by request initiator. */
+        private final BiConsumer<File, GroupPartitionId> partConsumer;
+
+        /** Counter which show how many partitions left to be received. */
+        private int partsLeft = -1;
+
+        /**
+         * @param partConsumer Received partition handler.
+         */
+        public RemoteSnapshotFuture(UUID rmtNodeId, String snpName, BiConsumer<File, GroupPartitionId> partConsumer) {
+            this.snpName = snpName;
+            this.rmtNodeId = rmtNodeId;
+            this.partConsumer = partConsumer;
+        }
+
+        /** {@inheritDoc} */
+        @Override public boolean cancel() {
+            return onCancelled();
+        }
+
+        /** {@inheritDoc} */
+        @Override protected boolean onDone(@Nullable Void res, @Nullable Throwable err, boolean cancel) {
+            assert err != null || cancel || stores.isEmpty() : "Not all file storage processed: " + stores;
+
+            rmtSnpReq.compareAndSet(this, null);
+
+            if (err != null || cancel) {
+                // Close non finished file storage.
+                for (Map.Entry<GroupPartitionId, FilePageStore> entry : stores.entrySet()) {
+                    FilePageStore store = entry.getValue();
+
+                    try {
+                        store.stop(true);
+                    }
+                    catch (StorageException e) {
+                        log.warning("Error stopping received file page store", e);
+                    }
+                }
+            }
+
+            U.delete(Paths.get(tmpWorkDir.getAbsolutePath(), snpName));
+
+            return super.onDone(res, err, cancel);
+        }
+
+        /** {@inheritDoc} */
+        @Override public boolean equals(Object o) {
+            if (this == o)
+                return true;
+
+            if (o == null || getClass() != o.getClass())
+                return false;
+
+            RemoteSnapshotFuture fut = (RemoteSnapshotFuture)o;
+
+            return rmtNodeId.equals(fut.rmtNodeId) &&
+                snpName.equals(fut.snpName);
+        }
+
+        /** {@inheritDoc} */
+        @Override public int hashCode() {
+            return Objects.hash(rmtNodeId, snpName);
+        }
+
+        /** {@inheritDoc} */
+        @Override public String toString() {
+            return S.toString(RemoteSnapshotFuture.class, this);
+        }
+    }
+
+    /**
+     * Such an executor can executes tasks not in a single thread, but executes them
+     * on different threads sequentially. It's important for some {@link SnapshotSender}'s
+     * to process sub-task sequentially due to all these sub-tasks may share a single socket
+     * channel to send data to.
+     */
+    private static class SequentialExecutorWrapper implements Executor {
+        /** Ignite logger. */
+        private final IgniteLogger log;
+
+        /** Queue of task to execute. */
+        private final Queue<Runnable> tasks = new ArrayDeque<>();
+
+        /** Delegate executor. */
+        private final Executor executor;
+
+        /** Currently running task. */
+        private volatile Runnable active;
+
+        /** If wrapped executor is shutting down. */
+        private volatile boolean stopping;
+
+        /**
+         * @param executor Executor to run tasks on.
+         */
+        public SequentialExecutorWrapper(IgniteLogger log, Executor executor) {
+            this.log = log.getLogger(SequentialExecutorWrapper.class);
+            this.executor = executor;
+        }
+
+        /** {@inheritDoc} */
+        @Override public synchronized void execute(final Runnable r) {
+            assert !stopping : "Task must be cancelled prior to the wrapped executor is shutting down.";
+
+            tasks.offer(() -> {
+                try {
+                    r.run();
+                }
+                finally {
+                    scheduleNext();
+                }
+            });
+
+            if (active == null)
+                scheduleNext();
+        }
+
+        /** */
+        protected synchronized void scheduleNext() {
+            if ((active = tasks.poll()) != null) {
+                try {
+                    executor.execute(active);
+                }
+                catch (RejectedExecutionException e) {
+                    tasks.clear();
+
+                    stopping = true;
+
+                    log.warning("Task is outdated. Wrapped executor is shutting down.", e);
+                }
+            }
+        }
+    }
+
+    /**
+     *
+     */
+    private static class RemoteSnapshotSender extends SnapshotSender {
+        /** The sender which sends files to remote node. */
+        private final GridIoManager.TransmissionSender sndr;
+
+        /** Relative node path initializer. */
+        private final Supplier<String> initPath;
+
+        /** Snapshot name */
+        private final String snpName;
+
+        /** Local node persistent directory with consistent id. */
+        private String relativeNodePath;
+
+        /** The number of cache partition files expected to be processed. */
+        private int partsCnt;
+
+        /**
+         * @param log Ignite logger.
+         * @param sndr File sender instance.
+         * @param snpName Snapshot name.
+         */
+        public RemoteSnapshotSender(
+            IgniteLogger log,
+            Executor exec,
+            Supplier<String> initPath,
+            GridIoManager.TransmissionSender sndr,
+            String snpName
+        ) {
+            super(log, exec);
+
+            this.sndr = sndr;
+            this.snpName = snpName;
+            this.initPath = initPath;
+        }
+
+        /** {@inheritDoc} */
+        @Override protected void init(int partsCnt) {
+            this.partsCnt = partsCnt;
+
+            relativeNodePath = initPath.get();
+
+            if (relativeNodePath == null)
+                throw new IgniteException("Relative node path cannot be empty.");
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendPart0(File part, String cacheDirName, GroupPartitionId pair, Long len) {
+            try {
+                assert part.exists();
+                assert len > 0 : "Requested partitions has incorrect file length " +
+                    "[pair=" + pair + ", cacheDirName=" + cacheDirName + ']';
+
+                sndr.send(part, 0, len, transmissionParams(snpName, cacheDirName, pair), TransmissionPolicy.FILE);
+
+                if (log.isInfoEnabled()) {
+                    log.info("Partition file has been send [part=" + part.getName() + ", pair=" + pair +
+                        ", length=" + len + ']');
+                }
+            }
+            catch (TransmissionCancelledException e) {
+                if (log.isInfoEnabled()) {
+                    log.info("Transmission partition file has been interrupted [part=" + part.getName() +
+                        ", pair=" + pair + ']');
+                }
+            }
+            catch (IgniteCheckedException | InterruptedException | IOException e) {
+                U.error(log, "Error sending partition file [part=" + part.getName() + ", pair=" + pair +
+                    ", length=" + len + ']', e);
+
+                throw new IgniteException(e);
+            }
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendDelta0(File delta, String cacheDirName, GroupPartitionId pair) {
+            try {
+                sndr.send(delta, transmissionParams(snpName, cacheDirName, pair), TransmissionPolicy.CHUNK);
+
+                if (log.isInfoEnabled())
+                    log.info("Delta pages storage has been send [part=" + delta.getName() + ", pair=" + pair + ']');
+            }
+            catch (TransmissionCancelledException e) {
+                if (log.isInfoEnabled()) {
+                    log.info("Transmission delta pages has been interrupted [part=" + delta.getName() +
+                        ", pair=" + pair + ']');
+                }
+            }
+            catch (IgniteCheckedException | InterruptedException | IOException e) {
+                U.error(log, "Error sending delta file  [part=" + delta.getName() + ", pair=" + pair + ']', e);
+
+                throw new IgniteException(e);
+            }
+        }
+
+        /**
+         * @param cacheDirName Cache directory name.
+         * @param pair Cache group id with corresponding partition id.
+         * @return Map of params.
+         */
+        private Map<String, Serializable> transmissionParams(String snpName, String cacheDirName,
+            GroupPartitionId pair) {
+            Map<String, Serializable> params = new HashMap<>();
+
+            params.put(SNP_GRP_ID_PARAM, pair.getGroupId());
+            params.put(SNP_PART_ID_PARAM, pair.getPartitionId());
+            params.put(SNP_DB_NODE_PATH_PARAM, relativeNodePath);
+            params.put(SNP_CACHE_DIR_NAME_PARAM, cacheDirName);
+            params.put(SNP_NAME_PARAM, snpName);
+            params.put(SNP_PARTITIONS_CNT, partsCnt);
+
+            return params;
+        }
+
+        /** {@inheritDoc} */
+        @Override public void close0(@Nullable Throwable th) {
+            U.closeQuiet(sndr);
+
+            if (th == null) {
+                if (log.isInfoEnabled())
+                    log.info("The remote snapshot sender closed normally [snpName=" + snpName + ']');
+            }
+            else {
+                U.warn(log, "The remote snapshot sender closed due to an error occurred while processing " +
+                    "snapshot operation [snpName=" + snpName + ']', th);
+            }
+        }
+    }
+
+    /**
+     * Snapshot sender which writes all data to local directory.
+     */
+    private class LocalSnapshotSender extends SnapshotSender {
+        /** Snapshot name. */
+        private final String snpName;
+
+        /** Local snapshot directory. */
+        private final File snpLocDir;
+
+        /** Local node snapshot directory calculated on snapshot directory. */
+        private File dbDir;
+
+        /** Size of page. */
+        private final int pageSize;
+
+        /**
+         * @param snpName Snapshot name.
+         */
+        public LocalSnapshotSender(String snpName) {
+            super(IgniteSnapshotManager.this.log, snpRunner);
+
+            this.snpName = snpName;
+            snpLocDir = snapshotLocalDir(snpName);
+            pageSize = cctx.kernalContext().config().getDataStorageConfiguration().getPageSize();
+        }
+
+        /** {@inheritDoc} */
+        @Override protected void init(int partsCnt) {
+            dbDir = new File (snpLocDir, databaseRelativePath(pdsSettings.folderName()));
+
+            if (dbDir.exists()) {
+                throw new IgniteException("Snapshot with given name already exists " +
+                    "[snpName=" + snpName + ", absPath=" + dbDir.getAbsolutePath() + ']');
+            }
+
+            cctx.database().checkpointReadLock();
+
+            try {
+                assert metaStorage != null && metaStorage.read(SNP_RUNNING_KEY) == null :
+                    "The previous snapshot hasn't been completed correctly";
+
+                metaStorage.write(SNP_RUNNING_KEY, snpName);
+
+                U.ensureDirectory(dbDir, "snapshot work directory", log);
+            }
+            catch (IgniteCheckedException e) {
+                throw new IgniteException(e);
+            }
+            finally {
+                cctx.database().checkpointReadUnlock();
+            }
+
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendCacheConfig0(File ccfg, String cacheDirName) {
+            assert dbDir != null;
+
+            try {
+                File cacheDir = U.resolveWorkDirectory(dbDir.getAbsolutePath(), cacheDirName, false);
+
+                copy(ccfg, new File(cacheDir, ccfg.getName()), ccfg.length());
+            }
+            catch (IgniteCheckedException | IOException e) {
+                throw new IgniteException(e);
+            }
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendMarshallerMeta0(List<Map<Integer, MappedName>> mappings) {
+            if (mappings == null)
+                return;
+
+            saveMappings(cctx.kernalContext(), mappings, snpLocDir);
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendBinaryMeta0(Collection<BinaryType> types) {
+            if (types == null)
+                return;
+
+            cctx.kernalContext().cacheObjects().saveMetadata(types, snpLocDir);
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendPart0(File part, String cacheDirName, GroupPartitionId pair, Long len) {
+            try {
+                if (len == 0)
+                    return;
+
+                File cacheDir = U.resolveWorkDirectory(dbDir.getAbsolutePath(), cacheDirName, false);
+
+                File snpPart = new File(cacheDir, part.getName());
+
+                if (!snpPart.exists() || snpPart.delete())
+                    snpPart.createNewFile();
+
+                copy(part, snpPart, len);
+
+                if (log.isInfoEnabled()) {
+                    log.info("Partition has been snapshot [snapshotDir=" + dbDir.getAbsolutePath() +
+                        ", cacheDirName=" + cacheDirName + ", part=" + part.getName() +
+                        ", length=" + part.length() + ", snapshot=" + snpPart.getName() + ']');
+                }
+            }
+            catch (IOException | IgniteCheckedException ex) {
+                throw new IgniteException(ex);
+            }
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendDelta0(File delta, String cacheDirName, GroupPartitionId pair) {
+            File snpPart = getPartitionFile(dbDir, cacheDirName, pair.getPartitionId());
+
+            if (log.isInfoEnabled()) {
+                log.info("Start partition snapshot recovery with the given delta page file [part=" + snpPart +
+                    ", delta=" + delta + ']');
+            }
+
+            try (FileIO fileIo = ioFactory.create(delta, READ);
+                 FilePageStore pageStore = (FilePageStore)storeFactory
+                     .apply(pair.getGroupId(), false)
+                     .createPageStore(getFlagByPartId(pair.getPartitionId()),
+                         snpPart::toPath,
+                         new LongAdderMetric("NO_OP", null))
+            ) {
+                ByteBuffer pageBuf = ByteBuffer.allocate(pageSize)
+                    .order(ByteOrder.nativeOrder());
+
+                long totalBytes = fileIo.size();
+
+                assert totalBytes % pageSize == 0 : "Given file with delta pages has incorrect size: " + fileIo.size();
+
+                pageStore.beginRecover();
+
+                for (long pos = 0; pos < totalBytes; pos += pageSize) {
+                    long read = fileIo.readFully(pageBuf, pos);
+
+                    assert read == pageBuf.capacity();
+
+                    pageBuf.flip();
+
+                    if (log.isDebugEnabled()) {
+                        log.debug("Read page given delta file [path=" + delta.getName() +
+                            ", pageId=" + PageIO.getPageId(pageBuf) + ", pos=" + pos + ", pages=" + (totalBytes / pageSize) +
+                            ", crcBuff=" + FastCrc.calcCrc(pageBuf, pageBuf.limit()) + ", crcPage=" + PageIO.getCrc(pageBuf) + ']');
+
+                        pageBuf.rewind();
+                    }
+
+                    pageStore.write(PageIO.getPageId(pageBuf), pageBuf, 0, false);
+
+                    pageBuf.flip();
+                }
+
+                pageStore.finishRecover();
+            }
+            catch (IOException | IgniteCheckedException e) {
+                throw new IgniteException(e);
+            }
+        }
+
+        /** {@inheritDoc} */
+        @Override protected void close0(@Nullable Throwable th) {
+            if (th == null) {
+                if (log.isInfoEnabled())
+                    log.info("Local snapshot sender closed, resources released [dbNodeSnpDir=" + dbDir + ']');
+            }
+            else {
+                deleteSnapshot(snpLocDir, pdsSettings.folderName());
+
+                U.warn(log, "Local snapshot sender closed due to an error occurred", th);
+            }
+        }
+
+        /**
+         * @param from Copy from file.
+         * @param to Copy data to file.
+         * @param length Number of bytes to copy from beginning.
+         * @throws IOException If fails.
+         */
+        private void copy(File from, File to, long length) throws IOException {
+            try (FileIO src = ioFactory.create(from, READ);
+                 FileChannel dest = new FileOutputStream(to).getChannel()) {
+                if (src.size() < length) {
+                    throw new IgniteException("The source file to copy has to enough length " +
+                        "[expected=" + length + ", actual=" + src.size() + ']');
+                }
+
+                src.position(0);
+
+                long written = 0;
+
+                while (written < length)
+                    written += src.transferTo(written, length - written, dest);
+            }
+        }
+    }
+
+    /** Snapshot start request for {@link DistributedProcess} initiate message. */
+    private static class SnapshotOperationRequest implements Serializable {
+        /** Serial version uid. */
+        private static final long serialVersionUID = 0L;
+
+        /** Unique snapshot request id. */
+        private final UUID rqId;
+
+        /** Source node id which trigger request. */
+        private final UUID srcNodeId;
+
+        /** Snapshot name. */
+        private final String snpName;
+
+        @GridToStringInclude
+        /** The list of cache groups to include into snapshot. */
+        private final List<Integer> grpIds;
+
+        @GridToStringInclude
+        /** The list of affected by snapshot operation baseline nodes. */
+        private final Set<UUID> bltNodes;
+
+        /** {@code true} if an execution of local snapshot tasks failed with an error. */
+        private volatile boolean hasErr;
+
+        /**
+         * @param snpName Snapshot name.
+         * @param grpIds Cache groups to include into snapshot.
+         */
+        public SnapshotOperationRequest(UUID rqId, UUID srcNodeId, String snpName, List<Integer> grpIds, Set<UUID> bltNodes) {
+            this.rqId = rqId;
+            this.srcNodeId = srcNodeId;
+            this.snpName = snpName;
+            this.grpIds = grpIds;
+            this.bltNodes = bltNodes;
+        }
+
+        /** {@inheritDoc} */
+        @Override public String toString() {
+            return S.toString(SnapshotOperationRequest.class, this);
+        }
+    }
+
+    /** */
+    private static class SnapshotOperationResponse implements Serializable {
+        /** Serial version uid. */
+        private static final long serialVersionUID = 0L;
+    }
+
+    /** Snapshot operation start message. */
+    private static class SnapshotStartDiscoveryMessage implements SnapshotDiscoveryMessage {
+        /** Serial version UID. */
+        private static final long serialVersionUID = 0L;
+
+        /** Discovery cache. */
+        private final DiscoCache discoCache;
+
+        /** Snapshot request id */
+        private final IgniteUuid id;
+
+        /**
+         * @param discoCache Discovery cache.
+         * @param id Snapshot request id.
+         */
+        public SnapshotStartDiscoveryMessage(DiscoCache discoCache, UUID id) {
+            this.discoCache = discoCache;
+            this.id = new IgniteUuid(id, 0);
+        }
+
+        /** {@inheritDoc} */
+        @Override public boolean needExchange() {
+            return true;
+        }
+
+        /** {@inheritDoc} */
+        @Override public boolean needAssignPartitions() {
+            return false;
+        }
+
+        /** {@inheritDoc} */
+        @Override public IgniteUuid id() {
+            return id;
+        }
+
+        /** {@inheritDoc} */
+        @Override public @Nullable DiscoveryCustomMessage ackMessage() {
+            return null;
+        }
+
+        /** {@inheritDoc} */
+        @Override public boolean isMutable() {
+            return false;
+        }
+
+        /** {@inheritDoc} */
+        @Override public DiscoCache createDiscoCache(GridDiscoveryManager mgr, AffinityTopologyVersion topVer,
+            DiscoCache discoCache) {
+            return this.discoCache;
+        }
+
+        /** {@inheritDoc} */
+        @Override public boolean equals(Object o) {
+            if (this == o)
+                return true;
+
+            if (o == null || getClass() != o.getClass())
+                return false;
+
+            SnapshotStartDiscoveryMessage message = (SnapshotStartDiscoveryMessage)o;
+
+            return id.equals(message.id);
+        }
+
+        /** {@inheritDoc} */
+        @Override public int hashCode() {
+            return Objects.hash(id);
+        }
+
+        /** {@inheritDoc} */
+        @Override public String toString() {
+            return S.toString(SnapshotStartDiscoveryMessage.class, this);
+        }
+    }
+
+    /** */
+    private static class ClusterSnapshotFuture extends GridFutureAdapter<Void> {
+        /** Unique snapshot request id. */
+        private final UUID rqId;
+
+        /** Snapshot name */
 
 Review comment:
   Point

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r408802631
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManager.java
 ##########
 @@ -0,0 +1,1986 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.channels.FileChannel;
+import java.nio.file.FileVisitResult;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.SimpleFileVisitor;
+import java.nio.file.attribute.BasicFileAttributes;
+import java.util.ArrayDeque;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Queue;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.Executor;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.concurrent.atomic.LongAdder;
+import java.util.function.BiConsumer;
+import java.util.function.BiFunction;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.events.DiscoveryEvent;
+import org.apache.ignite.failure.FailureContext;
+import org.apache.ignite.failure.FailureType;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.GridTopic;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.events.DiscoveryCustomEvent;
+import org.apache.ignite.internal.managers.communication.GridIoManager;
+import org.apache.ignite.internal.managers.communication.GridMessageListener;
+import org.apache.ignite.internal.managers.communication.TransmissionCancelledException;
+import org.apache.ignite.internal.managers.communication.TransmissionHandler;
+import org.apache.ignite.internal.managers.communication.TransmissionMeta;
+import org.apache.ignite.internal.managers.communication.TransmissionPolicy;
+import org.apache.ignite.internal.managers.discovery.DiscoCache;
+import org.apache.ignite.internal.managers.discovery.DiscoveryCustomMessage;
+import org.apache.ignite.internal.managers.discovery.GridDiscoveryManager;
+import org.apache.ignite.internal.managers.eventstorage.DiscoveryEventListener;
+import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.CacheType;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.PartitionsExchangeAware;
+import org.apache.ignite.internal.processors.cache.persistence.StorageException;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.filename.PdsFolderSettings;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.MetastorageLifecycleListener;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadOnlyMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadWriteMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.tree.io.PageIO;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.processors.marshaller.MappedName;
+import org.apache.ignite.internal.processors.metric.MetricRegistry;
+import org.apache.ignite.internal.processors.metric.impl.LongAdderMetric;
+import org.apache.ignite.internal.util.GridBusyLock;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.distributed.InitMessage;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.tostring.GridToStringInclude;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.A;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.S;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.lang.IgniteUuid;
+import org.apache.ignite.thread.IgniteThreadPoolExecutor;
+import org.apache.ignite.thread.OomExceptionHandler;
+import org.jetbrains.annotations.Nullable;
+
+import static java.nio.file.StandardOpenOption.READ;
+import static org.apache.ignite.cluster.ClusterState.active;
+import static org.apache.ignite.configuration.IgniteConfiguration.DFLT_SNAPSHOT_DIRECTORY;
+import static org.apache.ignite.events.EventType.EVT_NODE_FAILED;
+import static org.apache.ignite.events.EventType.EVT_NODE_LEFT;
+import static org.apache.ignite.internal.IgniteFeatures.PERSISTENCE_CACHE_SNAPSHOT;
+import static org.apache.ignite.internal.IgniteFeatures.nodeSupports;
+import static org.apache.ignite.internal.MarshallerContextImpl.mappingFileStoreWorkDir;
+import static org.apache.ignite.internal.MarshallerContextImpl.saveMappings;
+import static org.apache.ignite.internal.events.DiscoveryCustomEvent.EVT_DISCOVERY_CUSTOM_EVT;
+import static org.apache.ignite.internal.managers.communication.GridIoPolicy.SYSTEM_POOL;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.INDEX_PARTITION;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.MAX_PARTITION_ID;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.resolveBinaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.INDEX_FILE_NAME;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.PART_FILE_TEMPLATE;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFile;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFileName;
+import static org.apache.ignite.internal.processors.cache.persistence.filename.PdsConsistentIdProcessor.DB_DEFAULT_FOLDER;
+import static org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId.getFlagByPartId;
+import static org.apache.ignite.internal.util.IgniteUtils.isLocalNodeCoordinator;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.END_SNAPSHOT;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.START_SNAPSHOT;
+
+/**
+ * Internal implementation of snapshot operations over persistence caches.
+ * <p>
+ * There are two major actions available:
+ * <ul>
+ *     <li>Create snapshot of the whole cluster cache groups by triggering PME to achieve consistency.</li>
+ *     <li>Create local snapshot of requested cache groups and send it to the node which request this operation.
+ *     Cache groups will be transmitted using internal API for transferring files. See {@link TransmissionHandler}.</li>
+ * </ul>
+ */
+public class IgniteSnapshotManager extends GridCacheSharedManagerAdapter
+    implements IgniteSnapshot, PartitionsExchangeAware, MetastorageLifecycleListener {
+    /** File with delta pages suffix. */
+    public static final String DELTA_SUFFIX = ".delta";
+
+    /** File name template consists of delta pages. */
+    public static final String PART_DELTA_TEMPLATE = PART_FILE_TEMPLATE + DELTA_SUFFIX;
+
+    /** File name template for index delta pages. */
+    public static final String INDEX_DELTA_NAME = INDEX_FILE_NAME + DELTA_SUFFIX;
+
+    /** Text Reason for checkpoint to start snapshot operation. */
+    public static final String CP_SNAPSHOT_REASON = "Checkpoint started to enforce snapshot operation: %s";
+
+    /** Name prefix for each remote snapshot operation. */
+    public static final String RMT_SNAPSHOT_PREFIX = "snapshot_";
+
+    /** Default snapshot directory for loading remote snapshots. */
+    public static final String DFLT_SNAPSHOT_TMP_DIR = "snp";
+
+    /** Timeout in millisecond for snapshot operations. */
+    public static final long DFLT_SNAPSHOT_TIMEOUT = 15_000L;
+
+    /** Snapshot in progress error message. */
+    public static final String SNP_IN_PROGRESS_ERR_MSG = "Operation rejected due to the snapshot operation in progress.";
+
+    /** Error message to finalize snapshot tasks. */
+    public static final String SNP_NODE_STOPPING_ERR_MSG = "Snapshot has been cancelled due to the local node " +
+        "is stopping";
+
+    /** Metastorage key to save currently running snapshot. */
+    public static final String SNP_RUNNING_KEY = "snapshot-running";
+
+    /** Snapshot metrics prefix. */
+    public static final String SNAPSHOT_METRICS = "snapshot";
+
+    /** Prefix for snapshot threads. */
+    private static final String SNAPSHOT_RUNNER_THREAD_PREFIX = "snapshot-runner";
+
+    /** Total number of thread to perform local snapshot. */
+    private static final int SNAPSHOT_THREAD_POOL_SIZE = 4;
+
+    /** Default snapshot topic to receive snapshots from remote node. */
+    private static final Object DFLT_INITIAL_SNAPSHOT_TOPIC = GridTopic.TOPIC_SNAPSHOT.topic("rmt_snp");
+
+    /** File transmission parameter of cache group id. */
+    private static final String SNP_GRP_ID_PARAM = "grpId";
+
+    /** File transmission parameter of cache partition id. */
+    private static final String SNP_PART_ID_PARAM = "partId";
+
+    /** File transmission parameter of node-sender directory path with its consistentId (e.g. db/IgniteNode0). */
+    private static final String SNP_DB_NODE_PATH_PARAM = "dbNodePath";
+
+    /** File transmission parameter of a cache directory with is currently sends its partitions. */
+    private static final String SNP_CACHE_DIR_NAME_PARAM = "cacheDirName";
+
+    /** Snapshot parameter name for a file transmission. */
+    private static final String SNP_NAME_PARAM = "snpName";
+
+    /** Total snapshot files count which receiver should expect to receive. */
+    private static final String SNP_PARTITIONS_CNT = "partsCnt";
+
+    /**
+     * Local buffer to perform copy-on-write operations with pages for {@code SnapshotFutureTask.PageStoreSerialWriter}s.
+     * It is important to have only only buffer per thread (instead of creating each buffer per
+     * each {@code SnapshotFutureTask.PageStoreSerialWriter}) this is redundant and can lead to OOM errors. Direct buffer
+     * deallocate only when ByteBuffer is garbage collected, but it can get out of off-heap memory before it.
+     */
+    private final ThreadLocal<ByteBuffer> locBuff;
+
+    /** Map of registered cache snapshot processes and their corresponding contexts. */
+    private final ConcurrentMap<String, SnapshotFutureTask> locSnpTasks = new ConcurrentHashMap<>();
+
+    /** Lock to protect the resources is used. */
+    private final GridBusyLock busyLock = new GridBusyLock();
+
+    /** Requested snapshot from remote node. */
+    private final AtomicReference<RemoteSnapshotFuture> rmtSnpReq = new AtomicReference<>();
+
+    /** Mutex used to order cluster snapshot operation progress. */
+    private final Object snpOpMux = new Object();
+
+    /** Take snapshot operation procedure. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> startSnpProc;
+
+    /** Check previously performed snapshot operation and delete uncompleted files if need. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> endSnpProc;
+
+    /** Resolved persistent data storage settings. */
+    private volatile PdsFolderSettings pdsSettings;
+
+    /** Fully initialized metastorage. */
+    private volatile ReadWriteMetastorage metaStorage;
+
+    /** Local snapshot sender factory. */
+    private Function<String, SnapshotSender> locSndrFactory = localSnapshotSenderFactory();
+
+    /** Main snapshot directory to save created snapshots. */
+    private volatile File locSnpDir;
+
+    /**
+     * Working directory for loaded snapshots from the remote nodes and storing
+     * temporary partition delta-files of locally started snapshot process.
+     */
+    private File tmpWorkDir;
+
+    /** Factory to working with delta as file storage. */
+    private volatile FileIOFactory ioFactory = new RandomAccessFileIOFactory();
+
+    /** Factory to create page store for restore. */
+    private volatile BiFunction<Integer, Boolean, FilePageStoreFactory> storeFactory;
+
+    /** Snapshot thread pool to perform local partition snapshots. */
+    private ExecutorService snpRunner;
+
+    /** System discovery message listener. */
+    private DiscoveryEventListener discoLsnr;
+
+    /** Cluster snapshot operation requested by user. */
+    private ClusterSnapshotFuture clusterSnpFut;
+
+    /** Current snapshot operation on local node. */
+    private volatile SnapshotOperationRequest clusterSnpRq;
+
+    /** {@code true} if recovery process occurred for snapshot. */
+    private volatile boolean recovered;
+
+    /** Last seen cluster snapshot operation. */
+    private volatile ClusterSnapshotFuture lastSeenSnpFut = new ClusterSnapshotFuture();
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public IgniteSnapshotManager(GridKernalContext ctx) {
+        locBuff = ThreadLocal.withInitial(() ->
+            ByteBuffer.allocateDirect(ctx.config().getDataStorageConfiguration().getPageSize())
+                .order(ByteOrder.nativeOrder()));
+
+        startSnpProc = new DistributedProcess<>(ctx, START_SNAPSHOT, this::initLocalSnapshotStartStage,
+            this::processLocalSnapshotStartStageResult);
+
+        endSnpProc = new DistributedProcess<>(ctx, END_SNAPSHOT, this::initLocalSnapshotEndStage,
+            this::processLocalSnapshotEndStageResult);
+    }
+
+    /**
+     * @param snapshotCacheDir Snapshot directory to store files.
+     * @param partId Cache partition identifier.
+     * @return A file representation.
+     */
+    public static File partDeltaFile(File snapshotCacheDir, int partId) {
+        return new File(snapshotCacheDir, partDeltaFileName(partId));
+    }
+
+    /**
+     * @param partId Partition id.
+     * @return File name of delta partition pages.
+     */
+    public static String partDeltaFileName(int partId) {
+        assert partId <= MAX_PARTITION_ID || partId == INDEX_PARTITION;
+
+        return partId == INDEX_PARTITION ? INDEX_DELTA_NAME : String.format(PART_DELTA_TEMPLATE, partId);
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void start0() throws IgniteCheckedException {
+        super.start0();
+
+        GridKernalContext ctx = cctx.kernalContext();
+
+        if (ctx.clientNode())
+            return;
+
+        if (!CU.isPersistenceEnabled(ctx.config()))
+            return;
+
+        snpRunner = new IgniteThreadPoolExecutor(SNAPSHOT_RUNNER_THREAD_PREFIX,
+            cctx.igniteInstanceName(),
+            SNAPSHOT_THREAD_POOL_SIZE,
+            SNAPSHOT_THREAD_POOL_SIZE,
+            IgniteConfiguration.DFLT_THREAD_KEEP_ALIVE_TIME,
+            new LinkedBlockingQueue<>(),
+            SYSTEM_POOL,
+            new OomExceptionHandler(ctx));
+
+        assert cctx.pageStore() instanceof FilePageStoreManager;
+
+        FilePageStoreManager storeMgr = (FilePageStoreManager)cctx.pageStore();
+
+        pdsSettings = cctx.kernalContext().pdsFolderResolver().resolveFolders();
+
+        locSnpDir = resolveSnapshotWorkDirectory(ctx.config());
+        tmpWorkDir = Paths.get(storeMgr.workDir().getAbsolutePath(), DFLT_SNAPSHOT_TMP_DIR).toFile();
+
+        U.ensureDirectory(locSnpDir, "snapshot work directory", log);
+        U.ensureDirectory(tmpWorkDir, "temp directory for snapshot creation", log);
+
+        MetricRegistry mreg = cctx.kernalContext().metric().registry(SNAPSHOT_METRICS);
+
+        mreg.register("LastSnapshotStartTime", () -> lastSeenSnpFut.startTime,
+            "The system time approximated by 10 ms when the last cluster snapshot operation has been started.");
+        mreg.register("LastSnapshotEndTime", () -> lastSeenSnpFut.endTime,
+            "The system time approximated by 10 ms when the last cluster snapshot operation has been finished.");
+        mreg.register("LastSnapshotName", () -> lastSeenSnpFut.name, String.class,
+            "The name of last started cluster snapshot operation.");
+        mreg.register("LastSnapshotErrorMessage",
+            () -> lastSeenSnpFut.error() == null ? null : lastSeenSnpFut.error().getMessage(),
+            String.class,
+            "The error message of last started cluster snapshot operation which fail. This value will be 'null' " +
+                "if last snapshot operation completed successfully.");
+        mreg.register("localSnapshotList", this::getSnapshots, List.class,
+            "The list of all known snapshots currently saved on the local node with respect to " +
+                "configured via IgniteConfiguration a snapshot path.");
+
+        storeFactory = storeMgr::getPageStoreFactory;
+
+        cctx.exchange().registerExchangeAwareComponent(this);
+        ctx.internalSubscriptionProcessor().registerMetastorageListener(this);
+
+        // Receive remote snapshots requests.
+        cctx.gridIO().addMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC, new GridMessageListener() {
+            @Override public void onMessage(UUID nodeId, Object msg, byte plc) {
+                if (!busyLock.enterBusy())
+                    return;
+
+                try {
+                    if (msg instanceof SnapshotRequestMessage) {
+                        SnapshotRequestMessage reqMsg0 = (SnapshotRequestMessage)msg;
+                        String snpName = reqMsg0.snapshotName();
+
+                        synchronized (this) {
+                            SnapshotFutureTask task = lastScheduledRemoteSnapshotTask(nodeId);
+
+                            if (task != null) {
+                                // Task will also be removed from local map due to the listener on future done.
+                                task.cancel();
+
+                                log.info("Snapshot request has been cancelled due to another request received " +
+                                    "[prevSnpResp=" + task + ", msg0=" + reqMsg0 + ']');
+                            }
+                        }
+
+                        SnapshotFutureTask task = registerSnapshotTask(snpName,
+                            nodeId,
+                            reqMsg0.parts(),
+                            remoteSnapshotSender(snpName, nodeId));
+
+                        task.listen(f -> {
+                            if (f.error() == null)
+                                return;
+
+                            U.error(log, "Failed to process request of creating a snapshot " +
+                                "[from=" + nodeId + ", msg=" + reqMsg0 + ']', f.error());
+
+                            try {
+                                cctx.gridIO().sendToCustomTopic(nodeId,
+                                    DFLT_INITIAL_SNAPSHOT_TOPIC,
+                                    new SnapshotResponseMessage(reqMsg0.snapshotName(), f.error().getMessage()),
+                                    SYSTEM_POOL);
+                            }
+                            catch (IgniteCheckedException ex0) {
+                                U.error(log, "Fail to send the response message with processing snapshot request " +
+                                    "error [request=" + reqMsg0 + ", nodeId=" + nodeId + ']', ex0);
+                            }
+                        });
+
+                        task.start();
+                    }
+                    else if (msg instanceof SnapshotResponseMessage) {
+                        SnapshotResponseMessage respMsg0 = (SnapshotResponseMessage)msg;
+
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.snpName.equals(respMsg0.snapshotName())) {
+                            if (log.isInfoEnabled()) {
+                                log.info("A stale snapshot response message has been received. Will be ignored " +
+                                    "[fromNodeId=" + nodeId + ", response=" + respMsg0 + ']');
+                            }
+
+                            return;
+                        }
+
+                        if (respMsg0.errorMessage() != null) {
+                            fut0.onDone(new IgniteCheckedException("Request cancelled. The snapshot operation stopped " +
+                                "on the remote node with an error: " + respMsg0.errorMessage()));
+                        }
+                    }
+                }
+                catch (Throwable e) {
+                    U.error(log, "Processing snapshot request from remote node fails with an error", e);
+
+                    cctx.kernalContext().failure().process(new FailureContext(FailureType.CRITICAL_ERROR, e));
+                }
+                finally {
+                    busyLock.leaveBusy();
+                }
+            }
+        });
+
+        cctx.gridEvents().addDiscoveryEventListener(discoLsnr = (evt, discoCache) -> {
+            if (!busyLock.enterBusy())
+                return;
+
+            try {
+                UUID leftNodeId = evt.eventNode().id();
+
+                if (evt.type() == EVT_DISCOVERY_CUSTOM_EVT) {
+                    DiscoveryCustomEvent evt0 = (DiscoveryCustomEvent)evt;
+
+                    if (evt0.customMessage() instanceof InitMessage) {
+                        InitMessage<?> msg = (InitMessage<?>)evt0.customMessage();
+
+                        // This happens when #takeSnapshot() method already invoked and distributed process
+                        // starts its action.
+                        if (msg.type() == START_SNAPSHOT.ordinal()) {
+                            assert clusterSnpRq != null ||
+                                !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()) : evt;
+
+                            DiscoveryCustomEvent customEvt = new DiscoveryCustomEvent();
+
+                            customEvt.node(evt0.node());
+                            customEvt.eventNode(evt0.eventNode());
+                            customEvt.affinityTopologyVersion(evt0.affinityTopologyVersion());
+                            customEvt.customMessage(new SnapshotStartDiscoveryMessage(discoCache, msg.processId()));
+
+                            // Handle new event inside discovery thread, so no guarantees will be violated.
+                            cctx.exchange().onDiscoveryEvent(customEvt, discoCache);
+                        }
+                    }
+                }
+                else if (evt.type() == EVT_NODE_LEFT || evt.type() == EVT_NODE_FAILED) {
+                    SnapshotOperationRequest snpRq = clusterSnpRq;
+
+                    for (SnapshotFutureTask sctx : locSnpTasks.values()) {
+                        if (sctx.sourceNodeId().equals(leftNodeId) ||
+                            (snpRq != null &&
+                                snpRq.snpName.equals(sctx.snapshotName()) &&
+                                snpRq.bltNodes.contains(leftNodeId))) {
+                            sctx.acceptException(new ClusterTopologyCheckedException("The node which requested snapshot " +
 
 Review comment:
   Use two different error messages for two different cases. The second condition now is not related to error message.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r410051015
 
 

 ##########
 File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotSelfTest.java
 ##########
 @@ -286,6 +293,134 @@ public void testSnapshotPrimaryBackupsTheSame() throws Exception {
         TestRecordingCommunicationSpi.stopBlockAll();
     }
 
+    /** @throws Exception If fails. */
+    @Test
+    public void testClusterSnapshotConsistencyUnderLoad() throws Exception {
+        int clients = 50;
+        int balance = 10_000;
+        int transferLimit = 1000;
+        int total = clients * balance * 2;
+        int grids = 3;
+        int transferThreadCnt = 4;
+        AtomicBoolean stop = new AtomicBoolean(false);
+        CountDownLatch txStarted = new CountDownLatch(1);
+
+        CacheConfiguration<Integer, Account> eastCcfg = txCacheConfig(new CacheConfiguration<>("east"));
+        CacheConfiguration<Integer, Account> westCcfg = txCacheConfig(new CacheConfiguration<>("west"));
+
+        for (int i = 0; i < grids; i++)
+            startGrid(optimize(getConfiguration(getTestIgniteInstanceName(i)).setCacheConfiguration(eastCcfg, westCcfg)));
+
+        grid(0).cluster().state(ACTIVE);
+
+        Ignite client = startClientGrid(grids);
+
+        IgniteCache<Integer, Account> eastCache = client.cache(eastCcfg.getName());
+        IgniteCache<Integer, Account> westCache = client.cache(westCcfg.getName());
+
+        // Create clients with zero balance.
+        for (int i = 0; i < clients; i++) {
+            eastCache.put(i, new Account(i, balance));
+            westCache.put(i, new Account(i, balance));
+        }
+
+        assertEquals("The initial summary value in all caches is not correct.",
+            total, sumAllCacheValues(client, clients, eastCcfg.getName(), westCcfg.getName()));
 
 Review comment:
   `clients` here seems like we want to pass some collection to `sumAllCacheValues`. Let's rename this variable to `clientsCnt` or use scan query inside `sumAllCacheValues` and remove `clients` parameter

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r408764772
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManager.java
 ##########
 @@ -0,0 +1,1986 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.channels.FileChannel;
+import java.nio.file.FileVisitResult;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.SimpleFileVisitor;
+import java.nio.file.attribute.BasicFileAttributes;
+import java.util.ArrayDeque;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Queue;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.Executor;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.concurrent.atomic.LongAdder;
+import java.util.function.BiConsumer;
+import java.util.function.BiFunction;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.events.DiscoveryEvent;
+import org.apache.ignite.failure.FailureContext;
+import org.apache.ignite.failure.FailureType;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.GridTopic;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.events.DiscoveryCustomEvent;
+import org.apache.ignite.internal.managers.communication.GridIoManager;
+import org.apache.ignite.internal.managers.communication.GridMessageListener;
+import org.apache.ignite.internal.managers.communication.TransmissionCancelledException;
+import org.apache.ignite.internal.managers.communication.TransmissionHandler;
+import org.apache.ignite.internal.managers.communication.TransmissionMeta;
+import org.apache.ignite.internal.managers.communication.TransmissionPolicy;
+import org.apache.ignite.internal.managers.discovery.DiscoCache;
+import org.apache.ignite.internal.managers.discovery.DiscoveryCustomMessage;
+import org.apache.ignite.internal.managers.discovery.GridDiscoveryManager;
+import org.apache.ignite.internal.managers.eventstorage.DiscoveryEventListener;
+import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.CacheType;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.PartitionsExchangeAware;
+import org.apache.ignite.internal.processors.cache.persistence.StorageException;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.filename.PdsFolderSettings;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.MetastorageLifecycleListener;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadOnlyMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadWriteMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.tree.io.PageIO;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.processors.marshaller.MappedName;
+import org.apache.ignite.internal.processors.metric.MetricRegistry;
+import org.apache.ignite.internal.processors.metric.impl.LongAdderMetric;
+import org.apache.ignite.internal.util.GridBusyLock;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.distributed.InitMessage;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.tostring.GridToStringInclude;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.A;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.S;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.lang.IgniteUuid;
+import org.apache.ignite.thread.IgniteThreadPoolExecutor;
+import org.apache.ignite.thread.OomExceptionHandler;
+import org.jetbrains.annotations.Nullable;
+
+import static java.nio.file.StandardOpenOption.READ;
+import static org.apache.ignite.cluster.ClusterState.active;
+import static org.apache.ignite.configuration.IgniteConfiguration.DFLT_SNAPSHOT_DIRECTORY;
+import static org.apache.ignite.events.EventType.EVT_NODE_FAILED;
+import static org.apache.ignite.events.EventType.EVT_NODE_LEFT;
+import static org.apache.ignite.internal.IgniteFeatures.PERSISTENCE_CACHE_SNAPSHOT;
+import static org.apache.ignite.internal.IgniteFeatures.nodeSupports;
+import static org.apache.ignite.internal.MarshallerContextImpl.mappingFileStoreWorkDir;
+import static org.apache.ignite.internal.MarshallerContextImpl.saveMappings;
+import static org.apache.ignite.internal.events.DiscoveryCustomEvent.EVT_DISCOVERY_CUSTOM_EVT;
+import static org.apache.ignite.internal.managers.communication.GridIoPolicy.SYSTEM_POOL;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.INDEX_PARTITION;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.MAX_PARTITION_ID;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.resolveBinaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.INDEX_FILE_NAME;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.PART_FILE_TEMPLATE;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFile;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFileName;
+import static org.apache.ignite.internal.processors.cache.persistence.filename.PdsConsistentIdProcessor.DB_DEFAULT_FOLDER;
+import static org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId.getFlagByPartId;
+import static org.apache.ignite.internal.util.IgniteUtils.isLocalNodeCoordinator;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.END_SNAPSHOT;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.START_SNAPSHOT;
+
+/**
+ * Internal implementation of snapshot operations over persistence caches.
+ * <p>
+ * There are two major actions available:
+ * <ul>
+ *     <li>Create snapshot of the whole cluster cache groups by triggering PME to achieve consistency.</li>
+ *     <li>Create local snapshot of requested cache groups and send it to the node which request this operation.
+ *     Cache groups will be transmitted using internal API for transferring files. See {@link TransmissionHandler}.</li>
+ * </ul>
+ */
+public class IgniteSnapshotManager extends GridCacheSharedManagerAdapter
+    implements IgniteSnapshot, PartitionsExchangeAware, MetastorageLifecycleListener {
+    /** File with delta pages suffix. */
+    public static final String DELTA_SUFFIX = ".delta";
+
+    /** File name template consists of delta pages. */
+    public static final String PART_DELTA_TEMPLATE = PART_FILE_TEMPLATE + DELTA_SUFFIX;
+
+    /** File name template for index delta pages. */
+    public static final String INDEX_DELTA_NAME = INDEX_FILE_NAME + DELTA_SUFFIX;
+
+    /** Text Reason for checkpoint to start snapshot operation. */
+    public static final String CP_SNAPSHOT_REASON = "Checkpoint started to enforce snapshot operation: %s";
+
+    /** Name prefix for each remote snapshot operation. */
+    public static final String RMT_SNAPSHOT_PREFIX = "snapshot_";
+
+    /** Default snapshot directory for loading remote snapshots. */
+    public static final String DFLT_SNAPSHOT_TMP_DIR = "snp";
+
+    /** Timeout in millisecond for snapshot operations. */
+    public static final long DFLT_SNAPSHOT_TIMEOUT = 15_000L;
+
+    /** Snapshot in progress error message. */
+    public static final String SNP_IN_PROGRESS_ERR_MSG = "Operation rejected due to the snapshot operation in progress.";
+
+    /** Error message to finalize snapshot tasks. */
+    public static final String SNP_NODE_STOPPING_ERR_MSG = "Snapshot has been cancelled due to the local node " +
+        "is stopping";
+
+    /** Metastorage key to save currently running snapshot. */
+    public static final String SNP_RUNNING_KEY = "snapshot-running";
+
+    /** Snapshot metrics prefix. */
+    public static final String SNAPSHOT_METRICS = "snapshot";
+
+    /** Prefix for snapshot threads. */
+    private static final String SNAPSHOT_RUNNER_THREAD_PREFIX = "snapshot-runner";
+
+    /** Total number of thread to perform local snapshot. */
+    private static final int SNAPSHOT_THREAD_POOL_SIZE = 4;
+
+    /** Default snapshot topic to receive snapshots from remote node. */
+    private static final Object DFLT_INITIAL_SNAPSHOT_TOPIC = GridTopic.TOPIC_SNAPSHOT.topic("rmt_snp");
+
+    /** File transmission parameter of cache group id. */
+    private static final String SNP_GRP_ID_PARAM = "grpId";
+
+    /** File transmission parameter of cache partition id. */
+    private static final String SNP_PART_ID_PARAM = "partId";
+
+    /** File transmission parameter of node-sender directory path with its consistentId (e.g. db/IgniteNode0). */
+    private static final String SNP_DB_NODE_PATH_PARAM = "dbNodePath";
+
+    /** File transmission parameter of a cache directory with is currently sends its partitions. */
+    private static final String SNP_CACHE_DIR_NAME_PARAM = "cacheDirName";
+
+    /** Snapshot parameter name for a file transmission. */
+    private static final String SNP_NAME_PARAM = "snpName";
+
+    /** Total snapshot files count which receiver should expect to receive. */
+    private static final String SNP_PARTITIONS_CNT = "partsCnt";
+
+    /**
+     * Local buffer to perform copy-on-write operations with pages for {@code SnapshotFutureTask.PageStoreSerialWriter}s.
+     * It is important to have only only buffer per thread (instead of creating each buffer per
+     * each {@code SnapshotFutureTask.PageStoreSerialWriter}) this is redundant and can lead to OOM errors. Direct buffer
+     * deallocate only when ByteBuffer is garbage collected, but it can get out of off-heap memory before it.
+     */
+    private final ThreadLocal<ByteBuffer> locBuff;
+
+    /** Map of registered cache snapshot processes and their corresponding contexts. */
+    private final ConcurrentMap<String, SnapshotFutureTask> locSnpTasks = new ConcurrentHashMap<>();
+
+    /** Lock to protect the resources is used. */
+    private final GridBusyLock busyLock = new GridBusyLock();
+
+    /** Requested snapshot from remote node. */
+    private final AtomicReference<RemoteSnapshotFuture> rmtSnpReq = new AtomicReference<>();
+
+    /** Mutex used to order cluster snapshot operation progress. */
+    private final Object snpOpMux = new Object();
+
+    /** Take snapshot operation procedure. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> startSnpProc;
+
+    /** Check previously performed snapshot operation and delete uncompleted files if need. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> endSnpProc;
+
+    /** Resolved persistent data storage settings. */
+    private volatile PdsFolderSettings pdsSettings;
+
+    /** Fully initialized metastorage. */
+    private volatile ReadWriteMetastorage metaStorage;
+
+    /** Local snapshot sender factory. */
+    private Function<String, SnapshotSender> locSndrFactory = localSnapshotSenderFactory();
+
+    /** Main snapshot directory to save created snapshots. */
+    private volatile File locSnpDir;
+
+    /**
+     * Working directory for loaded snapshots from the remote nodes and storing
+     * temporary partition delta-files of locally started snapshot process.
+     */
+    private File tmpWorkDir;
+
+    /** Factory to working with delta as file storage. */
+    private volatile FileIOFactory ioFactory = new RandomAccessFileIOFactory();
+
+    /** Factory to create page store for restore. */
+    private volatile BiFunction<Integer, Boolean, FilePageStoreFactory> storeFactory;
+
+    /** Snapshot thread pool to perform local partition snapshots. */
+    private ExecutorService snpRunner;
+
+    /** System discovery message listener. */
+    private DiscoveryEventListener discoLsnr;
+
+    /** Cluster snapshot operation requested by user. */
+    private ClusterSnapshotFuture clusterSnpFut;
+
+    /** Current snapshot operation on local node. */
+    private volatile SnapshotOperationRequest clusterSnpRq;
+
+    /** {@code true} if recovery process occurred for snapshot. */
+    private volatile boolean recovered;
+
+    /** Last seen cluster snapshot operation. */
+    private volatile ClusterSnapshotFuture lastSeenSnpFut = new ClusterSnapshotFuture();
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public IgniteSnapshotManager(GridKernalContext ctx) {
+        locBuff = ThreadLocal.withInitial(() ->
+            ByteBuffer.allocateDirect(ctx.config().getDataStorageConfiguration().getPageSize())
+                .order(ByteOrder.nativeOrder()));
+
+        startSnpProc = new DistributedProcess<>(ctx, START_SNAPSHOT, this::initLocalSnapshotStartStage,
+            this::processLocalSnapshotStartStageResult);
+
+        endSnpProc = new DistributedProcess<>(ctx, END_SNAPSHOT, this::initLocalSnapshotEndStage,
+            this::processLocalSnapshotEndStageResult);
+    }
+
+    /**
+     * @param snapshotCacheDir Snapshot directory to store files.
+     * @param partId Cache partition identifier.
+     * @return A file representation.
+     */
+    public static File partDeltaFile(File snapshotCacheDir, int partId) {
+        return new File(snapshotCacheDir, partDeltaFileName(partId));
+    }
+
+    /**
+     * @param partId Partition id.
+     * @return File name of delta partition pages.
+     */
+    public static String partDeltaFileName(int partId) {
+        assert partId <= MAX_PARTITION_ID || partId == INDEX_PARTITION;
+
+        return partId == INDEX_PARTITION ? INDEX_DELTA_NAME : String.format(PART_DELTA_TEMPLATE, partId);
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void start0() throws IgniteCheckedException {
+        super.start0();
+
+        GridKernalContext ctx = cctx.kernalContext();
+
+        if (ctx.clientNode())
+            return;
+
+        if (!CU.isPersistenceEnabled(ctx.config()))
+            return;
+
+        snpRunner = new IgniteThreadPoolExecutor(SNAPSHOT_RUNNER_THREAD_PREFIX,
+            cctx.igniteInstanceName(),
+            SNAPSHOT_THREAD_POOL_SIZE,
+            SNAPSHOT_THREAD_POOL_SIZE,
+            IgniteConfiguration.DFLT_THREAD_KEEP_ALIVE_TIME,
+            new LinkedBlockingQueue<>(),
+            SYSTEM_POOL,
+            new OomExceptionHandler(ctx));
+
+        assert cctx.pageStore() instanceof FilePageStoreManager;
+
+        FilePageStoreManager storeMgr = (FilePageStoreManager)cctx.pageStore();
+
+        pdsSettings = cctx.kernalContext().pdsFolderResolver().resolveFolders();
+
+        locSnpDir = resolveSnapshotWorkDirectory(ctx.config());
+        tmpWorkDir = Paths.get(storeMgr.workDir().getAbsolutePath(), DFLT_SNAPSHOT_TMP_DIR).toFile();
+
+        U.ensureDirectory(locSnpDir, "snapshot work directory", log);
+        U.ensureDirectory(tmpWorkDir, "temp directory for snapshot creation", log);
+
+        MetricRegistry mreg = cctx.kernalContext().metric().registry(SNAPSHOT_METRICS);
+
+        mreg.register("LastSnapshotStartTime", () -> lastSeenSnpFut.startTime,
+            "The system time approximated by 10 ms when the last cluster snapshot operation has been started.");
+        mreg.register("LastSnapshotEndTime", () -> lastSeenSnpFut.endTime,
+            "The system time approximated by 10 ms when the last cluster snapshot operation has been finished.");
+        mreg.register("LastSnapshotName", () -> lastSeenSnpFut.name, String.class,
+            "The name of last started cluster snapshot operation.");
+        mreg.register("LastSnapshotErrorMessage",
+            () -> lastSeenSnpFut.error() == null ? null : lastSeenSnpFut.error().getMessage(),
+            String.class,
+            "The error message of last started cluster snapshot operation which fail. This value will be 'null' " +
+                "if last snapshot operation completed successfully.");
+        mreg.register("localSnapshotList", this::getSnapshots, List.class,
+            "The list of all known snapshots currently saved on the local node with respect to " +
+                "configured via IgniteConfiguration a snapshot path.");
+
+        storeFactory = storeMgr::getPageStoreFactory;
+
+        cctx.exchange().registerExchangeAwareComponent(this);
+        ctx.internalSubscriptionProcessor().registerMetastorageListener(this);
+
+        // Receive remote snapshots requests.
+        cctx.gridIO().addMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC, new GridMessageListener() {
+            @Override public void onMessage(UUID nodeId, Object msg, byte plc) {
+                if (!busyLock.enterBusy())
+                    return;
+
+                try {
+                    if (msg instanceof SnapshotRequestMessage) {
+                        SnapshotRequestMessage reqMsg0 = (SnapshotRequestMessage)msg;
+                        String snpName = reqMsg0.snapshotName();
+
+                        synchronized (this) {
+                            SnapshotFutureTask task = lastScheduledRemoteSnapshotTask(nodeId);
+
+                            if (task != null) {
+                                // Task will also be removed from local map due to the listener on future done.
+                                task.cancel();
+
+                                log.info("Snapshot request has been cancelled due to another request received " +
+                                    "[prevSnpResp=" + task + ", msg0=" + reqMsg0 + ']');
+                            }
+                        }
+
+                        SnapshotFutureTask task = registerSnapshotTask(snpName,
+                            nodeId,
+                            reqMsg0.parts(),
+                            remoteSnapshotSender(snpName, nodeId));
+
+                        task.listen(f -> {
+                            if (f.error() == null)
+                                return;
+
+                            U.error(log, "Failed to process request of creating a snapshot " +
+                                "[from=" + nodeId + ", msg=" + reqMsg0 + ']', f.error());
+
+                            try {
+                                cctx.gridIO().sendToCustomTopic(nodeId,
+                                    DFLT_INITIAL_SNAPSHOT_TOPIC,
+                                    new SnapshotResponseMessage(reqMsg0.snapshotName(), f.error().getMessage()),
+                                    SYSTEM_POOL);
+                            }
+                            catch (IgniteCheckedException ex0) {
+                                U.error(log, "Fail to send the response message with processing snapshot request " +
+                                    "error [request=" + reqMsg0 + ", nodeId=" + nodeId + ']', ex0);
+                            }
+                        });
+
+                        task.start();
+                    }
+                    else if (msg instanceof SnapshotResponseMessage) {
+                        SnapshotResponseMessage respMsg0 = (SnapshotResponseMessage)msg;
+
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.snpName.equals(respMsg0.snapshotName())) {
+                            if (log.isInfoEnabled()) {
+                                log.info("A stale snapshot response message has been received. Will be ignored " +
+                                    "[fromNodeId=" + nodeId + ", response=" + respMsg0 + ']');
+                            }
+
+                            return;
+                        }
+
+                        if (respMsg0.errorMessage() != null) {
+                            fut0.onDone(new IgniteCheckedException("Request cancelled. The snapshot operation stopped " +
+                                "on the remote node with an error: " + respMsg0.errorMessage()));
+                        }
+                    }
+                }
+                catch (Throwable e) {
+                    U.error(log, "Processing snapshot request from remote node fails with an error", e);
+
+                    cctx.kernalContext().failure().process(new FailureContext(FailureType.CRITICAL_ERROR, e));
+                }
+                finally {
+                    busyLock.leaveBusy();
+                }
+            }
+        });
+
+        cctx.gridEvents().addDiscoveryEventListener(discoLsnr = (evt, discoCache) -> {
+            if (!busyLock.enterBusy())
+                return;
+
+            try {
+                UUID leftNodeId = evt.eventNode().id();
+
+                if (evt.type() == EVT_DISCOVERY_CUSTOM_EVT) {
+                    DiscoveryCustomEvent evt0 = (DiscoveryCustomEvent)evt;
+
+                    if (evt0.customMessage() instanceof InitMessage) {
+                        InitMessage<?> msg = (InitMessage<?>)evt0.customMessage();
+
+                        // This happens when #takeSnapshot() method already invoked and distributed process
+                        // starts its action.
+                        if (msg.type() == START_SNAPSHOT.ordinal()) {
+                            assert clusterSnpRq != null ||
+                                !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()) : evt;
+
+                            DiscoveryCustomEvent customEvt = new DiscoveryCustomEvent();
+
+                            customEvt.node(evt0.node());
+                            customEvt.eventNode(evt0.eventNode());
+                            customEvt.affinityTopologyVersion(evt0.affinityTopologyVersion());
+                            customEvt.customMessage(new SnapshotStartDiscoveryMessage(discoCache, msg.processId()));
+
+                            // Handle new event inside discovery thread, so no guarantees will be violated.
+                            cctx.exchange().onDiscoveryEvent(customEvt, discoCache);
 
 Review comment:
   I don't like the idea of injecting to the discovery and using an internal `onDiscoveryEvent` method. I think we can do it with only one event. For example,  `SnapshotStartDiscoveryMessage` can extend `InitMessage` and we can pass `InitMessage` class and `InitMessage` factory to `DistributedProcess` constructor. In this case, you don't need to make `onDiscoveryEvent` public and can use standard discovery workflow.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r410054518
 
 

 ##########
 File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManagerSelfTest.java
 ##########
 @@ -354,6 +354,42 @@ public void testSnapshotCreateLocalCopyPartitionFail() throws Exception {
             err_msg);
     }
 
+    /** @throws Exception If fails. */
+    @Test
+    public void testSnapshotRemoteWithNodeFiler() throws Exception {
+        int grids = 3;
+        CacheConfiguration<Integer, Integer> ccfg = txCacheConfig(new CacheConfiguration<Integer, Integer>(DEFAULT_CACHE_NAME))
+            .setNodeFilter(node -> node.consistentId().toString().endsWith("1"));
+
+        for (int i = 0; i < grids; i++)
+            startGrid(optimize(getConfiguration(getTestIgniteInstanceName(i)).setCacheConfiguration()));
+
+        IgniteEx ig0 = grid(0);
+        ig0.cluster().baselineAutoAdjustEnabled(false);
+        ig0.cluster().state(ACTIVE);
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            ig0.getOrCreateCache(ccfg).put(i, i);
+
+        forceCheckpoint();
 
 Review comment:
   Redundant

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r409145417
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManager.java
 ##########
 @@ -0,0 +1,1986 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.channels.FileChannel;
+import java.nio.file.FileVisitResult;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.SimpleFileVisitor;
+import java.nio.file.attribute.BasicFileAttributes;
+import java.util.ArrayDeque;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Queue;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.Executor;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.concurrent.atomic.LongAdder;
+import java.util.function.BiConsumer;
+import java.util.function.BiFunction;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.events.DiscoveryEvent;
+import org.apache.ignite.failure.FailureContext;
+import org.apache.ignite.failure.FailureType;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.GridTopic;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.events.DiscoveryCustomEvent;
+import org.apache.ignite.internal.managers.communication.GridIoManager;
+import org.apache.ignite.internal.managers.communication.GridMessageListener;
+import org.apache.ignite.internal.managers.communication.TransmissionCancelledException;
+import org.apache.ignite.internal.managers.communication.TransmissionHandler;
+import org.apache.ignite.internal.managers.communication.TransmissionMeta;
+import org.apache.ignite.internal.managers.communication.TransmissionPolicy;
+import org.apache.ignite.internal.managers.discovery.DiscoCache;
+import org.apache.ignite.internal.managers.discovery.DiscoveryCustomMessage;
+import org.apache.ignite.internal.managers.discovery.GridDiscoveryManager;
+import org.apache.ignite.internal.managers.eventstorage.DiscoveryEventListener;
+import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.CacheType;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.PartitionsExchangeAware;
+import org.apache.ignite.internal.processors.cache.persistence.StorageException;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.filename.PdsFolderSettings;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.MetastorageLifecycleListener;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadOnlyMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadWriteMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.tree.io.PageIO;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.processors.marshaller.MappedName;
+import org.apache.ignite.internal.processors.metric.MetricRegistry;
+import org.apache.ignite.internal.processors.metric.impl.LongAdderMetric;
+import org.apache.ignite.internal.util.GridBusyLock;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.distributed.InitMessage;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.tostring.GridToStringInclude;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.A;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.S;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.lang.IgniteUuid;
+import org.apache.ignite.thread.IgniteThreadPoolExecutor;
+import org.apache.ignite.thread.OomExceptionHandler;
+import org.jetbrains.annotations.Nullable;
+
+import static java.nio.file.StandardOpenOption.READ;
+import static org.apache.ignite.cluster.ClusterState.active;
+import static org.apache.ignite.configuration.IgniteConfiguration.DFLT_SNAPSHOT_DIRECTORY;
+import static org.apache.ignite.events.EventType.EVT_NODE_FAILED;
+import static org.apache.ignite.events.EventType.EVT_NODE_LEFT;
+import static org.apache.ignite.internal.IgniteFeatures.PERSISTENCE_CACHE_SNAPSHOT;
+import static org.apache.ignite.internal.IgniteFeatures.nodeSupports;
+import static org.apache.ignite.internal.MarshallerContextImpl.mappingFileStoreWorkDir;
+import static org.apache.ignite.internal.MarshallerContextImpl.saveMappings;
+import static org.apache.ignite.internal.events.DiscoveryCustomEvent.EVT_DISCOVERY_CUSTOM_EVT;
+import static org.apache.ignite.internal.managers.communication.GridIoPolicy.SYSTEM_POOL;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.INDEX_PARTITION;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.MAX_PARTITION_ID;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.resolveBinaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.INDEX_FILE_NAME;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.PART_FILE_TEMPLATE;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFile;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFileName;
+import static org.apache.ignite.internal.processors.cache.persistence.filename.PdsConsistentIdProcessor.DB_DEFAULT_FOLDER;
+import static org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId.getFlagByPartId;
+import static org.apache.ignite.internal.util.IgniteUtils.isLocalNodeCoordinator;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.END_SNAPSHOT;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.START_SNAPSHOT;
+
+/**
+ * Internal implementation of snapshot operations over persistence caches.
+ * <p>
+ * There are two major actions available:
+ * <ul>
+ *     <li>Create snapshot of the whole cluster cache groups by triggering PME to achieve consistency.</li>
+ *     <li>Create local snapshot of requested cache groups and send it to the node which request this operation.
+ *     Cache groups will be transmitted using internal API for transferring files. See {@link TransmissionHandler}.</li>
+ * </ul>
+ */
+public class IgniteSnapshotManager extends GridCacheSharedManagerAdapter
+    implements IgniteSnapshot, PartitionsExchangeAware, MetastorageLifecycleListener {
+    /** File with delta pages suffix. */
+    public static final String DELTA_SUFFIX = ".delta";
+
+    /** File name template consists of delta pages. */
+    public static final String PART_DELTA_TEMPLATE = PART_FILE_TEMPLATE + DELTA_SUFFIX;
+
+    /** File name template for index delta pages. */
+    public static final String INDEX_DELTA_NAME = INDEX_FILE_NAME + DELTA_SUFFIX;
+
+    /** Text Reason for checkpoint to start snapshot operation. */
+    public static final String CP_SNAPSHOT_REASON = "Checkpoint started to enforce snapshot operation: %s";
+
+    /** Name prefix for each remote snapshot operation. */
+    public static final String RMT_SNAPSHOT_PREFIX = "snapshot_";
+
+    /** Default snapshot directory for loading remote snapshots. */
+    public static final String DFLT_SNAPSHOT_TMP_DIR = "snp";
+
+    /** Timeout in millisecond for snapshot operations. */
+    public static final long DFLT_SNAPSHOT_TIMEOUT = 15_000L;
+
+    /** Snapshot in progress error message. */
+    public static final String SNP_IN_PROGRESS_ERR_MSG = "Operation rejected due to the snapshot operation in progress.";
+
+    /** Error message to finalize snapshot tasks. */
+    public static final String SNP_NODE_STOPPING_ERR_MSG = "Snapshot has been cancelled due to the local node " +
+        "is stopping";
+
+    /** Metastorage key to save currently running snapshot. */
+    public static final String SNP_RUNNING_KEY = "snapshot-running";
+
+    /** Snapshot metrics prefix. */
+    public static final String SNAPSHOT_METRICS = "snapshot";
+
+    /** Prefix for snapshot threads. */
+    private static final String SNAPSHOT_RUNNER_THREAD_PREFIX = "snapshot-runner";
+
+    /** Total number of thread to perform local snapshot. */
+    private static final int SNAPSHOT_THREAD_POOL_SIZE = 4;
+
+    /** Default snapshot topic to receive snapshots from remote node. */
+    private static final Object DFLT_INITIAL_SNAPSHOT_TOPIC = GridTopic.TOPIC_SNAPSHOT.topic("rmt_snp");
+
+    /** File transmission parameter of cache group id. */
+    private static final String SNP_GRP_ID_PARAM = "grpId";
+
+    /** File transmission parameter of cache partition id. */
+    private static final String SNP_PART_ID_PARAM = "partId";
+
+    /** File transmission parameter of node-sender directory path with its consistentId (e.g. db/IgniteNode0). */
+    private static final String SNP_DB_NODE_PATH_PARAM = "dbNodePath";
+
+    /** File transmission parameter of a cache directory with is currently sends its partitions. */
+    private static final String SNP_CACHE_DIR_NAME_PARAM = "cacheDirName";
+
+    /** Snapshot parameter name for a file transmission. */
+    private static final String SNP_NAME_PARAM = "snpName";
+
+    /** Total snapshot files count which receiver should expect to receive. */
+    private static final String SNP_PARTITIONS_CNT = "partsCnt";
+
+    /**
+     * Local buffer to perform copy-on-write operations with pages for {@code SnapshotFutureTask.PageStoreSerialWriter}s.
+     * It is important to have only only buffer per thread (instead of creating each buffer per
+     * each {@code SnapshotFutureTask.PageStoreSerialWriter}) this is redundant and can lead to OOM errors. Direct buffer
+     * deallocate only when ByteBuffer is garbage collected, but it can get out of off-heap memory before it.
+     */
+    private final ThreadLocal<ByteBuffer> locBuff;
+
+    /** Map of registered cache snapshot processes and their corresponding contexts. */
+    private final ConcurrentMap<String, SnapshotFutureTask> locSnpTasks = new ConcurrentHashMap<>();
+
+    /** Lock to protect the resources is used. */
+    private final GridBusyLock busyLock = new GridBusyLock();
+
+    /** Requested snapshot from remote node. */
+    private final AtomicReference<RemoteSnapshotFuture> rmtSnpReq = new AtomicReference<>();
+
+    /** Mutex used to order cluster snapshot operation progress. */
+    private final Object snpOpMux = new Object();
+
+    /** Take snapshot operation procedure. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> startSnpProc;
+
+    /** Check previously performed snapshot operation and delete uncompleted files if need. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> endSnpProc;
+
+    /** Resolved persistent data storage settings. */
+    private volatile PdsFolderSettings pdsSettings;
+
+    /** Fully initialized metastorage. */
+    private volatile ReadWriteMetastorage metaStorage;
+
+    /** Local snapshot sender factory. */
+    private Function<String, SnapshotSender> locSndrFactory = localSnapshotSenderFactory();
+
+    /** Main snapshot directory to save created snapshots. */
+    private volatile File locSnpDir;
+
+    /**
+     * Working directory for loaded snapshots from the remote nodes and storing
+     * temporary partition delta-files of locally started snapshot process.
+     */
+    private File tmpWorkDir;
+
+    /** Factory to working with delta as file storage. */
+    private volatile FileIOFactory ioFactory = new RandomAccessFileIOFactory();
+
+    /** Factory to create page store for restore. */
+    private volatile BiFunction<Integer, Boolean, FilePageStoreFactory> storeFactory;
+
+    /** Snapshot thread pool to perform local partition snapshots. */
+    private ExecutorService snpRunner;
+
+    /** System discovery message listener. */
+    private DiscoveryEventListener discoLsnr;
+
+    /** Cluster snapshot operation requested by user. */
+    private ClusterSnapshotFuture clusterSnpFut;
+
+    /** Current snapshot operation on local node. */
+    private volatile SnapshotOperationRequest clusterSnpRq;
+
+    /** {@code true} if recovery process occurred for snapshot. */
+    private volatile boolean recovered;
+
+    /** Last seen cluster snapshot operation. */
+    private volatile ClusterSnapshotFuture lastSeenSnpFut = new ClusterSnapshotFuture();
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public IgniteSnapshotManager(GridKernalContext ctx) {
+        locBuff = ThreadLocal.withInitial(() ->
+            ByteBuffer.allocateDirect(ctx.config().getDataStorageConfiguration().getPageSize())
+                .order(ByteOrder.nativeOrder()));
+
+        startSnpProc = new DistributedProcess<>(ctx, START_SNAPSHOT, this::initLocalSnapshotStartStage,
+            this::processLocalSnapshotStartStageResult);
+
+        endSnpProc = new DistributedProcess<>(ctx, END_SNAPSHOT, this::initLocalSnapshotEndStage,
+            this::processLocalSnapshotEndStageResult);
+    }
+
+    /**
+     * @param snapshotCacheDir Snapshot directory to store files.
+     * @param partId Cache partition identifier.
+     * @return A file representation.
+     */
+    public static File partDeltaFile(File snapshotCacheDir, int partId) {
+        return new File(snapshotCacheDir, partDeltaFileName(partId));
+    }
+
+    /**
+     * @param partId Partition id.
+     * @return File name of delta partition pages.
+     */
+    public static String partDeltaFileName(int partId) {
+        assert partId <= MAX_PARTITION_ID || partId == INDEX_PARTITION;
+
+        return partId == INDEX_PARTITION ? INDEX_DELTA_NAME : String.format(PART_DELTA_TEMPLATE, partId);
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void start0() throws IgniteCheckedException {
+        super.start0();
+
+        GridKernalContext ctx = cctx.kernalContext();
+
+        if (ctx.clientNode())
+            return;
+
+        if (!CU.isPersistenceEnabled(ctx.config()))
+            return;
+
+        snpRunner = new IgniteThreadPoolExecutor(SNAPSHOT_RUNNER_THREAD_PREFIX,
+            cctx.igniteInstanceName(),
+            SNAPSHOT_THREAD_POOL_SIZE,
+            SNAPSHOT_THREAD_POOL_SIZE,
+            IgniteConfiguration.DFLT_THREAD_KEEP_ALIVE_TIME,
+            new LinkedBlockingQueue<>(),
+            SYSTEM_POOL,
+            new OomExceptionHandler(ctx));
+
+        assert cctx.pageStore() instanceof FilePageStoreManager;
+
+        FilePageStoreManager storeMgr = (FilePageStoreManager)cctx.pageStore();
+
+        pdsSettings = cctx.kernalContext().pdsFolderResolver().resolveFolders();
+
+        locSnpDir = resolveSnapshotWorkDirectory(ctx.config());
+        tmpWorkDir = Paths.get(storeMgr.workDir().getAbsolutePath(), DFLT_SNAPSHOT_TMP_DIR).toFile();
+
+        U.ensureDirectory(locSnpDir, "snapshot work directory", log);
+        U.ensureDirectory(tmpWorkDir, "temp directory for snapshot creation", log);
+
+        MetricRegistry mreg = cctx.kernalContext().metric().registry(SNAPSHOT_METRICS);
+
+        mreg.register("LastSnapshotStartTime", () -> lastSeenSnpFut.startTime,
+            "The system time approximated by 10 ms when the last cluster snapshot operation has been started.");
+        mreg.register("LastSnapshotEndTime", () -> lastSeenSnpFut.endTime,
+            "The system time approximated by 10 ms when the last cluster snapshot operation has been finished.");
+        mreg.register("LastSnapshotName", () -> lastSeenSnpFut.name, String.class,
+            "The name of last started cluster snapshot operation.");
+        mreg.register("LastSnapshotErrorMessage",
+            () -> lastSeenSnpFut.error() == null ? null : lastSeenSnpFut.error().getMessage(),
+            String.class,
+            "The error message of last started cluster snapshot operation which fail. This value will be 'null' " +
+                "if last snapshot operation completed successfully.");
+        mreg.register("localSnapshotList", this::getSnapshots, List.class,
+            "The list of all known snapshots currently saved on the local node with respect to " +
+                "configured via IgniteConfiguration a snapshot path.");
+
+        storeFactory = storeMgr::getPageStoreFactory;
+
+        cctx.exchange().registerExchangeAwareComponent(this);
+        ctx.internalSubscriptionProcessor().registerMetastorageListener(this);
+
+        // Receive remote snapshots requests.
+        cctx.gridIO().addMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC, new GridMessageListener() {
+            @Override public void onMessage(UUID nodeId, Object msg, byte plc) {
+                if (!busyLock.enterBusy())
+                    return;
+
+                try {
+                    if (msg instanceof SnapshotRequestMessage) {
+                        SnapshotRequestMessage reqMsg0 = (SnapshotRequestMessage)msg;
+                        String snpName = reqMsg0.snapshotName();
+
+                        synchronized (this) {
+                            SnapshotFutureTask task = lastScheduledRemoteSnapshotTask(nodeId);
+
+                            if (task != null) {
+                                // Task will also be removed from local map due to the listener on future done.
+                                task.cancel();
+
+                                log.info("Snapshot request has been cancelled due to another request received " +
+                                    "[prevSnpResp=" + task + ", msg0=" + reqMsg0 + ']');
+                            }
+                        }
+
+                        SnapshotFutureTask task = registerSnapshotTask(snpName,
+                            nodeId,
+                            reqMsg0.parts(),
+                            remoteSnapshotSender(snpName, nodeId));
+
+                        task.listen(f -> {
+                            if (f.error() == null)
+                                return;
+
+                            U.error(log, "Failed to process request of creating a snapshot " +
+                                "[from=" + nodeId + ", msg=" + reqMsg0 + ']', f.error());
+
+                            try {
+                                cctx.gridIO().sendToCustomTopic(nodeId,
+                                    DFLT_INITIAL_SNAPSHOT_TOPIC,
+                                    new SnapshotResponseMessage(reqMsg0.snapshotName(), f.error().getMessage()),
+                                    SYSTEM_POOL);
+                            }
+                            catch (IgniteCheckedException ex0) {
+                                U.error(log, "Fail to send the response message with processing snapshot request " +
+                                    "error [request=" + reqMsg0 + ", nodeId=" + nodeId + ']', ex0);
+                            }
+                        });
+
+                        task.start();
+                    }
+                    else if (msg instanceof SnapshotResponseMessage) {
+                        SnapshotResponseMessage respMsg0 = (SnapshotResponseMessage)msg;
+
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.snpName.equals(respMsg0.snapshotName())) {
+                            if (log.isInfoEnabled()) {
+                                log.info("A stale snapshot response message has been received. Will be ignored " +
+                                    "[fromNodeId=" + nodeId + ", response=" + respMsg0 + ']');
+                            }
+
+                            return;
+                        }
+
+                        if (respMsg0.errorMessage() != null) {
+                            fut0.onDone(new IgniteCheckedException("Request cancelled. The snapshot operation stopped " +
+                                "on the remote node with an error: " + respMsg0.errorMessage()));
+                        }
+                    }
+                }
+                catch (Throwable e) {
+                    U.error(log, "Processing snapshot request from remote node fails with an error", e);
+
+                    cctx.kernalContext().failure().process(new FailureContext(FailureType.CRITICAL_ERROR, e));
+                }
+                finally {
+                    busyLock.leaveBusy();
+                }
+            }
+        });
+
+        cctx.gridEvents().addDiscoveryEventListener(discoLsnr = (evt, discoCache) -> {
+            if (!busyLock.enterBusy())
+                return;
+
+            try {
+                UUID leftNodeId = evt.eventNode().id();
+
+                if (evt.type() == EVT_DISCOVERY_CUSTOM_EVT) {
+                    DiscoveryCustomEvent evt0 = (DiscoveryCustomEvent)evt;
+
+                    if (evt0.customMessage() instanceof InitMessage) {
+                        InitMessage<?> msg = (InitMessage<?>)evt0.customMessage();
+
+                        // This happens when #takeSnapshot() method already invoked and distributed process
+                        // starts its action.
+                        if (msg.type() == START_SNAPSHOT.ordinal()) {
+                            assert clusterSnpRq != null ||
+                                !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()) : evt;
+
+                            DiscoveryCustomEvent customEvt = new DiscoveryCustomEvent();
+
+                            customEvt.node(evt0.node());
+                            customEvt.eventNode(evt0.eventNode());
+                            customEvt.affinityTopologyVersion(evt0.affinityTopologyVersion());
+                            customEvt.customMessage(new SnapshotStartDiscoveryMessage(discoCache, msg.processId()));
+
+                            // Handle new event inside discovery thread, so no guarantees will be violated.
+                            cctx.exchange().onDiscoveryEvent(customEvt, discoCache);
+                        }
+                    }
+                }
+                else if (evt.type() == EVT_NODE_LEFT || evt.type() == EVT_NODE_FAILED) {
+                    SnapshotOperationRequest snpRq = clusterSnpRq;
+
+                    for (SnapshotFutureTask sctx : locSnpTasks.values()) {
+                        if (sctx.sourceNodeId().equals(leftNodeId) ||
+                            (snpRq != null &&
+                                snpRq.snpName.equals(sctx.snapshotName()) &&
+                                snpRq.bltNodes.contains(leftNodeId))) {
+                            sctx.acceptException(new ClusterTopologyCheckedException("The node which requested snapshot " +
+                                "creation has left the grid"));
+                        }
+                    }
+
+                    RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                    if (snpTrFut != null && snpTrFut.rmtNodeId.equals(leftNodeId)) {
+                        snpTrFut.onDone(new ClusterTopologyCheckedException("The node from which a snapshot has been " +
+                            "requested left the grid"));
+                    }
+                }
+            }
+            finally {
+                busyLock.leaveBusy();
+            }
+        }, EVT_NODE_LEFT, EVT_NODE_FAILED, EVT_DISCOVERY_CUSTOM_EVT);
+
+        // Remote snapshot handler.
+        cctx.kernalContext().io().addTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC, new TransmissionHandler() {
+            @Override public void onEnd(UUID nodeId) {
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                assert snpTrFut.stores.isEmpty() : snpTrFut.stores.entrySet();
+                assert snpTrFut.partsLeft == 0 : snpTrFut;
+
+                snpTrFut.onDone();
+
+                log.info("Requested snapshot from remote node has been fully received " +
+                    "[snpName=" + snpTrFut.snpName + ", snpTrans=" + snpTrFut + ']');
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onException(UUID nodeId, Throwable err) {
+                RemoteSnapshotFuture fut = rmtSnpReq.get();
+
+                if (fut == null)
+                    return;
+
+                if (fut.rmtNodeId.equals(nodeId))
+                    fut.onDone(err);
+            }
+
+            /** {@inheritDoc} */
+            @Override public String filePath(UUID nodeId, TransmissionMeta fileMeta) {
+                Integer partId = (Integer)fileMeta.params().get(SNP_PART_ID_PARAM);
+                String rmtDbNodePath = (String)fileMeta.params().get(SNP_DB_NODE_PATH_PARAM);
+                String cacheDirName = (String)fileMeta.params().get(SNP_CACHE_DIR_NAME_PARAM);
+
+                RemoteSnapshotFuture transFut = resolve(nodeId, fileMeta);
+
+                try {
+                    File cacheDir = U.resolveWorkDirectory(tmpWorkDir.getAbsolutePath(),
+                        Paths.get(transFut.snpName, rmtDbNodePath, cacheDirName).toString(),
+                        false);
+
+                    return new File(cacheDir, getPartitionFileName(partId)).getAbsolutePath();
+                }
+                catch (IgniteCheckedException e) {
+                    throw new IgniteException(e);
+                }
+            }
+
+            /**
+             * @param nodeId Remote node id.
+             * @param meta Transmission meta.
+             * @return Resolved transmission future.
+             */
+            private RemoteSnapshotFuture resolve(UUID nodeId, TransmissionMeta meta) {
+                String snpName = (String)meta.params().get(SNP_NAME_PARAM);
+                Integer partsCnt = (Integer)meta.params().get(SNP_PARTITIONS_CNT);
+
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                if (snpTrFut == null || !snpTrFut.snpName.equals(snpName)) {
+                    throw new TransmissionCancelledException("Stale snapshot transmission will be ignored " +
+                        "[snpName=" + snpName + ", meta=" + meta + ", snpTrFut=" + snpTrFut + ']');
+                }
+
+                assert snpTrFut.snpName.equals(snpName) && snpTrFut.rmtNodeId.equals(nodeId) :
+                    "Another transmission in progress [snpTrFut=" + snpTrFut + ", nodeId=" + snpName + ']';
+
+                if (snpTrFut.partsLeft == -1)
+                    snpTrFut.partsLeft = partsCnt;
+
+                return snpTrFut;
+            }
+
+            /**
+             * @param snpTrans Current snapshot transmission.
+             * @param grpPartId Pair of group id and its partition id.
+             */
+            private void finishRecover(RemoteSnapshotFuture snpTrans, GroupPartitionId grpPartId) {
+                FilePageStore pageStore = null;
+
+                try {
+                    pageStore = snpTrans.stores.remove(grpPartId);
+
+                    pageStore.finishRecover();
+
+                    snpTrans.partConsumer.accept(new File(pageStore.getFileAbsolutePath()), grpPartId);
+
+                    snpTrans.partsLeft--;
+                }
+                catch (StorageException e) {
+                    throw new IgniteException(e);
+                }
+                finally {
+                    U.closeQuiet(pageStore);
+                }
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<ByteBuffer> chunkHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+
+                RemoteSnapshotFuture snpTrFut = resolve(nodeId, initMeta);
+
+                GroupPartitionId grpPartId = new GroupPartitionId(grpId, partId);
+                FilePageStore pageStore = snpTrFut.stores.get(grpPartId);
+
+                if (pageStore == null) {
+                    throw new IgniteException("Partition must be loaded before applying snapshot delta pages " +
+                        "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                pageStore.beginRecover();
+
+                // No snapshot delta pages received. Finalize recovery.
+                if (initMeta.count() == 0)
+                    finishRecover(snpTrFut, grpPartId);
+
+                return new Consumer<ByteBuffer>() {
+                    final LongAdder transferred = new LongAdder();
+
+                    @Override public void accept(ByteBuffer buff) {
+                        try {
+                            assert initMeta.count() != 0 : initMeta;
+
+                            RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                            if (fut0 == null || !fut0.equals(snpTrFut) || fut0.isCancelled()) {
+                                throw new TransmissionCancelledException("Snapshot request is cancelled " +
+                                    "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                            }
+
+                            pageStore.write(PageIO.getPageId(buff), buff, 0, false);
+
+                            transferred.add(buff.capacity());
+
+                            if (transferred.longValue() == initMeta.count())
+                                finishRecover(snpTrFut, grpPartId);
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                    }
+                };
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<File> fileHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+                String snpName = (String)initMeta.params().get(SNP_NAME_PARAM);
+
+                assert grpId != null;
+                assert partId != null;
+                assert snpName != null;
+                assert storeFactory != null;
+
+                RemoteSnapshotFuture transFut = rmtSnpReq.get();
+
+                if (transFut == null) {
+                    throw new IgniteException("Snapshot transmission with given name doesn't exists " +
+                        "[snpName=" + snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                return new Consumer<File>() {
+                    @Override public void accept(File file) {
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.equals(transFut) || fut0.isCancelled()) {
+                            throw new TransmissionCancelledException("Snapshot request is cancelled [snpName=" + snpName +
+                                ", grpId=" + grpId + ", partId=" + partId + ']');
+                        }
+
+                        busyLock.enterBusy();
+
+                        try {
+                            FilePageStore pageStore = (FilePageStore)storeFactory
+                                .apply(grpId, false)
+                                .createPageStore(getFlagByPartId(partId),
+                                    file::toPath,
+                                    new LongAdderMetric("NO_OP", null));
+
+                            transFut.stores.put(new GroupPartitionId(grpId, partId), pageStore);
+
+                            pageStore.init();
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                        finally {
+                            busyLock.leaveBusy();
+                        }
+                    }
+                };
+            }
+        });
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void stop0(boolean cancel) {
+        busyLock.block();
+
+        try {
+            // Try stop all snapshot processing if not yet.
+            for (SnapshotFutureTask sctx : locSnpTasks.values())
+                sctx.acceptException(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+            locSnpTasks.clear();
+
+            RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+            if (snpTrFut != null)
+                snpTrFut.cancel();
+
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null) {
+                    clusterSnpFut.onDone(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+                    clusterSnpFut = null;
+                }
+            }
+
+            if (snpRunner != null)
+                snpRunner.shutdownNow();
+
+            cctx.kernalContext().io().removeMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC);
+            cctx.kernalContext().io().removeTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC);
+
+            if (discoLsnr != null)
+                cctx.kernalContext().event().removeDiscoveryEventListener(discoLsnr);
+
+            cctx.exchange().unregisterExchangeAwareComponent(this);
+        }
+        finally {
+            busyLock.unblock();
+        }
+    }
+
+    /**
+     * @param snpDir Snapshot dire
+     * @param folderName Local node folder name (see {@link U#maskForFileName} with consistent id).
+     */
+    public static void deleteSnapshot(File snpDir, String folderName) {
+        if (!snpDir.exists())
+            return;
+
+        assert snpDir.isDirectory() : snpDir;
+
+        try {
+            File binDir = resolveBinaryWorkDir(snpDir.getAbsolutePath(), folderName);
+            File dbDir = U.resolveWorkDirectory(snpDir.getAbsolutePath(), databaseRelativePath(folderName), false);
+
+            U.delete(binDir);
+            U.delete(dbDir);
+
+            File marshDir = mappingFileStoreWorkDir(snpDir.getAbsolutePath());
+
+            // Concurrently traverse the snapshot marshaller directory and delete all files.
+            Files.walkFileTree(marshDir.toPath(), new SimpleFileVisitor<Path>() {
+                @Override public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException {
+                    U.delete(file);
+
+                    return FileVisitResult.CONTINUE;
+                }
+
+                @Override public FileVisitResult visitFileFailed(Path file, IOException exc) {
+                    // Skip files which can be concurrently removed from FileTree.
+                    return FileVisitResult.CONTINUE;
+                }
+            });
+
+            File db = new File(snpDir, DB_DEFAULT_FOLDER);
+
+            if (!db.exists() || db.list().length == 0)
+                U.delete(snpDir);
+        }
+        catch (IOException | IgniteCheckedException e) {
+            throw new IgniteException(e);
+        }
+    }
+
+    /**
+     * @param snpName Snapshot name.
+     * @return Local snapshot directory for snapshot with given name.
+     */
+    public File snapshotLocalDir(String snpName) {
+        assert locSnpDir != null;
+
+        return new File(locSnpDir, snpName);
+    }
+
+    /**
+     * @return Node snapshot working directory.
+     */
+    public File snapshotTmpDir() {
+        assert tmpWorkDir != null;
+
+        return tmpWorkDir;
+    }
+
+    /**
+     * @param req Request on snapshot creation.
+     * @return Future which will be completed when a snapshot has been started.
+     */
+    private IgniteInternalFuture<SnapshotOperationResponse> initLocalSnapshotStartStage(SnapshotOperationRequest req) {
+        if (cctx.kernalContext().clientNode() ||
+            !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()))
+            return new GridFinishedFuture<>();
+
+        // Executed inside discovery notifier thread, prior to firing discovery custom event,
+        // so it is safe to set new snapshot task inside this method without synchronization.
+        if (clusterSnpRq != null) {
+            return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot operation has been rejected. " +
+                "Another snapshot operation in progress [req=" + req + ", curr=" + clusterSnpRq + ']'));
+        }
+
+        // Collection of pairs group and appropriate cache partition to be snapshot.
+        Map<Integer, Set<Integer>> parts = new HashMap<>();
+
+        for (Integer grpId : req.grpIds)
+            parts.put(grpId, null);
+
+        SnapshotFutureTask task0 = registerSnapshotTask(req.snpName,
+            req.srcNodeId,
+            parts,
+            locSndrFactory.apply(req.snpName));
+
+        clusterSnpRq = req;
+
+        return task0.chain(f -> new SnapshotOperationResponse());
+    }
+
+    /**
+     * @param id Request id.
+     * @param res Results.
+     * @param err Errors.
+     */
+    private void processLocalSnapshotStartStageResult(UUID id, Map<UUID, SnapshotOperationResponse> res, Map<UUID, Exception> err) {
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        if (snpRq == null)
+            return;
+
+        if (!snpRq.rqId.equals(id)) {
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null && clusterSnpFut.rqId.equals(id)) {
+                    clusterSnpFut.onDone(new IgniteCheckedException("Snapshot operation failed due to another snapshot " +
+                        "operation in progress: " + snpRq.snpName));
+
+                    clusterSnpFut = null;
+                }
+
+                return;
+            }
+        }
+
+        if (isLocalNodeCoordinator(cctx.discovery())) {
+            Set<UUID> missed = new HashSet<>(snpRq.bltNodes);
+            missed.removeAll(res.keySet());
+            missed.removeAll(err.keySet());
+
+            snpRq.hasErr = !F.isEmpty(err) || !missed.isEmpty();
+
+            if (snpRq.hasErr) {
+                U.warn(log, "Execution of local snapshot tasks fails or them haven't been executed " +
+                    "due to some of nodes left the cluster. Uncompleted snapshot will be deleted " +
+                    "[err=" + err + ", missed=" + missed + ']');
+            }
+
+            endSnpProc.start(UUID.randomUUID(), snpRq);
+        }
+    }
+
+    /**
+     * @param req Request on snapshot creation.
+     * @return Future which will be completed when the snapshot will be finalized.
+     */
+    private IgniteInternalFuture<SnapshotOperationResponse> initLocalSnapshotEndStage(SnapshotOperationRequest req) {
+        if (clusterSnpRq == null)
+            return new GridFinishedFuture<>(new SnapshotOperationResponse());
+
+        try {
+            if (req.hasErr)
+                deleteSnapshot(snapshotLocalDir(req.snpName), pdsSettings.folderName());
+
+            removeLastMetaStorageKey();
+        }
+        catch (Exception e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        return new GridFinishedFuture<>(new SnapshotOperationResponse());
+    }
+
+    /**
+     * @param id Request id.
+     * @param res Results.
+     * @param err Errors.
+     */
+    private void processLocalSnapshotEndStageResult(UUID id, Map<UUID, SnapshotOperationResponse> res, Map<UUID, Exception> err) {
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        if (snpRq == null)
+            return;
+
+        Set<UUID> endFail = new HashSet<>(snpRq.bltNodes);
+        endFail.removeAll(res.keySet());
+
+        clusterSnpRq = null;
+
+        synchronized (snpOpMux) {
+            if (clusterSnpFut != null) {
+                if (endFail.isEmpty() && !snpRq.hasErr) {
+                    clusterSnpFut.onDone();
+
+                    if (log.isInfoEnabled())
+                        log.info("Cluster-wide snapshot operation finished successfully [req=" + snpRq + ']');
+                }
+                else {
+                    clusterSnpFut.onDone(new IgniteCheckedException("Snapshot creation has been finished with an error. " +
+                        "Local snapshot tasks may not finished completely or finalizing results fails " +
+                        "[hasErr" + snpRq.hasErr + ", fail=" + endFail + ']'));
+                }
+
+                clusterSnpFut = null;
+            }
+        }
+    }
+
+    /**
+     * @return {@code True} if snapshot operation is in progress.
+     */
+    public boolean isSnapshotCreating() {
+        if (clusterSnpRq != null)
+            return true;
+
+        synchronized (snpOpMux) {
+            return clusterSnpRq != null || clusterSnpFut != null;
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public List<String> getSnapshots() {
+        if (cctx.kernalContext().clientNode())
+            throw new UnsupportedOperationException("Client and daemon nodes can not perform this operation.");
+
+        synchronized (snpOpMux) {
+            return Arrays.stream(locSnpDir.listFiles(File::isDirectory))
+                .map(File::getName)
+                .collect(Collectors.toList());
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public IgniteFuture<Void> createSnapshot(String name) {
+        A.notNullOrEmpty(name, "name");
+
+        try {
+            if (cctx.kernalContext().clientNode())
+                throw new UnsupportedOperationException("Client and daemon nodes can not perform this operation.");
+
+            if (!IgniteFeatures.allNodesSupports(cctx.discovery().allNodes(), PERSISTENCE_CACHE_SNAPSHOT))
+                throw new IgniteException("Not all nodes in the cluster support a snapshot operation.");
+
+            if (!active(cctx.kernalContext().state().clusterState().state()))
+                throw new IgniteException("Snapshot operation has been rejected. The cluster is inactive.");
+
+            DiscoveryDataClusterState clusterState = cctx.kernalContext().state().clusterState();
+
+            if (!clusterState.hasBaselineTopology())
+                throw new IgniteException("Snapshot operation has been rejected. The baseline topology is not configured for cluster.");
+
+            ClusterSnapshotFuture snpFut0;
+
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null && !clusterSnpFut.isDone())
+                    throw new IgniteException("Create snapshot request has been rejected. The previous snapshot operation was not completed.");
+
+                if (clusterSnpRq != null)
+                    throw new IgniteException("Create snapshot request has been rejected. Parallel snapshot processes are not allowed.");
+
+                if (getSnapshots().contains(name))
+                    throw new IgniteException("Create snapshot request has been rejected. Snapshot with given name already exists.");
+
+                snpFut0 = new ClusterSnapshotFuture(UUID.randomUUID(), name);
+
+                clusterSnpFut = snpFut0;
+                lastSeenSnpFut = snpFut0;
+            }
+
+            List<Integer> grps = cctx.cache().persistentGroups().stream()
+                .filter(g -> cctx.cache().cacheType(g.cacheOrGroupName()) == CacheType.USER)
+                .filter(g -> !g.config().isEncryptionEnabled())
+                .map(CacheGroupDescriptor::groupId)
+                .collect(Collectors.toList());
+
+            List<ClusterNode> srvNodes = cctx.discovery().serverNodes(AffinityTopologyVersion.NONE);
 
 Review comment:
   I've fixed it. Check baseline  nodes and cache groups prior to snapshot task created.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r410205253
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/managers/communication/FileReceiver.java
 ##########
 @@ -82,7 +82,8 @@ public FileReceiver(
             fileIo.position(meta.offset());
         }
         catch (IOException e) {
-            throw new IgniteException("Unable to open destination file. Receiver will will be stopped", e);
+            throw new IgniteException("Unable to open destination file. Receiver will will be stopped: " +
 
 Review comment:
   Fixed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r409741428
 
 

 ##########
 File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManagerSelfTest.java
 ##########
 @@ -0,0 +1,770 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.nio.file.OpenOption;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.CountDownLatch;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.function.BiConsumer;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.cache.affinity.rendezvous.RendezvousAffinityFunction;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.IgniteInterruptedCheckedException;
+import org.apache.ignite.internal.TestRecordingCommunicationSpi;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionDemandMessage;
+import org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionState;
+import org.apache.ignite.internal.processors.cache.persistence.CheckpointState;
+import org.apache.ignite.internal.processors.cache.persistence.DbCheckpointListener;
+import org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIODecorator;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.filename.PdsFolderSettings;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.util.lang.GridAbsPredicate;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.testframework.GridTestUtils;
+import org.junit.Test;
+
+import static org.apache.ignite.internal.MarshallerContextImpl.mappingFileStoreWorkDir;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.resolveBinaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.cacheDirName;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.CP_SNAPSHOT_REASON;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.testframework.GridTestUtils.assertThrowsAnyCause;
+
+/**
+ * Default snapshot manager test.
+ */
+public class IgniteSnapshotManagerSelfTest extends AbstractSnapshotSelfTest {
+    /** @throws Exception If fails. */
+    @Test
+    public void testSnapshotLocalPartitions() throws Exception {
+        // Start grid node with data before each test.
+        IgniteEx ig = startGridWithCache(dfltCacheCfg, 2048);
+
+        // The following data will be included into checkpoint.
+        for (int i = 2048; i < 4096; i++)
+            ig.cache(DEFAULT_CACHE_NAME).put(i, new TestOrderItem(i, i));
+
+        for (int i = 4096; i < 8192; i++) {
+            ig.cache(DEFAULT_CACHE_NAME).put(i, new TestOrderItem(i, i) {
+                @Override public String toString() {
+                    return "_" + super.toString();
+                }
+            });
+        }
+
+        GridCacheSharedContext<?, ?> cctx = ig.context().cache().context();
+        IgniteSnapshotManager mgr = snp(ig);
+
+        // Collection of pairs group and appropriate cache partition to be snapshot.
+        IgniteInternalFuture<?> snpFut = startLocalSnapshotTask(cctx,
+            SNAPSHOT_NAME,
+            F.asMap(CU.cacheId(DEFAULT_CACHE_NAME), null),
+            mgr.localSnapshotSenderFactory().apply(SNAPSHOT_NAME));
+
+        snpFut.get();
+
+        File cacheWorkDir = ((FilePageStoreManager)ig.context()
+            .cache()
+            .context()
+            .pageStore())
+            .cacheWorkDir(dfltCacheCfg);
+
+        // Checkpoint forces on cluster deactivation (currently only single node in cluster),
+        // so we must have the same data in snapshot partitions and those which left
+        // after node stop.
+        stopGrid(ig.name());
+
+        // Calculate CRCs.
+        IgniteConfiguration cfg = ig.context().config();
+        PdsFolderSettings settings = ig.context().pdsFolderResolver().resolveFolders();
+        String nodePath = databaseRelativePath(settings.folderName());
+        File binWorkDir = resolveBinaryWorkDir(cfg.getWorkDirectory(), settings.folderName());
+        File marshWorkDir = mappingFileStoreWorkDir(U.workDirectory(cfg.getWorkDirectory(), cfg.getIgniteHome()));
+        File snpBinWorkDir = resolveBinaryWorkDir(mgr.snapshotLocalDir(SNAPSHOT_NAME).getAbsolutePath(), settings.folderName());
+        File snpMarshWorkDir = mappingFileStoreWorkDir(mgr.snapshotLocalDir(SNAPSHOT_NAME).getAbsolutePath());
+
+        final Map<String, Integer> origPartCRCs = calculateCRC32Partitions(cacheWorkDir);
+        final Map<String, Integer> snpPartCRCs = calculateCRC32Partitions(
+            FilePageStoreManager.cacheWorkDir(U.resolveWorkDirectory(mgr.snapshotLocalDir(SNAPSHOT_NAME)
+                    .getAbsolutePath(),
+                nodePath,
+                false),
+                cacheDirName(dfltCacheCfg)));
+
+        assertEquals("Partitions must have the same CRC after file copying and merging partition delta files",
+            origPartCRCs, snpPartCRCs);
+        assertEquals("Binary object mappings must be the same for local node and created snapshot",
+            calculateCRC32Partitions(binWorkDir), calculateCRC32Partitions(snpBinWorkDir));
+        assertEquals("Marshaller meta mast be the same for local node and created snapshot",
+            calculateCRC32Partitions(marshWorkDir), calculateCRC32Partitions(snpMarshWorkDir));
+
+        File snpWorkDir = mgr.snapshotTmpDir();
+
+        assertEquals("Snapshot working directory must be cleaned after usage", 0, snpWorkDir.listFiles().length);
+    }
+
+    /**
+     * Test that all partitions are copied successfully even after multiple checkpoints occur during
+     * the long copy of cache partition files.
+     *
+     * Data consistency checked through a test node started right from snapshot directory and all values
+     * read successes.
+     *
+     * @throws Exception If fails.
+     */
+    @Test
+    public void testSnapshotLocalPartitionMultiCpWithLoad() throws Exception {
+        int valMultiplier = 2;
+        CountDownLatch slowCopy = new CountDownLatch(1);
+
+        // Start grid node with data before each test.
+        IgniteEx ig = startGrid(0);
+
+        ig.cluster().baselineAutoAdjustEnabled(false);
+        ig.cluster().state(ClusterState.ACTIVE);
+        GridCacheSharedContext<?, ?> cctx = ig.context().cache().context();
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            ig.cache(DEFAULT_CACHE_NAME).put(i, new TestOrderItem(i, i));
+
+        forceCheckpoint(ig);
+
+        AtomicInteger cntr = new AtomicInteger();
+        CountDownLatch ldrLatch = new CountDownLatch(1);
+        IgniteSnapshotManager mgr = snp(ig);
+        GridCacheDatabaseSharedManager db = (GridCacheDatabaseSharedManager)cctx.database();
+
+        IgniteInternalFuture<?> loadFut = GridTestUtils.runMultiThreadedAsync(() -> {
+            try {
+                U.await(ldrLatch);
+
+                while (!Thread.currentThread().isInterrupted())
+                    ig.cache(DEFAULT_CACHE_NAME).put(cntr.incrementAndGet(),
+                        new TestOrderItem(cntr.incrementAndGet(), cntr.incrementAndGet()));
+            }
+            catch (IgniteInterruptedCheckedException e) {
+                log.warning("Loader has been interrupted", e);
+            }
+        }, 5, "cache-loader-");
+
+        // Register task but not schedule it on the checkpoint.
+        SnapshotFutureTask snpFutTask = mgr.registerSnapshotTask(SNAPSHOT_NAME,
+            cctx.localNodeId(),
+            F.asMap(CU.cacheId(DEFAULT_CACHE_NAME), null),
+            new DelegateSnapshotSender(log, mgr.snapshotExecutorService(), mgr.localSnapshotSenderFactory().apply(SNAPSHOT_NAME)) {
+                @Override public void sendPart0(File part, String cacheDirName, GroupPartitionId pair, Long length) {
+                    try {
+                        U.await(slowCopy);
+
+                        delegate.sendPart0(part, cacheDirName, pair, length);
+                    }
+                    catch (IgniteInterruptedCheckedException e) {
+                        throw new IgniteException(e);
+                    }
+                }
+            });
+
+        db.addCheckpointListener(new DbCheckpointListener() {
+            /** {@inheritDoc} */
+            @Override public void beforeCheckpointBegin(Context ctx) {
+                // No-op.
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onMarkCheckpointBegin(Context ctx) {
+                // No-op.
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onCheckpointBegin(Context ctx) {
+                Map<Integer, Set<Integer>> processed = GridTestUtils.getFieldValue(snpFutTask,
+                    SnapshotFutureTask.class,
+                    "processed");
+
+                if (!processed.isEmpty())
+                    ldrLatch.countDown();
+            }
+        });
+
+        try {
+            snpFutTask.start();
+
+            // Change data before snapshot creation which must be included into it witch correct value multiplier.
+            for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+                ig.cache(DEFAULT_CACHE_NAME).put(i, new TestOrderItem(i, valMultiplier * i));
+
+            // Snapshot is still in the INIT state. beforeCheckpoint has been skipped
+            // due to checkpoint already running and we need to schedule the next one
+            // right after current will be completed.
+            cctx.database().forceCheckpoint(String.format(CP_SNAPSHOT_REASON, SNAPSHOT_NAME));
+
+            snpFutTask.awaitStarted();
+
+            db.forceCheckpoint("snapshot is ready to be created")
+                .futureFor(CheckpointState.MARKER_STORED_TO_DISK)
+                .get();
+
+            // Change data after snapshot.
+            for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+                ig.cache(DEFAULT_CACHE_NAME).put(i, new TestOrderItem(i, 3 * i));
+
+            // Snapshot on the next checkpoint must copy page to delta file before write it to a partition.
+            forceCheckpoint(ig);
+
+            slowCopy.countDown();
+
+            snpFutTask.get();
+        }
+        finally {
+            loadFut.cancel();
+        }
+
+        // Now can stop the node and check created snapshots.
+        stopGrid(0);
+
+        cleanPersistenceDir(ig.name());
+
+        // Start Ignite instance from snapshot directory.
+        IgniteEx ig2 = startGridsFromSnapshot(1, SNAPSHOT_NAME);
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++) {
+            assertEquals("snapshot data consistency violation [key=" + i + ']',
+                i * valMultiplier, ((TestOrderItem)ig2.cache(DEFAULT_CACHE_NAME).get(i)).value);
+        }
+    }
+
+    /** @throws Exception If fails. */
+    @Test
+    public void testSnapshotLocalPartitionNotEnoughSpace() throws Exception {
+        String err_msg = "Test exception. Not enough space.";
+        AtomicInteger throwCntr = new AtomicInteger();
+        RandomAccessFileIOFactory ioFactory = new RandomAccessFileIOFactory();
+
+        IgniteEx ig = startGridWithCache(dfltCacheCfg.setAffinity(new ZeroPartitionAffinityFunction()),
+            CACHE_KEYS_RANGE);
+
+        // Change data after backup.
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            ig.cache(DEFAULT_CACHE_NAME).put(i, 2 * i);
+
+        GridCacheSharedContext<?, ?> cctx0 = ig.context().cache().context();
+
+        IgniteSnapshotManager mgr = snp(ig);
+
+        mgr.ioFactory(new FileIOFactory() {
+            @Override public FileIO create(File file, OpenOption... modes) throws IOException {
+                FileIO fileIo = ioFactory.create(file, modes);
+
+                if (file.getName().equals(IgniteSnapshotManager.partDeltaFileName(0)))
+                    return new FileIODecorator(fileIo) {
+                        @Override public int writeFully(ByteBuffer srcBuf) throws IOException {
+                            if (throwCntr.incrementAndGet() == 3)
+                                throw new IOException(err_msg);
+
+                            return super.writeFully(srcBuf);
+                        }
+                    };
+
+                return fileIo;
+            }
+        });
+
+        IgniteInternalFuture<?> snpFut = startLocalSnapshotTask(cctx0,
+            SNAPSHOT_NAME,
+            F.asMap(CU.cacheId(DEFAULT_CACHE_NAME), null),
+            mgr.localSnapshotSenderFactory().apply(SNAPSHOT_NAME));
+
+        // Check the right exception thrown.
+        assertThrowsAnyCause(log,
+            snpFut::get,
+            IOException.class,
+            err_msg);
+    }
+
+    /** @throws Exception If fails. */
+    @Test
+    public void testSnapshotCreateLocalCopyPartitionFail() throws Exception {
+        String err_msg = "Test. Fail to copy partition: ";
+        IgniteEx ig = startGridWithCache(dfltCacheCfg, CACHE_KEYS_RANGE);
+
+        Map<Integer, Set<Integer>> parts = new HashMap<>();
+        parts.put(CU.cacheId(DEFAULT_CACHE_NAME), new HashSet<>(Collections.singletonList(0)));
+
+        IgniteSnapshotManager mgr0 = snp(ig);
+
+        IgniteInternalFuture<?> fut = startLocalSnapshotTask(ig.context().cache().context(),
+            SNAPSHOT_NAME,
+            parts,
+            new DelegateSnapshotSender(log, mgr0.snapshotExecutorService(),
+                mgr0.localSnapshotSenderFactory().apply(SNAPSHOT_NAME)) {
+                @Override public void sendPart0(File part, String cacheDirName, GroupPartitionId pair, Long length) {
+                    if (pair.getPartitionId() == 0)
+                        throw new IgniteException(err_msg + pair);
+
+                    delegate.sendPart0(part, cacheDirName, pair, length);
+                }
+            });
+
+        assertThrowsAnyCause(log,
+            fut::get,
+            IgniteException.class,
+            err_msg);
+    }
+
+    /** @throws Exception If fails. */
+    @Test
+    public void testSnapshotRemotePartitionsWithLoad() throws Exception {
+        IgniteEx ig0 = startGrids(2);
+
+        ig0.cluster().state(ClusterState.ACTIVE);
+
+        AtomicInteger cntr = new AtomicInteger();
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            ig0.cache(DEFAULT_CACHE_NAME).put(i, cntr.incrementAndGet());
+
+        GridCacheSharedContext<?, ?> cctx1 = grid(1).context().cache().context();
+        GridCacheDatabaseSharedManager db1 = (GridCacheDatabaseSharedManager)cctx1.database();
+
+        forceCheckpoint();
+
+        Map<String, Integer> rmtPartCRCs = new HashMap<>();
+        CountDownLatch cancelLatch = new CountDownLatch(1);
+
+        db1.addCheckpointListener(new DbCheckpointListener() {
+            /** {@inheritDoc} */
+            @Override public void beforeCheckpointBegin(Context ctx) {
+                //No-op.
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onMarkCheckpointBegin(Context ctx) {
+                // No-op.
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onCheckpointBegin(Context ctx) {
+                SnapshotFutureTask task = cctx1.snapshotMgr().lastScheduledRemoteSnapshotTask(grid(0).localNode().id());
+
+                // Skip first remote snapshot creation due to it will be cancelled.
+                if (task == null || cancelLatch.getCount() > 0)
+                    return;
+
+                Map<Integer, Set<Integer>> processed = GridTestUtils.getFieldValue(task,
+                    SnapshotFutureTask.class,
+                    "processed");
+
+                if (!processed.isEmpty()) {
+                    assert rmtPartCRCs.isEmpty();
+
+                    // Calculate actual partition CRCs when the checkpoint will be finished on this node.
+                    ctx.finishedStateFut().listen(f -> {
+                        File cacheWorkDir = ((FilePageStoreManager)grid(1).context().cache().context().pageStore())
+                            .cacheWorkDir(dfltCacheCfg);
+
+                        rmtPartCRCs.putAll(calculateCRC32Partitions(cacheWorkDir));
+                    });
+                }
+            }
+        });
+
+        IgniteSnapshotManager mgr0 = snp(ig0);
+
+        UUID rmtNodeId = grid(1).localNode().id();
+        Map<String, Integer> snpPartCRCs = new HashMap<>();
+
+        Map<Integer, Set<Integer>> parts = new HashMap<>();
+        parts.put(CU.cacheId(DEFAULT_CACHE_NAME), null);
+
+        IgniteInternalFuture<?> loadFut = GridTestUtils.runMultiThreadedAsync(() -> {
+            while (!Thread.currentThread().isInterrupted())
+                ig0.cache(DEFAULT_CACHE_NAME).put(cntr.incrementAndGet(), cntr.incrementAndGet());
+        }, 5, "cache-loader-");
+
+        try {
+            // Snapshot must be taken on node1 and transmitted to node0.
+            IgniteInternalFuture<?> fut = mgr0.requestRemoteSnapshot(rmtNodeId,
+                parts,
+                new BiConsumer<File, GroupPartitionId>() {
+                    @Override public void accept(File file, GroupPartitionId gprPartId) {
+                        log.info("Snapshot partition received successfully [rmtNodeId=" + rmtNodeId +
+                            ", part=" + file.getAbsolutePath() + ", gprPartId=" + gprPartId + ']');
+
+                        cancelLatch.countDown();
+                    }
+                });
+
+            cancelLatch.await();
+
+            fut.cancel();
+
+            IgniteInternalFuture<?> fut2 = mgr0.requestRemoteSnapshot(rmtNodeId,
+                parts,
+                (part, pair) -> {
+                    try {
+                        snpPartCRCs.put(part.getName(), FastCrc.calcCrc(part));
+                    }
+                    catch (IOException e) {
+                        throw new IgniteException(e);
+                    }
+                });
+
+            fut2.get();
+        }
+        finally {
+            loadFut.cancel();
+        }
+
+        assertEquals("Partitions from remote node must have the same CRCs as those which have been received",
+            rmtPartCRCs, snpPartCRCs);
+    }
+
+    /** @throws Exception If fails. */
+    @Test
+    public void testSnapshotRemoteOnBothNodes() throws Exception {
+        IgniteEx ig0 = startGrids(2);
+
+        ig0.cluster().state(ClusterState.ACTIVE);
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            ig0.cache(DEFAULT_CACHE_NAME).put(i, i);
+
+        forceCheckpoint(ig0);
+
+        IgniteSnapshotManager mgr0 = snp(ig0);
+        IgniteSnapshotManager mgr1 = snp(grid(1));
+
+        UUID node0 = grid(0).localNode().id();
+        UUID node1 = grid(1).localNode().id();
+
+        Map<Integer, Set<Integer>> fromNode1 = owningParts(ig0,
+            new HashSet<>(Collections.singletonList(CU.cacheId(DEFAULT_CACHE_NAME))),
+            node1);
+
+        Map<Integer, Set<Integer>> fromNode0 = owningParts(grid(1),
+            new HashSet<>(Collections.singletonList(CU.cacheId(DEFAULT_CACHE_NAME))),
+            node0);
+
+        // Snapshot must be taken on node1 and transmitted to node0.
+        IgniteInternalFuture<?> futFrom1To0 = mgr0.requestRemoteSnapshot(node1, fromNode1,
+            (part, pair) -> assertTrue("Received partition has not been requested", fromNode1.get(pair.getGroupId())
+                    .remove(pair.getPartitionId())));
+        IgniteInternalFuture<?> futFrom0To1 = mgr1.requestRemoteSnapshot(node0, fromNode0,
+            (part, pair) -> assertTrue("Received partition has not been requested", fromNode0.get(pair.getGroupId())
+                .remove(pair.getPartitionId())));
+
+        futFrom0To1.get();
+        futFrom1To0.get();
+
+        assertTrue("Not all of partitions have been received: " + fromNode1,
+            fromNode1.get(CU.cacheId(DEFAULT_CACHE_NAME)).isEmpty());
+        assertTrue("Not all of partitions have been received: " + fromNode0,
+            fromNode0.get(CU.cacheId(DEFAULT_CACHE_NAME)).isEmpty());
+    }
+
+    /** @throws Exception If fails. */
+    @Test(expected = ClusterTopologyCheckedException.class)
+    public void testRemoteSnapshotRequestedNodeLeft() throws Exception {
+        IgniteEx ig0 = startGridWithCache(dfltCacheCfg, CACHE_KEYS_RANGE);
+        IgniteEx ig1 = startGrid(1);
+
+        ig0.cluster().setBaselineTopology(ig0.cluster().forServers().nodes());
+
+        awaitPartitionMapExchange();
+
+        CountDownLatch hold = new CountDownLatch(1);
+
+        ((GridCacheDatabaseSharedManager)ig1.context().cache().context().database())
+            .addCheckpointListener(new DbCheckpointListener() {
+                /** {@inheritDoc} */
+                @Override public void beforeCheckpointBegin(Context ctx) throws IgniteCheckedException {
+                    // Listener will be executed inside the checkpoint thead.
+                    U.await(hold);
+                }
+
+                /** {@inheritDoc} */
+                @Override public void onMarkCheckpointBegin(Context ctx) {
+                    // No-op.
+                }
+
+                /** {@inheritDoc} */
+                @Override public void onCheckpointBegin(Context ctx) {
+                    // No-op.
+                }
+            });
+
+        UUID rmtNodeId = ig1.localNode().id();
+
+        Map<Integer, Set<Integer>> parts = new HashMap<>();
+        parts.put(CU.cacheId(DEFAULT_CACHE_NAME), null);
+
+        snp(ig0).requestRemoteSnapshot(rmtNodeId, parts, (part, grp) -> {});
+
+        IgniteInternalFuture<?>[] futs = new IgniteInternalFuture[1];
+
+        assertTrue(GridTestUtils.waitForCondition(new GridAbsPredicate() {
+            @Override public boolean apply() {
+                IgniteInternalFuture<Boolean> snpFut = snp(ig1)
+                    .lastScheduledRemoteSnapshotTask(ig0.localNode().id());
+
+                if (snpFut == null)
+                    return false;
+                else
+                    futs[0] = snpFut;
+
+                return true;
+            }
+        }, 5_000L));
+
+        stopGrid(0);
+
+        hold.countDown();
+
+        futs[0].get();
+    }
+
+    /**
+     * <pre>
+     * 1. Start 2 nodes.
+     * 2. Request snapshot from 2-nd node
+     * 3. Block snapshot-request message.
+     * 4. Start 3-rd node and change BLT.
+     * 5. Stop 3-rd node and change BLT.
+     * 6. 2-nd node now have MOVING partitions to be preloaded.
+     * 7. Release snapshot-request message.
+     * 8. Should get an error of snapshot creation since MOVING partitions cannot be snapshot.
+     * </pre>
+     *
+     * @throws Exception If fails.
+     */
+    @Test(expected = IgniteCheckedException.class)
+    public void testRemoteOutdatedSnapshot() throws Exception {
+        IgniteEx ig0 = startGrids(2);
+
+        ig0.cluster().state(ClusterState.ACTIVE);
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            ig0.cache(DEFAULT_CACHE_NAME).put(i, i);
+
+        awaitPartitionMapExchange();
+
+        forceCheckpoint();
+
+        TestRecordingCommunicationSpi.spi(ig0)
+            .blockMessages((node, msg) -> msg instanceof SnapshotRequestMessage);
+
+        UUID rmtNodeId = grid(1).localNode().id();
+
+        IgniteSnapshotManager mgr0 = snp(ig0);
+
+        // Snapshot must be taken on node1 and transmitted to node0.
+        IgniteInternalFuture<?> snpFut = mgr0.requestRemoteSnapshot(rmtNodeId,
+            owningParts(ig0, new HashSet<>(Collections.singletonList(CU.cacheId(DEFAULT_CACHE_NAME))), rmtNodeId),
+            (part, grp) -> {});
+
+        TestRecordingCommunicationSpi.spi(ig0)
+            .waitForBlocked();
+
+        startGrid(2);
+
+        ig0.cluster().setBaselineTopology(ig0.cluster().forServers().nodes());
+
+        awaitPartitionMapExchange();
+
+        stopGrid(2);
+
+        TestRecordingCommunicationSpi.spi(grid(1))
+            .blockMessages((node, msg) ->  msg instanceof GridDhtPartitionDemandMessage);
+
+        ig0.cluster().setBaselineTopology(ig0.cluster().forServers().nodes());
+
+        TestRecordingCommunicationSpi.spi(ig0)
+            .stopBlock(true, obj -> obj.get2().message() instanceof SnapshotRequestMessage);
+
+        snpFut.get();
+    }
+
+    /** @throws Exception If fails. */
+    @Test(expected = IgniteCheckedException.class)
+    public void testLocalSnapshotOnCacheStopped() throws Exception {
+        IgniteEx ig = startGridWithCache(dfltCacheCfg, CACHE_KEYS_RANGE);
+
+        startGrid(1);
+
+        ig.cluster().state(ClusterState.ACTIVE);
+
+        awaitPartitionMapExchange();
+
+        GridCacheSharedContext<?, ?> cctx0 = ig.context().cache().context();
+        IgniteSnapshotManager mgr = snp(ig);
+
+        CountDownLatch cpLatch = new CountDownLatch(1);
+
+        IgniteInternalFuture<?> snpFut = startLocalSnapshotTask(cctx0,
+            SNAPSHOT_NAME,
+            F.asMap(CU.cacheId(DEFAULT_CACHE_NAME), null),
+            new DelegateSnapshotSender(log, mgr.snapshotExecutorService(), mgr.localSnapshotSenderFactory().apply(SNAPSHOT_NAME)) {
+                @Override public void sendPart0(File part, String cacheDirName, GroupPartitionId pair, Long length) {
+                    try {
+                        U.await(cpLatch);
+
+                            delegate.sendPart0(part, cacheDirName, pair, length);
+                        } catch (IgniteInterruptedCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                    }
+                });
+
+        IgniteCache<?, ?> cache = ig.getOrCreateCache(DEFAULT_CACHE_NAME);
+
+        cache.destroy();
+
+        cpLatch.countDown();
+
+        snpFut.get(5_000, TimeUnit.MILLISECONDS);
+    }
+
+    /**
+     * @param src Source node to calculate.
+     * @param grps Groups to collect owning parts.
+     * @param rmtNodeId Remote node id.
+     * @return Map of collected parts.
+     */
+    private static Map<Integer, Set<Integer>> owningParts(IgniteEx src, Set<Integer> grps, UUID rmtNodeId) {
+        Map<Integer, Set<Integer>> result = new HashMap<>();
+
+        for (Integer grpId : grps) {
+            Set<Integer> parts = src.context()
+                .cache()
+                .cacheGroup(grpId)
+                .topology()
+                .partitions(rmtNodeId)
+                .entrySet()
+                .stream()
+                .filter(p -> p.getValue() == GridDhtPartitionState.OWNING)
+                .map(Map.Entry::getKey)
+                .collect(Collectors.toSet());
+
+            result.put(grpId, parts);
+        }
+
+        return result;
+    }
+
+    /**
+     * @param snpName Unique snapshot name.
+     * @param parts Collection of pairs group and appropriate cache partition to be snapshot.
+     * @param snpSndr Sender which used for snapshot sub-task processing.
+     * @return Future which will be completed when snapshot is done.
+     */
+    private static SnapshotFutureTask startLocalSnapshotTask(
+        GridCacheSharedContext<?, ?> cctx,
+        String snpName,
+        Map<Integer, Set<Integer>> parts,
+        SnapshotSender snpSndr
+    ) throws IgniteCheckedException{
+        SnapshotFutureTask snpFutTask = cctx.snapshotMgr().registerSnapshotTask(snpName, cctx.localNodeId(), parts, snpSndr);
+
+        snpFutTask.start();
+
+        // Snapshot is still in the INIT state. beforeCheckpoint has been skipped
+        // due to checkpoint already running and we need to schedule the next one
+        // right after current will be completed.
+        cctx.database().forceCheckpoint(String.format(CP_SNAPSHOT_REASON, snpName));
+
+        snpFutTask.awaitStarted();
+
+        return snpFutTask;
+    }
+
+    /** */
+    private static class ZeroPartitionAffinityFunction extends RendezvousAffinityFunction {
+        @Override public int partition(Object key) {
+            return 0;
+        }
+    }
+
+    /** */
+    private static class TestOrderItem implements Serializable {
+        /** Serial version. */
+        private static final long serialVersionUID = 0L;
+
+        /** Order key. */
+        private final int key;
+
+        /** Order value. */
+        private final int value;
 
 Review comment:
   Fixed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r410035608
 
 

 ##########
 File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotSelfTest.java
 ##########
 @@ -286,6 +293,134 @@ public void testSnapshotPrimaryBackupsTheSame() throws Exception {
         TestRecordingCommunicationSpi.stopBlockAll();
     }
 
+    /** @throws Exception If fails. */
+    @Test
+    public void testClusterSnapshotConsistencyUnderLoad() throws Exception {
+        int clients = 50;
+        int balance = 10_000;
+        int transferLimit = 1000;
+        int total = clients * balance * 2;
+        int grids = 3;
+        int transferThreadCnt = 4;
+        AtomicBoolean stop = new AtomicBoolean(false);
+        CountDownLatch txStarted = new CountDownLatch(1);
+
+        CacheConfiguration<Integer, Account> eastCcfg = txCacheConfig(new CacheConfiguration<>("east"));
+        CacheConfiguration<Integer, Account> westCcfg = txCacheConfig(new CacheConfiguration<>("west"));
+
+        for (int i = 0; i < grids; i++)
+            startGrid(optimize(getConfiguration(getTestIgniteInstanceName(i)).setCacheConfiguration(eastCcfg, westCcfg)));
+
+        grid(0).cluster().state(ACTIVE);
+
+        Ignite client = startClientGrid(grids);
+
+        IgniteCache<Integer, Account> eastCache = client.cache(eastCcfg.getName());
+        IgniteCache<Integer, Account> westCache = client.cache(westCcfg.getName());
+
+        // Create clients with zero balance.
 
 Review comment:
   With initial balance

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r409381353
 
 

 ##########
 File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotSelfTest.java
 ##########
 @@ -0,0 +1,734 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.file.OpenOption;
+import java.util.Collections;
+import java.util.List;
+import java.util.Random;
+import java.util.concurrent.Callable;
+import java.util.concurrent.CountDownLatch;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.function.Function;
+import java.util.function.Predicate;
+import org.apache.ignite.Ignite;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.Ignition;
+import org.apache.ignite.cache.CacheAtomicityMode;
+import org.apache.ignite.cache.affinity.rendezvous.RendezvousAffinityFunction;
+import org.apache.ignite.cache.query.ScanQuery;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.IgniteInterruptedCheckedException;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.TestRecordingCommunicationSpi;
+import org.apache.ignite.internal.events.DiscoveryCustomEvent;
+import org.apache.ignite.internal.managers.discovery.DiscoveryCustomMessage;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionDemandMessage;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionSupplyMessage;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.PartitionsExchangeAware;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.metric.MetricRegistry;
+import org.apache.ignite.internal.processors.metric.impl.ObjectGauge;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.distributed.FullMessage;
+import org.apache.ignite.internal.util.distributed.SingleNodeMessage;
+import org.apache.ignite.internal.util.typedef.G;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.spi.metric.LongMetric;
+import org.apache.ignite.testframework.GridTestUtils;
+import org.apache.ignite.transactions.Transaction;
+import org.junit.Before;
+import org.junit.Test;
+
+import static org.apache.ignite.cluster.ClusterState.ACTIVE;
+import static org.apache.ignite.internal.events.DiscoveryCustomEvent.EVT_DISCOVERY_CUSTOM_EVT;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.SNAPSHOT_METRICS;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.SNP_IN_PROGRESS_ERR_MSG;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.SNP_NODE_STOPPING_ERR_MSG;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.isSnapshotOperation;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.resolveSnapshotWorkDirectory;
+import static org.apache.ignite.testframework.GridTestUtils.assertThrowsAnyCause;
+import static org.apache.ignite.testframework.GridTestUtils.assertThrowsWithCause;
+
+/**
+ * Cluster-wide snapshot test.
+ */
+public class IgniteClusterSnapshotSelfTest extends AbstractSnapshotSelfTest {
+    /** Random instance. */
+    private static final Random R = new Random();
+
+    /** Time to wait while rebalance may happen. */
+    private static final long REBALANCE_AWAIT_TIME = GridTestUtils.SF.applyLB(10_000, 3_000);
+
+    /** Cache configuration for test. */
+    private static CacheConfiguration<Integer, Integer> txCcfg = new CacheConfiguration<Integer, Integer>("txCacheName")
+        .setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL)
+        .setBackups(2)
+        .setAffinity(new RendezvousAffinityFunction(false)
+            .setPartitions(CACHE_PARTS_COUNT));
+
+    /** {@code true} if node should be started in separate jvm. */
+    protected volatile boolean jvm;
+
+    /** @throws Exception If fails. */
+    @Before
+    @Override public void beforeTestSnapshot() throws Exception {
+        super.beforeTestSnapshot();
+
+        jvm = false;
+    }
+
+    /**
+     * Take snapshot from the whole cluster and check snapshot consistency.
+     * Note: Client nodes and server nodes not in baseline topology must not be affected.
+     *
+     * @throws Exception If fails.
+     */
+    @Test
+    public void testConsistentClusterSnapshotUnderLoad() throws Exception {
+        int grids = 3;
+        String snpName = "backup23012020";
+        AtomicInteger atKey = new AtomicInteger(CACHE_KEYS_RANGE);
+        AtomicInteger txKey = new AtomicInteger(CACHE_KEYS_RANGE);
+
+        IgniteEx ignite = startGrids(grids);
+        startClientGrid();
+
+        ignite.cluster().baselineAutoAdjustEnabled(false);
+        ignite.cluster().state(ACTIVE);
+
+        // Start node not in baseline.
+        IgniteEx notBltIgnite = startGrid(grids);
+        File locSnpDir = snp(notBltIgnite).snapshotLocalDir(SNAPSHOT_NAME);
+        String notBltDirName = folderName(notBltIgnite);
+
+        IgniteCache<Integer, Integer> cache = ignite.createCache(txCcfg);
+
+        for (int idx = 0; idx < CACHE_KEYS_RANGE; idx++) {
+            cache.put(txKey.incrementAndGet(), -1);
+            ignite.cache(DEFAULT_CACHE_NAME).put(atKey.incrementAndGet(), -1);
+        }
+
+        forceCheckpoint();
+
+        CountDownLatch loadLatch = new CountDownLatch(1);
+
+        ignite.context().cache().context().exchange().registerExchangeAwareComponent(new PartitionsExchangeAware() {
+            /** {@inheritDoc} */
+            @Override public void onInitBeforeTopologyLock(GridDhtPartitionsExchangeFuture fut) {
+                if (fut.firstEvent().type() != EVT_DISCOVERY_CUSTOM_EVT)
+                    return;
+
+                // First discovery custom event will be a snapshot operation.
+                assertTrue(isSnapshotOperation(fut.firstEvent()));
+                assertTrue("Snapshot must use pme-free exchange", fut.context().exchangeFreeSwitch());
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onInitAfterTopologyLock(GridDhtPartitionsExchangeFuture fut) {
+                if (fut.firstEvent().type() != EVT_DISCOVERY_CUSTOM_EVT)
+                    return;
+
+                DiscoveryCustomMessage msg = ((DiscoveryCustomEvent)fut.firstEvent()).customMessage();
+
+                assertNotNull(msg);
+
+                if (msg instanceof SnapshotDiscoveryMessage)
+                    loadLatch.countDown();
+            }
+        });
+
+        // Start cache load
+        IgniteInternalFuture<Long> loadFut = GridTestUtils.runMultiThreadedAsync(() -> {
+            try {
+                U.await(loadLatch);
+
+                while (!Thread.currentThread().isInterrupted()) {
+                    int txIdx = R.nextInt(grids);
+
+                    // zero out the sign bit
 
 Review comment:
   Upcase, point

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r409054064
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManager.java
 ##########
 @@ -0,0 +1,1986 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.channels.FileChannel;
+import java.nio.file.FileVisitResult;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.SimpleFileVisitor;
+import java.nio.file.attribute.BasicFileAttributes;
+import java.util.ArrayDeque;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Queue;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.Executor;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.concurrent.atomic.LongAdder;
+import java.util.function.BiConsumer;
+import java.util.function.BiFunction;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.events.DiscoveryEvent;
+import org.apache.ignite.failure.FailureContext;
+import org.apache.ignite.failure.FailureType;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.GridTopic;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.events.DiscoveryCustomEvent;
+import org.apache.ignite.internal.managers.communication.GridIoManager;
+import org.apache.ignite.internal.managers.communication.GridMessageListener;
+import org.apache.ignite.internal.managers.communication.TransmissionCancelledException;
+import org.apache.ignite.internal.managers.communication.TransmissionHandler;
+import org.apache.ignite.internal.managers.communication.TransmissionMeta;
+import org.apache.ignite.internal.managers.communication.TransmissionPolicy;
+import org.apache.ignite.internal.managers.discovery.DiscoCache;
+import org.apache.ignite.internal.managers.discovery.DiscoveryCustomMessage;
+import org.apache.ignite.internal.managers.discovery.GridDiscoveryManager;
+import org.apache.ignite.internal.managers.eventstorage.DiscoveryEventListener;
+import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.CacheType;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.PartitionsExchangeAware;
+import org.apache.ignite.internal.processors.cache.persistence.StorageException;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.filename.PdsFolderSettings;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.MetastorageLifecycleListener;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadOnlyMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadWriteMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.tree.io.PageIO;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.processors.marshaller.MappedName;
+import org.apache.ignite.internal.processors.metric.MetricRegistry;
+import org.apache.ignite.internal.processors.metric.impl.LongAdderMetric;
+import org.apache.ignite.internal.util.GridBusyLock;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.distributed.InitMessage;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.tostring.GridToStringInclude;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.A;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.S;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.lang.IgniteUuid;
+import org.apache.ignite.thread.IgniteThreadPoolExecutor;
+import org.apache.ignite.thread.OomExceptionHandler;
+import org.jetbrains.annotations.Nullable;
+
+import static java.nio.file.StandardOpenOption.READ;
+import static org.apache.ignite.cluster.ClusterState.active;
+import static org.apache.ignite.configuration.IgniteConfiguration.DFLT_SNAPSHOT_DIRECTORY;
+import static org.apache.ignite.events.EventType.EVT_NODE_FAILED;
+import static org.apache.ignite.events.EventType.EVT_NODE_LEFT;
+import static org.apache.ignite.internal.IgniteFeatures.PERSISTENCE_CACHE_SNAPSHOT;
+import static org.apache.ignite.internal.IgniteFeatures.nodeSupports;
+import static org.apache.ignite.internal.MarshallerContextImpl.mappingFileStoreWorkDir;
+import static org.apache.ignite.internal.MarshallerContextImpl.saveMappings;
+import static org.apache.ignite.internal.events.DiscoveryCustomEvent.EVT_DISCOVERY_CUSTOM_EVT;
+import static org.apache.ignite.internal.managers.communication.GridIoPolicy.SYSTEM_POOL;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.INDEX_PARTITION;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.MAX_PARTITION_ID;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.resolveBinaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.INDEX_FILE_NAME;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.PART_FILE_TEMPLATE;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFile;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFileName;
+import static org.apache.ignite.internal.processors.cache.persistence.filename.PdsConsistentIdProcessor.DB_DEFAULT_FOLDER;
+import static org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId.getFlagByPartId;
+import static org.apache.ignite.internal.util.IgniteUtils.isLocalNodeCoordinator;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.END_SNAPSHOT;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.START_SNAPSHOT;
+
+/**
+ * Internal implementation of snapshot operations over persistence caches.
+ * <p>
+ * There are two major actions available:
+ * <ul>
+ *     <li>Create snapshot of the whole cluster cache groups by triggering PME to achieve consistency.</li>
+ *     <li>Create local snapshot of requested cache groups and send it to the node which request this operation.
+ *     Cache groups will be transmitted using internal API for transferring files. See {@link TransmissionHandler}.</li>
+ * </ul>
+ */
+public class IgniteSnapshotManager extends GridCacheSharedManagerAdapter
+    implements IgniteSnapshot, PartitionsExchangeAware, MetastorageLifecycleListener {
+    /** File with delta pages suffix. */
+    public static final String DELTA_SUFFIX = ".delta";
+
+    /** File name template consists of delta pages. */
+    public static final String PART_DELTA_TEMPLATE = PART_FILE_TEMPLATE + DELTA_SUFFIX;
+
+    /** File name template for index delta pages. */
+    public static final String INDEX_DELTA_NAME = INDEX_FILE_NAME + DELTA_SUFFIX;
+
+    /** Text Reason for checkpoint to start snapshot operation. */
+    public static final String CP_SNAPSHOT_REASON = "Checkpoint started to enforce snapshot operation: %s";
+
+    /** Name prefix for each remote snapshot operation. */
+    public static final String RMT_SNAPSHOT_PREFIX = "snapshot_";
+
+    /** Default snapshot directory for loading remote snapshots. */
+    public static final String DFLT_SNAPSHOT_TMP_DIR = "snp";
+
+    /** Timeout in millisecond for snapshot operations. */
+    public static final long DFLT_SNAPSHOT_TIMEOUT = 15_000L;
+
+    /** Snapshot in progress error message. */
+    public static final String SNP_IN_PROGRESS_ERR_MSG = "Operation rejected due to the snapshot operation in progress.";
+
+    /** Error message to finalize snapshot tasks. */
+    public static final String SNP_NODE_STOPPING_ERR_MSG = "Snapshot has been cancelled due to the local node " +
+        "is stopping";
+
+    /** Metastorage key to save currently running snapshot. */
+    public static final String SNP_RUNNING_KEY = "snapshot-running";
+
+    /** Snapshot metrics prefix. */
+    public static final String SNAPSHOT_METRICS = "snapshot";
+
+    /** Prefix for snapshot threads. */
+    private static final String SNAPSHOT_RUNNER_THREAD_PREFIX = "snapshot-runner";
+
+    /** Total number of thread to perform local snapshot. */
+    private static final int SNAPSHOT_THREAD_POOL_SIZE = 4;
+
+    /** Default snapshot topic to receive snapshots from remote node. */
+    private static final Object DFLT_INITIAL_SNAPSHOT_TOPIC = GridTopic.TOPIC_SNAPSHOT.topic("rmt_snp");
+
+    /** File transmission parameter of cache group id. */
+    private static final String SNP_GRP_ID_PARAM = "grpId";
+
+    /** File transmission parameter of cache partition id. */
+    private static final String SNP_PART_ID_PARAM = "partId";
+
+    /** File transmission parameter of node-sender directory path with its consistentId (e.g. db/IgniteNode0). */
+    private static final String SNP_DB_NODE_PATH_PARAM = "dbNodePath";
+
+    /** File transmission parameter of a cache directory with is currently sends its partitions. */
+    private static final String SNP_CACHE_DIR_NAME_PARAM = "cacheDirName";
+
+    /** Snapshot parameter name for a file transmission. */
+    private static final String SNP_NAME_PARAM = "snpName";
+
+    /** Total snapshot files count which receiver should expect to receive. */
+    private static final String SNP_PARTITIONS_CNT = "partsCnt";
+
+    /**
+     * Local buffer to perform copy-on-write operations with pages for {@code SnapshotFutureTask.PageStoreSerialWriter}s.
+     * It is important to have only only buffer per thread (instead of creating each buffer per
+     * each {@code SnapshotFutureTask.PageStoreSerialWriter}) this is redundant and can lead to OOM errors. Direct buffer
+     * deallocate only when ByteBuffer is garbage collected, but it can get out of off-heap memory before it.
+     */
+    private final ThreadLocal<ByteBuffer> locBuff;
+
+    /** Map of registered cache snapshot processes and their corresponding contexts. */
+    private final ConcurrentMap<String, SnapshotFutureTask> locSnpTasks = new ConcurrentHashMap<>();
+
+    /** Lock to protect the resources is used. */
+    private final GridBusyLock busyLock = new GridBusyLock();
+
+    /** Requested snapshot from remote node. */
+    private final AtomicReference<RemoteSnapshotFuture> rmtSnpReq = new AtomicReference<>();
+
+    /** Mutex used to order cluster snapshot operation progress. */
+    private final Object snpOpMux = new Object();
+
+    /** Take snapshot operation procedure. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> startSnpProc;
+
+    /** Check previously performed snapshot operation and delete uncompleted files if need. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> endSnpProc;
+
+    /** Resolved persistent data storage settings. */
+    private volatile PdsFolderSettings pdsSettings;
+
+    /** Fully initialized metastorage. */
+    private volatile ReadWriteMetastorage metaStorage;
+
+    /** Local snapshot sender factory. */
+    private Function<String, SnapshotSender> locSndrFactory = localSnapshotSenderFactory();
+
+    /** Main snapshot directory to save created snapshots. */
+    private volatile File locSnpDir;
+
+    /**
+     * Working directory for loaded snapshots from the remote nodes and storing
+     * temporary partition delta-files of locally started snapshot process.
+     */
+    private File tmpWorkDir;
+
+    /** Factory to working with delta as file storage. */
+    private volatile FileIOFactory ioFactory = new RandomAccessFileIOFactory();
+
+    /** Factory to create page store for restore. */
+    private volatile BiFunction<Integer, Boolean, FilePageStoreFactory> storeFactory;
+
+    /** Snapshot thread pool to perform local partition snapshots. */
+    private ExecutorService snpRunner;
+
+    /** System discovery message listener. */
+    private DiscoveryEventListener discoLsnr;
+
+    /** Cluster snapshot operation requested by user. */
+    private ClusterSnapshotFuture clusterSnpFut;
+
+    /** Current snapshot operation on local node. */
+    private volatile SnapshotOperationRequest clusterSnpRq;
+
+    /** {@code true} if recovery process occurred for snapshot. */
+    private volatile boolean recovered;
+
+    /** Last seen cluster snapshot operation. */
+    private volatile ClusterSnapshotFuture lastSeenSnpFut = new ClusterSnapshotFuture();
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public IgniteSnapshotManager(GridKernalContext ctx) {
+        locBuff = ThreadLocal.withInitial(() ->
+            ByteBuffer.allocateDirect(ctx.config().getDataStorageConfiguration().getPageSize())
+                .order(ByteOrder.nativeOrder()));
+
+        startSnpProc = new DistributedProcess<>(ctx, START_SNAPSHOT, this::initLocalSnapshotStartStage,
+            this::processLocalSnapshotStartStageResult);
+
+        endSnpProc = new DistributedProcess<>(ctx, END_SNAPSHOT, this::initLocalSnapshotEndStage,
+            this::processLocalSnapshotEndStageResult);
+    }
+
+    /**
+     * @param snapshotCacheDir Snapshot directory to store files.
+     * @param partId Cache partition identifier.
+     * @return A file representation.
+     */
+    public static File partDeltaFile(File snapshotCacheDir, int partId) {
+        return new File(snapshotCacheDir, partDeltaFileName(partId));
+    }
+
+    /**
+     * @param partId Partition id.
+     * @return File name of delta partition pages.
+     */
+    public static String partDeltaFileName(int partId) {
+        assert partId <= MAX_PARTITION_ID || partId == INDEX_PARTITION;
+
+        return partId == INDEX_PARTITION ? INDEX_DELTA_NAME : String.format(PART_DELTA_TEMPLATE, partId);
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void start0() throws IgniteCheckedException {
+        super.start0();
+
+        GridKernalContext ctx = cctx.kernalContext();
+
+        if (ctx.clientNode())
+            return;
+
+        if (!CU.isPersistenceEnabled(ctx.config()))
+            return;
+
+        snpRunner = new IgniteThreadPoolExecutor(SNAPSHOT_RUNNER_THREAD_PREFIX,
+            cctx.igniteInstanceName(),
+            SNAPSHOT_THREAD_POOL_SIZE,
+            SNAPSHOT_THREAD_POOL_SIZE,
+            IgniteConfiguration.DFLT_THREAD_KEEP_ALIVE_TIME,
+            new LinkedBlockingQueue<>(),
+            SYSTEM_POOL,
+            new OomExceptionHandler(ctx));
+
+        assert cctx.pageStore() instanceof FilePageStoreManager;
+
+        FilePageStoreManager storeMgr = (FilePageStoreManager)cctx.pageStore();
+
+        pdsSettings = cctx.kernalContext().pdsFolderResolver().resolveFolders();
+
+        locSnpDir = resolveSnapshotWorkDirectory(ctx.config());
+        tmpWorkDir = Paths.get(storeMgr.workDir().getAbsolutePath(), DFLT_SNAPSHOT_TMP_DIR).toFile();
+
+        U.ensureDirectory(locSnpDir, "snapshot work directory", log);
+        U.ensureDirectory(tmpWorkDir, "temp directory for snapshot creation", log);
+
+        MetricRegistry mreg = cctx.kernalContext().metric().registry(SNAPSHOT_METRICS);
+
+        mreg.register("LastSnapshotStartTime", () -> lastSeenSnpFut.startTime,
+            "The system time approximated by 10 ms when the last cluster snapshot operation has been started.");
+        mreg.register("LastSnapshotEndTime", () -> lastSeenSnpFut.endTime,
+            "The system time approximated by 10 ms when the last cluster snapshot operation has been finished.");
+        mreg.register("LastSnapshotName", () -> lastSeenSnpFut.name, String.class,
+            "The name of last started cluster snapshot operation.");
+        mreg.register("LastSnapshotErrorMessage",
+            () -> lastSeenSnpFut.error() == null ? null : lastSeenSnpFut.error().getMessage(),
+            String.class,
+            "The error message of last started cluster snapshot operation which fail. This value will be 'null' " +
+                "if last snapshot operation completed successfully.");
+        mreg.register("localSnapshotList", this::getSnapshots, List.class,
 
 Review comment:
   Fixed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r409104355
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManager.java
 ##########
 @@ -0,0 +1,1986 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.channels.FileChannel;
+import java.nio.file.FileVisitResult;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.SimpleFileVisitor;
+import java.nio.file.attribute.BasicFileAttributes;
+import java.util.ArrayDeque;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Queue;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.Executor;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.concurrent.atomic.LongAdder;
+import java.util.function.BiConsumer;
+import java.util.function.BiFunction;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.events.DiscoveryEvent;
+import org.apache.ignite.failure.FailureContext;
+import org.apache.ignite.failure.FailureType;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.GridTopic;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.events.DiscoveryCustomEvent;
+import org.apache.ignite.internal.managers.communication.GridIoManager;
+import org.apache.ignite.internal.managers.communication.GridMessageListener;
+import org.apache.ignite.internal.managers.communication.TransmissionCancelledException;
+import org.apache.ignite.internal.managers.communication.TransmissionHandler;
+import org.apache.ignite.internal.managers.communication.TransmissionMeta;
+import org.apache.ignite.internal.managers.communication.TransmissionPolicy;
+import org.apache.ignite.internal.managers.discovery.DiscoCache;
+import org.apache.ignite.internal.managers.discovery.DiscoveryCustomMessage;
+import org.apache.ignite.internal.managers.discovery.GridDiscoveryManager;
+import org.apache.ignite.internal.managers.eventstorage.DiscoveryEventListener;
+import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.CacheType;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.PartitionsExchangeAware;
+import org.apache.ignite.internal.processors.cache.persistence.StorageException;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.filename.PdsFolderSettings;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.MetastorageLifecycleListener;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadOnlyMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadWriteMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.tree.io.PageIO;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.processors.marshaller.MappedName;
+import org.apache.ignite.internal.processors.metric.MetricRegistry;
+import org.apache.ignite.internal.processors.metric.impl.LongAdderMetric;
+import org.apache.ignite.internal.util.GridBusyLock;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.distributed.InitMessage;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.tostring.GridToStringInclude;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.A;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.S;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.lang.IgniteUuid;
+import org.apache.ignite.thread.IgniteThreadPoolExecutor;
+import org.apache.ignite.thread.OomExceptionHandler;
+import org.jetbrains.annotations.Nullable;
+
+import static java.nio.file.StandardOpenOption.READ;
+import static org.apache.ignite.cluster.ClusterState.active;
+import static org.apache.ignite.configuration.IgniteConfiguration.DFLT_SNAPSHOT_DIRECTORY;
+import static org.apache.ignite.events.EventType.EVT_NODE_FAILED;
+import static org.apache.ignite.events.EventType.EVT_NODE_LEFT;
+import static org.apache.ignite.internal.IgniteFeatures.PERSISTENCE_CACHE_SNAPSHOT;
+import static org.apache.ignite.internal.IgniteFeatures.nodeSupports;
+import static org.apache.ignite.internal.MarshallerContextImpl.mappingFileStoreWorkDir;
+import static org.apache.ignite.internal.MarshallerContextImpl.saveMappings;
+import static org.apache.ignite.internal.events.DiscoveryCustomEvent.EVT_DISCOVERY_CUSTOM_EVT;
+import static org.apache.ignite.internal.managers.communication.GridIoPolicy.SYSTEM_POOL;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.INDEX_PARTITION;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.MAX_PARTITION_ID;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.resolveBinaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.INDEX_FILE_NAME;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.PART_FILE_TEMPLATE;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFile;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFileName;
+import static org.apache.ignite.internal.processors.cache.persistence.filename.PdsConsistentIdProcessor.DB_DEFAULT_FOLDER;
+import static org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId.getFlagByPartId;
+import static org.apache.ignite.internal.util.IgniteUtils.isLocalNodeCoordinator;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.END_SNAPSHOT;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.START_SNAPSHOT;
+
+/**
+ * Internal implementation of snapshot operations over persistence caches.
+ * <p>
+ * There are two major actions available:
+ * <ul>
+ *     <li>Create snapshot of the whole cluster cache groups by triggering PME to achieve consistency.</li>
+ *     <li>Create local snapshot of requested cache groups and send it to the node which request this operation.
+ *     Cache groups will be transmitted using internal API for transferring files. See {@link TransmissionHandler}.</li>
+ * </ul>
+ */
+public class IgniteSnapshotManager extends GridCacheSharedManagerAdapter
+    implements IgniteSnapshot, PartitionsExchangeAware, MetastorageLifecycleListener {
+    /** File with delta pages suffix. */
+    public static final String DELTA_SUFFIX = ".delta";
+
+    /** File name template consists of delta pages. */
+    public static final String PART_DELTA_TEMPLATE = PART_FILE_TEMPLATE + DELTA_SUFFIX;
+
+    /** File name template for index delta pages. */
+    public static final String INDEX_DELTA_NAME = INDEX_FILE_NAME + DELTA_SUFFIX;
+
+    /** Text Reason for checkpoint to start snapshot operation. */
+    public static final String CP_SNAPSHOT_REASON = "Checkpoint started to enforce snapshot operation: %s";
+
+    /** Name prefix for each remote snapshot operation. */
+    public static final String RMT_SNAPSHOT_PREFIX = "snapshot_";
+
+    /** Default snapshot directory for loading remote snapshots. */
+    public static final String DFLT_SNAPSHOT_TMP_DIR = "snp";
+
+    /** Timeout in millisecond for snapshot operations. */
+    public static final long DFLT_SNAPSHOT_TIMEOUT = 15_000L;
+
+    /** Snapshot in progress error message. */
+    public static final String SNP_IN_PROGRESS_ERR_MSG = "Operation rejected due to the snapshot operation in progress.";
+
+    /** Error message to finalize snapshot tasks. */
+    public static final String SNP_NODE_STOPPING_ERR_MSG = "Snapshot has been cancelled due to the local node " +
+        "is stopping";
+
+    /** Metastorage key to save currently running snapshot. */
+    public static final String SNP_RUNNING_KEY = "snapshot-running";
+
+    /** Snapshot metrics prefix. */
+    public static final String SNAPSHOT_METRICS = "snapshot";
+
+    /** Prefix for snapshot threads. */
+    private static final String SNAPSHOT_RUNNER_THREAD_PREFIX = "snapshot-runner";
+
+    /** Total number of thread to perform local snapshot. */
+    private static final int SNAPSHOT_THREAD_POOL_SIZE = 4;
+
+    /** Default snapshot topic to receive snapshots from remote node. */
+    private static final Object DFLT_INITIAL_SNAPSHOT_TOPIC = GridTopic.TOPIC_SNAPSHOT.topic("rmt_snp");
+
+    /** File transmission parameter of cache group id. */
+    private static final String SNP_GRP_ID_PARAM = "grpId";
+
+    /** File transmission parameter of cache partition id. */
+    private static final String SNP_PART_ID_PARAM = "partId";
+
+    /** File transmission parameter of node-sender directory path with its consistentId (e.g. db/IgniteNode0). */
+    private static final String SNP_DB_NODE_PATH_PARAM = "dbNodePath";
+
+    /** File transmission parameter of a cache directory with is currently sends its partitions. */
+    private static final String SNP_CACHE_DIR_NAME_PARAM = "cacheDirName";
+
+    /** Snapshot parameter name for a file transmission. */
+    private static final String SNP_NAME_PARAM = "snpName";
+
+    /** Total snapshot files count which receiver should expect to receive. */
+    private static final String SNP_PARTITIONS_CNT = "partsCnt";
+
+    /**
+     * Local buffer to perform copy-on-write operations with pages for {@code SnapshotFutureTask.PageStoreSerialWriter}s.
+     * It is important to have only only buffer per thread (instead of creating each buffer per
+     * each {@code SnapshotFutureTask.PageStoreSerialWriter}) this is redundant and can lead to OOM errors. Direct buffer
+     * deallocate only when ByteBuffer is garbage collected, but it can get out of off-heap memory before it.
+     */
+    private final ThreadLocal<ByteBuffer> locBuff;
+
+    /** Map of registered cache snapshot processes and their corresponding contexts. */
+    private final ConcurrentMap<String, SnapshotFutureTask> locSnpTasks = new ConcurrentHashMap<>();
+
+    /** Lock to protect the resources is used. */
+    private final GridBusyLock busyLock = new GridBusyLock();
+
+    /** Requested snapshot from remote node. */
+    private final AtomicReference<RemoteSnapshotFuture> rmtSnpReq = new AtomicReference<>();
+
+    /** Mutex used to order cluster snapshot operation progress. */
+    private final Object snpOpMux = new Object();
+
+    /** Take snapshot operation procedure. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> startSnpProc;
+
+    /** Check previously performed snapshot operation and delete uncompleted files if need. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> endSnpProc;
+
+    /** Resolved persistent data storage settings. */
+    private volatile PdsFolderSettings pdsSettings;
+
+    /** Fully initialized metastorage. */
+    private volatile ReadWriteMetastorage metaStorage;
+
+    /** Local snapshot sender factory. */
+    private Function<String, SnapshotSender> locSndrFactory = localSnapshotSenderFactory();
+
+    /** Main snapshot directory to save created snapshots. */
+    private volatile File locSnpDir;
+
+    /**
+     * Working directory for loaded snapshots from the remote nodes and storing
+     * temporary partition delta-files of locally started snapshot process.
+     */
+    private File tmpWorkDir;
+
+    /** Factory to working with delta as file storage. */
+    private volatile FileIOFactory ioFactory = new RandomAccessFileIOFactory();
+
+    /** Factory to create page store for restore. */
+    private volatile BiFunction<Integer, Boolean, FilePageStoreFactory> storeFactory;
+
+    /** Snapshot thread pool to perform local partition snapshots. */
+    private ExecutorService snpRunner;
+
+    /** System discovery message listener. */
+    private DiscoveryEventListener discoLsnr;
+
+    /** Cluster snapshot operation requested by user. */
+    private ClusterSnapshotFuture clusterSnpFut;
+
+    /** Current snapshot operation on local node. */
+    private volatile SnapshotOperationRequest clusterSnpRq;
+
+    /** {@code true} if recovery process occurred for snapshot. */
+    private volatile boolean recovered;
+
+    /** Last seen cluster snapshot operation. */
+    private volatile ClusterSnapshotFuture lastSeenSnpFut = new ClusterSnapshotFuture();
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public IgniteSnapshotManager(GridKernalContext ctx) {
+        locBuff = ThreadLocal.withInitial(() ->
+            ByteBuffer.allocateDirect(ctx.config().getDataStorageConfiguration().getPageSize())
+                .order(ByteOrder.nativeOrder()));
+
+        startSnpProc = new DistributedProcess<>(ctx, START_SNAPSHOT, this::initLocalSnapshotStartStage,
+            this::processLocalSnapshotStartStageResult);
+
+        endSnpProc = new DistributedProcess<>(ctx, END_SNAPSHOT, this::initLocalSnapshotEndStage,
+            this::processLocalSnapshotEndStageResult);
+    }
+
+    /**
+     * @param snapshotCacheDir Snapshot directory to store files.
+     * @param partId Cache partition identifier.
+     * @return A file representation.
+     */
+    public static File partDeltaFile(File snapshotCacheDir, int partId) {
+        return new File(snapshotCacheDir, partDeltaFileName(partId));
+    }
+
+    /**
+     * @param partId Partition id.
+     * @return File name of delta partition pages.
+     */
+    public static String partDeltaFileName(int partId) {
+        assert partId <= MAX_PARTITION_ID || partId == INDEX_PARTITION;
+
+        return partId == INDEX_PARTITION ? INDEX_DELTA_NAME : String.format(PART_DELTA_TEMPLATE, partId);
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void start0() throws IgniteCheckedException {
+        super.start0();
+
+        GridKernalContext ctx = cctx.kernalContext();
+
+        if (ctx.clientNode())
+            return;
+
+        if (!CU.isPersistenceEnabled(ctx.config()))
+            return;
+
+        snpRunner = new IgniteThreadPoolExecutor(SNAPSHOT_RUNNER_THREAD_PREFIX,
+            cctx.igniteInstanceName(),
+            SNAPSHOT_THREAD_POOL_SIZE,
+            SNAPSHOT_THREAD_POOL_SIZE,
+            IgniteConfiguration.DFLT_THREAD_KEEP_ALIVE_TIME,
+            new LinkedBlockingQueue<>(),
+            SYSTEM_POOL,
+            new OomExceptionHandler(ctx));
+
+        assert cctx.pageStore() instanceof FilePageStoreManager;
+
+        FilePageStoreManager storeMgr = (FilePageStoreManager)cctx.pageStore();
+
+        pdsSettings = cctx.kernalContext().pdsFolderResolver().resolveFolders();
+
+        locSnpDir = resolveSnapshotWorkDirectory(ctx.config());
+        tmpWorkDir = Paths.get(storeMgr.workDir().getAbsolutePath(), DFLT_SNAPSHOT_TMP_DIR).toFile();
+
+        U.ensureDirectory(locSnpDir, "snapshot work directory", log);
+        U.ensureDirectory(tmpWorkDir, "temp directory for snapshot creation", log);
+
+        MetricRegistry mreg = cctx.kernalContext().metric().registry(SNAPSHOT_METRICS);
+
+        mreg.register("LastSnapshotStartTime", () -> lastSeenSnpFut.startTime,
+            "The system time approximated by 10 ms when the last cluster snapshot operation has been started.");
+        mreg.register("LastSnapshotEndTime", () -> lastSeenSnpFut.endTime,
+            "The system time approximated by 10 ms when the last cluster snapshot operation has been finished.");
+        mreg.register("LastSnapshotName", () -> lastSeenSnpFut.name, String.class,
+            "The name of last started cluster snapshot operation.");
+        mreg.register("LastSnapshotErrorMessage",
+            () -> lastSeenSnpFut.error() == null ? null : lastSeenSnpFut.error().getMessage(),
+            String.class,
+            "The error message of last started cluster snapshot operation which fail. This value will be 'null' " +
+                "if last snapshot operation completed successfully.");
+        mreg.register("localSnapshotList", this::getSnapshots, List.class,
+            "The list of all known snapshots currently saved on the local node with respect to " +
+                "configured via IgniteConfiguration a snapshot path.");
+
+        storeFactory = storeMgr::getPageStoreFactory;
+
+        cctx.exchange().registerExchangeAwareComponent(this);
+        ctx.internalSubscriptionProcessor().registerMetastorageListener(this);
+
+        // Receive remote snapshots requests.
+        cctx.gridIO().addMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC, new GridMessageListener() {
+            @Override public void onMessage(UUID nodeId, Object msg, byte plc) {
+                if (!busyLock.enterBusy())
+                    return;
+
+                try {
+                    if (msg instanceof SnapshotRequestMessage) {
+                        SnapshotRequestMessage reqMsg0 = (SnapshotRequestMessage)msg;
+                        String snpName = reqMsg0.snapshotName();
+
+                        synchronized (this) {
+                            SnapshotFutureTask task = lastScheduledRemoteSnapshotTask(nodeId);
+
+                            if (task != null) {
+                                // Task will also be removed from local map due to the listener on future done.
+                                task.cancel();
+
+                                log.info("Snapshot request has been cancelled due to another request received " +
+                                    "[prevSnpResp=" + task + ", msg0=" + reqMsg0 + ']');
+                            }
+                        }
+
+                        SnapshotFutureTask task = registerSnapshotTask(snpName,
+                            nodeId,
+                            reqMsg0.parts(),
+                            remoteSnapshotSender(snpName, nodeId));
+
+                        task.listen(f -> {
+                            if (f.error() == null)
+                                return;
+
+                            U.error(log, "Failed to process request of creating a snapshot " +
+                                "[from=" + nodeId + ", msg=" + reqMsg0 + ']', f.error());
+
+                            try {
+                                cctx.gridIO().sendToCustomTopic(nodeId,
+                                    DFLT_INITIAL_SNAPSHOT_TOPIC,
+                                    new SnapshotResponseMessage(reqMsg0.snapshotName(), f.error().getMessage()),
+                                    SYSTEM_POOL);
+                            }
+                            catch (IgniteCheckedException ex0) {
+                                U.error(log, "Fail to send the response message with processing snapshot request " +
+                                    "error [request=" + reqMsg0 + ", nodeId=" + nodeId + ']', ex0);
+                            }
+                        });
+
+                        task.start();
+                    }
+                    else if (msg instanceof SnapshotResponseMessage) {
+                        SnapshotResponseMessage respMsg0 = (SnapshotResponseMessage)msg;
+
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.snpName.equals(respMsg0.snapshotName())) {
+                            if (log.isInfoEnabled()) {
+                                log.info("A stale snapshot response message has been received. Will be ignored " +
+                                    "[fromNodeId=" + nodeId + ", response=" + respMsg0 + ']');
+                            }
+
+                            return;
+                        }
+
+                        if (respMsg0.errorMessage() != null) {
+                            fut0.onDone(new IgniteCheckedException("Request cancelled. The snapshot operation stopped " +
+                                "on the remote node with an error: " + respMsg0.errorMessage()));
+                        }
+                    }
+                }
+                catch (Throwable e) {
+                    U.error(log, "Processing snapshot request from remote node fails with an error", e);
+
+                    cctx.kernalContext().failure().process(new FailureContext(FailureType.CRITICAL_ERROR, e));
+                }
+                finally {
+                    busyLock.leaveBusy();
+                }
+            }
+        });
+
+        cctx.gridEvents().addDiscoveryEventListener(discoLsnr = (evt, discoCache) -> {
+            if (!busyLock.enterBusy())
+                return;
+
+            try {
+                UUID leftNodeId = evt.eventNode().id();
+
+                if (evt.type() == EVT_DISCOVERY_CUSTOM_EVT) {
+                    DiscoveryCustomEvent evt0 = (DiscoveryCustomEvent)evt;
+
+                    if (evt0.customMessage() instanceof InitMessage) {
+                        InitMessage<?> msg = (InitMessage<?>)evt0.customMessage();
+
+                        // This happens when #takeSnapshot() method already invoked and distributed process
+                        // starts its action.
+                        if (msg.type() == START_SNAPSHOT.ordinal()) {
+                            assert clusterSnpRq != null ||
+                                !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()) : evt;
+
+                            DiscoveryCustomEvent customEvt = new DiscoveryCustomEvent();
+
+                            customEvt.node(evt0.node());
+                            customEvt.eventNode(evt0.eventNode());
+                            customEvt.affinityTopologyVersion(evt0.affinityTopologyVersion());
+                            customEvt.customMessage(new SnapshotStartDiscoveryMessage(discoCache, msg.processId()));
+
+                            // Handle new event inside discovery thread, so no guarantees will be violated.
+                            cctx.exchange().onDiscoveryEvent(customEvt, discoCache);
+                        }
+                    }
+                }
+                else if (evt.type() == EVT_NODE_LEFT || evt.type() == EVT_NODE_FAILED) {
+                    SnapshotOperationRequest snpRq = clusterSnpRq;
+
+                    for (SnapshotFutureTask sctx : locSnpTasks.values()) {
+                        if (sctx.sourceNodeId().equals(leftNodeId) ||
+                            (snpRq != null &&
+                                snpRq.snpName.equals(sctx.snapshotName()) &&
+                                snpRq.bltNodes.contains(leftNodeId))) {
+                            sctx.acceptException(new ClusterTopologyCheckedException("The node which requested snapshot " +
+                                "creation has left the grid"));
+                        }
+                    }
+
+                    RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                    if (snpTrFut != null && snpTrFut.rmtNodeId.equals(leftNodeId)) {
+                        snpTrFut.onDone(new ClusterTopologyCheckedException("The node from which a snapshot has been " +
+                            "requested left the grid"));
+                    }
+                }
+            }
+            finally {
+                busyLock.leaveBusy();
+            }
+        }, EVT_NODE_LEFT, EVT_NODE_FAILED, EVT_DISCOVERY_CUSTOM_EVT);
+
+        // Remote snapshot handler.
+        cctx.kernalContext().io().addTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC, new TransmissionHandler() {
+            @Override public void onEnd(UUID nodeId) {
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                assert snpTrFut.stores.isEmpty() : snpTrFut.stores.entrySet();
+                assert snpTrFut.partsLeft == 0 : snpTrFut;
+
+                snpTrFut.onDone();
+
+                log.info("Requested snapshot from remote node has been fully received " +
+                    "[snpName=" + snpTrFut.snpName + ", snpTrans=" + snpTrFut + ']');
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onException(UUID nodeId, Throwable err) {
+                RemoteSnapshotFuture fut = rmtSnpReq.get();
+
+                if (fut == null)
+                    return;
+
+                if (fut.rmtNodeId.equals(nodeId))
+                    fut.onDone(err);
+            }
+
+            /** {@inheritDoc} */
+            @Override public String filePath(UUID nodeId, TransmissionMeta fileMeta) {
+                Integer partId = (Integer)fileMeta.params().get(SNP_PART_ID_PARAM);
+                String rmtDbNodePath = (String)fileMeta.params().get(SNP_DB_NODE_PATH_PARAM);
+                String cacheDirName = (String)fileMeta.params().get(SNP_CACHE_DIR_NAME_PARAM);
+
+                RemoteSnapshotFuture transFut = resolve(nodeId, fileMeta);
+
+                try {
+                    File cacheDir = U.resolveWorkDirectory(tmpWorkDir.getAbsolutePath(),
+                        Paths.get(transFut.snpName, rmtDbNodePath, cacheDirName).toString(),
+                        false);
+
+                    return new File(cacheDir, getPartitionFileName(partId)).getAbsolutePath();
+                }
+                catch (IgniteCheckedException e) {
+                    throw new IgniteException(e);
+                }
+            }
+
+            /**
+             * @param nodeId Remote node id.
+             * @param meta Transmission meta.
+             * @return Resolved transmission future.
+             */
+            private RemoteSnapshotFuture resolve(UUID nodeId, TransmissionMeta meta) {
+                String snpName = (String)meta.params().get(SNP_NAME_PARAM);
+                Integer partsCnt = (Integer)meta.params().get(SNP_PARTITIONS_CNT);
+
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                if (snpTrFut == null || !snpTrFut.snpName.equals(snpName)) {
+                    throw new TransmissionCancelledException("Stale snapshot transmission will be ignored " +
+                        "[snpName=" + snpName + ", meta=" + meta + ", snpTrFut=" + snpTrFut + ']');
+                }
+
+                assert snpTrFut.snpName.equals(snpName) && snpTrFut.rmtNodeId.equals(nodeId) :
+                    "Another transmission in progress [snpTrFut=" + snpTrFut + ", nodeId=" + snpName + ']';
+
+                if (snpTrFut.partsLeft == -1)
+                    snpTrFut.partsLeft = partsCnt;
+
+                return snpTrFut;
+            }
+
+            /**
+             * @param snpTrans Current snapshot transmission.
+             * @param grpPartId Pair of group id and its partition id.
+             */
+            private void finishRecover(RemoteSnapshotFuture snpTrans, GroupPartitionId grpPartId) {
+                FilePageStore pageStore = null;
+
+                try {
+                    pageStore = snpTrans.stores.remove(grpPartId);
+
+                    pageStore.finishRecover();
+
+                    snpTrans.partConsumer.accept(new File(pageStore.getFileAbsolutePath()), grpPartId);
+
+                    snpTrans.partsLeft--;
+                }
+                catch (StorageException e) {
+                    throw new IgniteException(e);
+                }
+                finally {
+                    U.closeQuiet(pageStore);
+                }
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<ByteBuffer> chunkHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+
+                RemoteSnapshotFuture snpTrFut = resolve(nodeId, initMeta);
+
+                GroupPartitionId grpPartId = new GroupPartitionId(grpId, partId);
+                FilePageStore pageStore = snpTrFut.stores.get(grpPartId);
+
+                if (pageStore == null) {
+                    throw new IgniteException("Partition must be loaded before applying snapshot delta pages " +
+                        "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                pageStore.beginRecover();
+
+                // No snapshot delta pages received. Finalize recovery.
+                if (initMeta.count() == 0)
+                    finishRecover(snpTrFut, grpPartId);
+
+                return new Consumer<ByteBuffer>() {
+                    final LongAdder transferred = new LongAdder();
+
+                    @Override public void accept(ByteBuffer buff) {
+                        try {
+                            assert initMeta.count() != 0 : initMeta;
+
+                            RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                            if (fut0 == null || !fut0.equals(snpTrFut) || fut0.isCancelled()) {
+                                throw new TransmissionCancelledException("Snapshot request is cancelled " +
+                                    "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                            }
+
+                            pageStore.write(PageIO.getPageId(buff), buff, 0, false);
+
+                            transferred.add(buff.capacity());
+
+                            if (transferred.longValue() == initMeta.count())
+                                finishRecover(snpTrFut, grpPartId);
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                    }
+                };
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<File> fileHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+                String snpName = (String)initMeta.params().get(SNP_NAME_PARAM);
+
+                assert grpId != null;
+                assert partId != null;
+                assert snpName != null;
+                assert storeFactory != null;
+
+                RemoteSnapshotFuture transFut = rmtSnpReq.get();
+
+                if (transFut == null) {
+                    throw new IgniteException("Snapshot transmission with given name doesn't exists " +
+                        "[snpName=" + snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                return new Consumer<File>() {
+                    @Override public void accept(File file) {
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.equals(transFut) || fut0.isCancelled()) {
+                            throw new TransmissionCancelledException("Snapshot request is cancelled [snpName=" + snpName +
+                                ", grpId=" + grpId + ", partId=" + partId + ']');
+                        }
+
+                        busyLock.enterBusy();
+
+                        try {
+                            FilePageStore pageStore = (FilePageStore)storeFactory
+                                .apply(grpId, false)
+                                .createPageStore(getFlagByPartId(partId),
+                                    file::toPath,
+                                    new LongAdderMetric("NO_OP", null));
+
+                            transFut.stores.put(new GroupPartitionId(grpId, partId), pageStore);
+
+                            pageStore.init();
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                        finally {
+                            busyLock.leaveBusy();
+                        }
+                    }
+                };
+            }
+        });
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void stop0(boolean cancel) {
+        busyLock.block();
+
+        try {
+            // Try stop all snapshot processing if not yet.
+            for (SnapshotFutureTask sctx : locSnpTasks.values())
+                sctx.acceptException(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+            locSnpTasks.clear();
+
+            RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+            if (snpTrFut != null)
+                snpTrFut.cancel();
+
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null) {
+                    clusterSnpFut.onDone(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+                    clusterSnpFut = null;
+                }
+            }
+
+            if (snpRunner != null)
+                snpRunner.shutdownNow();
+
+            cctx.kernalContext().io().removeMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC);
+            cctx.kernalContext().io().removeTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC);
+
+            if (discoLsnr != null)
+                cctx.kernalContext().event().removeDiscoveryEventListener(discoLsnr);
+
+            cctx.exchange().unregisterExchangeAwareComponent(this);
+        }
+        finally {
+            busyLock.unblock();
+        }
+    }
+
+    /**
+     * @param snpDir Snapshot dire
+     * @param folderName Local node folder name (see {@link U#maskForFileName} with consistent id).
+     */
+    public static void deleteSnapshot(File snpDir, String folderName) {
+        if (!snpDir.exists())
+            return;
+
+        assert snpDir.isDirectory() : snpDir;
+
+        try {
+            File binDir = resolveBinaryWorkDir(snpDir.getAbsolutePath(), folderName);
+            File dbDir = U.resolveWorkDirectory(snpDir.getAbsolutePath(), databaseRelativePath(folderName), false);
+
+            U.delete(binDir);
+            U.delete(dbDir);
+
+            File marshDir = mappingFileStoreWorkDir(snpDir.getAbsolutePath());
+
+            // Concurrently traverse the snapshot marshaller directory and delete all files.
+            Files.walkFileTree(marshDir.toPath(), new SimpleFileVisitor<Path>() {
+                @Override public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException {
+                    U.delete(file);
+
+                    return FileVisitResult.CONTINUE;
+                }
+
+                @Override public FileVisitResult visitFileFailed(Path file, IOException exc) {
+                    // Skip files which can be concurrently removed from FileTree.
+                    return FileVisitResult.CONTINUE;
+                }
+            });
+
+            File db = new File(snpDir, DB_DEFAULT_FOLDER);
+
+            if (!db.exists() || db.list().length == 0)
+                U.delete(snpDir);
+        }
+        catch (IOException | IgniteCheckedException e) {
+            throw new IgniteException(e);
+        }
+    }
+
+    /**
+     * @param snpName Snapshot name.
+     * @return Local snapshot directory for snapshot with given name.
+     */
+    public File snapshotLocalDir(String snpName) {
+        assert locSnpDir != null;
+
+        return new File(locSnpDir, snpName);
+    }
+
+    /**
+     * @return Node snapshot working directory.
+     */
+    public File snapshotTmpDir() {
+        assert tmpWorkDir != null;
+
+        return tmpWorkDir;
+    }
+
+    /**
+     * @param req Request on snapshot creation.
+     * @return Future which will be completed when a snapshot has been started.
+     */
+    private IgniteInternalFuture<SnapshotOperationResponse> initLocalSnapshotStartStage(SnapshotOperationRequest req) {
+        if (cctx.kernalContext().clientNode() ||
+            !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()))
+            return new GridFinishedFuture<>();
+
+        // Executed inside discovery notifier thread, prior to firing discovery custom event,
+        // so it is safe to set new snapshot task inside this method without synchronization.
+        if (clusterSnpRq != null) {
+            return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot operation has been rejected. " +
+                "Another snapshot operation in progress [req=" + req + ", curr=" + clusterSnpRq + ']'));
+        }
+
+        // Collection of pairs group and appropriate cache partition to be snapshot.
+        Map<Integer, Set<Integer>> parts = new HashMap<>();
+
+        for (Integer grpId : req.grpIds)
+            parts.put(grpId, null);
+
+        SnapshotFutureTask task0 = registerSnapshotTask(req.snpName,
+            req.srcNodeId,
+            parts,
+            locSndrFactory.apply(req.snpName));
+
+        clusterSnpRq = req;
+
+        return task0.chain(f -> new SnapshotOperationResponse());
+    }
+
+    /**
+     * @param id Request id.
+     * @param res Results.
+     * @param err Errors.
+     */
+    private void processLocalSnapshotStartStageResult(UUID id, Map<UUID, SnapshotOperationResponse> res, Map<UUID, Exception> err) {
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        if (snpRq == null)
+            return;
+
+        if (!snpRq.rqId.equals(id)) {
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null && clusterSnpFut.rqId.equals(id)) {
+                    clusterSnpFut.onDone(new IgniteCheckedException("Snapshot operation failed due to another snapshot " +
+                        "operation in progress: " + snpRq.snpName));
+
+                    clusterSnpFut = null;
+                }
+
+                return;
+            }
+        }
+
+        if (isLocalNodeCoordinator(cctx.discovery())) {
+            Set<UUID> missed = new HashSet<>(snpRq.bltNodes);
+            missed.removeAll(res.keySet());
+            missed.removeAll(err.keySet());
+
+            snpRq.hasErr = !F.isEmpty(err) || !missed.isEmpty();
+
+            if (snpRq.hasErr) {
+                U.warn(log, "Execution of local snapshot tasks fails or them haven't been executed " +
+                    "due to some of nodes left the cluster. Uncompleted snapshot will be deleted " +
+                    "[err=" + err + ", missed=" + missed + ']');
+            }
+
+            endSnpProc.start(UUID.randomUUID(), snpRq);
+        }
+    }
+
+    /**
+     * @param req Request on snapshot creation.
+     * @return Future which will be completed when the snapshot will be finalized.
+     */
+    private IgniteInternalFuture<SnapshotOperationResponse> initLocalSnapshotEndStage(SnapshotOperationRequest req) {
+        if (clusterSnpRq == null)
+            return new GridFinishedFuture<>(new SnapshotOperationResponse());
+
+        try {
+            if (req.hasErr)
+                deleteSnapshot(snapshotLocalDir(req.snpName), pdsSettings.folderName());
+
+            removeLastMetaStorageKey();
+        }
+        catch (Exception e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        return new GridFinishedFuture<>(new SnapshotOperationResponse());
+    }
+
+    /**
+     * @param id Request id.
+     * @param res Results.
+     * @param err Errors.
+     */
+    private void processLocalSnapshotEndStageResult(UUID id, Map<UUID, SnapshotOperationResponse> res, Map<UUID, Exception> err) {
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        if (snpRq == null)
+            return;
+
+        Set<UUID> endFail = new HashSet<>(snpRq.bltNodes);
+        endFail.removeAll(res.keySet());
+
+        clusterSnpRq = null;
+
+        synchronized (snpOpMux) {
+            if (clusterSnpFut != null) {
+                if (endFail.isEmpty() && !snpRq.hasErr) {
+                    clusterSnpFut.onDone();
+
+                    if (log.isInfoEnabled())
+                        log.info("Cluster-wide snapshot operation finished successfully [req=" + snpRq + ']');
+                }
+                else {
+                    clusterSnpFut.onDone(new IgniteCheckedException("Snapshot creation has been finished with an error. " +
+                        "Local snapshot tasks may not finished completely or finalizing results fails " +
+                        "[hasErr" + snpRq.hasErr + ", fail=" + endFail + ']'));
+                }
+
+                clusterSnpFut = null;
+            }
+        }
+    }
+
+    /**
+     * @return {@code True} if snapshot operation is in progress.
+     */
+    public boolean isSnapshotCreating() {
+        if (clusterSnpRq != null)
+            return true;
+
+        synchronized (snpOpMux) {
+            return clusterSnpRq != null || clusterSnpFut != null;
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public List<String> getSnapshots() {
+        if (cctx.kernalContext().clientNode())
+            throw new UnsupportedOperationException("Client and daemon nodes can not perform this operation.");
+
+        synchronized (snpOpMux) {
+            return Arrays.stream(locSnpDir.listFiles(File::isDirectory))
+                .map(File::getName)
+                .collect(Collectors.toList());
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public IgniteFuture<Void> createSnapshot(String name) {
+        A.notNullOrEmpty(name, "name");
+
+        try {
+            if (cctx.kernalContext().clientNode())
+                throw new UnsupportedOperationException("Client and daemon nodes can not perform this operation.");
+
+            if (!IgniteFeatures.allNodesSupports(cctx.discovery().allNodes(), PERSISTENCE_CACHE_SNAPSHOT))
+                throw new IgniteException("Not all nodes in the cluster support a snapshot operation.");
+
+            if (!active(cctx.kernalContext().state().clusterState().state()))
+                throw new IgniteException("Snapshot operation has been rejected. The cluster is inactive.");
+
+            DiscoveryDataClusterState clusterState = cctx.kernalContext().state().clusterState();
+
+            if (!clusterState.hasBaselineTopology())
+                throw new IgniteException("Snapshot operation has been rejected. The baseline topology is not configured for cluster.");
+
+            ClusterSnapshotFuture snpFut0;
+
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null && !clusterSnpFut.isDone())
+                    throw new IgniteException("Create snapshot request has been rejected. The previous snapshot operation was not completed.");
+
+                if (clusterSnpRq != null)
+                    throw new IgniteException("Create snapshot request has been rejected. Parallel snapshot processes are not allowed.");
+
+                if (getSnapshots().contains(name))
+                    throw new IgniteException("Create snapshot request has been rejected. Snapshot with given name already exists.");
+
+                snpFut0 = new ClusterSnapshotFuture(UUID.randomUUID(), name);
+
+                clusterSnpFut = snpFut0;
+                lastSeenSnpFut = snpFut0;
+            }
+
+            List<Integer> grps = cctx.cache().persistentGroups().stream()
+                .filter(g -> cctx.cache().cacheType(g.cacheOrGroupName()) == CacheType.USER)
+                .filter(g -> !g.config().isEncryptionEnabled())
+                .map(CacheGroupDescriptor::groupId)
+                .collect(Collectors.toList());
+
+            List<ClusterNode> srvNodes = cctx.discovery().serverNodes(AffinityTopologyVersion.NONE);
+
+            startSnpProc.start(snpFut0.rqId, new SnapshotOperationRequest(snpFut0.rqId,
+                cctx.localNodeId(),
+                name,
+                grps,
+                new HashSet<>(F.viewReadOnly(srvNodes,
+                    F.node2id(),
+                    (node) -> CU.baselineNode(node, clusterState)))));
+
+            if (log.isInfoEnabled())
+                log.info("Cluster-wide snapshot operation started [snpName=" + name + ", grps=" + grps + ']');
+
+            return new IgniteFutureImpl<>(snpFut0);
+        }
+        catch (Exception e) {
+            U.error(log, "Start snapshot operation failed", e);
+
+            lastSeenSnpFut = new ClusterSnapshotFuture(name, e);
+
+            return new IgniteFinishedFutureImpl<>(e);
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onReadyForReadWrite(ReadWriteMetastorage metaStorage) throws IgniteCheckedException {
+        synchronized (snpOpMux) {
+            this.metaStorage = metaStorage;
+
+            if (recovered)
+                removeLastMetaStorageKey();
+
+            recovered = false;
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onReadyForRead(ReadOnlyMetastorage metaStorage) throws IgniteCheckedException {
+        // Snapshot which has not been completed due to the local node crashed must be deleted.
+        String snpName = (String)metaStorage.read(SNP_RUNNING_KEY);
+
+        if (snpName == null)
+            return;
+
+        recovered = true;
+
+        for (File tmp : snapshotTmpDir().listFiles())
+            U.delete(tmp);
+
+        deleteSnapshot(snapshotLocalDir(snpName), pdsSettings.folderName());
+
+        if (log.isInfoEnabled()) {
+            log.info("Previous attempt to create snapshot fail due to the local node crash. All resources " +
+                "related to snapshot operation have been deleted: " + snpName);
+        }
+    }
+
+    /**
+     * @param evt Discovery event to check.
+     * @return {@code true} if exchange started by snapshot operation.
+     */
+    public static boolean isSnapshotOperation(DiscoveryEvent evt) {
+        return !evt.eventNode().isClient() &&
+            evt.type() == EVT_DISCOVERY_CUSTOM_EVT &&
+            ((DiscoveryCustomEvent)evt).customMessage() instanceof SnapshotStartDiscoveryMessage;
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onDoneBeforeTopologyUnlock(GridDhtPartitionsExchangeFuture fut) {
+        if (clusterSnpRq == null || cctx.kernalContext().clientNode())
+            return;
+
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        SnapshotFutureTask task = locSnpTasks.get(snpRq.snpName);
+
+        if (task == null)
+            return;
+
+        if (task.start()) {
+            cctx.database().forceCheckpoint(String.format("Start snapshot operation: %s", snpRq.snpName));
+
+            // schedule task on checkpoint and wait when it starts
+            try {
+                task.awaitStarted();
+            }
+            catch (IgniteCheckedException e) {
+                U.error(log, "Fail to wait while cluster-wide snapshot operation started", e);
+            }
+        }
+    }
+
+    /**
+     * @param parts Collection of pairs group and appropriate cache partition to be snapshot.
+     * @param rmtNodeId The remote node to connect to.
+     * @param partConsumer Received partition handler.
+     * @return Future which will be completed when requested snapshot fully received.
+     */
+    public IgniteInternalFuture<Void> createRemoteSnapshot(
+        UUID rmtNodeId,
+        Map<Integer, Set<Integer>> parts,
+        BiConsumer<File, GroupPartitionId> partConsumer
+    ) {
+        assert partConsumer != null;
+
+        ClusterNode rmtNode = cctx.discovery().node(rmtNodeId);
+
+        if (!nodeSupports(rmtNode, PERSISTENCE_CACHE_SNAPSHOT))
+            return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot on remote node is not supported: " + rmtNode.id()));
+
+        if (rmtNode == null) {
+            return new GridFinishedFuture<>(new ClusterTopologyCheckedException("Snapshot request cannot be performed. " +
+                "Remote node left the grid [rmtNodeId=" + rmtNodeId + ']'));
+        }
+
+        String snpName = RMT_SNAPSHOT_PREFIX + UUID.randomUUID().toString();
+
+        RemoteSnapshotFuture snpTransFut = new RemoteSnapshotFuture(rmtNodeId, snpName, partConsumer);
+
+        busyLock.enterBusy();
+        SnapshotRequestMessage msg0;
+
+        try {
+            msg0 = new SnapshotRequestMessage(snpName, parts);
+
+            RemoteSnapshotFuture fut = rmtSnpReq.get();
+
+            try {
+                if (fut != null)
+                    fut.get(DFLT_SNAPSHOT_TIMEOUT, TimeUnit.MILLISECONDS);
+            }
+            catch (IgniteCheckedException e) {
+                if (log.isInfoEnabled())
+                    log.info("The previous snapshot request finished with an exception:" + e.getMessage());
+            }
+
+            try {
+                if (rmtSnpReq.compareAndSet(null, snpTransFut)) {
+                    cctx.gridIO().sendOrderedMessage(rmtNode, DFLT_INITIAL_SNAPSHOT_TOPIC, msg0, SYSTEM_POOL,
+                        Long.MAX_VALUE, true);
+                }
+                else
+                    return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot request has been concurrently interrupted."));
+
+            }
+            catch (IgniteCheckedException e) {
+                rmtSnpReq.compareAndSet(snpTransFut, null);
+
+                return new GridFinishedFuture<>(e);
+            }
+        }
+        finally {
+            busyLock.leaveBusy();
+        }
+
+        if (log.isInfoEnabled()) {
+            log.info("Snapshot request is sent to the remote node [rmtNodeId=" + rmtNodeId +
+                ", msg0=" + msg0 + ", snpTransFut=" + snpTransFut +
+                ", topVer=" + cctx.discovery().topologyVersionEx() + ']');
+        }
+
+        return snpTransFut;
+    }
+
+    /**
+     * @param grps List of cache groups which will be destroyed.
+     */
+    public void onCacheGroupsStopped(List<Integer> grps) {
+        for (SnapshotFutureTask sctx : locSnpTasks.values()) {
+            Set<Integer> retain = new HashSet<>(grps);
+            retain.retainAll(sctx.affectedCacheGroups());
+
+            if (!retain.isEmpty()) {
+                sctx.acceptException(new IgniteCheckedException("Snapshot has been interrupted due to some of the required " +
+                    "cache groups stopped: " + retain));
+            }
+        }
+    }
+
+    /**
+     * @param snpName Unique snapshot name.
+     * @param srcNodeId Node id which cause snapshot operation.
+     * @param parts Collection of pairs group and appropriate cache partition to be snapshot.
+     * @param snpSndr Factory which produces snapshot receiver instance.
+     * @return Snapshot operation task which should be registered on checkpoint to run.
+     */
+    SnapshotFutureTask registerSnapshotTask(
+        String snpName,
+        UUID srcNodeId,
+        Map<Integer, Set<Integer>> parts,
+        SnapshotSender snpSndr
+    ) {
+        if (!busyLock.enterBusy())
+            return new SnapshotFutureTask(new IgniteCheckedException("Snapshot manager is stopping [locNodeId=" + cctx.localNodeId() + ']'));
+
+        try {
+            if (locSnpTasks.containsKey(snpName))
+                return new SnapshotFutureTask(new IgniteCheckedException("Snapshot with requested name is already scheduled: " + snpName));
+
+            SnapshotFutureTask snpFutTask;
+
+            SnapshotFutureTask prev = locSnpTasks.putIfAbsent(snpName,
+                snpFutTask = new SnapshotFutureTask(cctx,
+                    srcNodeId,
+                    snpName,
+                    tmpWorkDir,
+                    ioFactory,
+                    snpSndr,
+                    parts,
+                    locBuff));
+
+            if (prev != null)
+                return new SnapshotFutureTask(new IgniteCheckedException("Snapshot with requested name is already scheduled: " + snpName));
+
+            if (log.isInfoEnabled()) {
+                log.info("Snapshot task has been registered on local node [sctx=" + this +
+                    ", topVer=" + cctx.discovery().topologyVersionEx() + ']');
+            }
+
+            snpFutTask.listen(f -> locSnpTasks.remove(snpName));
+
+            return snpFutTask;
+        }
+        finally {
+            busyLock.leaveBusy();
+        }
+    }
+
+    /**
+     * @param factory Factory which produces {@link LocalSnapshotSender} implementation.
+     */
+    void setLocalSnapshotSenderFactory(Function<String, SnapshotSender> factory) {
+        locSndrFactory = factory;
+    }
+
+    /**
+     * @return Factory which produces {@link LocalSnapshotSender} implementation.
+     */
+    Function<String, SnapshotSender> localSnapshotSenderFactory() {
+        return LocalSnapshotSender::new;
+    }
+
+    /**
+     * @param snpName Snapshot name.
+     * @param rmtNodeId Remote node id to send snapshot to.
+     * @return Snapshot sender instance.
+     */
+    SnapshotSender remoteSnapshotSender(String snpName, UUID rmtNodeId) {
+        // Remote snapshots can be send only by single threaded executor since only one transmissionSender created.
+        return new RemoteSnapshotSender(log,
+            new SequentialExecutorWrapper(log, snpRunner),
+            () -> databaseRelativePath(pdsSettings.folderName()),
+            cctx.gridIO().openTransmissionSender(rmtNodeId, DFLT_INITIAL_SNAPSHOT_TOPIC),
+            snpName);
+    }
+
+    /** Snapshot finished successfully or already restored. Key can be removed. */
+    private void removeLastMetaStorageKey() throws IgniteCheckedException {
+        cctx.database().checkpointReadLock();
+
+        try {
+            metaStorage.remove(SNP_RUNNING_KEY);
+        }
+        finally {
+            cctx.database().checkpointReadUnlock();
+        }
+    }
+
+    /**
+     * @return The executor service used to run snapshot tasks.
+     */
+    ExecutorService snapshotExecutorService() {
+        assert snpRunner != null;
+
+        return snpRunner;
+    }
+
+    /**
+     * @param ioFactory Factory to create IO interface over a page stores.
+     */
+    void ioFactory(FileIOFactory ioFactory) {
+        this.ioFactory = ioFactory;
+    }
+
+    /**
+     * @param nodeId Remote node id on which requests has been registered.
+     * @return Snapshot future related to given node id.
+     */
+    SnapshotFutureTask lastScheduledRemoteSnapshotTask(UUID nodeId) {
+        return locSnpTasks.values().stream()
+            .filter(t -> t.type() == RemoteSnapshotSender.class && t.sourceNodeId().equals(nodeId))
+            .findFirst()
+            .orElse(null);
+    }
+
+    /**
+     * @return Relative configured path of persistence data storage directory for the local node.
+     * Example: {@code snapshotWorkDir/db/IgniteNodeName0}
+     */
+    static String databaseRelativePath(String folderName) {
+        return Paths.get(DB_DEFAULT_FOLDER, folderName).toString();
+    }
+
+    /**
+     * @param cfg Ignite configuration.
+     * @return Snapshot work path.
+     */
+    static File resolveSnapshotWorkDirectory(IgniteConfiguration cfg) {
+        try {
+            return cfg.getSnapshotPath() == null ?
+                U.resolveWorkDirectory(cfg.getWorkDirectory(), DFLT_SNAPSHOT_DIRECTORY, false) :
+                U.resolveWorkDirectory(cfg.getWorkDirectory(), cfg.getSnapshotPath(), false);
+        }
+        catch (IgniteCheckedException e) {
+            throw new IgniteException(e);
+        }
+    }
+
+    /** Remote snapshot future which tracks remote snapshot transmission result. */
+    private class RemoteSnapshotFuture extends GridFutureAdapter<Void> {
+        /** Snapshot name to create. */
+        private final String snpName;
+
+        /** Remote node id to request snapshot from. */
+        private final UUID rmtNodeId;
+
+        /** Collection of partition to be received. */
+        private final Map<GroupPartitionId, FilePageStore> stores = new ConcurrentHashMap<>();
+
+        /** Partition handler given by request initiator. */
+        private final BiConsumer<File, GroupPartitionId> partConsumer;
+
+        /** Counter which show how many partitions left to be received. */
+        private int partsLeft = -1;
+
+        /**
+         * @param partConsumer Received partition handler.
+         */
+        public RemoteSnapshotFuture(UUID rmtNodeId, String snpName, BiConsumer<File, GroupPartitionId> partConsumer) {
+            this.snpName = snpName;
+            this.rmtNodeId = rmtNodeId;
+            this.partConsumer = partConsumer;
+        }
+
+        /** {@inheritDoc} */
+        @Override public boolean cancel() {
+            return onCancelled();
+        }
+
+        /** {@inheritDoc} */
+        @Override protected boolean onDone(@Nullable Void res, @Nullable Throwable err, boolean cancel) {
+            assert err != null || cancel || stores.isEmpty() : "Not all file storage processed: " + stores;
+
+            rmtSnpReq.compareAndSet(this, null);
+
+            if (err != null || cancel) {
+                // Close non finished file storage.
+                for (Map.Entry<GroupPartitionId, FilePageStore> entry : stores.entrySet()) {
+                    FilePageStore store = entry.getValue();
+
+                    try {
+                        store.stop(true);
+                    }
+                    catch (StorageException e) {
+                        log.warning("Error stopping received file page store", e);
+                    }
+                }
+            }
+
+            U.delete(Paths.get(tmpWorkDir.getAbsolutePath(), snpName));
+
+            return super.onDone(res, err, cancel);
+        }
+
+        /** {@inheritDoc} */
+        @Override public boolean equals(Object o) {
+            if (this == o)
+                return true;
+
+            if (o == null || getClass() != o.getClass())
+                return false;
+
+            RemoteSnapshotFuture fut = (RemoteSnapshotFuture)o;
+
+            return rmtNodeId.equals(fut.rmtNodeId) &&
+                snpName.equals(fut.snpName);
+        }
+
+        /** {@inheritDoc} */
+        @Override public int hashCode() {
+            return Objects.hash(rmtNodeId, snpName);
+        }
+
+        /** {@inheritDoc} */
+        @Override public String toString() {
+            return S.toString(RemoteSnapshotFuture.class, this);
+        }
+    }
+
+    /**
+     * Such an executor can executes tasks not in a single thread, but executes them
+     * on different threads sequentially. It's important for some {@link SnapshotSender}'s
+     * to process sub-task sequentially due to all these sub-tasks may share a single socket
+     * channel to send data to.
+     */
+    private static class SequentialExecutorWrapper implements Executor {
+        /** Ignite logger. */
+        private final IgniteLogger log;
+
+        /** Queue of task to execute. */
+        private final Queue<Runnable> tasks = new ArrayDeque<>();
+
+        /** Delegate executor. */
+        private final Executor executor;
+
+        /** Currently running task. */
+        private volatile Runnable active;
+
+        /** If wrapped executor is shutting down. */
+        private volatile boolean stopping;
+
+        /**
+         * @param executor Executor to run tasks on.
+         */
+        public SequentialExecutorWrapper(IgniteLogger log, Executor executor) {
+            this.log = log.getLogger(SequentialExecutorWrapper.class);
+            this.executor = executor;
+        }
+
+        /** {@inheritDoc} */
+        @Override public synchronized void execute(final Runnable r) {
+            assert !stopping : "Task must be cancelled prior to the wrapped executor is shutting down.";
+
+            tasks.offer(() -> {
+                try {
+                    r.run();
+                }
+                finally {
+                    scheduleNext();
+                }
+            });
+
+            if (active == null)
+                scheduleNext();
+        }
+
+        /** */
+        protected synchronized void scheduleNext() {
+            if ((active = tasks.poll()) != null) {
+                try {
+                    executor.execute(active);
+                }
+                catch (RejectedExecutionException e) {
+                    tasks.clear();
+
+                    stopping = true;
+
+                    log.warning("Task is outdated. Wrapped executor is shutting down.", e);
+                }
+            }
+        }
+    }
+
+    /**
+     *
+     */
+    private static class RemoteSnapshotSender extends SnapshotSender {
+        /** The sender which sends files to remote node. */
+        private final GridIoManager.TransmissionSender sndr;
+
+        /** Relative node path initializer. */
+        private final Supplier<String> initPath;
+
+        /** Snapshot name */
+        private final String snpName;
+
+        /** Local node persistent directory with consistent id. */
+        private String relativeNodePath;
+
+        /** The number of cache partition files expected to be processed. */
+        private int partsCnt;
+
+        /**
+         * @param log Ignite logger.
+         * @param sndr File sender instance.
+         * @param snpName Snapshot name.
+         */
+        public RemoteSnapshotSender(
+            IgniteLogger log,
+            Executor exec,
+            Supplier<String> initPath,
+            GridIoManager.TransmissionSender sndr,
+            String snpName
+        ) {
+            super(log, exec);
+
+            this.sndr = sndr;
+            this.snpName = snpName;
+            this.initPath = initPath;
+        }
+
+        /** {@inheritDoc} */
+        @Override protected void init(int partsCnt) {
+            this.partsCnt = partsCnt;
+
+            relativeNodePath = initPath.get();
+
+            if (relativeNodePath == null)
+                throw new IgniteException("Relative node path cannot be empty.");
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendPart0(File part, String cacheDirName, GroupPartitionId pair, Long len) {
+            try {
+                assert part.exists();
+                assert len > 0 : "Requested partitions has incorrect file length " +
+                    "[pair=" + pair + ", cacheDirName=" + cacheDirName + ']';
+
+                sndr.send(part, 0, len, transmissionParams(snpName, cacheDirName, pair), TransmissionPolicy.FILE);
+
+                if (log.isInfoEnabled()) {
+                    log.info("Partition file has been send [part=" + part.getName() + ", pair=" + pair +
+                        ", length=" + len + ']');
+                }
+            }
+            catch (TransmissionCancelledException e) {
+                if (log.isInfoEnabled()) {
+                    log.info("Transmission partition file has been interrupted [part=" + part.getName() +
+                        ", pair=" + pair + ']');
+                }
+            }
+            catch (IgniteCheckedException | InterruptedException | IOException e) {
+                U.error(log, "Error sending partition file [part=" + part.getName() + ", pair=" + pair +
+                    ", length=" + len + ']', e);
+
+                throw new IgniteException(e);
+            }
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendDelta0(File delta, String cacheDirName, GroupPartitionId pair) {
+            try {
+                sndr.send(delta, transmissionParams(snpName, cacheDirName, pair), TransmissionPolicy.CHUNK);
+
+                if (log.isInfoEnabled())
+                    log.info("Delta pages storage has been send [part=" + delta.getName() + ", pair=" + pair + ']');
+            }
+            catch (TransmissionCancelledException e) {
+                if (log.isInfoEnabled()) {
+                    log.info("Transmission delta pages has been interrupted [part=" + delta.getName() +
+                        ", pair=" + pair + ']');
+                }
+            }
+            catch (IgniteCheckedException | InterruptedException | IOException e) {
+                U.error(log, "Error sending delta file  [part=" + delta.getName() + ", pair=" + pair + ']', e);
+
+                throw new IgniteException(e);
+            }
+        }
+
+        /**
+         * @param cacheDirName Cache directory name.
+         * @param pair Cache group id with corresponding partition id.
+         * @return Map of params.
+         */
+        private Map<String, Serializable> transmissionParams(String snpName, String cacheDirName,
+            GroupPartitionId pair) {
+            Map<String, Serializable> params = new HashMap<>();
+
+            params.put(SNP_GRP_ID_PARAM, pair.getGroupId());
+            params.put(SNP_PART_ID_PARAM, pair.getPartitionId());
+            params.put(SNP_DB_NODE_PATH_PARAM, relativeNodePath);
+            params.put(SNP_CACHE_DIR_NAME_PARAM, cacheDirName);
+            params.put(SNP_NAME_PARAM, snpName);
+            params.put(SNP_PARTITIONS_CNT, partsCnt);
+
+            return params;
+        }
+
+        /** {@inheritDoc} */
+        @Override public void close0(@Nullable Throwable th) {
+            U.closeQuiet(sndr);
+
+            if (th == null) {
+                if (log.isInfoEnabled())
+                    log.info("The remote snapshot sender closed normally [snpName=" + snpName + ']');
+            }
+            else {
+                U.warn(log, "The remote snapshot sender closed due to an error occurred while processing " +
+                    "snapshot operation [snpName=" + snpName + ']', th);
+            }
+        }
+    }
+
+    /**
+     * Snapshot sender which writes all data to local directory.
+     */
+    private class LocalSnapshotSender extends SnapshotSender {
+        /** Snapshot name. */
+        private final String snpName;
+
+        /** Local snapshot directory. */
+        private final File snpLocDir;
+
+        /** Local node snapshot directory calculated on snapshot directory. */
+        private File dbDir;
+
+        /** Size of page. */
+        private final int pageSize;
+
+        /**
+         * @param snpName Snapshot name.
+         */
+        public LocalSnapshotSender(String snpName) {
+            super(IgniteSnapshotManager.this.log, snpRunner);
+
+            this.snpName = snpName;
+            snpLocDir = snapshotLocalDir(snpName);
+            pageSize = cctx.kernalContext().config().getDataStorageConfiguration().getPageSize();
+        }
+
+        /** {@inheritDoc} */
+        @Override protected void init(int partsCnt) {
+            dbDir = new File (snpLocDir, databaseRelativePath(pdsSettings.folderName()));
+
+            if (dbDir.exists()) {
+                throw new IgniteException("Snapshot with given name already exists " +
+                    "[snpName=" + snpName + ", absPath=" + dbDir.getAbsolutePath() + ']');
+            }
+
+            cctx.database().checkpointReadLock();
+
+            try {
+                assert metaStorage != null && metaStorage.read(SNP_RUNNING_KEY) == null :
+                    "The previous snapshot hasn't been completed correctly";
+
+                metaStorage.write(SNP_RUNNING_KEY, snpName);
+
+                U.ensureDirectory(dbDir, "snapshot work directory", log);
+            }
+            catch (IgniteCheckedException e) {
+                throw new IgniteException(e);
+            }
+            finally {
+                cctx.database().checkpointReadUnlock();
+            }
+
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendCacheConfig0(File ccfg, String cacheDirName) {
+            assert dbDir != null;
+
+            try {
+                File cacheDir = U.resolveWorkDirectory(dbDir.getAbsolutePath(), cacheDirName, false);
+
+                copy(ccfg, new File(cacheDir, ccfg.getName()), ccfg.length());
+            }
+            catch (IgniteCheckedException | IOException e) {
+                throw new IgniteException(e);
+            }
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendMarshallerMeta0(List<Map<Integer, MappedName>> mappings) {
+            if (mappings == null)
+                return;
+
+            saveMappings(cctx.kernalContext(), mappings, snpLocDir);
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendBinaryMeta0(Collection<BinaryType> types) {
+            if (types == null)
+                return;
+
+            cctx.kernalContext().cacheObjects().saveMetadata(types, snpLocDir);
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendPart0(File part, String cacheDirName, GroupPartitionId pair, Long len) {
+            try {
+                if (len == 0)
+                    return;
+
+                File cacheDir = U.resolveWorkDirectory(dbDir.getAbsolutePath(), cacheDirName, false);
+
+                File snpPart = new File(cacheDir, part.getName());
+
+                if (!snpPart.exists() || snpPart.delete())
+                    snpPart.createNewFile();
+
+                copy(part, snpPart, len);
+
+                if (log.isInfoEnabled()) {
+                    log.info("Partition has been snapshot [snapshotDir=" + dbDir.getAbsolutePath() +
+                        ", cacheDirName=" + cacheDirName + ", part=" + part.getName() +
+                        ", length=" + part.length() + ", snapshot=" + snpPart.getName() + ']');
+                }
+            }
+            catch (IOException | IgniteCheckedException ex) {
+                throw new IgniteException(ex);
+            }
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendDelta0(File delta, String cacheDirName, GroupPartitionId pair) {
+            File snpPart = getPartitionFile(dbDir, cacheDirName, pair.getPartitionId());
+
+            if (log.isInfoEnabled()) {
+                log.info("Start partition snapshot recovery with the given delta page file [part=" + snpPart +
+                    ", delta=" + delta + ']');
+            }
+
+            try (FileIO fileIo = ioFactory.create(delta, READ);
+                 FilePageStore pageStore = (FilePageStore)storeFactory
+                     .apply(pair.getGroupId(), false)
+                     .createPageStore(getFlagByPartId(pair.getPartitionId()),
+                         snpPart::toPath,
+                         new LongAdderMetric("NO_OP", null))
+            ) {
+                ByteBuffer pageBuf = ByteBuffer.allocate(pageSize)
+                    .order(ByteOrder.nativeOrder());
+
+                long totalBytes = fileIo.size();
+
+                assert totalBytes % pageSize == 0 : "Given file with delta pages has incorrect size: " + fileIo.size();
+
+                pageStore.beginRecover();
+
+                for (long pos = 0; pos < totalBytes; pos += pageSize) {
+                    long read = fileIo.readFully(pageBuf, pos);
+
+                    assert read == pageBuf.capacity();
+
+                    pageBuf.flip();
+
+                    if (log.isDebugEnabled()) {
+                        log.debug("Read page given delta file [path=" + delta.getName() +
+                            ", pageId=" + PageIO.getPageId(pageBuf) + ", pos=" + pos + ", pages=" + (totalBytes / pageSize) +
+                            ", crcBuff=" + FastCrc.calcCrc(pageBuf, pageBuf.limit()) + ", crcPage=" + PageIO.getCrc(pageBuf) + ']');
+
+                        pageBuf.rewind();
+                    }
+
+                    pageStore.write(PageIO.getPageId(pageBuf), pageBuf, 0, false);
+
+                    pageBuf.flip();
+                }
+
+                pageStore.finishRecover();
+            }
+            catch (IOException | IgniteCheckedException e) {
+                throw new IgniteException(e);
+            }
+        }
+
+        /** {@inheritDoc} */
+        @Override protected void close0(@Nullable Throwable th) {
+            if (th == null) {
+                if (log.isInfoEnabled())
+                    log.info("Local snapshot sender closed, resources released [dbNodeSnpDir=" + dbDir + ']');
+            }
+            else {
+                deleteSnapshot(snpLocDir, pdsSettings.folderName());
+
+                U.warn(log, "Local snapshot sender closed due to an error occurred", th);
+            }
+        }
+
+        /**
+         * @param from Copy from file.
+         * @param to Copy data to file.
+         * @param length Number of bytes to copy from beginning.
+         * @throws IOException If fails.
+         */
+        private void copy(File from, File to, long length) throws IOException {
+            try (FileIO src = ioFactory.create(from, READ);
+                 FileChannel dest = new FileOutputStream(to).getChannel()) {
+                if (src.size() < length) {
+                    throw new IgniteException("The source file to copy has to enough length " +
+                        "[expected=" + length + ", actual=" + src.size() + ']');
+                }
+
+                src.position(0);
+
+                long written = 0;
+
+                while (written < length)
+                    written += src.transferTo(written, length - written, dest);
+            }
+        }
+    }
+
+    /** Snapshot start request for {@link DistributedProcess} initiate message. */
+    private static class SnapshotOperationRequest implements Serializable {
+        /** Serial version uid. */
+        private static final long serialVersionUID = 0L;
+
+        /** Unique snapshot request id. */
+        private final UUID rqId;
+
+        /** Source node id which trigger request. */
+        private final UUID srcNodeId;
+
+        /** Snapshot name. */
+        private final String snpName;
+
+        @GridToStringInclude
+        /** The list of cache groups to include into snapshot. */
 
 Review comment:
   Fixed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r410049168
 
 

 ##########
 File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotSelfTest.java
 ##########
 @@ -286,6 +293,134 @@ public void testSnapshotPrimaryBackupsTheSame() throws Exception {
         TestRecordingCommunicationSpi.stopBlockAll();
     }
 
+    /** @throws Exception If fails. */
+    @Test
+    public void testClusterSnapshotConsistencyUnderLoad() throws Exception {
+        int clients = 50;
+        int balance = 10_000;
+        int transferLimit = 1000;
+        int total = clients * balance * 2;
+        int grids = 3;
+        int transferThreadCnt = 4;
+        AtomicBoolean stop = new AtomicBoolean(false);
+        CountDownLatch txStarted = new CountDownLatch(1);
+
+        CacheConfiguration<Integer, Account> eastCcfg = txCacheConfig(new CacheConfiguration<>("east"));
+        CacheConfiguration<Integer, Account> westCcfg = txCacheConfig(new CacheConfiguration<>("west"));
+
+        for (int i = 0; i < grids; i++)
+            startGrid(optimize(getConfiguration(getTestIgniteInstanceName(i)).setCacheConfiguration(eastCcfg, westCcfg)));
+
+        grid(0).cluster().state(ACTIVE);
+
+        Ignite client = startClientGrid(grids);
+
+        IgniteCache<Integer, Account> eastCache = client.cache(eastCcfg.getName());
+        IgniteCache<Integer, Account> westCache = client.cache(westCcfg.getName());
+
+        // Create clients with zero balance.
+        for (int i = 0; i < clients; i++) {
+            eastCache.put(i, new Account(i, balance));
+            westCache.put(i, new Account(i, balance));
+        }
+
+        assertEquals("The initial summary value in all caches is not correct.",
+            total, sumAllCacheValues(client, clients, eastCcfg.getName(), westCcfg.getName()));
+
+        forceCheckpoint();
+
+        IgniteInternalFuture<?> txLoadFut = GridTestUtils.runMultiThreadedAsync(
+            () -> {
+                ThreadLocalRandom rnd = ThreadLocalRandom.current();
+
+                int amount;
+
+                try {
+                    while (!stop.get()) {
+                        IgniteEx ignite = grid(rnd.nextInt(grids));
+                        IgniteCache<Integer, Account> east = ignite.cache("east");
+                        IgniteCache<Integer, Account> west = ignite.cache("west");
+
+                        amount = rnd.nextInt(transferLimit);
+
+                        try (Transaction tx = ignite.transactions().txStart()) {
+                            Integer id = rnd.nextInt(clients);
+
+                            Account acc0 = east.get(id);
+                            Account acc1 = west.get(id);
+
+                            acc0.balance -= amount;
+
+                            txStarted.countDown();
 
 Review comment:
   Can be placed to the beginning of tx try block for better readability with the same effect

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r408970882
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManager.java
 ##########
 @@ -0,0 +1,1986 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.channels.FileChannel;
+import java.nio.file.FileVisitResult;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.SimpleFileVisitor;
+import java.nio.file.attribute.BasicFileAttributes;
+import java.util.ArrayDeque;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Queue;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.Executor;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.concurrent.atomic.LongAdder;
+import java.util.function.BiConsumer;
+import java.util.function.BiFunction;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.events.DiscoveryEvent;
+import org.apache.ignite.failure.FailureContext;
+import org.apache.ignite.failure.FailureType;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.GridTopic;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.events.DiscoveryCustomEvent;
+import org.apache.ignite.internal.managers.communication.GridIoManager;
+import org.apache.ignite.internal.managers.communication.GridMessageListener;
+import org.apache.ignite.internal.managers.communication.TransmissionCancelledException;
+import org.apache.ignite.internal.managers.communication.TransmissionHandler;
+import org.apache.ignite.internal.managers.communication.TransmissionMeta;
+import org.apache.ignite.internal.managers.communication.TransmissionPolicy;
+import org.apache.ignite.internal.managers.discovery.DiscoCache;
+import org.apache.ignite.internal.managers.discovery.DiscoveryCustomMessage;
+import org.apache.ignite.internal.managers.discovery.GridDiscoveryManager;
+import org.apache.ignite.internal.managers.eventstorage.DiscoveryEventListener;
+import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.CacheType;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.PartitionsExchangeAware;
+import org.apache.ignite.internal.processors.cache.persistence.StorageException;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.filename.PdsFolderSettings;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.MetastorageLifecycleListener;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadOnlyMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadWriteMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.tree.io.PageIO;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.processors.marshaller.MappedName;
+import org.apache.ignite.internal.processors.metric.MetricRegistry;
+import org.apache.ignite.internal.processors.metric.impl.LongAdderMetric;
+import org.apache.ignite.internal.util.GridBusyLock;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.distributed.InitMessage;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.tostring.GridToStringInclude;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.A;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.S;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.lang.IgniteUuid;
+import org.apache.ignite.thread.IgniteThreadPoolExecutor;
+import org.apache.ignite.thread.OomExceptionHandler;
+import org.jetbrains.annotations.Nullable;
+
+import static java.nio.file.StandardOpenOption.READ;
+import static org.apache.ignite.cluster.ClusterState.active;
+import static org.apache.ignite.configuration.IgniteConfiguration.DFLT_SNAPSHOT_DIRECTORY;
+import static org.apache.ignite.events.EventType.EVT_NODE_FAILED;
+import static org.apache.ignite.events.EventType.EVT_NODE_LEFT;
+import static org.apache.ignite.internal.IgniteFeatures.PERSISTENCE_CACHE_SNAPSHOT;
+import static org.apache.ignite.internal.IgniteFeatures.nodeSupports;
+import static org.apache.ignite.internal.MarshallerContextImpl.mappingFileStoreWorkDir;
+import static org.apache.ignite.internal.MarshallerContextImpl.saveMappings;
+import static org.apache.ignite.internal.events.DiscoveryCustomEvent.EVT_DISCOVERY_CUSTOM_EVT;
+import static org.apache.ignite.internal.managers.communication.GridIoPolicy.SYSTEM_POOL;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.INDEX_PARTITION;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.MAX_PARTITION_ID;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.resolveBinaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.INDEX_FILE_NAME;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.PART_FILE_TEMPLATE;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFile;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFileName;
+import static org.apache.ignite.internal.processors.cache.persistence.filename.PdsConsistentIdProcessor.DB_DEFAULT_FOLDER;
+import static org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId.getFlagByPartId;
+import static org.apache.ignite.internal.util.IgniteUtils.isLocalNodeCoordinator;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.END_SNAPSHOT;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.START_SNAPSHOT;
+
+/**
+ * Internal implementation of snapshot operations over persistence caches.
+ * <p>
+ * There are two major actions available:
+ * <ul>
+ *     <li>Create snapshot of the whole cluster cache groups by triggering PME to achieve consistency.</li>
+ *     <li>Create local snapshot of requested cache groups and send it to the node which request this operation.
+ *     Cache groups will be transmitted using internal API for transferring files. See {@link TransmissionHandler}.</li>
+ * </ul>
+ */
+public class IgniteSnapshotManager extends GridCacheSharedManagerAdapter
+    implements IgniteSnapshot, PartitionsExchangeAware, MetastorageLifecycleListener {
+    /** File with delta pages suffix. */
+    public static final String DELTA_SUFFIX = ".delta";
+
+    /** File name template consists of delta pages. */
+    public static final String PART_DELTA_TEMPLATE = PART_FILE_TEMPLATE + DELTA_SUFFIX;
+
+    /** File name template for index delta pages. */
+    public static final String INDEX_DELTA_NAME = INDEX_FILE_NAME + DELTA_SUFFIX;
+
+    /** Text Reason for checkpoint to start snapshot operation. */
+    public static final String CP_SNAPSHOT_REASON = "Checkpoint started to enforce snapshot operation: %s";
+
+    /** Name prefix for each remote snapshot operation. */
+    public static final String RMT_SNAPSHOT_PREFIX = "snapshot_";
+
+    /** Default snapshot directory for loading remote snapshots. */
+    public static final String DFLT_SNAPSHOT_TMP_DIR = "snp";
+
+    /** Timeout in millisecond for snapshot operations. */
+    public static final long DFLT_SNAPSHOT_TIMEOUT = 15_000L;
+
+    /** Snapshot in progress error message. */
+    public static final String SNP_IN_PROGRESS_ERR_MSG = "Operation rejected due to the snapshot operation in progress.";
+
+    /** Error message to finalize snapshot tasks. */
+    public static final String SNP_NODE_STOPPING_ERR_MSG = "Snapshot has been cancelled due to the local node " +
+        "is stopping";
+
+    /** Metastorage key to save currently running snapshot. */
+    public static final String SNP_RUNNING_KEY = "snapshot-running";
+
+    /** Snapshot metrics prefix. */
+    public static final String SNAPSHOT_METRICS = "snapshot";
+
+    /** Prefix for snapshot threads. */
+    private static final String SNAPSHOT_RUNNER_THREAD_PREFIX = "snapshot-runner";
+
+    /** Total number of thread to perform local snapshot. */
+    private static final int SNAPSHOT_THREAD_POOL_SIZE = 4;
+
+    /** Default snapshot topic to receive snapshots from remote node. */
+    private static final Object DFLT_INITIAL_SNAPSHOT_TOPIC = GridTopic.TOPIC_SNAPSHOT.topic("rmt_snp");
+
+    /** File transmission parameter of cache group id. */
+    private static final String SNP_GRP_ID_PARAM = "grpId";
+
+    /** File transmission parameter of cache partition id. */
+    private static final String SNP_PART_ID_PARAM = "partId";
+
+    /** File transmission parameter of node-sender directory path with its consistentId (e.g. db/IgniteNode0). */
+    private static final String SNP_DB_NODE_PATH_PARAM = "dbNodePath";
+
+    /** File transmission parameter of a cache directory with is currently sends its partitions. */
+    private static final String SNP_CACHE_DIR_NAME_PARAM = "cacheDirName";
+
+    /** Snapshot parameter name for a file transmission. */
+    private static final String SNP_NAME_PARAM = "snpName";
+
+    /** Total snapshot files count which receiver should expect to receive. */
+    private static final String SNP_PARTITIONS_CNT = "partsCnt";
+
+    /**
+     * Local buffer to perform copy-on-write operations with pages for {@code SnapshotFutureTask.PageStoreSerialWriter}s.
+     * It is important to have only only buffer per thread (instead of creating each buffer per
+     * each {@code SnapshotFutureTask.PageStoreSerialWriter}) this is redundant and can lead to OOM errors. Direct buffer
+     * deallocate only when ByteBuffer is garbage collected, but it can get out of off-heap memory before it.
+     */
+    private final ThreadLocal<ByteBuffer> locBuff;
+
+    /** Map of registered cache snapshot processes and their corresponding contexts. */
+    private final ConcurrentMap<String, SnapshotFutureTask> locSnpTasks = new ConcurrentHashMap<>();
+
+    /** Lock to protect the resources is used. */
+    private final GridBusyLock busyLock = new GridBusyLock();
+
+    /** Requested snapshot from remote node. */
+    private final AtomicReference<RemoteSnapshotFuture> rmtSnpReq = new AtomicReference<>();
+
+    /** Mutex used to order cluster snapshot operation progress. */
+    private final Object snpOpMux = new Object();
+
+    /** Take snapshot operation procedure. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> startSnpProc;
+
+    /** Check previously performed snapshot operation and delete uncompleted files if need. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> endSnpProc;
+
+    /** Resolved persistent data storage settings. */
+    private volatile PdsFolderSettings pdsSettings;
+
+    /** Fully initialized metastorage. */
+    private volatile ReadWriteMetastorage metaStorage;
+
+    /** Local snapshot sender factory. */
+    private Function<String, SnapshotSender> locSndrFactory = localSnapshotSenderFactory();
+
+    /** Main snapshot directory to save created snapshots. */
+    private volatile File locSnpDir;
+
+    /**
+     * Working directory for loaded snapshots from the remote nodes and storing
+     * temporary partition delta-files of locally started snapshot process.
+     */
+    private File tmpWorkDir;
+
+    /** Factory to working with delta as file storage. */
+    private volatile FileIOFactory ioFactory = new RandomAccessFileIOFactory();
+
+    /** Factory to create page store for restore. */
+    private volatile BiFunction<Integer, Boolean, FilePageStoreFactory> storeFactory;
+
+    /** Snapshot thread pool to perform local partition snapshots. */
+    private ExecutorService snpRunner;
+
+    /** System discovery message listener. */
+    private DiscoveryEventListener discoLsnr;
+
+    /** Cluster snapshot operation requested by user. */
+    private ClusterSnapshotFuture clusterSnpFut;
+
+    /** Current snapshot operation on local node. */
+    private volatile SnapshotOperationRequest clusterSnpRq;
+
+    /** {@code true} if recovery process occurred for snapshot. */
+    private volatile boolean recovered;
+
+    /** Last seen cluster snapshot operation. */
+    private volatile ClusterSnapshotFuture lastSeenSnpFut = new ClusterSnapshotFuture();
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public IgniteSnapshotManager(GridKernalContext ctx) {
+        locBuff = ThreadLocal.withInitial(() ->
+            ByteBuffer.allocateDirect(ctx.config().getDataStorageConfiguration().getPageSize())
+                .order(ByteOrder.nativeOrder()));
+
+        startSnpProc = new DistributedProcess<>(ctx, START_SNAPSHOT, this::initLocalSnapshotStartStage,
+            this::processLocalSnapshotStartStageResult);
+
+        endSnpProc = new DistributedProcess<>(ctx, END_SNAPSHOT, this::initLocalSnapshotEndStage,
+            this::processLocalSnapshotEndStageResult);
+    }
+
+    /**
+     * @param snapshotCacheDir Snapshot directory to store files.
+     * @param partId Cache partition identifier.
+     * @return A file representation.
+     */
+    public static File partDeltaFile(File snapshotCacheDir, int partId) {
+        return new File(snapshotCacheDir, partDeltaFileName(partId));
+    }
+
+    /**
+     * @param partId Partition id.
+     * @return File name of delta partition pages.
+     */
+    public static String partDeltaFileName(int partId) {
+        assert partId <= MAX_PARTITION_ID || partId == INDEX_PARTITION;
+
+        return partId == INDEX_PARTITION ? INDEX_DELTA_NAME : String.format(PART_DELTA_TEMPLATE, partId);
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void start0() throws IgniteCheckedException {
+        super.start0();
+
+        GridKernalContext ctx = cctx.kernalContext();
+
+        if (ctx.clientNode())
+            return;
+
+        if (!CU.isPersistenceEnabled(ctx.config()))
+            return;
+
+        snpRunner = new IgniteThreadPoolExecutor(SNAPSHOT_RUNNER_THREAD_PREFIX,
+            cctx.igniteInstanceName(),
+            SNAPSHOT_THREAD_POOL_SIZE,
+            SNAPSHOT_THREAD_POOL_SIZE,
+            IgniteConfiguration.DFLT_THREAD_KEEP_ALIVE_TIME,
+            new LinkedBlockingQueue<>(),
+            SYSTEM_POOL,
+            new OomExceptionHandler(ctx));
+
+        assert cctx.pageStore() instanceof FilePageStoreManager;
+
+        FilePageStoreManager storeMgr = (FilePageStoreManager)cctx.pageStore();
+
+        pdsSettings = cctx.kernalContext().pdsFolderResolver().resolveFolders();
+
+        locSnpDir = resolveSnapshotWorkDirectory(ctx.config());
+        tmpWorkDir = Paths.get(storeMgr.workDir().getAbsolutePath(), DFLT_SNAPSHOT_TMP_DIR).toFile();
+
+        U.ensureDirectory(locSnpDir, "snapshot work directory", log);
+        U.ensureDirectory(tmpWorkDir, "temp directory for snapshot creation", log);
+
+        MetricRegistry mreg = cctx.kernalContext().metric().registry(SNAPSHOT_METRICS);
+
+        mreg.register("LastSnapshotStartTime", () -> lastSeenSnpFut.startTime,
+            "The system time approximated by 10 ms when the last cluster snapshot operation has been started.");
+        mreg.register("LastSnapshotEndTime", () -> lastSeenSnpFut.endTime,
+            "The system time approximated by 10 ms when the last cluster snapshot operation has been finished.");
+        mreg.register("LastSnapshotName", () -> lastSeenSnpFut.name, String.class,
+            "The name of last started cluster snapshot operation.");
+        mreg.register("LastSnapshotErrorMessage",
+            () -> lastSeenSnpFut.error() == null ? null : lastSeenSnpFut.error().getMessage(),
+            String.class,
+            "The error message of last started cluster snapshot operation which fail. This value will be 'null' " +
+                "if last snapshot operation completed successfully.");
+        mreg.register("localSnapshotList", this::getSnapshots, List.class,
+            "The list of all known snapshots currently saved on the local node with respect to " +
+                "configured via IgniteConfiguration a snapshot path.");
+
+        storeFactory = storeMgr::getPageStoreFactory;
+
+        cctx.exchange().registerExchangeAwareComponent(this);
+        ctx.internalSubscriptionProcessor().registerMetastorageListener(this);
+
+        // Receive remote snapshots requests.
+        cctx.gridIO().addMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC, new GridMessageListener() {
+            @Override public void onMessage(UUID nodeId, Object msg, byte plc) {
+                if (!busyLock.enterBusy())
+                    return;
+
+                try {
+                    if (msg instanceof SnapshotRequestMessage) {
+                        SnapshotRequestMessage reqMsg0 = (SnapshotRequestMessage)msg;
+                        String snpName = reqMsg0.snapshotName();
+
+                        synchronized (this) {
+                            SnapshotFutureTask task = lastScheduledRemoteSnapshotTask(nodeId);
+
+                            if (task != null) {
+                                // Task will also be removed from local map due to the listener on future done.
+                                task.cancel();
+
+                                log.info("Snapshot request has been cancelled due to another request received " +
+                                    "[prevSnpResp=" + task + ", msg0=" + reqMsg0 + ']');
+                            }
+                        }
+
+                        SnapshotFutureTask task = registerSnapshotTask(snpName,
+                            nodeId,
+                            reqMsg0.parts(),
+                            remoteSnapshotSender(snpName, nodeId));
+
+                        task.listen(f -> {
+                            if (f.error() == null)
+                                return;
+
+                            U.error(log, "Failed to process request of creating a snapshot " +
+                                "[from=" + nodeId + ", msg=" + reqMsg0 + ']', f.error());
+
+                            try {
+                                cctx.gridIO().sendToCustomTopic(nodeId,
+                                    DFLT_INITIAL_SNAPSHOT_TOPIC,
+                                    new SnapshotResponseMessage(reqMsg0.snapshotName(), f.error().getMessage()),
+                                    SYSTEM_POOL);
+                            }
+                            catch (IgniteCheckedException ex0) {
+                                U.error(log, "Fail to send the response message with processing snapshot request " +
+                                    "error [request=" + reqMsg0 + ", nodeId=" + nodeId + ']', ex0);
+                            }
+                        });
+
+                        task.start();
+                    }
+                    else if (msg instanceof SnapshotResponseMessage) {
+                        SnapshotResponseMessage respMsg0 = (SnapshotResponseMessage)msg;
+
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.snpName.equals(respMsg0.snapshotName())) {
+                            if (log.isInfoEnabled()) {
+                                log.info("A stale snapshot response message has been received. Will be ignored " +
+                                    "[fromNodeId=" + nodeId + ", response=" + respMsg0 + ']');
+                            }
+
+                            return;
+                        }
+
+                        if (respMsg0.errorMessage() != null) {
+                            fut0.onDone(new IgniteCheckedException("Request cancelled. The snapshot operation stopped " +
+                                "on the remote node with an error: " + respMsg0.errorMessage()));
+                        }
+                    }
+                }
+                catch (Throwable e) {
+                    U.error(log, "Processing snapshot request from remote node fails with an error", e);
+
+                    cctx.kernalContext().failure().process(new FailureContext(FailureType.CRITICAL_ERROR, e));
+                }
+                finally {
+                    busyLock.leaveBusy();
+                }
+            }
+        });
+
+        cctx.gridEvents().addDiscoveryEventListener(discoLsnr = (evt, discoCache) -> {
+            if (!busyLock.enterBusy())
+                return;
+
+            try {
+                UUID leftNodeId = evt.eventNode().id();
+
+                if (evt.type() == EVT_DISCOVERY_CUSTOM_EVT) {
+                    DiscoveryCustomEvent evt0 = (DiscoveryCustomEvent)evt;
+
+                    if (evt0.customMessage() instanceof InitMessage) {
+                        InitMessage<?> msg = (InitMessage<?>)evt0.customMessage();
+
+                        // This happens when #takeSnapshot() method already invoked and distributed process
+                        // starts its action.
+                        if (msg.type() == START_SNAPSHOT.ordinal()) {
+                            assert clusterSnpRq != null ||
+                                !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()) : evt;
+
+                            DiscoveryCustomEvent customEvt = new DiscoveryCustomEvent();
+
+                            customEvt.node(evt0.node());
+                            customEvt.eventNode(evt0.eventNode());
+                            customEvt.affinityTopologyVersion(evt0.affinityTopologyVersion());
+                            customEvt.customMessage(new SnapshotStartDiscoveryMessage(discoCache, msg.processId()));
+
+                            // Handle new event inside discovery thread, so no guarantees will be violated.
+                            cctx.exchange().onDiscoveryEvent(customEvt, discoCache);
+                        }
+                    }
+                }
+                else if (evt.type() == EVT_NODE_LEFT || evt.type() == EVT_NODE_FAILED) {
+                    SnapshotOperationRequest snpRq = clusterSnpRq;
+
+                    for (SnapshotFutureTask sctx : locSnpTasks.values()) {
+                        if (sctx.sourceNodeId().equals(leftNodeId) ||
+                            (snpRq != null &&
+                                snpRq.snpName.equals(sctx.snapshotName()) &&
+                                snpRq.bltNodes.contains(leftNodeId))) {
+                            sctx.acceptException(new ClusterTopologyCheckedException("The node which requested snapshot " +
+                                "creation has left the grid"));
+                        }
+                    }
+
+                    RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                    if (snpTrFut != null && snpTrFut.rmtNodeId.equals(leftNodeId)) {
+                        snpTrFut.onDone(new ClusterTopologyCheckedException("The node from which a snapshot has been " +
+                            "requested left the grid"));
+                    }
+                }
+            }
+            finally {
+                busyLock.leaveBusy();
+            }
+        }, EVT_NODE_LEFT, EVT_NODE_FAILED, EVT_DISCOVERY_CUSTOM_EVT);
+
+        // Remote snapshot handler.
+        cctx.kernalContext().io().addTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC, new TransmissionHandler() {
+            @Override public void onEnd(UUID nodeId) {
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                assert snpTrFut.stores.isEmpty() : snpTrFut.stores.entrySet();
+                assert snpTrFut.partsLeft == 0 : snpTrFut;
+
+                snpTrFut.onDone();
+
+                log.info("Requested snapshot from remote node has been fully received " +
+                    "[snpName=" + snpTrFut.snpName + ", snpTrans=" + snpTrFut + ']');
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onException(UUID nodeId, Throwable err) {
+                RemoteSnapshotFuture fut = rmtSnpReq.get();
+
+                if (fut == null)
+                    return;
+
+                if (fut.rmtNodeId.equals(nodeId))
+                    fut.onDone(err);
+            }
+
+            /** {@inheritDoc} */
+            @Override public String filePath(UUID nodeId, TransmissionMeta fileMeta) {
+                Integer partId = (Integer)fileMeta.params().get(SNP_PART_ID_PARAM);
+                String rmtDbNodePath = (String)fileMeta.params().get(SNP_DB_NODE_PATH_PARAM);
+                String cacheDirName = (String)fileMeta.params().get(SNP_CACHE_DIR_NAME_PARAM);
+
+                RemoteSnapshotFuture transFut = resolve(nodeId, fileMeta);
+
+                try {
+                    File cacheDir = U.resolveWorkDirectory(tmpWorkDir.getAbsolutePath(),
+                        Paths.get(transFut.snpName, rmtDbNodePath, cacheDirName).toString(),
+                        false);
+
+                    return new File(cacheDir, getPartitionFileName(partId)).getAbsolutePath();
+                }
+                catch (IgniteCheckedException e) {
+                    throw new IgniteException(e);
+                }
+            }
+
+            /**
+             * @param nodeId Remote node id.
+             * @param meta Transmission meta.
+             * @return Resolved transmission future.
+             */
+            private RemoteSnapshotFuture resolve(UUID nodeId, TransmissionMeta meta) {
+                String snpName = (String)meta.params().get(SNP_NAME_PARAM);
+                Integer partsCnt = (Integer)meta.params().get(SNP_PARTITIONS_CNT);
+
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                if (snpTrFut == null || !snpTrFut.snpName.equals(snpName)) {
+                    throw new TransmissionCancelledException("Stale snapshot transmission will be ignored " +
+                        "[snpName=" + snpName + ", meta=" + meta + ", snpTrFut=" + snpTrFut + ']');
+                }
+
+                assert snpTrFut.snpName.equals(snpName) && snpTrFut.rmtNodeId.equals(nodeId) :
+                    "Another transmission in progress [snpTrFut=" + snpTrFut + ", nodeId=" + snpName + ']';
+
+                if (snpTrFut.partsLeft == -1)
+                    snpTrFut.partsLeft = partsCnt;
+
+                return snpTrFut;
+            }
+
+            /**
+             * @param snpTrans Current snapshot transmission.
+             * @param grpPartId Pair of group id and its partition id.
+             */
+            private void finishRecover(RemoteSnapshotFuture snpTrans, GroupPartitionId grpPartId) {
+                FilePageStore pageStore = null;
+
+                try {
+                    pageStore = snpTrans.stores.remove(grpPartId);
+
+                    pageStore.finishRecover();
+
+                    snpTrans.partConsumer.accept(new File(pageStore.getFileAbsolutePath()), grpPartId);
+
+                    snpTrans.partsLeft--;
+                }
+                catch (StorageException e) {
+                    throw new IgniteException(e);
+                }
+                finally {
+                    U.closeQuiet(pageStore);
+                }
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<ByteBuffer> chunkHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+
+                RemoteSnapshotFuture snpTrFut = resolve(nodeId, initMeta);
+
+                GroupPartitionId grpPartId = new GroupPartitionId(grpId, partId);
+                FilePageStore pageStore = snpTrFut.stores.get(grpPartId);
+
+                if (pageStore == null) {
+                    throw new IgniteException("Partition must be loaded before applying snapshot delta pages " +
+                        "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                pageStore.beginRecover();
+
+                // No snapshot delta pages received. Finalize recovery.
+                if (initMeta.count() == 0)
+                    finishRecover(snpTrFut, grpPartId);
+
+                return new Consumer<ByteBuffer>() {
+                    final LongAdder transferred = new LongAdder();
+
+                    @Override public void accept(ByteBuffer buff) {
+                        try {
+                            assert initMeta.count() != 0 : initMeta;
+
+                            RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                            if (fut0 == null || !fut0.equals(snpTrFut) || fut0.isCancelled()) {
+                                throw new TransmissionCancelledException("Snapshot request is cancelled " +
+                                    "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                            }
+
+                            pageStore.write(PageIO.getPageId(buff), buff, 0, false);
+
+                            transferred.add(buff.capacity());
+
+                            if (transferred.longValue() == initMeta.count())
+                                finishRecover(snpTrFut, grpPartId);
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                    }
+                };
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<File> fileHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+                String snpName = (String)initMeta.params().get(SNP_NAME_PARAM);
+
+                assert grpId != null;
+                assert partId != null;
+                assert snpName != null;
+                assert storeFactory != null;
+
+                RemoteSnapshotFuture transFut = rmtSnpReq.get();
+
+                if (transFut == null) {
+                    throw new IgniteException("Snapshot transmission with given name doesn't exists " +
+                        "[snpName=" + snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                return new Consumer<File>() {
+                    @Override public void accept(File file) {
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.equals(transFut) || fut0.isCancelled()) {
+                            throw new TransmissionCancelledException("Snapshot request is cancelled [snpName=" + snpName +
+                                ", grpId=" + grpId + ", partId=" + partId + ']');
+                        }
+
+                        busyLock.enterBusy();
+
+                        try {
+                            FilePageStore pageStore = (FilePageStore)storeFactory
+                                .apply(grpId, false)
+                                .createPageStore(getFlagByPartId(partId),
+                                    file::toPath,
+                                    new LongAdderMetric("NO_OP", null));
+
+                            transFut.stores.put(new GroupPartitionId(grpId, partId), pageStore);
+
+                            pageStore.init();
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                        finally {
+                            busyLock.leaveBusy();
+                        }
+                    }
+                };
+            }
+        });
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void stop0(boolean cancel) {
+        busyLock.block();
+
+        try {
+            // Try stop all snapshot processing if not yet.
+            for (SnapshotFutureTask sctx : locSnpTasks.values())
+                sctx.acceptException(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+            locSnpTasks.clear();
+
+            RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+            if (snpTrFut != null)
+                snpTrFut.cancel();
+
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null) {
+                    clusterSnpFut.onDone(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+                    clusterSnpFut = null;
+                }
+            }
+
+            if (snpRunner != null)
+                snpRunner.shutdownNow();
+
+            cctx.kernalContext().io().removeMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC);
+            cctx.kernalContext().io().removeTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC);
+
+            if (discoLsnr != null)
+                cctx.kernalContext().event().removeDiscoveryEventListener(discoLsnr);
+
+            cctx.exchange().unregisterExchangeAwareComponent(this);
+        }
+        finally {
+            busyLock.unblock();
+        }
+    }
+
+    /**
+     * @param snpDir Snapshot dire
+     * @param folderName Local node folder name (see {@link U#maskForFileName} with consistent id).
+     */
+    public static void deleteSnapshot(File snpDir, String folderName) {
+        if (!snpDir.exists())
+            return;
+
+        assert snpDir.isDirectory() : snpDir;
+
+        try {
+            File binDir = resolveBinaryWorkDir(snpDir.getAbsolutePath(), folderName);
+            File dbDir = U.resolveWorkDirectory(snpDir.getAbsolutePath(), databaseRelativePath(folderName), false);
+
+            U.delete(binDir);
+            U.delete(dbDir);
+
+            File marshDir = mappingFileStoreWorkDir(snpDir.getAbsolutePath());
+
+            // Concurrently traverse the snapshot marshaller directory and delete all files.
+            Files.walkFileTree(marshDir.toPath(), new SimpleFileVisitor<Path>() {
+                @Override public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException {
+                    U.delete(file);
+
+                    return FileVisitResult.CONTINUE;
+                }
+
+                @Override public FileVisitResult visitFileFailed(Path file, IOException exc) {
+                    // Skip files which can be concurrently removed from FileTree.
+                    return FileVisitResult.CONTINUE;
+                }
+            });
+
+            File db = new File(snpDir, DB_DEFAULT_FOLDER);
+
+            if (!db.exists() || db.list().length == 0)
+                U.delete(snpDir);
+        }
+        catch (IOException | IgniteCheckedException e) {
+            throw new IgniteException(e);
+        }
+    }
+
+    /**
+     * @param snpName Snapshot name.
+     * @return Local snapshot directory for snapshot with given name.
+     */
+    public File snapshotLocalDir(String snpName) {
+        assert locSnpDir != null;
+
+        return new File(locSnpDir, snpName);
+    }
+
+    /**
+     * @return Node snapshot working directory.
+     */
+    public File snapshotTmpDir() {
+        assert tmpWorkDir != null;
+
+        return tmpWorkDir;
+    }
+
+    /**
+     * @param req Request on snapshot creation.
+     * @return Future which will be completed when a snapshot has been started.
+     */
+    private IgniteInternalFuture<SnapshotOperationResponse> initLocalSnapshotStartStage(SnapshotOperationRequest req) {
+        if (cctx.kernalContext().clientNode() ||
+            !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()))
+            return new GridFinishedFuture<>();
+
+        // Executed inside discovery notifier thread, prior to firing discovery custom event,
+        // so it is safe to set new snapshot task inside this method without synchronization.
+        if (clusterSnpRq != null) {
+            return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot operation has been rejected. " +
+                "Another snapshot operation in progress [req=" + req + ", curr=" + clusterSnpRq + ']'));
+        }
+
+        // Collection of pairs group and appropriate cache partition to be snapshot.
+        Map<Integer, Set<Integer>> parts = new HashMap<>();
+
+        for (Integer grpId : req.grpIds)
+            parts.put(grpId, null);
+
+        SnapshotFutureTask task0 = registerSnapshotTask(req.snpName,
+            req.srcNodeId,
+            parts,
+            locSndrFactory.apply(req.snpName));
+
+        clusterSnpRq = req;
+
+        return task0.chain(f -> new SnapshotOperationResponse());
+    }
+
+    /**
+     * @param id Request id.
+     * @param res Results.
+     * @param err Errors.
+     */
+    private void processLocalSnapshotStartStageResult(UUID id, Map<UUID, SnapshotOperationResponse> res, Map<UUID, Exception> err) {
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        if (snpRq == null)
+            return;
+
+        if (!snpRq.rqId.equals(id)) {
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null && clusterSnpFut.rqId.equals(id)) {
+                    clusterSnpFut.onDone(new IgniteCheckedException("Snapshot operation failed due to another snapshot " +
+                        "operation in progress: " + snpRq.snpName));
+
+                    clusterSnpFut = null;
+                }
+
+                return;
+            }
+        }
+
+        if (isLocalNodeCoordinator(cctx.discovery())) {
+            Set<UUID> missed = new HashSet<>(snpRq.bltNodes);
+            missed.removeAll(res.keySet());
+            missed.removeAll(err.keySet());
+
+            snpRq.hasErr = !F.isEmpty(err) || !missed.isEmpty();
+
+            if (snpRq.hasErr) {
+                U.warn(log, "Execution of local snapshot tasks fails or them haven't been executed " +
+                    "due to some of nodes left the cluster. Uncompleted snapshot will be deleted " +
+                    "[err=" + err + ", missed=" + missed + ']');
+            }
+
+            endSnpProc.start(UUID.randomUUID(), snpRq);
+        }
+    }
+
+    /**
+     * @param req Request on snapshot creation.
+     * @return Future which will be completed when the snapshot will be finalized.
+     */
+    private IgniteInternalFuture<SnapshotOperationResponse> initLocalSnapshotEndStage(SnapshotOperationRequest req) {
+        if (clusterSnpRq == null)
+            return new GridFinishedFuture<>(new SnapshotOperationResponse());
+
+        try {
+            if (req.hasErr)
+                deleteSnapshot(snapshotLocalDir(req.snpName), pdsSettings.folderName());
+
+            removeLastMetaStorageKey();
+        }
+        catch (Exception e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        return new GridFinishedFuture<>(new SnapshotOperationResponse());
+    }
+
+    /**
+     * @param id Request id.
+     * @param res Results.
+     * @param err Errors.
+     */
+    private void processLocalSnapshotEndStageResult(UUID id, Map<UUID, SnapshotOperationResponse> res, Map<UUID, Exception> err) {
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        if (snpRq == null)
+            return;
+
+        Set<UUID> endFail = new HashSet<>(snpRq.bltNodes);
+        endFail.removeAll(res.keySet());
+
+        clusterSnpRq = null;
+
+        synchronized (snpOpMux) {
+            if (clusterSnpFut != null) {
+                if (endFail.isEmpty() && !snpRq.hasErr) {
+                    clusterSnpFut.onDone();
+
+                    if (log.isInfoEnabled())
+                        log.info("Cluster-wide snapshot operation finished successfully [req=" + snpRq + ']');
+                }
+                else {
+                    clusterSnpFut.onDone(new IgniteCheckedException("Snapshot creation has been finished with an error. " +
+                        "Local snapshot tasks may not finished completely or finalizing results fails " +
+                        "[hasErr" + snpRq.hasErr + ", fail=" + endFail + ']'));
+                }
+
+                clusterSnpFut = null;
+            }
+        }
+    }
+
+    /**
+     * @return {@code True} if snapshot operation is in progress.
+     */
+    public boolean isSnapshotCreating() {
+        if (clusterSnpRq != null)
+            return true;
+
+        synchronized (snpOpMux) {
+            return clusterSnpRq != null || clusterSnpFut != null;
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public List<String> getSnapshots() {
+        if (cctx.kernalContext().clientNode())
+            throw new UnsupportedOperationException("Client and daemon nodes can not perform this operation.");
+
+        synchronized (snpOpMux) {
+            return Arrays.stream(locSnpDir.listFiles(File::isDirectory))
+                .map(File::getName)
+                .collect(Collectors.toList());
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public IgniteFuture<Void> createSnapshot(String name) {
+        A.notNullOrEmpty(name, "name");
+
+        try {
+            if (cctx.kernalContext().clientNode())
+                throw new UnsupportedOperationException("Client and daemon nodes can not perform this operation.");
+
+            if (!IgniteFeatures.allNodesSupports(cctx.discovery().allNodes(), PERSISTENCE_CACHE_SNAPSHOT))
+                throw new IgniteException("Not all nodes in the cluster support a snapshot operation.");
+
+            if (!active(cctx.kernalContext().state().clusterState().state()))
+                throw new IgniteException("Snapshot operation has been rejected. The cluster is inactive.");
+
+            DiscoveryDataClusterState clusterState = cctx.kernalContext().state().clusterState();
+
+            if (!clusterState.hasBaselineTopology())
+                throw new IgniteException("Snapshot operation has been rejected. The baseline topology is not configured for cluster.");
+
+            ClusterSnapshotFuture snpFut0;
+
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null && !clusterSnpFut.isDone())
+                    throw new IgniteException("Create snapshot request has been rejected. The previous snapshot operation was not completed.");
+
+                if (clusterSnpRq != null)
+                    throw new IgniteException("Create snapshot request has been rejected. Parallel snapshot processes are not allowed.");
+
+                if (getSnapshots().contains(name))
+                    throw new IgniteException("Create snapshot request has been rejected. Snapshot with given name already exists.");
+
+                snpFut0 = new ClusterSnapshotFuture(UUID.randomUUID(), name);
+
+                clusterSnpFut = snpFut0;
+                lastSeenSnpFut = snpFut0;
+            }
+
+            List<Integer> grps = cctx.cache().persistentGroups().stream()
+                .filter(g -> cctx.cache().cacheType(g.cacheOrGroupName()) == CacheType.USER)
+                .filter(g -> !g.config().isEncryptionEnabled())
+                .map(CacheGroupDescriptor::groupId)
+                .collect(Collectors.toList());
+
+            List<ClusterNode> srvNodes = cctx.discovery().serverNodes(AffinityTopologyVersion.NONE);
+
+            startSnpProc.start(snpFut0.rqId, new SnapshotOperationRequest(snpFut0.rqId,
+                cctx.localNodeId(),
+                name,
+                grps,
+                new HashSet<>(F.viewReadOnly(srvNodes,
+                    F.node2id(),
+                    (node) -> CU.baselineNode(node, clusterState)))));
+
+            if (log.isInfoEnabled())
+                log.info("Cluster-wide snapshot operation started [snpName=" + name + ", grps=" + grps + ']');
+
+            return new IgniteFutureImpl<>(snpFut0);
+        }
+        catch (Exception e) {
+            U.error(log, "Start snapshot operation failed", e);
+
+            lastSeenSnpFut = new ClusterSnapshotFuture(name, e);
+
+            return new IgniteFinishedFutureImpl<>(e);
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onReadyForReadWrite(ReadWriteMetastorage metaStorage) throws IgniteCheckedException {
+        synchronized (snpOpMux) {
+            this.metaStorage = metaStorage;
+
+            if (recovered)
+                removeLastMetaStorageKey();
+
+            recovered = false;
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onReadyForRead(ReadOnlyMetastorage metaStorage) throws IgniteCheckedException {
+        // Snapshot which has not been completed due to the local node crashed must be deleted.
+        String snpName = (String)metaStorage.read(SNP_RUNNING_KEY);
+
+        if (snpName == null)
+            return;
+
+        recovered = true;
+
+        for (File tmp : snapshotTmpDir().listFiles())
+            U.delete(tmp);
+
+        deleteSnapshot(snapshotLocalDir(snpName), pdsSettings.folderName());
+
+        if (log.isInfoEnabled()) {
+            log.info("Previous attempt to create snapshot fail due to the local node crash. All resources " +
+                "related to snapshot operation have been deleted: " + snpName);
+        }
+    }
+
+    /**
+     * @param evt Discovery event to check.
+     * @return {@code true} if exchange started by snapshot operation.
+     */
+    public static boolean isSnapshotOperation(DiscoveryEvent evt) {
+        return !evt.eventNode().isClient() &&
+            evt.type() == EVT_DISCOVERY_CUSTOM_EVT &&
+            ((DiscoveryCustomEvent)evt).customMessage() instanceof SnapshotStartDiscoveryMessage;
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onDoneBeforeTopologyUnlock(GridDhtPartitionsExchangeFuture fut) {
+        if (clusterSnpRq == null || cctx.kernalContext().clientNode())
+            return;
+
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        SnapshotFutureTask task = locSnpTasks.get(snpRq.snpName);
+
+        if (task == null)
+            return;
+
+        if (task.start()) {
+            cctx.database().forceCheckpoint(String.format("Start snapshot operation: %s", snpRq.snpName));
+
+            // schedule task on checkpoint and wait when it starts
+            try {
+                task.awaitStarted();
+            }
+            catch (IgniteCheckedException e) {
+                U.error(log, "Fail to wait while cluster-wide snapshot operation started", e);
+            }
+        }
+    }
+
+    /**
+     * @param parts Collection of pairs group and appropriate cache partition to be snapshot.
+     * @param rmtNodeId The remote node to connect to.
+     * @param partConsumer Received partition handler.
+     * @return Future which will be completed when requested snapshot fully received.
+     */
+    public IgniteInternalFuture<Void> createRemoteSnapshot(
+        UUID rmtNodeId,
+        Map<Integer, Set<Integer>> parts,
+        BiConsumer<File, GroupPartitionId> partConsumer
+    ) {
+        assert partConsumer != null;
+
+        ClusterNode rmtNode = cctx.discovery().node(rmtNodeId);
+
+        if (!nodeSupports(rmtNode, PERSISTENCE_CACHE_SNAPSHOT))
+            return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot on remote node is not supported: " + rmtNode.id()));
+
+        if (rmtNode == null) {
+            return new GridFinishedFuture<>(new ClusterTopologyCheckedException("Snapshot request cannot be performed. " +
+                "Remote node left the grid [rmtNodeId=" + rmtNodeId + ']'));
+        }
+
+        String snpName = RMT_SNAPSHOT_PREFIX + UUID.randomUUID().toString();
+
+        RemoteSnapshotFuture snpTransFut = new RemoteSnapshotFuture(rmtNodeId, snpName, partConsumer);
+
+        busyLock.enterBusy();
+        SnapshotRequestMessage msg0;
+
+        try {
+            msg0 = new SnapshotRequestMessage(snpName, parts);
+
+            RemoteSnapshotFuture fut = rmtSnpReq.get();
+
+            try {
+                if (fut != null)
+                    fut.get(DFLT_SNAPSHOT_TIMEOUT, TimeUnit.MILLISECONDS);
+            }
+            catch (IgniteCheckedException e) {
+                if (log.isInfoEnabled())
+                    log.info("The previous snapshot request finished with an exception:" + e.getMessage());
+            }
+
+            try {
+                if (rmtSnpReq.compareAndSet(null, snpTransFut)) {
+                    cctx.gridIO().sendOrderedMessage(rmtNode, DFLT_INITIAL_SNAPSHOT_TOPIC, msg0, SYSTEM_POOL,
+                        Long.MAX_VALUE, true);
+                }
+                else
+                    return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot request has been concurrently interrupted."));
+
+            }
+            catch (IgniteCheckedException e) {
+                rmtSnpReq.compareAndSet(snpTransFut, null);
+
+                return new GridFinishedFuture<>(e);
+            }
+        }
+        finally {
+            busyLock.leaveBusy();
+        }
+
+        if (log.isInfoEnabled()) {
+            log.info("Snapshot request is sent to the remote node [rmtNodeId=" + rmtNodeId +
+                ", msg0=" + msg0 + ", snpTransFut=" + snpTransFut +
+                ", topVer=" + cctx.discovery().topologyVersionEx() + ']');
+        }
+
+        return snpTransFut;
+    }
+
+    /**
+     * @param grps List of cache groups which will be destroyed.
+     */
+    public void onCacheGroupsStopped(List<Integer> grps) {
+        for (SnapshotFutureTask sctx : locSnpTasks.values()) {
+            Set<Integer> retain = new HashSet<>(grps);
+            retain.retainAll(sctx.affectedCacheGroups());
+
+            if (!retain.isEmpty()) {
+                sctx.acceptException(new IgniteCheckedException("Snapshot has been interrupted due to some of the required " +
+                    "cache groups stopped: " + retain));
+            }
+        }
+    }
+
+    /**
+     * @param snpName Unique snapshot name.
+     * @param srcNodeId Node id which cause snapshot operation.
+     * @param parts Collection of pairs group and appropriate cache partition to be snapshot.
+     * @param snpSndr Factory which produces snapshot receiver instance.
+     * @return Snapshot operation task which should be registered on checkpoint to run.
+     */
+    SnapshotFutureTask registerSnapshotTask(
+        String snpName,
+        UUID srcNodeId,
+        Map<Integer, Set<Integer>> parts,
+        SnapshotSender snpSndr
+    ) {
+        if (!busyLock.enterBusy())
+            return new SnapshotFutureTask(new IgniteCheckedException("Snapshot manager is stopping [locNodeId=" + cctx.localNodeId() + ']'));
+
+        try {
+            if (locSnpTasks.containsKey(snpName))
+                return new SnapshotFutureTask(new IgniteCheckedException("Snapshot with requested name is already scheduled: " + snpName));
+
+            SnapshotFutureTask snpFutTask;
+
+            SnapshotFutureTask prev = locSnpTasks.putIfAbsent(snpName,
+                snpFutTask = new SnapshotFutureTask(cctx,
+                    srcNodeId,
+                    snpName,
+                    tmpWorkDir,
+                    ioFactory,
+                    snpSndr,
+                    parts,
+                    locBuff));
+
+            if (prev != null)
+                return new SnapshotFutureTask(new IgniteCheckedException("Snapshot with requested name is already scheduled: " + snpName));
+
+            if (log.isInfoEnabled()) {
+                log.info("Snapshot task has been registered on local node [sctx=" + this +
+                    ", topVer=" + cctx.discovery().topologyVersionEx() + ']');
+            }
+
+            snpFutTask.listen(f -> locSnpTasks.remove(snpName));
+
+            return snpFutTask;
+        }
+        finally {
+            busyLock.leaveBusy();
+        }
+    }
+
+    /**
+     * @param factory Factory which produces {@link LocalSnapshotSender} implementation.
+     */
+    void setLocalSnapshotSenderFactory(Function<String, SnapshotSender> factory) {
+        locSndrFactory = factory;
+    }
+
+    /**
+     * @return Factory which produces {@link LocalSnapshotSender} implementation.
+     */
+    Function<String, SnapshotSender> localSnapshotSenderFactory() {
+        return LocalSnapshotSender::new;
+    }
+
+    /**
+     * @param snpName Snapshot name.
+     * @param rmtNodeId Remote node id to send snapshot to.
+     * @return Snapshot sender instance.
+     */
+    SnapshotSender remoteSnapshotSender(String snpName, UUID rmtNodeId) {
+        // Remote snapshots can be send only by single threaded executor since only one transmissionSender created.
+        return new RemoteSnapshotSender(log,
+            new SequentialExecutorWrapper(log, snpRunner),
+            () -> databaseRelativePath(pdsSettings.folderName()),
+            cctx.gridIO().openTransmissionSender(rmtNodeId, DFLT_INITIAL_SNAPSHOT_TOPIC),
+            snpName);
+    }
+
+    /** Snapshot finished successfully or already restored. Key can be removed. */
+    private void removeLastMetaStorageKey() throws IgniteCheckedException {
+        cctx.database().checkpointReadLock();
+
+        try {
+            metaStorage.remove(SNP_RUNNING_KEY);
+        }
+        finally {
+            cctx.database().checkpointReadUnlock();
+        }
+    }
+
+    /**
+     * @return The executor service used to run snapshot tasks.
+     */
+    ExecutorService snapshotExecutorService() {
+        assert snpRunner != null;
+
+        return snpRunner;
+    }
+
+    /**
+     * @param ioFactory Factory to create IO interface over a page stores.
+     */
+    void ioFactory(FileIOFactory ioFactory) {
+        this.ioFactory = ioFactory;
+    }
+
+    /**
+     * @param nodeId Remote node id on which requests has been registered.
+     * @return Snapshot future related to given node id.
+     */
+    SnapshotFutureTask lastScheduledRemoteSnapshotTask(UUID nodeId) {
+        return locSnpTasks.values().stream()
+            .filter(t -> t.type() == RemoteSnapshotSender.class && t.sourceNodeId().equals(nodeId))
+            .findFirst()
+            .orElse(null);
+    }
+
+    /**
+     * @return Relative configured path of persistence data storage directory for the local node.
+     * Example: {@code snapshotWorkDir/db/IgniteNodeName0}
+     */
+    static String databaseRelativePath(String folderName) {
+        return Paths.get(DB_DEFAULT_FOLDER, folderName).toString();
+    }
+
+    /**
+     * @param cfg Ignite configuration.
+     * @return Snapshot work path.
+     */
+    static File resolveSnapshotWorkDirectory(IgniteConfiguration cfg) {
+        try {
+            return cfg.getSnapshotPath() == null ?
+                U.resolveWorkDirectory(cfg.getWorkDirectory(), DFLT_SNAPSHOT_DIRECTORY, false) :
+                U.resolveWorkDirectory(cfg.getWorkDirectory(), cfg.getSnapshotPath(), false);
+        }
+        catch (IgniteCheckedException e) {
+            throw new IgniteException(e);
+        }
+    }
+
+    /** Remote snapshot future which tracks remote snapshot transmission result. */
+    private class RemoteSnapshotFuture extends GridFutureAdapter<Void> {
+        /** Snapshot name to create. */
+        private final String snpName;
+
+        /** Remote node id to request snapshot from. */
+        private final UUID rmtNodeId;
+
+        /** Collection of partition to be received. */
+        private final Map<GroupPartitionId, FilePageStore> stores = new ConcurrentHashMap<>();
+
+        /** Partition handler given by request initiator. */
+        private final BiConsumer<File, GroupPartitionId> partConsumer;
+
+        /** Counter which show how many partitions left to be received. */
+        private int partsLeft = -1;
+
+        /**
+         * @param partConsumer Received partition handler.
+         */
+        public RemoteSnapshotFuture(UUID rmtNodeId, String snpName, BiConsumer<File, GroupPartitionId> partConsumer) {
+            this.snpName = snpName;
+            this.rmtNodeId = rmtNodeId;
+            this.partConsumer = partConsumer;
+        }
+
+        /** {@inheritDoc} */
+        @Override public boolean cancel() {
+            return onCancelled();
+        }
+
+        /** {@inheritDoc} */
+        @Override protected boolean onDone(@Nullable Void res, @Nullable Throwable err, boolean cancel) {
+            assert err != null || cancel || stores.isEmpty() : "Not all file storage processed: " + stores;
+
+            rmtSnpReq.compareAndSet(this, null);
+
+            if (err != null || cancel) {
+                // Close non finished file storage.
+                for (Map.Entry<GroupPartitionId, FilePageStore> entry : stores.entrySet()) {
+                    FilePageStore store = entry.getValue();
+
+                    try {
+                        store.stop(true);
+                    }
+                    catch (StorageException e) {
+                        log.warning("Error stopping received file page store", e);
+                    }
+                }
+            }
+
+            U.delete(Paths.get(tmpWorkDir.getAbsolutePath(), snpName));
+
+            return super.onDone(res, err, cancel);
+        }
+
+        /** {@inheritDoc} */
+        @Override public boolean equals(Object o) {
+            if (this == o)
+                return true;
+
+            if (o == null || getClass() != o.getClass())
+                return false;
+
+            RemoteSnapshotFuture fut = (RemoteSnapshotFuture)o;
+
+            return rmtNodeId.equals(fut.rmtNodeId) &&
+                snpName.equals(fut.snpName);
+        }
+
+        /** {@inheritDoc} */
+        @Override public int hashCode() {
+            return Objects.hash(rmtNodeId, snpName);
+        }
+
+        /** {@inheritDoc} */
+        @Override public String toString() {
+            return S.toString(RemoteSnapshotFuture.class, this);
+        }
+    }
+
+    /**
+     * Such an executor can executes tasks not in a single thread, but executes them
+     * on different threads sequentially. It's important for some {@link SnapshotSender}'s
+     * to process sub-task sequentially due to all these sub-tasks may share a single socket
+     * channel to send data to.
+     */
+    private static class SequentialExecutorWrapper implements Executor {
+        /** Ignite logger. */
+        private final IgniteLogger log;
+
+        /** Queue of task to execute. */
+        private final Queue<Runnable> tasks = new ArrayDeque<>();
+
+        /** Delegate executor. */
+        private final Executor executor;
+
+        /** Currently running task. */
+        private volatile Runnable active;
+
+        /** If wrapped executor is shutting down. */
+        private volatile boolean stopping;
+
+        /**
+         * @param executor Executor to run tasks on.
+         */
+        public SequentialExecutorWrapper(IgniteLogger log, Executor executor) {
+            this.log = log.getLogger(SequentialExecutorWrapper.class);
+            this.executor = executor;
+        }
+
+        /** {@inheritDoc} */
+        @Override public synchronized void execute(final Runnable r) {
+            assert !stopping : "Task must be cancelled prior to the wrapped executor is shutting down.";
+
+            tasks.offer(() -> {
+                try {
+                    r.run();
+                }
+                finally {
+                    scheduleNext();
+                }
+            });
+
+            if (active == null)
+                scheduleNext();
+        }
+
+        /** */
+        protected synchronized void scheduleNext() {
+            if ((active = tasks.poll()) != null) {
+                try {
+                    executor.execute(active);
+                }
+                catch (RejectedExecutionException e) {
+                    tasks.clear();
+
+                    stopping = true;
+
+                    log.warning("Task is outdated. Wrapped executor is shutting down.", e);
+                }
+            }
+        }
+    }
+
+    /**
+     *
+     */
+    private static class RemoteSnapshotSender extends SnapshotSender {
+        /** The sender which sends files to remote node. */
+        private final GridIoManager.TransmissionSender sndr;
+
+        /** Relative node path initializer. */
+        private final Supplier<String> initPath;
+
+        /** Snapshot name */
+        private final String snpName;
+
+        /** Local node persistent directory with consistent id. */
+        private String relativeNodePath;
+
+        /** The number of cache partition files expected to be processed. */
+        private int partsCnt;
+
+        /**
+         * @param log Ignite logger.
+         * @param sndr File sender instance.
+         * @param snpName Snapshot name.
+         */
+        public RemoteSnapshotSender(
+            IgniteLogger log,
+            Executor exec,
+            Supplier<String> initPath,
+            GridIoManager.TransmissionSender sndr,
+            String snpName
+        ) {
+            super(log, exec);
+
+            this.sndr = sndr;
+            this.snpName = snpName;
+            this.initPath = initPath;
+        }
+
+        /** {@inheritDoc} */
+        @Override protected void init(int partsCnt) {
+            this.partsCnt = partsCnt;
+
+            relativeNodePath = initPath.get();
+
+            if (relativeNodePath == null)
+                throw new IgniteException("Relative node path cannot be empty.");
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendPart0(File part, String cacheDirName, GroupPartitionId pair, Long len) {
+            try {
+                assert part.exists();
+                assert len > 0 : "Requested partitions has incorrect file length " +
+                    "[pair=" + pair + ", cacheDirName=" + cacheDirName + ']';
+
+                sndr.send(part, 0, len, transmissionParams(snpName, cacheDirName, pair), TransmissionPolicy.FILE);
+
+                if (log.isInfoEnabled()) {
+                    log.info("Partition file has been send [part=" + part.getName() + ", pair=" + pair +
+                        ", length=" + len + ']');
+                }
+            }
+            catch (TransmissionCancelledException e) {
+                if (log.isInfoEnabled()) {
+                    log.info("Transmission partition file has been interrupted [part=" + part.getName() +
+                        ", pair=" + pair + ']');
+                }
+            }
+            catch (IgniteCheckedException | InterruptedException | IOException e) {
+                U.error(log, "Error sending partition file [part=" + part.getName() + ", pair=" + pair +
+                    ", length=" + len + ']', e);
+
+                throw new IgniteException(e);
+            }
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendDelta0(File delta, String cacheDirName, GroupPartitionId pair) {
+            try {
+                sndr.send(delta, transmissionParams(snpName, cacheDirName, pair), TransmissionPolicy.CHUNK);
+
+                if (log.isInfoEnabled())
+                    log.info("Delta pages storage has been send [part=" + delta.getName() + ", pair=" + pair + ']');
+            }
+            catch (TransmissionCancelledException e) {
+                if (log.isInfoEnabled()) {
+                    log.info("Transmission delta pages has been interrupted [part=" + delta.getName() +
+                        ", pair=" + pair + ']');
+                }
+            }
+            catch (IgniteCheckedException | InterruptedException | IOException e) {
+                U.error(log, "Error sending delta file  [part=" + delta.getName() + ", pair=" + pair + ']', e);
+
+                throw new IgniteException(e);
+            }
+        }
+
+        /**
+         * @param cacheDirName Cache directory name.
+         * @param pair Cache group id with corresponding partition id.
+         * @return Map of params.
+         */
+        private Map<String, Serializable> transmissionParams(String snpName, String cacheDirName,
+            GroupPartitionId pair) {
+            Map<String, Serializable> params = new HashMap<>();
+
+            params.put(SNP_GRP_ID_PARAM, pair.getGroupId());
+            params.put(SNP_PART_ID_PARAM, pair.getPartitionId());
+            params.put(SNP_DB_NODE_PATH_PARAM, relativeNodePath);
+            params.put(SNP_CACHE_DIR_NAME_PARAM, cacheDirName);
+            params.put(SNP_NAME_PARAM, snpName);
+            params.put(SNP_PARTITIONS_CNT, partsCnt);
+
+            return params;
+        }
+
+        /** {@inheritDoc} */
+        @Override public void close0(@Nullable Throwable th) {
+            U.closeQuiet(sndr);
+
+            if (th == null) {
+                if (log.isInfoEnabled())
+                    log.info("The remote snapshot sender closed normally [snpName=" + snpName + ']');
+            }
+            else {
+                U.warn(log, "The remote snapshot sender closed due to an error occurred while processing " +
+                    "snapshot operation [snpName=" + snpName + ']', th);
+            }
+        }
+    }
+
+    /**
+     * Snapshot sender which writes all data to local directory.
+     */
+    private class LocalSnapshotSender extends SnapshotSender {
+        /** Snapshot name. */
+        private final String snpName;
+
+        /** Local snapshot directory. */
+        private final File snpLocDir;
+
+        /** Local node snapshot directory calculated on snapshot directory. */
+        private File dbDir;
+
+        /** Size of page. */
+        private final int pageSize;
+
+        /**
+         * @param snpName Snapshot name.
+         */
+        public LocalSnapshotSender(String snpName) {
+            super(IgniteSnapshotManager.this.log, snpRunner);
+
+            this.snpName = snpName;
+            snpLocDir = snapshotLocalDir(snpName);
+            pageSize = cctx.kernalContext().config().getDataStorageConfiguration().getPageSize();
+        }
+
+        /** {@inheritDoc} */
+        @Override protected void init(int partsCnt) {
+            dbDir = new File (snpLocDir, databaseRelativePath(pdsSettings.folderName()));
+
+            if (dbDir.exists()) {
+                throw new IgniteException("Snapshot with given name already exists " +
+                    "[snpName=" + snpName + ", absPath=" + dbDir.getAbsolutePath() + ']');
+            }
+
+            cctx.database().checkpointReadLock();
+
+            try {
+                assert metaStorage != null && metaStorage.read(SNP_RUNNING_KEY) == null :
+                    "The previous snapshot hasn't been completed correctly";
+
+                metaStorage.write(SNP_RUNNING_KEY, snpName);
+
+                U.ensureDirectory(dbDir, "snapshot work directory", log);
+            }
+            catch (IgniteCheckedException e) {
+                throw new IgniteException(e);
+            }
+            finally {
+                cctx.database().checkpointReadUnlock();
+            }
+
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendCacheConfig0(File ccfg, String cacheDirName) {
+            assert dbDir != null;
+
+            try {
+                File cacheDir = U.resolveWorkDirectory(dbDir.getAbsolutePath(), cacheDirName, false);
+
+                copy(ccfg, new File(cacheDir, ccfg.getName()), ccfg.length());
+            }
+            catch (IgniteCheckedException | IOException e) {
+                throw new IgniteException(e);
+            }
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendMarshallerMeta0(List<Map<Integer, MappedName>> mappings) {
+            if (mappings == null)
+                return;
+
+            saveMappings(cctx.kernalContext(), mappings, snpLocDir);
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendBinaryMeta0(Collection<BinaryType> types) {
+            if (types == null)
+                return;
+
+            cctx.kernalContext().cacheObjects().saveMetadata(types, snpLocDir);
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendPart0(File part, String cacheDirName, GroupPartitionId pair, Long len) {
+            try {
+                if (len == 0)
+                    return;
+
+                File cacheDir = U.resolveWorkDirectory(dbDir.getAbsolutePath(), cacheDirName, false);
+
+                File snpPart = new File(cacheDir, part.getName());
+
+                if (!snpPart.exists() || snpPart.delete())
+                    snpPart.createNewFile();
+
+                copy(part, snpPart, len);
+
+                if (log.isInfoEnabled()) {
+                    log.info("Partition has been snapshot [snapshotDir=" + dbDir.getAbsolutePath() +
+                        ", cacheDirName=" + cacheDirName + ", part=" + part.getName() +
+                        ", length=" + part.length() + ", snapshot=" + snpPart.getName() + ']');
+                }
+            }
+            catch (IOException | IgniteCheckedException ex) {
+                throw new IgniteException(ex);
+            }
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendDelta0(File delta, String cacheDirName, GroupPartitionId pair) {
+            File snpPart = getPartitionFile(dbDir, cacheDirName, pair.getPartitionId());
+
+            if (log.isInfoEnabled()) {
+                log.info("Start partition snapshot recovery with the given delta page file [part=" + snpPart +
+                    ", delta=" + delta + ']');
+            }
+
+            try (FileIO fileIo = ioFactory.create(delta, READ);
+                 FilePageStore pageStore = (FilePageStore)storeFactory
+                     .apply(pair.getGroupId(), false)
+                     .createPageStore(getFlagByPartId(pair.getPartitionId()),
+                         snpPart::toPath,
+                         new LongAdderMetric("NO_OP", null))
+            ) {
+                ByteBuffer pageBuf = ByteBuffer.allocate(pageSize)
+                    .order(ByteOrder.nativeOrder());
+
+                long totalBytes = fileIo.size();
+
+                assert totalBytes % pageSize == 0 : "Given file with delta pages has incorrect size: " + fileIo.size();
+
+                pageStore.beginRecover();
+
+                for (long pos = 0; pos < totalBytes; pos += pageSize) {
+                    long read = fileIo.readFully(pageBuf, pos);
+
+                    assert read == pageBuf.capacity();
+
+                    pageBuf.flip();
+
+                    if (log.isDebugEnabled()) {
+                        log.debug("Read page given delta file [path=" + delta.getName() +
+                            ", pageId=" + PageIO.getPageId(pageBuf) + ", pos=" + pos + ", pages=" + (totalBytes / pageSize) +
+                            ", crcBuff=" + FastCrc.calcCrc(pageBuf, pageBuf.limit()) + ", crcPage=" + PageIO.getCrc(pageBuf) + ']');
+
+                        pageBuf.rewind();
+                    }
+
+                    pageStore.write(PageIO.getPageId(pageBuf), pageBuf, 0, false);
+
+                    pageBuf.flip();
+                }
+
+                pageStore.finishRecover();
+            }
+            catch (IOException | IgniteCheckedException e) {
+                throw new IgniteException(e);
+            }
+        }
+
+        /** {@inheritDoc} */
+        @Override protected void close0(@Nullable Throwable th) {
+            if (th == null) {
+                if (log.isInfoEnabled())
+                    log.info("Local snapshot sender closed, resources released [dbNodeSnpDir=" + dbDir + ']');
+            }
+            else {
+                deleteSnapshot(snpLocDir, pdsSettings.folderName());
+
+                U.warn(log, "Local snapshot sender closed due to an error occurred", th);
+            }
+        }
+
+        /**
+         * @param from Copy from file.
+         * @param to Copy data to file.
+         * @param length Number of bytes to copy from beginning.
+         * @throws IOException If fails.
+         */
+        private void copy(File from, File to, long length) throws IOException {
+            try (FileIO src = ioFactory.create(from, READ);
+                 FileChannel dest = new FileOutputStream(to).getChannel()) {
+                if (src.size() < length) {
+                    throw new IgniteException("The source file to copy has to enough length " +
+                        "[expected=" + length + ", actual=" + src.size() + ']');
+                }
+
+                src.position(0);
+
+                long written = 0;
+
+                while (written < length)
+                    written += src.transferTo(written, length - written, dest);
+            }
+        }
+    }
+
+    /** Snapshot start request for {@link DistributedProcess} initiate message. */
+    private static class SnapshotOperationRequest implements Serializable {
+        /** Serial version uid. */
+        private static final long serialVersionUID = 0L;
+
+        /** Unique snapshot request id. */
+        private final UUID rqId;
+
+        /** Source node id which trigger request. */
+        private final UUID srcNodeId;
+
+        /** Snapshot name. */
+        private final String snpName;
+
+        @GridToStringInclude
+        /** The list of cache groups to include into snapshot. */
+        private final List<Integer> grpIds;
+
+        @GridToStringInclude
+        /** The list of affected by snapshot operation baseline nodes. */
+        private final Set<UUID> bltNodes;
+
+        /** {@code true} if an execution of local snapshot tasks failed with an error. */
+        private volatile boolean hasErr;
+
+        /**
+         * @param snpName Snapshot name.
+         * @param grpIds Cache groups to include into snapshot.
+         */
+        public SnapshotOperationRequest(UUID rqId, UUID srcNodeId, String snpName, List<Integer> grpIds, Set<UUID> bltNodes) {
+            this.rqId = rqId;
+            this.srcNodeId = srcNodeId;
+            this.snpName = snpName;
+            this.grpIds = grpIds;
+            this.bltNodes = bltNodes;
+        }
+
+        /** {@inheritDoc} */
+        @Override public String toString() {
+            return S.toString(SnapshotOperationRequest.class, this);
+        }
+    }
+
+    /** */
+    private static class SnapshotOperationResponse implements Serializable {
+        /** Serial version uid. */
+        private static final long serialVersionUID = 0L;
+    }
+
+    /** Snapshot operation start message. */
+    private static class SnapshotStartDiscoveryMessage implements SnapshotDiscoveryMessage {
+        /** Serial version UID. */
+        private static final long serialVersionUID = 0L;
+
+        /** Discovery cache. */
+        private final DiscoCache discoCache;
+
+        /** Snapshot request id */
 
 Review comment:
   Point

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r408791018
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManager.java
 ##########
 @@ -0,0 +1,1986 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.channels.FileChannel;
+import java.nio.file.FileVisitResult;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.SimpleFileVisitor;
+import java.nio.file.attribute.BasicFileAttributes;
+import java.util.ArrayDeque;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Queue;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.Executor;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.concurrent.atomic.LongAdder;
+import java.util.function.BiConsumer;
+import java.util.function.BiFunction;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.events.DiscoveryEvent;
+import org.apache.ignite.failure.FailureContext;
+import org.apache.ignite.failure.FailureType;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.GridTopic;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.events.DiscoveryCustomEvent;
+import org.apache.ignite.internal.managers.communication.GridIoManager;
+import org.apache.ignite.internal.managers.communication.GridMessageListener;
+import org.apache.ignite.internal.managers.communication.TransmissionCancelledException;
+import org.apache.ignite.internal.managers.communication.TransmissionHandler;
+import org.apache.ignite.internal.managers.communication.TransmissionMeta;
+import org.apache.ignite.internal.managers.communication.TransmissionPolicy;
+import org.apache.ignite.internal.managers.discovery.DiscoCache;
+import org.apache.ignite.internal.managers.discovery.DiscoveryCustomMessage;
+import org.apache.ignite.internal.managers.discovery.GridDiscoveryManager;
+import org.apache.ignite.internal.managers.eventstorage.DiscoveryEventListener;
+import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.CacheType;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.PartitionsExchangeAware;
+import org.apache.ignite.internal.processors.cache.persistence.StorageException;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.filename.PdsFolderSettings;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.MetastorageLifecycleListener;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadOnlyMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadWriteMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.tree.io.PageIO;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.processors.marshaller.MappedName;
+import org.apache.ignite.internal.processors.metric.MetricRegistry;
+import org.apache.ignite.internal.processors.metric.impl.LongAdderMetric;
+import org.apache.ignite.internal.util.GridBusyLock;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.distributed.InitMessage;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.tostring.GridToStringInclude;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.A;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.S;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.lang.IgniteUuid;
+import org.apache.ignite.thread.IgniteThreadPoolExecutor;
+import org.apache.ignite.thread.OomExceptionHandler;
+import org.jetbrains.annotations.Nullable;
+
+import static java.nio.file.StandardOpenOption.READ;
+import static org.apache.ignite.cluster.ClusterState.active;
+import static org.apache.ignite.configuration.IgniteConfiguration.DFLT_SNAPSHOT_DIRECTORY;
+import static org.apache.ignite.events.EventType.EVT_NODE_FAILED;
+import static org.apache.ignite.events.EventType.EVT_NODE_LEFT;
+import static org.apache.ignite.internal.IgniteFeatures.PERSISTENCE_CACHE_SNAPSHOT;
+import static org.apache.ignite.internal.IgniteFeatures.nodeSupports;
+import static org.apache.ignite.internal.MarshallerContextImpl.mappingFileStoreWorkDir;
+import static org.apache.ignite.internal.MarshallerContextImpl.saveMappings;
+import static org.apache.ignite.internal.events.DiscoveryCustomEvent.EVT_DISCOVERY_CUSTOM_EVT;
+import static org.apache.ignite.internal.managers.communication.GridIoPolicy.SYSTEM_POOL;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.INDEX_PARTITION;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.MAX_PARTITION_ID;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.resolveBinaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.INDEX_FILE_NAME;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.PART_FILE_TEMPLATE;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFile;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFileName;
+import static org.apache.ignite.internal.processors.cache.persistence.filename.PdsConsistentIdProcessor.DB_DEFAULT_FOLDER;
+import static org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId.getFlagByPartId;
+import static org.apache.ignite.internal.util.IgniteUtils.isLocalNodeCoordinator;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.END_SNAPSHOT;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.START_SNAPSHOT;
+
+/**
+ * Internal implementation of snapshot operations over persistence caches.
+ * <p>
+ * There are two major actions available:
+ * <ul>
+ *     <li>Create snapshot of the whole cluster cache groups by triggering PME to achieve consistency.</li>
+ *     <li>Create local snapshot of requested cache groups and send it to the node which request this operation.
+ *     Cache groups will be transmitted using internal API for transferring files. See {@link TransmissionHandler}.</li>
+ * </ul>
+ */
+public class IgniteSnapshotManager extends GridCacheSharedManagerAdapter
+    implements IgniteSnapshot, PartitionsExchangeAware, MetastorageLifecycleListener {
+    /** File with delta pages suffix. */
+    public static final String DELTA_SUFFIX = ".delta";
+
+    /** File name template consists of delta pages. */
+    public static final String PART_DELTA_TEMPLATE = PART_FILE_TEMPLATE + DELTA_SUFFIX;
+
+    /** File name template for index delta pages. */
+    public static final String INDEX_DELTA_NAME = INDEX_FILE_NAME + DELTA_SUFFIX;
+
+    /** Text Reason for checkpoint to start snapshot operation. */
+    public static final String CP_SNAPSHOT_REASON = "Checkpoint started to enforce snapshot operation: %s";
+
+    /** Name prefix for each remote snapshot operation. */
+    public static final String RMT_SNAPSHOT_PREFIX = "snapshot_";
+
+    /** Default snapshot directory for loading remote snapshots. */
+    public static final String DFLT_SNAPSHOT_TMP_DIR = "snp";
+
+    /** Timeout in millisecond for snapshot operations. */
+    public static final long DFLT_SNAPSHOT_TIMEOUT = 15_000L;
+
+    /** Snapshot in progress error message. */
+    public static final String SNP_IN_PROGRESS_ERR_MSG = "Operation rejected due to the snapshot operation in progress.";
+
+    /** Error message to finalize snapshot tasks. */
+    public static final String SNP_NODE_STOPPING_ERR_MSG = "Snapshot has been cancelled due to the local node " +
+        "is stopping";
+
+    /** Metastorage key to save currently running snapshot. */
+    public static final String SNP_RUNNING_KEY = "snapshot-running";
+
+    /** Snapshot metrics prefix. */
+    public static final String SNAPSHOT_METRICS = "snapshot";
+
+    /** Prefix for snapshot threads. */
+    private static final String SNAPSHOT_RUNNER_THREAD_PREFIX = "snapshot-runner";
+
+    /** Total number of thread to perform local snapshot. */
+    private static final int SNAPSHOT_THREAD_POOL_SIZE = 4;
+
+    /** Default snapshot topic to receive snapshots from remote node. */
+    private static final Object DFLT_INITIAL_SNAPSHOT_TOPIC = GridTopic.TOPIC_SNAPSHOT.topic("rmt_snp");
+
+    /** File transmission parameter of cache group id. */
+    private static final String SNP_GRP_ID_PARAM = "grpId";
+
+    /** File transmission parameter of cache partition id. */
+    private static final String SNP_PART_ID_PARAM = "partId";
+
+    /** File transmission parameter of node-sender directory path with its consistentId (e.g. db/IgniteNode0). */
+    private static final String SNP_DB_NODE_PATH_PARAM = "dbNodePath";
+
+    /** File transmission parameter of a cache directory with is currently sends its partitions. */
+    private static final String SNP_CACHE_DIR_NAME_PARAM = "cacheDirName";
+
+    /** Snapshot parameter name for a file transmission. */
+    private static final String SNP_NAME_PARAM = "snpName";
+
+    /** Total snapshot files count which receiver should expect to receive. */
+    private static final String SNP_PARTITIONS_CNT = "partsCnt";
+
+    /**
+     * Local buffer to perform copy-on-write operations with pages for {@code SnapshotFutureTask.PageStoreSerialWriter}s.
+     * It is important to have only only buffer per thread (instead of creating each buffer per
+     * each {@code SnapshotFutureTask.PageStoreSerialWriter}) this is redundant and can lead to OOM errors. Direct buffer
+     * deallocate only when ByteBuffer is garbage collected, but it can get out of off-heap memory before it.
+     */
+    private final ThreadLocal<ByteBuffer> locBuff;
+
+    /** Map of registered cache snapshot processes and their corresponding contexts. */
+    private final ConcurrentMap<String, SnapshotFutureTask> locSnpTasks = new ConcurrentHashMap<>();
+
+    /** Lock to protect the resources is used. */
+    private final GridBusyLock busyLock = new GridBusyLock();
+
+    /** Requested snapshot from remote node. */
+    private final AtomicReference<RemoteSnapshotFuture> rmtSnpReq = new AtomicReference<>();
+
+    /** Mutex used to order cluster snapshot operation progress. */
+    private final Object snpOpMux = new Object();
+
+    /** Take snapshot operation procedure. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> startSnpProc;
+
+    /** Check previously performed snapshot operation and delete uncompleted files if need. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> endSnpProc;
+
+    /** Resolved persistent data storage settings. */
+    private volatile PdsFolderSettings pdsSettings;
+
+    /** Fully initialized metastorage. */
+    private volatile ReadWriteMetastorage metaStorage;
+
+    /** Local snapshot sender factory. */
+    private Function<String, SnapshotSender> locSndrFactory = localSnapshotSenderFactory();
+
+    /** Main snapshot directory to save created snapshots. */
+    private volatile File locSnpDir;
+
+    /**
+     * Working directory for loaded snapshots from the remote nodes and storing
+     * temporary partition delta-files of locally started snapshot process.
+     */
+    private File tmpWorkDir;
+
+    /** Factory to working with delta as file storage. */
+    private volatile FileIOFactory ioFactory = new RandomAccessFileIOFactory();
+
+    /** Factory to create page store for restore. */
+    private volatile BiFunction<Integer, Boolean, FilePageStoreFactory> storeFactory;
+
+    /** Snapshot thread pool to perform local partition snapshots. */
+    private ExecutorService snpRunner;
+
+    /** System discovery message listener. */
+    private DiscoveryEventListener discoLsnr;
+
+    /** Cluster snapshot operation requested by user. */
+    private ClusterSnapshotFuture clusterSnpFut;
+
+    /** Current snapshot operation on local node. */
+    private volatile SnapshotOperationRequest clusterSnpRq;
+
+    /** {@code true} if recovery process occurred for snapshot. */
+    private volatile boolean recovered;
+
+    /** Last seen cluster snapshot operation. */
+    private volatile ClusterSnapshotFuture lastSeenSnpFut = new ClusterSnapshotFuture();
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public IgniteSnapshotManager(GridKernalContext ctx) {
+        locBuff = ThreadLocal.withInitial(() ->
+            ByteBuffer.allocateDirect(ctx.config().getDataStorageConfiguration().getPageSize())
+                .order(ByteOrder.nativeOrder()));
+
+        startSnpProc = new DistributedProcess<>(ctx, START_SNAPSHOT, this::initLocalSnapshotStartStage,
+            this::processLocalSnapshotStartStageResult);
+
+        endSnpProc = new DistributedProcess<>(ctx, END_SNAPSHOT, this::initLocalSnapshotEndStage,
+            this::processLocalSnapshotEndStageResult);
+    }
+
+    /**
+     * @param snapshotCacheDir Snapshot directory to store files.
+     * @param partId Cache partition identifier.
+     * @return A file representation.
+     */
+    public static File partDeltaFile(File snapshotCacheDir, int partId) {
+        return new File(snapshotCacheDir, partDeltaFileName(partId));
+    }
+
+    /**
+     * @param partId Partition id.
+     * @return File name of delta partition pages.
+     */
+    public static String partDeltaFileName(int partId) {
+        assert partId <= MAX_PARTITION_ID || partId == INDEX_PARTITION;
+
+        return partId == INDEX_PARTITION ? INDEX_DELTA_NAME : String.format(PART_DELTA_TEMPLATE, partId);
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void start0() throws IgniteCheckedException {
+        super.start0();
+
+        GridKernalContext ctx = cctx.kernalContext();
+
+        if (ctx.clientNode())
+            return;
+
+        if (!CU.isPersistenceEnabled(ctx.config()))
+            return;
+
+        snpRunner = new IgniteThreadPoolExecutor(SNAPSHOT_RUNNER_THREAD_PREFIX,
+            cctx.igniteInstanceName(),
+            SNAPSHOT_THREAD_POOL_SIZE,
+            SNAPSHOT_THREAD_POOL_SIZE,
+            IgniteConfiguration.DFLT_THREAD_KEEP_ALIVE_TIME,
+            new LinkedBlockingQueue<>(),
+            SYSTEM_POOL,
+            new OomExceptionHandler(ctx));
+
+        assert cctx.pageStore() instanceof FilePageStoreManager;
+
+        FilePageStoreManager storeMgr = (FilePageStoreManager)cctx.pageStore();
+
+        pdsSettings = cctx.kernalContext().pdsFolderResolver().resolveFolders();
+
+        locSnpDir = resolveSnapshotWorkDirectory(ctx.config());
+        tmpWorkDir = Paths.get(storeMgr.workDir().getAbsolutePath(), DFLT_SNAPSHOT_TMP_DIR).toFile();
+
+        U.ensureDirectory(locSnpDir, "snapshot work directory", log);
+        U.ensureDirectory(tmpWorkDir, "temp directory for snapshot creation", log);
+
+        MetricRegistry mreg = cctx.kernalContext().metric().registry(SNAPSHOT_METRICS);
+
+        mreg.register("LastSnapshotStartTime", () -> lastSeenSnpFut.startTime,
+            "The system time approximated by 10 ms when the last cluster snapshot operation has been started.");
 
 Review comment:
   We should state in the description (for all metrics) that we are talking about snapshots started by this node. Or change implementation to show snapshots started by any node.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r408803204
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManager.java
 ##########
 @@ -0,0 +1,1986 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.channels.FileChannel;
+import java.nio.file.FileVisitResult;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.SimpleFileVisitor;
+import java.nio.file.attribute.BasicFileAttributes;
+import java.util.ArrayDeque;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Queue;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.Executor;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.concurrent.atomic.LongAdder;
+import java.util.function.BiConsumer;
+import java.util.function.BiFunction;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.events.DiscoveryEvent;
+import org.apache.ignite.failure.FailureContext;
+import org.apache.ignite.failure.FailureType;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.GridTopic;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.events.DiscoveryCustomEvent;
+import org.apache.ignite.internal.managers.communication.GridIoManager;
+import org.apache.ignite.internal.managers.communication.GridMessageListener;
+import org.apache.ignite.internal.managers.communication.TransmissionCancelledException;
+import org.apache.ignite.internal.managers.communication.TransmissionHandler;
+import org.apache.ignite.internal.managers.communication.TransmissionMeta;
+import org.apache.ignite.internal.managers.communication.TransmissionPolicy;
+import org.apache.ignite.internal.managers.discovery.DiscoCache;
+import org.apache.ignite.internal.managers.discovery.DiscoveryCustomMessage;
+import org.apache.ignite.internal.managers.discovery.GridDiscoveryManager;
+import org.apache.ignite.internal.managers.eventstorage.DiscoveryEventListener;
+import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.CacheType;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.PartitionsExchangeAware;
+import org.apache.ignite.internal.processors.cache.persistence.StorageException;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.filename.PdsFolderSettings;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.MetastorageLifecycleListener;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadOnlyMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadWriteMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.tree.io.PageIO;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.processors.marshaller.MappedName;
+import org.apache.ignite.internal.processors.metric.MetricRegistry;
+import org.apache.ignite.internal.processors.metric.impl.LongAdderMetric;
+import org.apache.ignite.internal.util.GridBusyLock;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.distributed.InitMessage;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.tostring.GridToStringInclude;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.A;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.S;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.lang.IgniteUuid;
+import org.apache.ignite.thread.IgniteThreadPoolExecutor;
+import org.apache.ignite.thread.OomExceptionHandler;
+import org.jetbrains.annotations.Nullable;
+
+import static java.nio.file.StandardOpenOption.READ;
+import static org.apache.ignite.cluster.ClusterState.active;
+import static org.apache.ignite.configuration.IgniteConfiguration.DFLT_SNAPSHOT_DIRECTORY;
+import static org.apache.ignite.events.EventType.EVT_NODE_FAILED;
+import static org.apache.ignite.events.EventType.EVT_NODE_LEFT;
+import static org.apache.ignite.internal.IgniteFeatures.PERSISTENCE_CACHE_SNAPSHOT;
+import static org.apache.ignite.internal.IgniteFeatures.nodeSupports;
+import static org.apache.ignite.internal.MarshallerContextImpl.mappingFileStoreWorkDir;
+import static org.apache.ignite.internal.MarshallerContextImpl.saveMappings;
+import static org.apache.ignite.internal.events.DiscoveryCustomEvent.EVT_DISCOVERY_CUSTOM_EVT;
+import static org.apache.ignite.internal.managers.communication.GridIoPolicy.SYSTEM_POOL;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.INDEX_PARTITION;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.MAX_PARTITION_ID;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.resolveBinaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.INDEX_FILE_NAME;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.PART_FILE_TEMPLATE;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFile;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFileName;
+import static org.apache.ignite.internal.processors.cache.persistence.filename.PdsConsistentIdProcessor.DB_DEFAULT_FOLDER;
+import static org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId.getFlagByPartId;
+import static org.apache.ignite.internal.util.IgniteUtils.isLocalNodeCoordinator;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.END_SNAPSHOT;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.START_SNAPSHOT;
+
+/**
+ * Internal implementation of snapshot operations over persistence caches.
+ * <p>
+ * There are two major actions available:
+ * <ul>
+ *     <li>Create snapshot of the whole cluster cache groups by triggering PME to achieve consistency.</li>
+ *     <li>Create local snapshot of requested cache groups and send it to the node which request this operation.
+ *     Cache groups will be transmitted using internal API for transferring files. See {@link TransmissionHandler}.</li>
+ * </ul>
+ */
+public class IgniteSnapshotManager extends GridCacheSharedManagerAdapter
+    implements IgniteSnapshot, PartitionsExchangeAware, MetastorageLifecycleListener {
+    /** File with delta pages suffix. */
+    public static final String DELTA_SUFFIX = ".delta";
+
+    /** File name template consists of delta pages. */
+    public static final String PART_DELTA_TEMPLATE = PART_FILE_TEMPLATE + DELTA_SUFFIX;
+
+    /** File name template for index delta pages. */
+    public static final String INDEX_DELTA_NAME = INDEX_FILE_NAME + DELTA_SUFFIX;
+
+    /** Text Reason for checkpoint to start snapshot operation. */
+    public static final String CP_SNAPSHOT_REASON = "Checkpoint started to enforce snapshot operation: %s";
+
+    /** Name prefix for each remote snapshot operation. */
+    public static final String RMT_SNAPSHOT_PREFIX = "snapshot_";
+
+    /** Default snapshot directory for loading remote snapshots. */
+    public static final String DFLT_SNAPSHOT_TMP_DIR = "snp";
+
+    /** Timeout in millisecond for snapshot operations. */
+    public static final long DFLT_SNAPSHOT_TIMEOUT = 15_000L;
+
+    /** Snapshot in progress error message. */
+    public static final String SNP_IN_PROGRESS_ERR_MSG = "Operation rejected due to the snapshot operation in progress.";
+
+    /** Error message to finalize snapshot tasks. */
+    public static final String SNP_NODE_STOPPING_ERR_MSG = "Snapshot has been cancelled due to the local node " +
+        "is stopping";
+
+    /** Metastorage key to save currently running snapshot. */
+    public static final String SNP_RUNNING_KEY = "snapshot-running";
+
+    /** Snapshot metrics prefix. */
+    public static final String SNAPSHOT_METRICS = "snapshot";
+
+    /** Prefix for snapshot threads. */
+    private static final String SNAPSHOT_RUNNER_THREAD_PREFIX = "snapshot-runner";
+
+    /** Total number of thread to perform local snapshot. */
+    private static final int SNAPSHOT_THREAD_POOL_SIZE = 4;
+
+    /** Default snapshot topic to receive snapshots from remote node. */
+    private static final Object DFLT_INITIAL_SNAPSHOT_TOPIC = GridTopic.TOPIC_SNAPSHOT.topic("rmt_snp");
+
+    /** File transmission parameter of cache group id. */
+    private static final String SNP_GRP_ID_PARAM = "grpId";
+
+    /** File transmission parameter of cache partition id. */
+    private static final String SNP_PART_ID_PARAM = "partId";
+
+    /** File transmission parameter of node-sender directory path with its consistentId (e.g. db/IgniteNode0). */
+    private static final String SNP_DB_NODE_PATH_PARAM = "dbNodePath";
+
+    /** File transmission parameter of a cache directory with is currently sends its partitions. */
+    private static final String SNP_CACHE_DIR_NAME_PARAM = "cacheDirName";
+
+    /** Snapshot parameter name for a file transmission. */
+    private static final String SNP_NAME_PARAM = "snpName";
+
+    /** Total snapshot files count which receiver should expect to receive. */
+    private static final String SNP_PARTITIONS_CNT = "partsCnt";
+
+    /**
+     * Local buffer to perform copy-on-write operations with pages for {@code SnapshotFutureTask.PageStoreSerialWriter}s.
+     * It is important to have only only buffer per thread (instead of creating each buffer per
+     * each {@code SnapshotFutureTask.PageStoreSerialWriter}) this is redundant and can lead to OOM errors. Direct buffer
+     * deallocate only when ByteBuffer is garbage collected, but it can get out of off-heap memory before it.
+     */
+    private final ThreadLocal<ByteBuffer> locBuff;
+
+    /** Map of registered cache snapshot processes and their corresponding contexts. */
+    private final ConcurrentMap<String, SnapshotFutureTask> locSnpTasks = new ConcurrentHashMap<>();
+
+    /** Lock to protect the resources is used. */
+    private final GridBusyLock busyLock = new GridBusyLock();
+
+    /** Requested snapshot from remote node. */
+    private final AtomicReference<RemoteSnapshotFuture> rmtSnpReq = new AtomicReference<>();
+
+    /** Mutex used to order cluster snapshot operation progress. */
+    private final Object snpOpMux = new Object();
+
+    /** Take snapshot operation procedure. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> startSnpProc;
+
+    /** Check previously performed snapshot operation and delete uncompleted files if need. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> endSnpProc;
+
+    /** Resolved persistent data storage settings. */
+    private volatile PdsFolderSettings pdsSettings;
+
+    /** Fully initialized metastorage. */
+    private volatile ReadWriteMetastorage metaStorage;
+
+    /** Local snapshot sender factory. */
+    private Function<String, SnapshotSender> locSndrFactory = localSnapshotSenderFactory();
+
+    /** Main snapshot directory to save created snapshots. */
+    private volatile File locSnpDir;
+
+    /**
+     * Working directory for loaded snapshots from the remote nodes and storing
+     * temporary partition delta-files of locally started snapshot process.
+     */
+    private File tmpWorkDir;
+
+    /** Factory to working with delta as file storage. */
+    private volatile FileIOFactory ioFactory = new RandomAccessFileIOFactory();
+
+    /** Factory to create page store for restore. */
+    private volatile BiFunction<Integer, Boolean, FilePageStoreFactory> storeFactory;
+
+    /** Snapshot thread pool to perform local partition snapshots. */
+    private ExecutorService snpRunner;
+
+    /** System discovery message listener. */
+    private DiscoveryEventListener discoLsnr;
+
+    /** Cluster snapshot operation requested by user. */
+    private ClusterSnapshotFuture clusterSnpFut;
+
+    /** Current snapshot operation on local node. */
+    private volatile SnapshotOperationRequest clusterSnpRq;
 
 Review comment:
   `Rq` is not a common abbreviation for Ignite, use `Req` (`snpRq` should be renamed too)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r410170795
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManager.java
 ##########
 @@ -0,0 +1,1944 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.channels.FileChannel;
+import java.nio.file.FileVisitResult;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.SimpleFileVisitor;
+import java.nio.file.attribute.BasicFileAttributes;
+import java.util.ArrayDeque;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Queue;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.Executor;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.concurrent.atomic.LongAdder;
+import java.util.function.BiConsumer;
+import java.util.function.BiFunction;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.events.DiscoveryEvent;
+import org.apache.ignite.failure.FailureContext;
+import org.apache.ignite.failure.FailureType;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.GridTopic;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.events.DiscoveryCustomEvent;
+import org.apache.ignite.internal.managers.communication.GridIoManager;
+import org.apache.ignite.internal.managers.communication.GridMessageListener;
+import org.apache.ignite.internal.managers.communication.TransmissionCancelledException;
+import org.apache.ignite.internal.managers.communication.TransmissionHandler;
+import org.apache.ignite.internal.managers.communication.TransmissionMeta;
+import org.apache.ignite.internal.managers.communication.TransmissionPolicy;
+import org.apache.ignite.internal.managers.eventstorage.DiscoveryEventListener;
+import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.CacheType;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.PartitionsExchangeAware;
+import org.apache.ignite.internal.processors.cache.persistence.StorageException;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.filename.PdsFolderSettings;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.MetastorageLifecycleListener;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadOnlyMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadWriteMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.tree.io.PageIO;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.processors.marshaller.MappedName;
+import org.apache.ignite.internal.processors.metric.MetricRegistry;
+import org.apache.ignite.internal.processors.metric.impl.LongAdderMetric;
+import org.apache.ignite.internal.util.GridBusyLock;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.distributed.InitMessage;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.lang.GridClosureException;
+import org.apache.ignite.internal.util.tostring.GridToStringInclude;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.A;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.S;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.thread.IgniteThreadPoolExecutor;
+import org.apache.ignite.thread.OomExceptionHandler;
+import org.jetbrains.annotations.Nullable;
+
+import static java.nio.file.StandardOpenOption.READ;
+import static org.apache.ignite.cluster.ClusterState.active;
+import static org.apache.ignite.events.EventType.EVT_NODE_FAILED;
+import static org.apache.ignite.events.EventType.EVT_NODE_LEFT;
+import static org.apache.ignite.internal.IgniteFeatures.PERSISTENCE_CACHE_SNAPSHOT;
+import static org.apache.ignite.internal.IgniteFeatures.nodeSupports;
+import static org.apache.ignite.internal.MarshallerContextImpl.mappingFileStoreWorkDir;
+import static org.apache.ignite.internal.MarshallerContextImpl.saveMappings;
+import static org.apache.ignite.internal.events.DiscoveryCustomEvent.EVT_DISCOVERY_CUSTOM_EVT;
+import static org.apache.ignite.internal.managers.communication.GridIoPolicy.SYSTEM_POOL;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.INDEX_PARTITION;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.MAX_PARTITION_ID;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.resolveBinaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.INDEX_FILE_NAME;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.PART_FILE_TEMPLATE;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFile;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFileName;
+import static org.apache.ignite.internal.processors.cache.persistence.filename.PdsConsistentIdProcessor.DB_DEFAULT_FOLDER;
+import static org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId.getFlagByPartId;
+import static org.apache.ignite.internal.util.IgniteUtils.isLocalNodeCoordinator;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.END_SNAPSHOT;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.START_SNAPSHOT;
+
+/**
+ * Internal implementation of snapshot operations over persistence caches.
+ * <p>
+ * There are two major actions available:
+ * <ul>
+ *     <li>Create snapshot of the whole cluster cache groups by triggering PME to achieve consistency.</li>
+ *     <li>Create local snapshot of requested cache groups and send it to the node which request this operation.
+ *     Cache groups will be transmitted using internal API for transferring files. See {@link TransmissionHandler}.</li>
+ * </ul>
+ */
+public class IgniteSnapshotManager extends GridCacheSharedManagerAdapter
+    implements IgniteSnapshot, PartitionsExchangeAware, MetastorageLifecycleListener {
+    /** File with delta pages suffix. */
+    public static final String DELTA_SUFFIX = ".delta";
+
+    /** File name template consists of delta pages. */
+    public static final String PART_DELTA_TEMPLATE = PART_FILE_TEMPLATE + DELTA_SUFFIX;
+
+    /** File name template for index delta pages. */
+    public static final String INDEX_DELTA_NAME = INDEX_FILE_NAME + DELTA_SUFFIX;
+
+    /** Text Reason for checkpoint to start snapshot operation. */
+    public static final String CP_SNAPSHOT_REASON = "Checkpoint started to enforce snapshot operation: %s";
+
+    /** Name prefix for each remote snapshot operation. */
+    public static final String RMT_SNAPSHOT_PREFIX = "snapshot_";
+
+    /** Default snapshot directory for loading remote snapshots. */
+    public static final String DFLT_SNAPSHOT_TMP_DIR = "snp";
+
+    /** Timeout in millisecond for snapshot operations. */
+    public static final long DFLT_SNAPSHOT_TIMEOUT = 15_000L;
+
+    /** Snapshot in progress error message. */
+    public static final String SNP_IN_PROGRESS_ERR_MSG = "Operation rejected due to the snapshot operation in progress.";
+
+    /** Error message to finalize snapshot tasks. */
+    public static final String SNP_NODE_STOPPING_ERR_MSG = "Snapshot has been cancelled due to the local node " +
+        "is stopping";
+
+    /** Metastorage key to save currently running snapshot. */
+    public static final String SNP_RUNNING_KEY = "snapshot-running";
+
+    /** Snapshot metrics prefix. */
+    public static final String SNAPSHOT_METRICS = "snapshot";
+
+    /** Prefix for snapshot threads. */
+    private static final String SNAPSHOT_RUNNER_THREAD_PREFIX = "snapshot-runner";
+
+    /** Total number of thread to perform local snapshot. */
+    private static final int SNAPSHOT_THREAD_POOL_SIZE = 4;
+
+    /** Default snapshot topic to receive snapshots from remote node. */
+    private static final Object DFLT_INITIAL_SNAPSHOT_TOPIC = GridTopic.TOPIC_SNAPSHOT.topic("rmt_snp");
+
+    /** File transmission parameter of cache group id. */
+    private static final String SNP_GRP_ID_PARAM = "grpId";
+
+    /** File transmission parameter of cache partition id. */
+    private static final String SNP_PART_ID_PARAM = "partId";
+
+    /** File transmission parameter of node-sender directory path with its consistentId (e.g. db/IgniteNode0). */
+    private static final String SNP_DB_NODE_PATH_PARAM = "dbNodePath";
+
+    /** File transmission parameter of a cache directory with is currently sends its partitions. */
+    private static final String SNP_CACHE_DIR_NAME_PARAM = "cacheDirName";
+
+    /** Snapshot parameter name for a file transmission. */
+    private static final String SNP_NAME_PARAM = "snpName";
+
+    /** Total snapshot files count which receiver should expect to receive. */
+    private static final String SNP_PARTITIONS_CNT = "partsCnt";
+
+    /**
+     * Local buffer to perform copy-on-write operations with pages for {@code SnapshotFutureTask.PageStoreSerialWriter}s.
+     * It is important to have only only buffer per thread (instead of creating each buffer per
+     * each {@code SnapshotFutureTask.PageStoreSerialWriter}) this is redundant and can lead to OOM errors. Direct buffer
+     * deallocate only when ByteBuffer is garbage collected, but it can get out of off-heap memory before it.
+     */
+    private final ThreadLocal<ByteBuffer> locBuff;
+
+    /** Map of registered cache snapshot processes and their corresponding contexts. */
+    private final ConcurrentMap<String, SnapshotFutureTask> locSnpTasks = new ConcurrentHashMap<>();
+
+    /** Lock to protect the resources is used. */
+    private final GridBusyLock busyLock = new GridBusyLock();
+
+    /** Requested snapshot from remote node. */
+    private final AtomicReference<RemoteSnapshotFuture> rmtSnpReq = new AtomicReference<>();
+
+    /** Mutex used to order cluster snapshot operation progress. */
+    private final Object snpOpMux = new Object();
+
+    /** Take snapshot operation procedure. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> startSnpProc;
+
+    /** Check previously performed snapshot operation and delete uncompleted files if need. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> endSnpProc;
+
+    /** Resolved persistent data storage settings. */
+    private volatile PdsFolderSettings pdsSettings;
+
+    /** Fully initialized metastorage. */
+    private volatile ReadWriteMetastorage metaStorage;
+
+    /** Local snapshot sender factory. */
+    private Function<String, SnapshotSender> locSndrFactory = LocalSnapshotSender::new;
+
+    /** Main snapshot directory to save created snapshots. */
+    private volatile File locSnpDir;
+
+    /**
+     * Working directory for loaded snapshots from the remote nodes and storing
+     * temporary partition delta-files of locally started snapshot process.
+     */
+    private File tmpWorkDir;
+
+    /** Factory to working with delta as file storage. */
+    private volatile FileIOFactory ioFactory = new RandomAccessFileIOFactory();
+
+    /** Factory to create page store for restore. */
+    private volatile BiFunction<Integer, Boolean, FilePageStoreFactory> storeFactory;
+
+    /** Snapshot thread pool to perform local partition snapshots. */
+    private ExecutorService snpRunner;
+
+    /** System discovery message listener. */
+    private DiscoveryEventListener discoLsnr;
+
+    /** Cluster snapshot operation requested by user. */
+    private ClusterSnapshotFuture clusterSnpFut;
+
+    /** Current snapshot operation on local node. */
+    private volatile SnapshotOperationRequest clusterSnpReq;
+
+    /** {@code true} if recovery process occurred for snapshot. */
+    private volatile boolean recovered;
+
+    /** Last seen cluster snapshot operation. */
+    private volatile ClusterSnapshotFuture lastSeenSnpFut = new ClusterSnapshotFuture();
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public IgniteSnapshotManager(GridKernalContext ctx) {
+        locBuff = ThreadLocal.withInitial(() ->
+            ByteBuffer.allocateDirect(ctx.config().getDataStorageConfiguration().getPageSize())
+                .order(ByteOrder.nativeOrder()));
+
+        startSnpProc = new DistributedProcess<>(ctx, START_SNAPSHOT, this::initLocalSnapshotStartStage,
+            this::processLocalSnapshotStartStageResult, SnapshotStartDiscoveryMessage::new);
+
+        endSnpProc = new DistributedProcess<>(ctx, END_SNAPSHOT, this::initLocalSnapshotEndStage,
+            this::processLocalSnapshotEndStageResult);
+    }
+
+    /**
+     * @param snapshotCacheDir Snapshot directory to store files.
+     * @param partId Cache partition identifier.
+     * @return A file representation.
+     */
+    public static File partDeltaFile(File snapshotCacheDir, int partId) {
+        return new File(snapshotCacheDir, partDeltaFileName(partId));
+    }
+
+    /**
+     * @param partId Partition id.
+     * @return File name of delta partition pages.
+     */
+    public static String partDeltaFileName(int partId) {
+        assert partId <= MAX_PARTITION_ID || partId == INDEX_PARTITION;
+
+        return partId == INDEX_PARTITION ? INDEX_DELTA_NAME : String.format(PART_DELTA_TEMPLATE, partId);
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void start0() throws IgniteCheckedException {
+        super.start0();
+
+        GridKernalContext ctx = cctx.kernalContext();
+
+        if (ctx.clientNode())
+            return;
+
+        if (!CU.isPersistenceEnabled(ctx.config()))
+            return;
+
+        snpRunner = new IgniteThreadPoolExecutor(SNAPSHOT_RUNNER_THREAD_PREFIX,
+            cctx.igniteInstanceName(),
+            SNAPSHOT_THREAD_POOL_SIZE,
+            SNAPSHOT_THREAD_POOL_SIZE,
+            IgniteConfiguration.DFLT_THREAD_KEEP_ALIVE_TIME,
+            new LinkedBlockingQueue<>(),
+            SYSTEM_POOL,
+            new OomExceptionHandler(ctx));
+
+        assert cctx.pageStore() instanceof FilePageStoreManager;
+
+        FilePageStoreManager storeMgr = (FilePageStoreManager)cctx.pageStore();
+
+        pdsSettings = cctx.kernalContext().pdsFolderResolver().resolveFolders();
+
+        locSnpDir = resolveSnapshotWorkDirectory(ctx.config());
+        tmpWorkDir = Paths.get(storeMgr.workDir().getAbsolutePath(), DFLT_SNAPSHOT_TMP_DIR).toFile();
+
+        U.ensureDirectory(locSnpDir, "snapshot work directory", log);
+        U.ensureDirectory(tmpWorkDir, "temp directory for snapshot creation", log);
+
+        MetricRegistry mreg = cctx.kernalContext().metric().registry(SNAPSHOT_METRICS);
+
+        mreg.register("LastSnapshotStartTime", () -> lastSeenSnpFut.startTime,
+            "The system time of the last cluster snapshot request start time on this node.");
+        mreg.register("LastSnapshotEndTime", () -> lastSeenSnpFut.endTime,
+            "The system time of the last cluster snapshot request end time on this node.");
+        mreg.register("LastSnapshotName", () -> lastSeenSnpFut.name, String.class,
+            "The name of last started cluster snapshot request on this node.");
+        mreg.register("LastSnapshotErrorMessage",
+            () -> lastSeenSnpFut.error() == null ? null : lastSeenSnpFut.error().getMessage(),
+            String.class,
+            "The error message of last started cluster snapshot request which fail with an error. " +
+                "This value will be 'null' if last snapshot request has been completed successfully.");
+        mreg.register("LocalSnapshotList", this::getSnapshots, List.class,
+            "The list of names of all snapshots currently saved on the local node with respect to " +
+                "the configured via IgniteConfiguration snapshot working path.");
+
+        storeFactory = storeMgr::getPageStoreFactory;
+
+        cctx.exchange().registerExchangeAwareComponent(this);
+        ctx.internalSubscriptionProcessor().registerMetastorageListener(this);
+
+        // Receive remote snapshots requests.
+        cctx.gridIO().addMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC, new GridMessageListener() {
+            @Override public void onMessage(UUID nodeId, Object msg, byte plc) {
+                if (!busyLock.enterBusy())
+                    return;
+
+                try {
+                    if (msg instanceof SnapshotRequestMessage) {
+                        SnapshotRequestMessage reqMsg0 = (SnapshotRequestMessage)msg;
+                        String snpName = reqMsg0.snapshotName();
+
+                        synchronized (this) {
+                            SnapshotFutureTask task = lastScheduledRemoteSnapshotTask(nodeId);
+
+                            if (task != null) {
+                                // Task will also be removed from local map due to the listener on future done.
+                                task.cancel();
+
+                                log.info("Snapshot request has been cancelled due to another request received " +
+                                    "[prevSnpResp=" + task + ", msg0=" + reqMsg0 + ']');
+                            }
+                        }
+
+                        SnapshotFutureTask task = registerSnapshotTask(snpName,
+                            nodeId,
+                            reqMsg0.parts(),
+                            remoteSnapshotSender(snpName, nodeId));
+
+                        task.listen(f -> {
+                            if (f.error() == null)
+                                return;
+
+                            U.error(log, "Failed to process request of creating a snapshot " +
+                                "[from=" + nodeId + ", msg=" + reqMsg0 + ']', f.error());
+
+                            try {
+                                cctx.gridIO().sendToCustomTopic(nodeId,
+                                    DFLT_INITIAL_SNAPSHOT_TOPIC,
+                                    new SnapshotResponseMessage(reqMsg0.snapshotName(), f.error().getMessage()),
+                                    SYSTEM_POOL);
+                            }
+                            catch (IgniteCheckedException ex0) {
+                                U.error(log, "Fail to send the response message with processing snapshot request " +
+                                    "error [request=" + reqMsg0 + ", nodeId=" + nodeId + ']', ex0);
+                            }
+                        });
+
+                        task.start();
+                    }
+                    else if (msg instanceof SnapshotResponseMessage) {
+                        SnapshotResponseMessage respMsg0 = (SnapshotResponseMessage)msg;
+
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.snpName.equals(respMsg0.snapshotName())) {
+                            if (log.isInfoEnabled()) {
+                                log.info("A stale snapshot response message has been received. Will be ignored " +
+                                    "[fromNodeId=" + nodeId + ", response=" + respMsg0 + ']');
+                            }
+
+                            return;
+                        }
+
+                        if (respMsg0.errorMessage() != null) {
+                            fut0.onDone(new IgniteCheckedException("Request cancelled. The snapshot operation stopped " +
+                                "on the remote node with an error: " + respMsg0.errorMessage()));
+                        }
+                    }
+                }
+                catch (Throwable e) {
+                    U.error(log, "Processing snapshot request from remote node fails with an error", e);
+
+                    cctx.kernalContext().failure().process(new FailureContext(FailureType.CRITICAL_ERROR, e));
+                }
+                finally {
+                    busyLock.leaveBusy();
+                }
+            }
+        });
+
+        cctx.gridEvents().addDiscoveryEventListener(discoLsnr = (evt, discoCache) -> {
+            if (!busyLock.enterBusy())
+                return;
+
+            try {
+                UUID leftNodeId = evt.eventNode().id();
+
+                if (evt.type() == EVT_NODE_LEFT || evt.type() == EVT_NODE_FAILED) {
+                    SnapshotOperationRequest snpReq = clusterSnpReq;
+
+                    for (SnapshotFutureTask sctx : locSnpTasks.values()) {
+                        if (sctx.sourceNodeId().equals(leftNodeId) ||
+                            (snpReq != null &&
+                                snpReq.snpName.equals(sctx.snapshotName()) &&
+                                snpReq.bltNodes.contains(leftNodeId))) {
+                            sctx.acceptException(new ClusterTopologyCheckedException("Snapshot operation interrupted. " +
+                                "One of baseline nodes left the cluster: " + leftNodeId));
+                        }
+                    }
+
+                    RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                    if (snpTrFut != null && snpTrFut.rmtNodeId.equals(leftNodeId)) {
+                        snpTrFut.onDone(new ClusterTopologyCheckedException("The node from which a snapshot has been " +
+                            "requested left the grid"));
+                    }
+                }
+            }
+            finally {
+                busyLock.leaveBusy();
+            }
+        }, EVT_NODE_LEFT, EVT_NODE_FAILED);
+
+        // Remote snapshot handler.
+        cctx.kernalContext().io().addTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC, new TransmissionHandler() {
+            @Override public void onEnd(UUID nodeId) {
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                assert snpTrFut.stores.isEmpty() : snpTrFut.stores.entrySet();
+                assert snpTrFut.partsLeft == 0 : snpTrFut;
+
+                snpTrFut.onDone();
+
+                log.info("Requested snapshot from remote node has been fully received " +
+                    "[snpName=" + snpTrFut.snpName + ", snpTrans=" + snpTrFut + ']');
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onException(UUID nodeId, Throwable err) {
+                RemoteSnapshotFuture fut = rmtSnpReq.get();
+
+                if (fut == null)
+                    return;
+
+                if (fut.rmtNodeId.equals(nodeId))
+                    fut.onDone(err);
+            }
+
+            /** {@inheritDoc} */
+            @Override public String filePath(UUID nodeId, TransmissionMeta fileMeta) {
+                Integer partId = (Integer)fileMeta.params().get(SNP_PART_ID_PARAM);
+                String rmtDbNodePath = (String)fileMeta.params().get(SNP_DB_NODE_PATH_PARAM);
+                String cacheDirName = (String)fileMeta.params().get(SNP_CACHE_DIR_NAME_PARAM);
+
+                RemoteSnapshotFuture transFut = resolve(nodeId, fileMeta);
+
+                try {
+                    File cacheDir = U.resolveWorkDirectory(tmpWorkDir.getAbsolutePath(),
+                        Paths.get(transFut.snpName, rmtDbNodePath, cacheDirName).toString(),
+                        false);
+
+                    return new File(cacheDir, getPartitionFileName(partId)).getAbsolutePath();
+                }
+                catch (IgniteCheckedException e) {
+                    throw new IgniteException(e);
+                }
+            }
+
+            /**
+             * @param nodeId Remote node id.
+             * @param meta Transmission meta.
+             * @return Resolved transmission future.
+             */
+            private RemoteSnapshotFuture resolve(UUID nodeId, TransmissionMeta meta) {
+                String snpName = (String)meta.params().get(SNP_NAME_PARAM);
+                Integer partsCnt = (Integer)meta.params().get(SNP_PARTITIONS_CNT);
+
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                if (snpTrFut == null || !snpTrFut.snpName.equals(snpName)) {
+                    throw new TransmissionCancelledException("Stale snapshot transmission will be ignored " +
+                        "[snpName=" + snpName + ", meta=" + meta + ", snpTrFut=" + snpTrFut + ']');
+                }
+
+                assert snpTrFut.snpName.equals(snpName) && snpTrFut.rmtNodeId.equals(nodeId) :
+                    "Another transmission in progress [snpTrFut=" + snpTrFut + ", nodeId=" + snpName + ']';
+
+                if (snpTrFut.partsLeft == -1)
+                    snpTrFut.partsLeft = partsCnt;
+
+                return snpTrFut;
+            }
+
+            /**
+             * @param snpTrans Current snapshot transmission.
+             * @param grpPartId Pair of group id and its partition id.
+             */
+            private void finishRecover(RemoteSnapshotFuture snpTrans, GroupPartitionId grpPartId) {
+                FilePageStore pageStore = null;
+
+                try {
+                    pageStore = snpTrans.stores.remove(grpPartId);
+
+                    pageStore.finishRecover();
+
+                    snpTrans.partConsumer.accept(new File(pageStore.getFileAbsolutePath()), grpPartId);
+
+                    snpTrans.partsLeft--;
+                }
+                catch (StorageException e) {
+                    throw new IgniteException(e);
+                }
+                finally {
+                    U.closeQuiet(pageStore);
+                }
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<ByteBuffer> chunkHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+
+                RemoteSnapshotFuture snpTrFut = resolve(nodeId, initMeta);
+
+                GroupPartitionId grpPartId = new GroupPartitionId(grpId, partId);
+                FilePageStore pageStore = snpTrFut.stores.get(grpPartId);
+
+                if (pageStore == null) {
+                    throw new IgniteException("Partition must be loaded before applying snapshot delta pages " +
+                        "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                pageStore.beginRecover();
+
+                // No snapshot delta pages received. Finalize recovery.
+                if (initMeta.count() == 0)
+                    finishRecover(snpTrFut, grpPartId);
+
+                return new Consumer<ByteBuffer>() {
+                    final LongAdder transferred = new LongAdder();
+
+                    @Override public void accept(ByteBuffer buff) {
+                        try {
+                            assert initMeta.count() != 0 : initMeta;
+
+                            RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                            if (fut0 == null || !fut0.equals(snpTrFut) || fut0.isCancelled()) {
+                                throw new TransmissionCancelledException("Snapshot request is cancelled " +
+                                    "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                            }
+
+                            pageStore.write(PageIO.getPageId(buff), buff, 0, false);
+
+                            transferred.add(buff.capacity());
+
+                            if (transferred.longValue() == initMeta.count())
+                                finishRecover(snpTrFut, grpPartId);
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                    }
+                };
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<File> fileHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+                String snpName = (String)initMeta.params().get(SNP_NAME_PARAM);
+
+                assert grpId != null;
+                assert partId != null;
+                assert snpName != null;
+                assert storeFactory != null;
+
+                RemoteSnapshotFuture transFut = rmtSnpReq.get();
+
+                if (transFut == null) {
+                    throw new IgniteException("Snapshot transmission with given name doesn't exists " +
+                        "[snpName=" + snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                return new Consumer<File>() {
+                    @Override public void accept(File file) {
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.equals(transFut) || fut0.isCancelled()) {
+                            throw new TransmissionCancelledException("Snapshot request is cancelled [snpName=" + snpName +
+                                ", grpId=" + grpId + ", partId=" + partId + ']');
+                        }
+
+                        busyLock.enterBusy();
+
+                        try {
+                            FilePageStore pageStore = (FilePageStore)storeFactory
+                                .apply(grpId, false)
+                                .createPageStore(getFlagByPartId(partId),
+                                    file::toPath,
+                                    new LongAdderMetric("NO_OP", null));
+
+                            transFut.stores.put(new GroupPartitionId(grpId, partId), pageStore);
+
+                            pageStore.init();
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                        finally {
+                            busyLock.leaveBusy();
+                        }
+                    }
+                };
+            }
+        });
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void stop0(boolean cancel) {
+        busyLock.block();
+
+        try {
+            // Try stop all snapshot processing if not yet.
+            for (SnapshotFutureTask sctx : locSnpTasks.values())
+                sctx.acceptException(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+            locSnpTasks.clear();
+
+            RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+            if (snpTrFut != null)
+                snpTrFut.cancel();
+
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null) {
+                    clusterSnpFut.onDone(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+                    clusterSnpFut = null;
+                }
+            }
+
+            if (snpRunner != null)
+                snpRunner.shutdownNow();
+
+            cctx.kernalContext().io().removeMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC);
+            cctx.kernalContext().io().removeTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC);
+
+            if (discoLsnr != null)
+                cctx.kernalContext().event().removeDiscoveryEventListener(discoLsnr);
+
+            cctx.exchange().unregisterExchangeAwareComponent(this);
+        }
+        finally {
+            busyLock.unblock();
+        }
+    }
+
+    /**
+     * @param snpDir Snapshot dire
 
 Review comment:
   dire -> dir.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r409023231
 
 

 ##########
 File path: modules/core/src/test/java/org/apache/ignite/platform/PlatformDeployServiceTask.java
 ##########
 @@ -18,6 +18,11 @@
 package org.apache.ignite.platform;
 
 import java.sql.Timestamp;
+import java.util.ArrayList;
 
 Review comment:
   Fixed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r410161471
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/managers/communication/FileReceiver.java
 ##########
 @@ -82,7 +82,8 @@ public FileReceiver(
             fileIo.position(meta.offset());
         }
         catch (IOException e) {
-            throw new IgniteException("Unable to open destination file. Receiver will will be stopped", e);
+            throw new IgniteException("Unable to open destination file. Receiver will will be stopped: " +
 
 Review comment:
   `will will` -> `will`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] anton-vinogradov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
anton-vinogradov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r405978359
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/ExchangeContext.java
 ##########
 @@ -76,9 +77,10 @@ public ExchangeContext(GridCacheSharedContext<?, ?> cctx, boolean crd, GridDhtPa
             log.warning("Current topology does not support the PME-free switch. Please check all nodes support" +
                 " this feature and it was not explicitly disabled by IGNITE_PME_FREE_SWITCH_DISABLED JVM option.");
 
+        boolean requirePmeFree = (fut.wasRebalanced() && fut.isBaselineNodeFailed()) || startedBySnapshot(fut);
 
 Review comment:
   just a boolean pmeFreeSwitch

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] anton-vinogradov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
anton-vinogradov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r405978359
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/ExchangeContext.java
 ##########
 @@ -76,9 +77,10 @@ public ExchangeContext(GridCacheSharedContext<?, ?> cctx, boolean crd, GridDhtPa
             log.warning("Current topology does not support the PME-free switch. Please check all nodes support" +
                 " this feature and it was not explicitly disabled by IGNITE_PME_FREE_SWITCH_DISABLED JVM option.");
 
+        boolean requirePmeFree = (fut.wasRebalanced() && fut.isBaselineNodeFailed()) || startedBySnapshot(fut);
 
 Review comment:
   just a boolean pmeFreeSwitch

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r409342250
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotFutureTask.java
 ##########
 @@ -0,0 +1,881 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.Closeable;
+import java.io.File;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.atomic.AtomicBoolean;
+import java.util.concurrent.atomic.AtomicIntegerArray;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.concurrent.locks.ReadWriteLock;
+import java.util.concurrent.locks.ReentrantReadWriteLock;
+import java.util.function.BooleanSupplier;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.internal.pagemem.PageIdUtils;
+import org.apache.ignite.internal.pagemem.store.PageStore;
+import org.apache.ignite.internal.pagemem.store.PageWriteListener;
+import org.apache.ignite.internal.processors.cache.CacheGroupContext;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtLocalPartition;
+import org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionState;
+import org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopology;
+import org.apache.ignite.internal.processors.cache.persistence.DbCheckpointListener;
+import org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.tree.io.PageIO;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.processors.marshaller.MappedName;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.lang.IgniteThrowableRunner;
+import org.apache.ignite.internal.util.tostring.GridToStringExclude;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.A;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.S;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.INDEX_PARTITION;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.cacheDirName;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.cacheWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFile;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.partDeltaFile;
+
+/**
+ *
+ */
+class SnapshotFutureTask extends GridFutureAdapter<Boolean> implements DbCheckpointListener {
+    /** Shared context. */
+    private final GridCacheSharedContext<?, ?> cctx;
+
+    /** Ignite logger. */
+    private final IgniteLogger log;
+
+    /** Node id which cause snapshot operation. */
+    private final UUID srcNodeId;
+
+    /** Unique identifier of snapshot process. */
+    private final String snpName;
+
+    /** Snapshot working directory on file system. */
+    private final File tmpTaskWorkDir;
+
+    /** Local buffer to perform copy-on-write operations for {@link PageStoreSerialWriter}. */
+    private final ThreadLocal<ByteBuffer> locBuff;
+
+    /** IO factory which will be used for creating snapshot delta-writers. */
+    private final FileIOFactory ioFactory;
+
+    /**
+     * The length of file size per each cache partition file.
+     * Partition has value greater than zero only for partitions in OWNING state.
+     * Information collected under checkpoint write lock.
+     */
+    private final Map<GroupPartitionId, Long> partFileLengths = new HashMap<>();
+
+    /**
+     * Map of partitions to snapshot and theirs corresponding delta PageStores.
+     * Writers are pinned to the snapshot context due to controlling partition
+     * processing supplier.
+     */
+    private final Map<GroupPartitionId, PageStoreSerialWriter> partDeltaWriters = new HashMap<>();
+
+    /** Snapshot data sender. */
+    @GridToStringExclude
+    private final SnapshotSender snpSndr;
+
+    /**
+     * Requested map of cache groups and its partitions to include into snapshot. If array of partitions
+     * is {@code null} than all OWNING partitions for given cache groups will be included into snapshot.
+     * In this case if all of partitions have OWNING state the index partition also will be included.
+     * <p>
+     * If partitions for particular cache group are not provided that they will be collected and added
+     * on checkpoint under the write lock.
+     */
+    private final Map<Integer, Set<Integer>> parts;
+
+    /** Cache group and corresponding partitions collected under the checkpoint write lock. */
+    private final Map<Integer, Set<Integer>> processed = new HashMap<>();
+
+    /** Checkpoint end future. */
+    private final CompletableFuture<Boolean> cpEndFut = new CompletableFuture<>();
+
+    /** Future to wait until checkpoint mark phase will be finished and snapshot tasks scheduled. */
+    private final GridFutureAdapter<Void> startedFut = new GridFutureAdapter<>();
+
+    /** Absolute snapshot storage path. */
+    private File tmpSnpDir;
+
+    /** Future which will be completed when task requested to be closed. Will be executed on system pool. */
+    private volatile CompletableFuture<Void> closeFut;
+
+    /** An exception which has been occurred during snapshot processing. */
+    private final AtomicReference<Throwable> err = new AtomicReference<>();
+
+    /** Flag indicates that task already scheduled on checkpoint. */
+    private final AtomicBoolean started = new AtomicBoolean();
+
+    /**
+     * @param e Finished snapshot task future with particular exception.
+     */
+    public SnapshotFutureTask(IgniteCheckedException e) {
+        A.notNull(e, "Exception for a finished snapshot task must be not null");
+
+        cctx = null;
+        log = null;
+        snpName = null;
+        srcNodeId = null;
+        tmpTaskWorkDir = null;
+        snpSndr = null;
+
+        err.set(e);
+        startedFut.onDone(e);
+        onDone(e);
+        parts = null;
+        ioFactory = null;
+        locBuff = null;
+    }
+
+    /**
+     * @param snpName Unique identifier of snapshot task.
+     * @param ioFactory Factory to working with delta as file storage.
+     * @param parts Map of cache groups and its partitions to include into snapshot, if set of partitions
+     * is {@code null} than all OWNING partitions for given cache groups will be included into snapshot.
+     */
+    public SnapshotFutureTask(
+        GridCacheSharedContext<?, ?> cctx,
+        UUID srcNodeId,
+        String snpName,
+        File tmpWorkDir,
+        FileIOFactory ioFactory,
+        SnapshotSender snpSndr,
+        Map<Integer, Set<Integer>> parts,
+        ThreadLocal<ByteBuffer> locBuff
+    ) {
+        A.notNull(snpName, "Snapshot name cannot be empty or null");
+        A.notNull(snpSndr, "Snapshot sender which handles execution tasks must be not null");
+        A.notNull(snpSndr.executor(), "Executor service must be not null");
+
+        this.parts = parts;
+        this.cctx = cctx;
+        this.log = cctx.logger(SnapshotFutureTask.class);
+        this.snpName = snpName;
+        this.srcNodeId = srcNodeId;
+        this.tmpTaskWorkDir = new File(tmpWorkDir, snpName);
+        this.snpSndr = snpSndr;
+        this.ioFactory = ioFactory;
+        this.locBuff = locBuff;
+    }
+
+    /**
+     * @return Snapshot name.
+     */
+    public String snapshotName() {
+        return snpName;
+    }
+
+    /**
+     * @return Node id which triggers this operation.
+     */
+    public UUID sourceNodeId() {
+        return srcNodeId;
+    }
+
+    /**
+     * @return Type of snapshot operation.
+     */
+    public Class<? extends SnapshotSender> type() {
+        return snpSndr.getClass();
+    }
+
+    /**
+     * @return Set of cache groups included into snapshot operation.
+     */
+    public Set<Integer> affectedCacheGroups() {
+        return parts.keySet();
+    }
+
+    /**
+     * @param th An exception which occurred during snapshot processing.
+     */
+    public void acceptException(Throwable th) {
+        if (th == null)
+            return;
+
+        if (err.compareAndSet(null, th))
+            closeAsync();
+
+        startedFut.onDone(th);
+
+        U.log(log, "Snapshot task has accepted exception to stop itself: " + th);
+    }
+
+    /** {@inheritDoc} */
+    @Override public boolean onDone(@Nullable Boolean res, @Nullable Throwable err) {
+        for (PageStoreSerialWriter writer : partDeltaWriters.values())
+            U.closeQuiet(writer);
+
+        snpSndr.close(err);
+
+        if (tmpSnpDir != null)
+            U.delete(tmpSnpDir);
+
+        // Delete snapshot directory if no other files exists.
+        try {
+            if (U.fileCount(tmpTaskWorkDir.toPath()) == 0 || err != null)
+                U.delete(tmpTaskWorkDir.toPath());
+        }
+        catch (IOException e) {
+            log.error("Snapshot directory doesn't exist [snpName=" + snpName + ", dir=" + tmpTaskWorkDir + ']');
+        }
+
+        if (err != null)
+            startedFut.onDone(err);
+
+        return super.onDone(res, err);
+    }
+
+    /**
+     * @throws IgniteCheckedException If fails.
+     */
+    public void awaitStarted() throws IgniteCheckedException {
+        startedFut.get();
+    }
+
+    /**
+     * @return {@code true} if current task requested to be stopped.
+     */
+    private boolean stopping() {
+        return err.get() != null;
+    }
+
+    /**
+     * Initiates snapshot task.
+     *
+     * @return {@code true} if task started by this call.
+     */
+    public boolean start() {
+        if (stopping())
+            return false;
+
+        try {
+            if (!started.compareAndSet(false, true))
+                return false;
+
+            tmpSnpDir = U.resolveWorkDirectory(tmpTaskWorkDir.getAbsolutePath(),
+                databaseRelativePath(cctx.kernalContext().pdsFolderResolver().resolveFolders().folderName()),
+                false);
+
+            for (Integer grpId : parts.keySet()) {
+                CacheGroupContext gctx = cctx.cache().cacheGroup(grpId);
 
 Review comment:
   Cache context sometimes is absent on some nodes (if `NodeFilter` used). I think it's safer here to skip groups if we can't find context for them and fail only if partitions for these groups ware requested explicitly. Also, we should cover this behavior by test (check that we can restore from such a snapshot)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r409104429
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManager.java
 ##########
 @@ -0,0 +1,1986 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.channels.FileChannel;
+import java.nio.file.FileVisitResult;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.SimpleFileVisitor;
+import java.nio.file.attribute.BasicFileAttributes;
+import java.util.ArrayDeque;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Queue;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.Executor;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.concurrent.atomic.LongAdder;
+import java.util.function.BiConsumer;
+import java.util.function.BiFunction;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.events.DiscoveryEvent;
+import org.apache.ignite.failure.FailureContext;
+import org.apache.ignite.failure.FailureType;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.GridTopic;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.events.DiscoveryCustomEvent;
+import org.apache.ignite.internal.managers.communication.GridIoManager;
+import org.apache.ignite.internal.managers.communication.GridMessageListener;
+import org.apache.ignite.internal.managers.communication.TransmissionCancelledException;
+import org.apache.ignite.internal.managers.communication.TransmissionHandler;
+import org.apache.ignite.internal.managers.communication.TransmissionMeta;
+import org.apache.ignite.internal.managers.communication.TransmissionPolicy;
+import org.apache.ignite.internal.managers.discovery.DiscoCache;
+import org.apache.ignite.internal.managers.discovery.DiscoveryCustomMessage;
+import org.apache.ignite.internal.managers.discovery.GridDiscoveryManager;
+import org.apache.ignite.internal.managers.eventstorage.DiscoveryEventListener;
+import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.CacheType;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.PartitionsExchangeAware;
+import org.apache.ignite.internal.processors.cache.persistence.StorageException;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.filename.PdsFolderSettings;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.MetastorageLifecycleListener;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadOnlyMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadWriteMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.tree.io.PageIO;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.processors.marshaller.MappedName;
+import org.apache.ignite.internal.processors.metric.MetricRegistry;
+import org.apache.ignite.internal.processors.metric.impl.LongAdderMetric;
+import org.apache.ignite.internal.util.GridBusyLock;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.distributed.InitMessage;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.tostring.GridToStringInclude;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.A;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.S;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.lang.IgniteUuid;
+import org.apache.ignite.thread.IgniteThreadPoolExecutor;
+import org.apache.ignite.thread.OomExceptionHandler;
+import org.jetbrains.annotations.Nullable;
+
+import static java.nio.file.StandardOpenOption.READ;
+import static org.apache.ignite.cluster.ClusterState.active;
+import static org.apache.ignite.configuration.IgniteConfiguration.DFLT_SNAPSHOT_DIRECTORY;
+import static org.apache.ignite.events.EventType.EVT_NODE_FAILED;
+import static org.apache.ignite.events.EventType.EVT_NODE_LEFT;
+import static org.apache.ignite.internal.IgniteFeatures.PERSISTENCE_CACHE_SNAPSHOT;
+import static org.apache.ignite.internal.IgniteFeatures.nodeSupports;
+import static org.apache.ignite.internal.MarshallerContextImpl.mappingFileStoreWorkDir;
+import static org.apache.ignite.internal.MarshallerContextImpl.saveMappings;
+import static org.apache.ignite.internal.events.DiscoveryCustomEvent.EVT_DISCOVERY_CUSTOM_EVT;
+import static org.apache.ignite.internal.managers.communication.GridIoPolicy.SYSTEM_POOL;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.INDEX_PARTITION;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.MAX_PARTITION_ID;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.resolveBinaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.INDEX_FILE_NAME;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.PART_FILE_TEMPLATE;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFile;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFileName;
+import static org.apache.ignite.internal.processors.cache.persistence.filename.PdsConsistentIdProcessor.DB_DEFAULT_FOLDER;
+import static org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId.getFlagByPartId;
+import static org.apache.ignite.internal.util.IgniteUtils.isLocalNodeCoordinator;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.END_SNAPSHOT;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.START_SNAPSHOT;
+
+/**
+ * Internal implementation of snapshot operations over persistence caches.
+ * <p>
+ * There are two major actions available:
+ * <ul>
+ *     <li>Create snapshot of the whole cluster cache groups by triggering PME to achieve consistency.</li>
+ *     <li>Create local snapshot of requested cache groups and send it to the node which request this operation.
+ *     Cache groups will be transmitted using internal API for transferring files. See {@link TransmissionHandler}.</li>
+ * </ul>
+ */
+public class IgniteSnapshotManager extends GridCacheSharedManagerAdapter
+    implements IgniteSnapshot, PartitionsExchangeAware, MetastorageLifecycleListener {
+    /** File with delta pages suffix. */
+    public static final String DELTA_SUFFIX = ".delta";
+
+    /** File name template consists of delta pages. */
+    public static final String PART_DELTA_TEMPLATE = PART_FILE_TEMPLATE + DELTA_SUFFIX;
+
+    /** File name template for index delta pages. */
+    public static final String INDEX_DELTA_NAME = INDEX_FILE_NAME + DELTA_SUFFIX;
+
+    /** Text Reason for checkpoint to start snapshot operation. */
+    public static final String CP_SNAPSHOT_REASON = "Checkpoint started to enforce snapshot operation: %s";
+
+    /** Name prefix for each remote snapshot operation. */
+    public static final String RMT_SNAPSHOT_PREFIX = "snapshot_";
+
+    /** Default snapshot directory for loading remote snapshots. */
+    public static final String DFLT_SNAPSHOT_TMP_DIR = "snp";
+
+    /** Timeout in millisecond for snapshot operations. */
+    public static final long DFLT_SNAPSHOT_TIMEOUT = 15_000L;
+
+    /** Snapshot in progress error message. */
+    public static final String SNP_IN_PROGRESS_ERR_MSG = "Operation rejected due to the snapshot operation in progress.";
+
+    /** Error message to finalize snapshot tasks. */
+    public static final String SNP_NODE_STOPPING_ERR_MSG = "Snapshot has been cancelled due to the local node " +
+        "is stopping";
+
+    /** Metastorage key to save currently running snapshot. */
+    public static final String SNP_RUNNING_KEY = "snapshot-running";
+
+    /** Snapshot metrics prefix. */
+    public static final String SNAPSHOT_METRICS = "snapshot";
+
+    /** Prefix for snapshot threads. */
+    private static final String SNAPSHOT_RUNNER_THREAD_PREFIX = "snapshot-runner";
+
+    /** Total number of thread to perform local snapshot. */
+    private static final int SNAPSHOT_THREAD_POOL_SIZE = 4;
+
+    /** Default snapshot topic to receive snapshots from remote node. */
+    private static final Object DFLT_INITIAL_SNAPSHOT_TOPIC = GridTopic.TOPIC_SNAPSHOT.topic("rmt_snp");
+
+    /** File transmission parameter of cache group id. */
+    private static final String SNP_GRP_ID_PARAM = "grpId";
+
+    /** File transmission parameter of cache partition id. */
+    private static final String SNP_PART_ID_PARAM = "partId";
+
+    /** File transmission parameter of node-sender directory path with its consistentId (e.g. db/IgniteNode0). */
+    private static final String SNP_DB_NODE_PATH_PARAM = "dbNodePath";
+
+    /** File transmission parameter of a cache directory with is currently sends its partitions. */
+    private static final String SNP_CACHE_DIR_NAME_PARAM = "cacheDirName";
+
+    /** Snapshot parameter name for a file transmission. */
+    private static final String SNP_NAME_PARAM = "snpName";
+
+    /** Total snapshot files count which receiver should expect to receive. */
+    private static final String SNP_PARTITIONS_CNT = "partsCnt";
+
+    /**
+     * Local buffer to perform copy-on-write operations with pages for {@code SnapshotFutureTask.PageStoreSerialWriter}s.
+     * It is important to have only only buffer per thread (instead of creating each buffer per
+     * each {@code SnapshotFutureTask.PageStoreSerialWriter}) this is redundant and can lead to OOM errors. Direct buffer
+     * deallocate only when ByteBuffer is garbage collected, but it can get out of off-heap memory before it.
+     */
+    private final ThreadLocal<ByteBuffer> locBuff;
+
+    /** Map of registered cache snapshot processes and their corresponding contexts. */
+    private final ConcurrentMap<String, SnapshotFutureTask> locSnpTasks = new ConcurrentHashMap<>();
+
+    /** Lock to protect the resources is used. */
+    private final GridBusyLock busyLock = new GridBusyLock();
+
+    /** Requested snapshot from remote node. */
+    private final AtomicReference<RemoteSnapshotFuture> rmtSnpReq = new AtomicReference<>();
+
+    /** Mutex used to order cluster snapshot operation progress. */
+    private final Object snpOpMux = new Object();
+
+    /** Take snapshot operation procedure. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> startSnpProc;
+
+    /** Check previously performed snapshot operation and delete uncompleted files if need. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> endSnpProc;
+
+    /** Resolved persistent data storage settings. */
+    private volatile PdsFolderSettings pdsSettings;
+
+    /** Fully initialized metastorage. */
+    private volatile ReadWriteMetastorage metaStorage;
+
+    /** Local snapshot sender factory. */
+    private Function<String, SnapshotSender> locSndrFactory = localSnapshotSenderFactory();
+
+    /** Main snapshot directory to save created snapshots. */
+    private volatile File locSnpDir;
+
+    /**
+     * Working directory for loaded snapshots from the remote nodes and storing
+     * temporary partition delta-files of locally started snapshot process.
+     */
+    private File tmpWorkDir;
+
+    /** Factory to working with delta as file storage. */
+    private volatile FileIOFactory ioFactory = new RandomAccessFileIOFactory();
+
+    /** Factory to create page store for restore. */
+    private volatile BiFunction<Integer, Boolean, FilePageStoreFactory> storeFactory;
+
+    /** Snapshot thread pool to perform local partition snapshots. */
+    private ExecutorService snpRunner;
+
+    /** System discovery message listener. */
+    private DiscoveryEventListener discoLsnr;
+
+    /** Cluster snapshot operation requested by user. */
+    private ClusterSnapshotFuture clusterSnpFut;
+
+    /** Current snapshot operation on local node. */
+    private volatile SnapshotOperationRequest clusterSnpRq;
+
+    /** {@code true} if recovery process occurred for snapshot. */
+    private volatile boolean recovered;
+
+    /** Last seen cluster snapshot operation. */
+    private volatile ClusterSnapshotFuture lastSeenSnpFut = new ClusterSnapshotFuture();
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public IgniteSnapshotManager(GridKernalContext ctx) {
+        locBuff = ThreadLocal.withInitial(() ->
+            ByteBuffer.allocateDirect(ctx.config().getDataStorageConfiguration().getPageSize())
+                .order(ByteOrder.nativeOrder()));
+
+        startSnpProc = new DistributedProcess<>(ctx, START_SNAPSHOT, this::initLocalSnapshotStartStage,
+            this::processLocalSnapshotStartStageResult);
+
+        endSnpProc = new DistributedProcess<>(ctx, END_SNAPSHOT, this::initLocalSnapshotEndStage,
+            this::processLocalSnapshotEndStageResult);
+    }
+
+    /**
+     * @param snapshotCacheDir Snapshot directory to store files.
+     * @param partId Cache partition identifier.
+     * @return A file representation.
+     */
+    public static File partDeltaFile(File snapshotCacheDir, int partId) {
+        return new File(snapshotCacheDir, partDeltaFileName(partId));
+    }
+
+    /**
+     * @param partId Partition id.
+     * @return File name of delta partition pages.
+     */
+    public static String partDeltaFileName(int partId) {
+        assert partId <= MAX_PARTITION_ID || partId == INDEX_PARTITION;
+
+        return partId == INDEX_PARTITION ? INDEX_DELTA_NAME : String.format(PART_DELTA_TEMPLATE, partId);
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void start0() throws IgniteCheckedException {
+        super.start0();
+
+        GridKernalContext ctx = cctx.kernalContext();
+
+        if (ctx.clientNode())
+            return;
+
+        if (!CU.isPersistenceEnabled(ctx.config()))
+            return;
+
+        snpRunner = new IgniteThreadPoolExecutor(SNAPSHOT_RUNNER_THREAD_PREFIX,
+            cctx.igniteInstanceName(),
+            SNAPSHOT_THREAD_POOL_SIZE,
+            SNAPSHOT_THREAD_POOL_SIZE,
+            IgniteConfiguration.DFLT_THREAD_KEEP_ALIVE_TIME,
+            new LinkedBlockingQueue<>(),
+            SYSTEM_POOL,
+            new OomExceptionHandler(ctx));
+
+        assert cctx.pageStore() instanceof FilePageStoreManager;
+
+        FilePageStoreManager storeMgr = (FilePageStoreManager)cctx.pageStore();
+
+        pdsSettings = cctx.kernalContext().pdsFolderResolver().resolveFolders();
+
+        locSnpDir = resolveSnapshotWorkDirectory(ctx.config());
+        tmpWorkDir = Paths.get(storeMgr.workDir().getAbsolutePath(), DFLT_SNAPSHOT_TMP_DIR).toFile();
+
+        U.ensureDirectory(locSnpDir, "snapshot work directory", log);
+        U.ensureDirectory(tmpWorkDir, "temp directory for snapshot creation", log);
+
+        MetricRegistry mreg = cctx.kernalContext().metric().registry(SNAPSHOT_METRICS);
+
+        mreg.register("LastSnapshotStartTime", () -> lastSeenSnpFut.startTime,
+            "The system time approximated by 10 ms when the last cluster snapshot operation has been started.");
+        mreg.register("LastSnapshotEndTime", () -> lastSeenSnpFut.endTime,
+            "The system time approximated by 10 ms when the last cluster snapshot operation has been finished.");
+        mreg.register("LastSnapshotName", () -> lastSeenSnpFut.name, String.class,
+            "The name of last started cluster snapshot operation.");
+        mreg.register("LastSnapshotErrorMessage",
+            () -> lastSeenSnpFut.error() == null ? null : lastSeenSnpFut.error().getMessage(),
+            String.class,
+            "The error message of last started cluster snapshot operation which fail. This value will be 'null' " +
+                "if last snapshot operation completed successfully.");
+        mreg.register("localSnapshotList", this::getSnapshots, List.class,
+            "The list of all known snapshots currently saved on the local node with respect to " +
+                "configured via IgniteConfiguration a snapshot path.");
+
+        storeFactory = storeMgr::getPageStoreFactory;
+
+        cctx.exchange().registerExchangeAwareComponent(this);
+        ctx.internalSubscriptionProcessor().registerMetastorageListener(this);
+
+        // Receive remote snapshots requests.
+        cctx.gridIO().addMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC, new GridMessageListener() {
+            @Override public void onMessage(UUID nodeId, Object msg, byte plc) {
+                if (!busyLock.enterBusy())
+                    return;
+
+                try {
+                    if (msg instanceof SnapshotRequestMessage) {
+                        SnapshotRequestMessage reqMsg0 = (SnapshotRequestMessage)msg;
+                        String snpName = reqMsg0.snapshotName();
+
+                        synchronized (this) {
+                            SnapshotFutureTask task = lastScheduledRemoteSnapshotTask(nodeId);
+
+                            if (task != null) {
+                                // Task will also be removed from local map due to the listener on future done.
+                                task.cancel();
+
+                                log.info("Snapshot request has been cancelled due to another request received " +
+                                    "[prevSnpResp=" + task + ", msg0=" + reqMsg0 + ']');
+                            }
+                        }
+
+                        SnapshotFutureTask task = registerSnapshotTask(snpName,
+                            nodeId,
+                            reqMsg0.parts(),
+                            remoteSnapshotSender(snpName, nodeId));
+
+                        task.listen(f -> {
+                            if (f.error() == null)
+                                return;
+
+                            U.error(log, "Failed to process request of creating a snapshot " +
+                                "[from=" + nodeId + ", msg=" + reqMsg0 + ']', f.error());
+
+                            try {
+                                cctx.gridIO().sendToCustomTopic(nodeId,
+                                    DFLT_INITIAL_SNAPSHOT_TOPIC,
+                                    new SnapshotResponseMessage(reqMsg0.snapshotName(), f.error().getMessage()),
+                                    SYSTEM_POOL);
+                            }
+                            catch (IgniteCheckedException ex0) {
+                                U.error(log, "Fail to send the response message with processing snapshot request " +
+                                    "error [request=" + reqMsg0 + ", nodeId=" + nodeId + ']', ex0);
+                            }
+                        });
+
+                        task.start();
+                    }
+                    else if (msg instanceof SnapshotResponseMessage) {
+                        SnapshotResponseMessage respMsg0 = (SnapshotResponseMessage)msg;
+
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.snpName.equals(respMsg0.snapshotName())) {
+                            if (log.isInfoEnabled()) {
+                                log.info("A stale snapshot response message has been received. Will be ignored " +
+                                    "[fromNodeId=" + nodeId + ", response=" + respMsg0 + ']');
+                            }
+
+                            return;
+                        }
+
+                        if (respMsg0.errorMessage() != null) {
+                            fut0.onDone(new IgniteCheckedException("Request cancelled. The snapshot operation stopped " +
+                                "on the remote node with an error: " + respMsg0.errorMessage()));
+                        }
+                    }
+                }
+                catch (Throwable e) {
+                    U.error(log, "Processing snapshot request from remote node fails with an error", e);
+
+                    cctx.kernalContext().failure().process(new FailureContext(FailureType.CRITICAL_ERROR, e));
+                }
+                finally {
+                    busyLock.leaveBusy();
+                }
+            }
+        });
+
+        cctx.gridEvents().addDiscoveryEventListener(discoLsnr = (evt, discoCache) -> {
+            if (!busyLock.enterBusy())
+                return;
+
+            try {
+                UUID leftNodeId = evt.eventNode().id();
+
+                if (evt.type() == EVT_DISCOVERY_CUSTOM_EVT) {
+                    DiscoveryCustomEvent evt0 = (DiscoveryCustomEvent)evt;
+
+                    if (evt0.customMessage() instanceof InitMessage) {
+                        InitMessage<?> msg = (InitMessage<?>)evt0.customMessage();
+
+                        // This happens when #takeSnapshot() method already invoked and distributed process
+                        // starts its action.
+                        if (msg.type() == START_SNAPSHOT.ordinal()) {
+                            assert clusterSnpRq != null ||
+                                !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()) : evt;
+
+                            DiscoveryCustomEvent customEvt = new DiscoveryCustomEvent();
+
+                            customEvt.node(evt0.node());
+                            customEvt.eventNode(evt0.eventNode());
+                            customEvt.affinityTopologyVersion(evt0.affinityTopologyVersion());
+                            customEvt.customMessage(new SnapshotStartDiscoveryMessage(discoCache, msg.processId()));
+
+                            // Handle new event inside discovery thread, so no guarantees will be violated.
+                            cctx.exchange().onDiscoveryEvent(customEvt, discoCache);
+                        }
+                    }
+                }
+                else if (evt.type() == EVT_NODE_LEFT || evt.type() == EVT_NODE_FAILED) {
+                    SnapshotOperationRequest snpRq = clusterSnpRq;
+
+                    for (SnapshotFutureTask sctx : locSnpTasks.values()) {
+                        if (sctx.sourceNodeId().equals(leftNodeId) ||
+                            (snpRq != null &&
+                                snpRq.snpName.equals(sctx.snapshotName()) &&
+                                snpRq.bltNodes.contains(leftNodeId))) {
+                            sctx.acceptException(new ClusterTopologyCheckedException("The node which requested snapshot " +
+                                "creation has left the grid"));
+                        }
+                    }
+
+                    RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                    if (snpTrFut != null && snpTrFut.rmtNodeId.equals(leftNodeId)) {
+                        snpTrFut.onDone(new ClusterTopologyCheckedException("The node from which a snapshot has been " +
+                            "requested left the grid"));
+                    }
+                }
+            }
+            finally {
+                busyLock.leaveBusy();
+            }
+        }, EVT_NODE_LEFT, EVT_NODE_FAILED, EVT_DISCOVERY_CUSTOM_EVT);
+
+        // Remote snapshot handler.
+        cctx.kernalContext().io().addTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC, new TransmissionHandler() {
+            @Override public void onEnd(UUID nodeId) {
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                assert snpTrFut.stores.isEmpty() : snpTrFut.stores.entrySet();
+                assert snpTrFut.partsLeft == 0 : snpTrFut;
+
+                snpTrFut.onDone();
+
+                log.info("Requested snapshot from remote node has been fully received " +
+                    "[snpName=" + snpTrFut.snpName + ", snpTrans=" + snpTrFut + ']');
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onException(UUID nodeId, Throwable err) {
+                RemoteSnapshotFuture fut = rmtSnpReq.get();
+
+                if (fut == null)
+                    return;
+
+                if (fut.rmtNodeId.equals(nodeId))
+                    fut.onDone(err);
+            }
+
+            /** {@inheritDoc} */
+            @Override public String filePath(UUID nodeId, TransmissionMeta fileMeta) {
+                Integer partId = (Integer)fileMeta.params().get(SNP_PART_ID_PARAM);
+                String rmtDbNodePath = (String)fileMeta.params().get(SNP_DB_NODE_PATH_PARAM);
+                String cacheDirName = (String)fileMeta.params().get(SNP_CACHE_DIR_NAME_PARAM);
+
+                RemoteSnapshotFuture transFut = resolve(nodeId, fileMeta);
+
+                try {
+                    File cacheDir = U.resolveWorkDirectory(tmpWorkDir.getAbsolutePath(),
+                        Paths.get(transFut.snpName, rmtDbNodePath, cacheDirName).toString(),
+                        false);
+
+                    return new File(cacheDir, getPartitionFileName(partId)).getAbsolutePath();
+                }
+                catch (IgniteCheckedException e) {
+                    throw new IgniteException(e);
+                }
+            }
+
+            /**
+             * @param nodeId Remote node id.
+             * @param meta Transmission meta.
+             * @return Resolved transmission future.
+             */
+            private RemoteSnapshotFuture resolve(UUID nodeId, TransmissionMeta meta) {
+                String snpName = (String)meta.params().get(SNP_NAME_PARAM);
+                Integer partsCnt = (Integer)meta.params().get(SNP_PARTITIONS_CNT);
+
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                if (snpTrFut == null || !snpTrFut.snpName.equals(snpName)) {
+                    throw new TransmissionCancelledException("Stale snapshot transmission will be ignored " +
+                        "[snpName=" + snpName + ", meta=" + meta + ", snpTrFut=" + snpTrFut + ']');
+                }
+
+                assert snpTrFut.snpName.equals(snpName) && snpTrFut.rmtNodeId.equals(nodeId) :
+                    "Another transmission in progress [snpTrFut=" + snpTrFut + ", nodeId=" + snpName + ']';
+
+                if (snpTrFut.partsLeft == -1)
+                    snpTrFut.partsLeft = partsCnt;
+
+                return snpTrFut;
+            }
+
+            /**
+             * @param snpTrans Current snapshot transmission.
+             * @param grpPartId Pair of group id and its partition id.
+             */
+            private void finishRecover(RemoteSnapshotFuture snpTrans, GroupPartitionId grpPartId) {
+                FilePageStore pageStore = null;
+
+                try {
+                    pageStore = snpTrans.stores.remove(grpPartId);
+
+                    pageStore.finishRecover();
+
+                    snpTrans.partConsumer.accept(new File(pageStore.getFileAbsolutePath()), grpPartId);
+
+                    snpTrans.partsLeft--;
+                }
+                catch (StorageException e) {
+                    throw new IgniteException(e);
+                }
+                finally {
+                    U.closeQuiet(pageStore);
+                }
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<ByteBuffer> chunkHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+
+                RemoteSnapshotFuture snpTrFut = resolve(nodeId, initMeta);
+
+                GroupPartitionId grpPartId = new GroupPartitionId(grpId, partId);
+                FilePageStore pageStore = snpTrFut.stores.get(grpPartId);
+
+                if (pageStore == null) {
+                    throw new IgniteException("Partition must be loaded before applying snapshot delta pages " +
+                        "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                pageStore.beginRecover();
+
+                // No snapshot delta pages received. Finalize recovery.
+                if (initMeta.count() == 0)
+                    finishRecover(snpTrFut, grpPartId);
+
+                return new Consumer<ByteBuffer>() {
+                    final LongAdder transferred = new LongAdder();
+
+                    @Override public void accept(ByteBuffer buff) {
+                        try {
+                            assert initMeta.count() != 0 : initMeta;
+
+                            RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                            if (fut0 == null || !fut0.equals(snpTrFut) || fut0.isCancelled()) {
+                                throw new TransmissionCancelledException("Snapshot request is cancelled " +
+                                    "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                            }
+
+                            pageStore.write(PageIO.getPageId(buff), buff, 0, false);
+
+                            transferred.add(buff.capacity());
+
+                            if (transferred.longValue() == initMeta.count())
+                                finishRecover(snpTrFut, grpPartId);
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                    }
+                };
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<File> fileHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+                String snpName = (String)initMeta.params().get(SNP_NAME_PARAM);
+
+                assert grpId != null;
+                assert partId != null;
+                assert snpName != null;
+                assert storeFactory != null;
+
+                RemoteSnapshotFuture transFut = rmtSnpReq.get();
+
+                if (transFut == null) {
+                    throw new IgniteException("Snapshot transmission with given name doesn't exists " +
+                        "[snpName=" + snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                return new Consumer<File>() {
+                    @Override public void accept(File file) {
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.equals(transFut) || fut0.isCancelled()) {
+                            throw new TransmissionCancelledException("Snapshot request is cancelled [snpName=" + snpName +
+                                ", grpId=" + grpId + ", partId=" + partId + ']');
+                        }
+
+                        busyLock.enterBusy();
+
+                        try {
+                            FilePageStore pageStore = (FilePageStore)storeFactory
+                                .apply(grpId, false)
+                                .createPageStore(getFlagByPartId(partId),
+                                    file::toPath,
+                                    new LongAdderMetric("NO_OP", null));
+
+                            transFut.stores.put(new GroupPartitionId(grpId, partId), pageStore);
+
+                            pageStore.init();
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                        finally {
+                            busyLock.leaveBusy();
+                        }
+                    }
+                };
+            }
+        });
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void stop0(boolean cancel) {
+        busyLock.block();
+
+        try {
+            // Try stop all snapshot processing if not yet.
+            for (SnapshotFutureTask sctx : locSnpTasks.values())
+                sctx.acceptException(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+            locSnpTasks.clear();
+
+            RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+            if (snpTrFut != null)
+                snpTrFut.cancel();
+
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null) {
+                    clusterSnpFut.onDone(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+                    clusterSnpFut = null;
+                }
+            }
+
+            if (snpRunner != null)
+                snpRunner.shutdownNow();
+
+            cctx.kernalContext().io().removeMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC);
+            cctx.kernalContext().io().removeTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC);
+
+            if (discoLsnr != null)
+                cctx.kernalContext().event().removeDiscoveryEventListener(discoLsnr);
+
+            cctx.exchange().unregisterExchangeAwareComponent(this);
+        }
+        finally {
+            busyLock.unblock();
+        }
+    }
+
+    /**
+     * @param snpDir Snapshot dire
+     * @param folderName Local node folder name (see {@link U#maskForFileName} with consistent id).
+     */
+    public static void deleteSnapshot(File snpDir, String folderName) {
+        if (!snpDir.exists())
+            return;
+
+        assert snpDir.isDirectory() : snpDir;
+
+        try {
+            File binDir = resolveBinaryWorkDir(snpDir.getAbsolutePath(), folderName);
+            File dbDir = U.resolveWorkDirectory(snpDir.getAbsolutePath(), databaseRelativePath(folderName), false);
+
+            U.delete(binDir);
+            U.delete(dbDir);
+
+            File marshDir = mappingFileStoreWorkDir(snpDir.getAbsolutePath());
+
+            // Concurrently traverse the snapshot marshaller directory and delete all files.
+            Files.walkFileTree(marshDir.toPath(), new SimpleFileVisitor<Path>() {
+                @Override public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException {
+                    U.delete(file);
+
+                    return FileVisitResult.CONTINUE;
+                }
+
+                @Override public FileVisitResult visitFileFailed(Path file, IOException exc) {
+                    // Skip files which can be concurrently removed from FileTree.
+                    return FileVisitResult.CONTINUE;
+                }
+            });
+
+            File db = new File(snpDir, DB_DEFAULT_FOLDER);
+
+            if (!db.exists() || db.list().length == 0)
+                U.delete(snpDir);
+        }
+        catch (IOException | IgniteCheckedException e) {
+            throw new IgniteException(e);
+        }
+    }
+
+    /**
+     * @param snpName Snapshot name.
+     * @return Local snapshot directory for snapshot with given name.
+     */
+    public File snapshotLocalDir(String snpName) {
+        assert locSnpDir != null;
+
+        return new File(locSnpDir, snpName);
+    }
+
+    /**
+     * @return Node snapshot working directory.
+     */
+    public File snapshotTmpDir() {
+        assert tmpWorkDir != null;
+
+        return tmpWorkDir;
+    }
+
+    /**
+     * @param req Request on snapshot creation.
+     * @return Future which will be completed when a snapshot has been started.
+     */
+    private IgniteInternalFuture<SnapshotOperationResponse> initLocalSnapshotStartStage(SnapshotOperationRequest req) {
+        if (cctx.kernalContext().clientNode() ||
+            !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()))
+            return new GridFinishedFuture<>();
+
+        // Executed inside discovery notifier thread, prior to firing discovery custom event,
+        // so it is safe to set new snapshot task inside this method without synchronization.
+        if (clusterSnpRq != null) {
+            return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot operation has been rejected. " +
+                "Another snapshot operation in progress [req=" + req + ", curr=" + clusterSnpRq + ']'));
+        }
+
+        // Collection of pairs group and appropriate cache partition to be snapshot.
+        Map<Integer, Set<Integer>> parts = new HashMap<>();
+
+        for (Integer grpId : req.grpIds)
+            parts.put(grpId, null);
+
+        SnapshotFutureTask task0 = registerSnapshotTask(req.snpName,
+            req.srcNodeId,
+            parts,
+            locSndrFactory.apply(req.snpName));
+
+        clusterSnpRq = req;
+
+        return task0.chain(f -> new SnapshotOperationResponse());
+    }
+
+    /**
+     * @param id Request id.
+     * @param res Results.
+     * @param err Errors.
+     */
+    private void processLocalSnapshotStartStageResult(UUID id, Map<UUID, SnapshotOperationResponse> res, Map<UUID, Exception> err) {
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        if (snpRq == null)
+            return;
+
+        if (!snpRq.rqId.equals(id)) {
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null && clusterSnpFut.rqId.equals(id)) {
+                    clusterSnpFut.onDone(new IgniteCheckedException("Snapshot operation failed due to another snapshot " +
+                        "operation in progress: " + snpRq.snpName));
+
+                    clusterSnpFut = null;
+                }
+
+                return;
+            }
+        }
+
+        if (isLocalNodeCoordinator(cctx.discovery())) {
+            Set<UUID> missed = new HashSet<>(snpRq.bltNodes);
+            missed.removeAll(res.keySet());
+            missed.removeAll(err.keySet());
+
+            snpRq.hasErr = !F.isEmpty(err) || !missed.isEmpty();
+
+            if (snpRq.hasErr) {
+                U.warn(log, "Execution of local snapshot tasks fails or them haven't been executed " +
+                    "due to some of nodes left the cluster. Uncompleted snapshot will be deleted " +
+                    "[err=" + err + ", missed=" + missed + ']');
+            }
+
+            endSnpProc.start(UUID.randomUUID(), snpRq);
+        }
+    }
+
+    /**
+     * @param req Request on snapshot creation.
+     * @return Future which will be completed when the snapshot will be finalized.
+     */
+    private IgniteInternalFuture<SnapshotOperationResponse> initLocalSnapshotEndStage(SnapshotOperationRequest req) {
+        if (clusterSnpRq == null)
+            return new GridFinishedFuture<>(new SnapshotOperationResponse());
+
+        try {
+            if (req.hasErr)
+                deleteSnapshot(snapshotLocalDir(req.snpName), pdsSettings.folderName());
+
+            removeLastMetaStorageKey();
+        }
+        catch (Exception e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        return new GridFinishedFuture<>(new SnapshotOperationResponse());
+    }
+
+    /**
+     * @param id Request id.
+     * @param res Results.
+     * @param err Errors.
+     */
+    private void processLocalSnapshotEndStageResult(UUID id, Map<UUID, SnapshotOperationResponse> res, Map<UUID, Exception> err) {
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        if (snpRq == null)
+            return;
+
+        Set<UUID> endFail = new HashSet<>(snpRq.bltNodes);
+        endFail.removeAll(res.keySet());
+
+        clusterSnpRq = null;
+
+        synchronized (snpOpMux) {
+            if (clusterSnpFut != null) {
+                if (endFail.isEmpty() && !snpRq.hasErr) {
+                    clusterSnpFut.onDone();
+
+                    if (log.isInfoEnabled())
+                        log.info("Cluster-wide snapshot operation finished successfully [req=" + snpRq + ']');
+                }
+                else {
+                    clusterSnpFut.onDone(new IgniteCheckedException("Snapshot creation has been finished with an error. " +
+                        "Local snapshot tasks may not finished completely or finalizing results fails " +
+                        "[hasErr" + snpRq.hasErr + ", fail=" + endFail + ']'));
+                }
+
+                clusterSnpFut = null;
+            }
+        }
+    }
+
+    /**
+     * @return {@code True} if snapshot operation is in progress.
+     */
+    public boolean isSnapshotCreating() {
+        if (clusterSnpRq != null)
+            return true;
+
+        synchronized (snpOpMux) {
+            return clusterSnpRq != null || clusterSnpFut != null;
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public List<String> getSnapshots() {
+        if (cctx.kernalContext().clientNode())
+            throw new UnsupportedOperationException("Client and daemon nodes can not perform this operation.");
+
+        synchronized (snpOpMux) {
+            return Arrays.stream(locSnpDir.listFiles(File::isDirectory))
+                .map(File::getName)
+                .collect(Collectors.toList());
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public IgniteFuture<Void> createSnapshot(String name) {
+        A.notNullOrEmpty(name, "name");
+
+        try {
+            if (cctx.kernalContext().clientNode())
+                throw new UnsupportedOperationException("Client and daemon nodes can not perform this operation.");
+
+            if (!IgniteFeatures.allNodesSupports(cctx.discovery().allNodes(), PERSISTENCE_CACHE_SNAPSHOT))
+                throw new IgniteException("Not all nodes in the cluster support a snapshot operation.");
+
+            if (!active(cctx.kernalContext().state().clusterState().state()))
+                throw new IgniteException("Snapshot operation has been rejected. The cluster is inactive.");
+
+            DiscoveryDataClusterState clusterState = cctx.kernalContext().state().clusterState();
+
+            if (!clusterState.hasBaselineTopology())
+                throw new IgniteException("Snapshot operation has been rejected. The baseline topology is not configured for cluster.");
+
+            ClusterSnapshotFuture snpFut0;
+
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null && !clusterSnpFut.isDone())
+                    throw new IgniteException("Create snapshot request has been rejected. The previous snapshot operation was not completed.");
+
+                if (clusterSnpRq != null)
+                    throw new IgniteException("Create snapshot request has been rejected. Parallel snapshot processes are not allowed.");
+
+                if (getSnapshots().contains(name))
+                    throw new IgniteException("Create snapshot request has been rejected. Snapshot with given name already exists.");
+
+                snpFut0 = new ClusterSnapshotFuture(UUID.randomUUID(), name);
+
+                clusterSnpFut = snpFut0;
+                lastSeenSnpFut = snpFut0;
+            }
+
+            List<Integer> grps = cctx.cache().persistentGroups().stream()
+                .filter(g -> cctx.cache().cacheType(g.cacheOrGroupName()) == CacheType.USER)
+                .filter(g -> !g.config().isEncryptionEnabled())
+                .map(CacheGroupDescriptor::groupId)
+                .collect(Collectors.toList());
+
+            List<ClusterNode> srvNodes = cctx.discovery().serverNodes(AffinityTopologyVersion.NONE);
+
+            startSnpProc.start(snpFut0.rqId, new SnapshotOperationRequest(snpFut0.rqId,
+                cctx.localNodeId(),
+                name,
+                grps,
+                new HashSet<>(F.viewReadOnly(srvNodes,
+                    F.node2id(),
+                    (node) -> CU.baselineNode(node, clusterState)))));
+
+            if (log.isInfoEnabled())
+                log.info("Cluster-wide snapshot operation started [snpName=" + name + ", grps=" + grps + ']');
+
+            return new IgniteFutureImpl<>(snpFut0);
+        }
+        catch (Exception e) {
+            U.error(log, "Start snapshot operation failed", e);
+
+            lastSeenSnpFut = new ClusterSnapshotFuture(name, e);
+
+            return new IgniteFinishedFutureImpl<>(e);
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onReadyForReadWrite(ReadWriteMetastorage metaStorage) throws IgniteCheckedException {
+        synchronized (snpOpMux) {
+            this.metaStorage = metaStorage;
+
+            if (recovered)
+                removeLastMetaStorageKey();
+
+            recovered = false;
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onReadyForRead(ReadOnlyMetastorage metaStorage) throws IgniteCheckedException {
+        // Snapshot which has not been completed due to the local node crashed must be deleted.
+        String snpName = (String)metaStorage.read(SNP_RUNNING_KEY);
+
+        if (snpName == null)
+            return;
+
+        recovered = true;
+
+        for (File tmp : snapshotTmpDir().listFiles())
+            U.delete(tmp);
+
+        deleteSnapshot(snapshotLocalDir(snpName), pdsSettings.folderName());
+
+        if (log.isInfoEnabled()) {
+            log.info("Previous attempt to create snapshot fail due to the local node crash. All resources " +
+                "related to snapshot operation have been deleted: " + snpName);
+        }
+    }
+
+    /**
+     * @param evt Discovery event to check.
+     * @return {@code true} if exchange started by snapshot operation.
+     */
+    public static boolean isSnapshotOperation(DiscoveryEvent evt) {
+        return !evt.eventNode().isClient() &&
+            evt.type() == EVT_DISCOVERY_CUSTOM_EVT &&
+            ((DiscoveryCustomEvent)evt).customMessage() instanceof SnapshotStartDiscoveryMessage;
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onDoneBeforeTopologyUnlock(GridDhtPartitionsExchangeFuture fut) {
+        if (clusterSnpRq == null || cctx.kernalContext().clientNode())
+            return;
+
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        SnapshotFutureTask task = locSnpTasks.get(snpRq.snpName);
+
+        if (task == null)
+            return;
+
+        if (task.start()) {
+            cctx.database().forceCheckpoint(String.format("Start snapshot operation: %s", snpRq.snpName));
+
+            // schedule task on checkpoint and wait when it starts
+            try {
+                task.awaitStarted();
+            }
+            catch (IgniteCheckedException e) {
+                U.error(log, "Fail to wait while cluster-wide snapshot operation started", e);
+            }
+        }
+    }
+
+    /**
+     * @param parts Collection of pairs group and appropriate cache partition to be snapshot.
+     * @param rmtNodeId The remote node to connect to.
+     * @param partConsumer Received partition handler.
+     * @return Future which will be completed when requested snapshot fully received.
+     */
+    public IgniteInternalFuture<Void> createRemoteSnapshot(
+        UUID rmtNodeId,
+        Map<Integer, Set<Integer>> parts,
+        BiConsumer<File, GroupPartitionId> partConsumer
+    ) {
+        assert partConsumer != null;
+
+        ClusterNode rmtNode = cctx.discovery().node(rmtNodeId);
+
+        if (!nodeSupports(rmtNode, PERSISTENCE_CACHE_SNAPSHOT))
+            return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot on remote node is not supported: " + rmtNode.id()));
+
+        if (rmtNode == null) {
+            return new GridFinishedFuture<>(new ClusterTopologyCheckedException("Snapshot request cannot be performed. " +
+                "Remote node left the grid [rmtNodeId=" + rmtNodeId + ']'));
+        }
+
+        String snpName = RMT_SNAPSHOT_PREFIX + UUID.randomUUID().toString();
+
+        RemoteSnapshotFuture snpTransFut = new RemoteSnapshotFuture(rmtNodeId, snpName, partConsumer);
+
+        busyLock.enterBusy();
+        SnapshotRequestMessage msg0;
+
+        try {
+            msg0 = new SnapshotRequestMessage(snpName, parts);
+
+            RemoteSnapshotFuture fut = rmtSnpReq.get();
+
+            try {
+                if (fut != null)
+                    fut.get(DFLT_SNAPSHOT_TIMEOUT, TimeUnit.MILLISECONDS);
+            }
+            catch (IgniteCheckedException e) {
+                if (log.isInfoEnabled())
+                    log.info("The previous snapshot request finished with an exception:" + e.getMessage());
+            }
+
+            try {
+                if (rmtSnpReq.compareAndSet(null, snpTransFut)) {
+                    cctx.gridIO().sendOrderedMessage(rmtNode, DFLT_INITIAL_SNAPSHOT_TOPIC, msg0, SYSTEM_POOL,
+                        Long.MAX_VALUE, true);
+                }
+                else
+                    return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot request has been concurrently interrupted."));
+
+            }
+            catch (IgniteCheckedException e) {
+                rmtSnpReq.compareAndSet(snpTransFut, null);
+
+                return new GridFinishedFuture<>(e);
+            }
+        }
+        finally {
+            busyLock.leaveBusy();
+        }
+
+        if (log.isInfoEnabled()) {
+            log.info("Snapshot request is sent to the remote node [rmtNodeId=" + rmtNodeId +
+                ", msg0=" + msg0 + ", snpTransFut=" + snpTransFut +
+                ", topVer=" + cctx.discovery().topologyVersionEx() + ']');
+        }
+
+        return snpTransFut;
+    }
+
+    /**
+     * @param grps List of cache groups which will be destroyed.
+     */
+    public void onCacheGroupsStopped(List<Integer> grps) {
+        for (SnapshotFutureTask sctx : locSnpTasks.values()) {
+            Set<Integer> retain = new HashSet<>(grps);
+            retain.retainAll(sctx.affectedCacheGroups());
+
+            if (!retain.isEmpty()) {
+                sctx.acceptException(new IgniteCheckedException("Snapshot has been interrupted due to some of the required " +
+                    "cache groups stopped: " + retain));
+            }
+        }
+    }
+
+    /**
+     * @param snpName Unique snapshot name.
+     * @param srcNodeId Node id which cause snapshot operation.
+     * @param parts Collection of pairs group and appropriate cache partition to be snapshot.
+     * @param snpSndr Factory which produces snapshot receiver instance.
+     * @return Snapshot operation task which should be registered on checkpoint to run.
+     */
+    SnapshotFutureTask registerSnapshotTask(
+        String snpName,
+        UUID srcNodeId,
+        Map<Integer, Set<Integer>> parts,
+        SnapshotSender snpSndr
+    ) {
+        if (!busyLock.enterBusy())
+            return new SnapshotFutureTask(new IgniteCheckedException("Snapshot manager is stopping [locNodeId=" + cctx.localNodeId() + ']'));
+
+        try {
+            if (locSnpTasks.containsKey(snpName))
+                return new SnapshotFutureTask(new IgniteCheckedException("Snapshot with requested name is already scheduled: " + snpName));
+
+            SnapshotFutureTask snpFutTask;
+
+            SnapshotFutureTask prev = locSnpTasks.putIfAbsent(snpName,
+                snpFutTask = new SnapshotFutureTask(cctx,
+                    srcNodeId,
+                    snpName,
+                    tmpWorkDir,
+                    ioFactory,
+                    snpSndr,
+                    parts,
+                    locBuff));
+
+            if (prev != null)
+                return new SnapshotFutureTask(new IgniteCheckedException("Snapshot with requested name is already scheduled: " + snpName));
+
+            if (log.isInfoEnabled()) {
+                log.info("Snapshot task has been registered on local node [sctx=" + this +
+                    ", topVer=" + cctx.discovery().topologyVersionEx() + ']');
+            }
+
+            snpFutTask.listen(f -> locSnpTasks.remove(snpName));
+
+            return snpFutTask;
+        }
+        finally {
+            busyLock.leaveBusy();
+        }
+    }
+
+    /**
+     * @param factory Factory which produces {@link LocalSnapshotSender} implementation.
+     */
+    void setLocalSnapshotSenderFactory(Function<String, SnapshotSender> factory) {
+        locSndrFactory = factory;
+    }
+
+    /**
+     * @return Factory which produces {@link LocalSnapshotSender} implementation.
+     */
+    Function<String, SnapshotSender> localSnapshotSenderFactory() {
+        return LocalSnapshotSender::new;
+    }
+
+    /**
+     * @param snpName Snapshot name.
+     * @param rmtNodeId Remote node id to send snapshot to.
+     * @return Snapshot sender instance.
+     */
+    SnapshotSender remoteSnapshotSender(String snpName, UUID rmtNodeId) {
+        // Remote snapshots can be send only by single threaded executor since only one transmissionSender created.
+        return new RemoteSnapshotSender(log,
+            new SequentialExecutorWrapper(log, snpRunner),
+            () -> databaseRelativePath(pdsSettings.folderName()),
+            cctx.gridIO().openTransmissionSender(rmtNodeId, DFLT_INITIAL_SNAPSHOT_TOPIC),
+            snpName);
+    }
+
+    /** Snapshot finished successfully or already restored. Key can be removed. */
+    private void removeLastMetaStorageKey() throws IgniteCheckedException {
+        cctx.database().checkpointReadLock();
+
+        try {
+            metaStorage.remove(SNP_RUNNING_KEY);
+        }
+        finally {
+            cctx.database().checkpointReadUnlock();
+        }
+    }
+
+    /**
+     * @return The executor service used to run snapshot tasks.
+     */
+    ExecutorService snapshotExecutorService() {
+        assert snpRunner != null;
+
+        return snpRunner;
+    }
+
+    /**
+     * @param ioFactory Factory to create IO interface over a page stores.
+     */
+    void ioFactory(FileIOFactory ioFactory) {
+        this.ioFactory = ioFactory;
+    }
+
+    /**
+     * @param nodeId Remote node id on which requests has been registered.
+     * @return Snapshot future related to given node id.
+     */
+    SnapshotFutureTask lastScheduledRemoteSnapshotTask(UUID nodeId) {
+        return locSnpTasks.values().stream()
+            .filter(t -> t.type() == RemoteSnapshotSender.class && t.sourceNodeId().equals(nodeId))
+            .findFirst()
+            .orElse(null);
+    }
+
+    /**
+     * @return Relative configured path of persistence data storage directory for the local node.
+     * Example: {@code snapshotWorkDir/db/IgniteNodeName0}
+     */
+    static String databaseRelativePath(String folderName) {
+        return Paths.get(DB_DEFAULT_FOLDER, folderName).toString();
+    }
+
+    /**
+     * @param cfg Ignite configuration.
+     * @return Snapshot work path.
+     */
+    static File resolveSnapshotWorkDirectory(IgniteConfiguration cfg) {
+        try {
+            return cfg.getSnapshotPath() == null ?
+                U.resolveWorkDirectory(cfg.getWorkDirectory(), DFLT_SNAPSHOT_DIRECTORY, false) :
+                U.resolveWorkDirectory(cfg.getWorkDirectory(), cfg.getSnapshotPath(), false);
+        }
+        catch (IgniteCheckedException e) {
+            throw new IgniteException(e);
+        }
+    }
+
+    /** Remote snapshot future which tracks remote snapshot transmission result. */
+    private class RemoteSnapshotFuture extends GridFutureAdapter<Void> {
+        /** Snapshot name to create. */
+        private final String snpName;
+
+        /** Remote node id to request snapshot from. */
+        private final UUID rmtNodeId;
+
+        /** Collection of partition to be received. */
+        private final Map<GroupPartitionId, FilePageStore> stores = new ConcurrentHashMap<>();
+
+        /** Partition handler given by request initiator. */
+        private final BiConsumer<File, GroupPartitionId> partConsumer;
+
+        /** Counter which show how many partitions left to be received. */
+        private int partsLeft = -1;
+
+        /**
+         * @param partConsumer Received partition handler.
+         */
+        public RemoteSnapshotFuture(UUID rmtNodeId, String snpName, BiConsumer<File, GroupPartitionId> partConsumer) {
+            this.snpName = snpName;
+            this.rmtNodeId = rmtNodeId;
+            this.partConsumer = partConsumer;
+        }
+
+        /** {@inheritDoc} */
+        @Override public boolean cancel() {
+            return onCancelled();
+        }
+
+        /** {@inheritDoc} */
+        @Override protected boolean onDone(@Nullable Void res, @Nullable Throwable err, boolean cancel) {
+            assert err != null || cancel || stores.isEmpty() : "Not all file storage processed: " + stores;
+
+            rmtSnpReq.compareAndSet(this, null);
+
+            if (err != null || cancel) {
+                // Close non finished file storage.
+                for (Map.Entry<GroupPartitionId, FilePageStore> entry : stores.entrySet()) {
+                    FilePageStore store = entry.getValue();
+
+                    try {
+                        store.stop(true);
+                    }
+                    catch (StorageException e) {
+                        log.warning("Error stopping received file page store", e);
+                    }
+                }
+            }
+
+            U.delete(Paths.get(tmpWorkDir.getAbsolutePath(), snpName));
+
+            return super.onDone(res, err, cancel);
+        }
+
+        /** {@inheritDoc} */
+        @Override public boolean equals(Object o) {
+            if (this == o)
+                return true;
+
+            if (o == null || getClass() != o.getClass())
+                return false;
+
+            RemoteSnapshotFuture fut = (RemoteSnapshotFuture)o;
+
+            return rmtNodeId.equals(fut.rmtNodeId) &&
+                snpName.equals(fut.snpName);
+        }
+
+        /** {@inheritDoc} */
+        @Override public int hashCode() {
+            return Objects.hash(rmtNodeId, snpName);
+        }
+
+        /** {@inheritDoc} */
+        @Override public String toString() {
+            return S.toString(RemoteSnapshotFuture.class, this);
+        }
+    }
+
+    /**
+     * Such an executor can executes tasks not in a single thread, but executes them
+     * on different threads sequentially. It's important for some {@link SnapshotSender}'s
+     * to process sub-task sequentially due to all these sub-tasks may share a single socket
+     * channel to send data to.
+     */
+    private static class SequentialExecutorWrapper implements Executor {
+        /** Ignite logger. */
+        private final IgniteLogger log;
+
+        /** Queue of task to execute. */
+        private final Queue<Runnable> tasks = new ArrayDeque<>();
+
+        /** Delegate executor. */
+        private final Executor executor;
+
+        /** Currently running task. */
+        private volatile Runnable active;
+
+        /** If wrapped executor is shutting down. */
+        private volatile boolean stopping;
+
+        /**
+         * @param executor Executor to run tasks on.
+         */
+        public SequentialExecutorWrapper(IgniteLogger log, Executor executor) {
+            this.log = log.getLogger(SequentialExecutorWrapper.class);
+            this.executor = executor;
+        }
+
+        /** {@inheritDoc} */
+        @Override public synchronized void execute(final Runnable r) {
+            assert !stopping : "Task must be cancelled prior to the wrapped executor is shutting down.";
+
+            tasks.offer(() -> {
+                try {
+                    r.run();
+                }
+                finally {
+                    scheduleNext();
+                }
+            });
+
+            if (active == null)
+                scheduleNext();
+        }
+
+        /** */
+        protected synchronized void scheduleNext() {
+            if ((active = tasks.poll()) != null) {
+                try {
+                    executor.execute(active);
+                }
+                catch (RejectedExecutionException e) {
+                    tasks.clear();
+
+                    stopping = true;
+
+                    log.warning("Task is outdated. Wrapped executor is shutting down.", e);
+                }
+            }
+        }
+    }
+
+    /**
+     *
+     */
+    private static class RemoteSnapshotSender extends SnapshotSender {
+        /** The sender which sends files to remote node. */
+        private final GridIoManager.TransmissionSender sndr;
+
+        /** Relative node path initializer. */
+        private final Supplier<String> initPath;
+
+        /** Snapshot name */
+        private final String snpName;
+
+        /** Local node persistent directory with consistent id. */
+        private String relativeNodePath;
+
+        /** The number of cache partition files expected to be processed. */
+        private int partsCnt;
+
+        /**
+         * @param log Ignite logger.
+         * @param sndr File sender instance.
+         * @param snpName Snapshot name.
+         */
+        public RemoteSnapshotSender(
+            IgniteLogger log,
+            Executor exec,
+            Supplier<String> initPath,
+            GridIoManager.TransmissionSender sndr,
+            String snpName
+        ) {
+            super(log, exec);
+
+            this.sndr = sndr;
+            this.snpName = snpName;
+            this.initPath = initPath;
+        }
+
+        /** {@inheritDoc} */
+        @Override protected void init(int partsCnt) {
+            this.partsCnt = partsCnt;
+
+            relativeNodePath = initPath.get();
+
+            if (relativeNodePath == null)
+                throw new IgniteException("Relative node path cannot be empty.");
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendPart0(File part, String cacheDirName, GroupPartitionId pair, Long len) {
+            try {
+                assert part.exists();
+                assert len > 0 : "Requested partitions has incorrect file length " +
+                    "[pair=" + pair + ", cacheDirName=" + cacheDirName + ']';
+
+                sndr.send(part, 0, len, transmissionParams(snpName, cacheDirName, pair), TransmissionPolicy.FILE);
+
+                if (log.isInfoEnabled()) {
+                    log.info("Partition file has been send [part=" + part.getName() + ", pair=" + pair +
+                        ", length=" + len + ']');
+                }
+            }
+            catch (TransmissionCancelledException e) {
+                if (log.isInfoEnabled()) {
+                    log.info("Transmission partition file has been interrupted [part=" + part.getName() +
+                        ", pair=" + pair + ']');
+                }
+            }
+            catch (IgniteCheckedException | InterruptedException | IOException e) {
+                U.error(log, "Error sending partition file [part=" + part.getName() + ", pair=" + pair +
+                    ", length=" + len + ']', e);
+
+                throw new IgniteException(e);
+            }
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendDelta0(File delta, String cacheDirName, GroupPartitionId pair) {
+            try {
+                sndr.send(delta, transmissionParams(snpName, cacheDirName, pair), TransmissionPolicy.CHUNK);
+
+                if (log.isInfoEnabled())
+                    log.info("Delta pages storage has been send [part=" + delta.getName() + ", pair=" + pair + ']');
+            }
+            catch (TransmissionCancelledException e) {
+                if (log.isInfoEnabled()) {
+                    log.info("Transmission delta pages has been interrupted [part=" + delta.getName() +
+                        ", pair=" + pair + ']');
+                }
+            }
+            catch (IgniteCheckedException | InterruptedException | IOException e) {
+                U.error(log, "Error sending delta file  [part=" + delta.getName() + ", pair=" + pair + ']', e);
+
+                throw new IgniteException(e);
+            }
+        }
+
+        /**
+         * @param cacheDirName Cache directory name.
+         * @param pair Cache group id with corresponding partition id.
+         * @return Map of params.
+         */
+        private Map<String, Serializable> transmissionParams(String snpName, String cacheDirName,
+            GroupPartitionId pair) {
+            Map<String, Serializable> params = new HashMap<>();
+
+            params.put(SNP_GRP_ID_PARAM, pair.getGroupId());
+            params.put(SNP_PART_ID_PARAM, pair.getPartitionId());
+            params.put(SNP_DB_NODE_PATH_PARAM, relativeNodePath);
+            params.put(SNP_CACHE_DIR_NAME_PARAM, cacheDirName);
+            params.put(SNP_NAME_PARAM, snpName);
+            params.put(SNP_PARTITIONS_CNT, partsCnt);
+
+            return params;
+        }
+
+        /** {@inheritDoc} */
+        @Override public void close0(@Nullable Throwable th) {
+            U.closeQuiet(sndr);
+
+            if (th == null) {
+                if (log.isInfoEnabled())
+                    log.info("The remote snapshot sender closed normally [snpName=" + snpName + ']');
+            }
+            else {
+                U.warn(log, "The remote snapshot sender closed due to an error occurred while processing " +
+                    "snapshot operation [snpName=" + snpName + ']', th);
+            }
+        }
+    }
+
+    /**
+     * Snapshot sender which writes all data to local directory.
+     */
+    private class LocalSnapshotSender extends SnapshotSender {
+        /** Snapshot name. */
+        private final String snpName;
+
+        /** Local snapshot directory. */
+        private final File snpLocDir;
+
+        /** Local node snapshot directory calculated on snapshot directory. */
+        private File dbDir;
+
+        /** Size of page. */
+        private final int pageSize;
+
+        /**
+         * @param snpName Snapshot name.
+         */
+        public LocalSnapshotSender(String snpName) {
+            super(IgniteSnapshotManager.this.log, snpRunner);
+
+            this.snpName = snpName;
+            snpLocDir = snapshotLocalDir(snpName);
+            pageSize = cctx.kernalContext().config().getDataStorageConfiguration().getPageSize();
+        }
+
+        /** {@inheritDoc} */
+        @Override protected void init(int partsCnt) {
+            dbDir = new File (snpLocDir, databaseRelativePath(pdsSettings.folderName()));
+
+            if (dbDir.exists()) {
+                throw new IgniteException("Snapshot with given name already exists " +
+                    "[snpName=" + snpName + ", absPath=" + dbDir.getAbsolutePath() + ']');
+            }
+
+            cctx.database().checkpointReadLock();
+
+            try {
+                assert metaStorage != null && metaStorage.read(SNP_RUNNING_KEY) == null :
+                    "The previous snapshot hasn't been completed correctly";
+
+                metaStorage.write(SNP_RUNNING_KEY, snpName);
+
+                U.ensureDirectory(dbDir, "snapshot work directory", log);
+            }
+            catch (IgniteCheckedException e) {
+                throw new IgniteException(e);
+            }
+            finally {
+                cctx.database().checkpointReadUnlock();
+            }
+
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendCacheConfig0(File ccfg, String cacheDirName) {
+            assert dbDir != null;
+
+            try {
+                File cacheDir = U.resolveWorkDirectory(dbDir.getAbsolutePath(), cacheDirName, false);
+
+                copy(ccfg, new File(cacheDir, ccfg.getName()), ccfg.length());
+            }
+            catch (IgniteCheckedException | IOException e) {
+                throw new IgniteException(e);
+            }
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendMarshallerMeta0(List<Map<Integer, MappedName>> mappings) {
+            if (mappings == null)
+                return;
+
+            saveMappings(cctx.kernalContext(), mappings, snpLocDir);
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendBinaryMeta0(Collection<BinaryType> types) {
+            if (types == null)
+                return;
+
+            cctx.kernalContext().cacheObjects().saveMetadata(types, snpLocDir);
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendPart0(File part, String cacheDirName, GroupPartitionId pair, Long len) {
+            try {
+                if (len == 0)
+                    return;
+
+                File cacheDir = U.resolveWorkDirectory(dbDir.getAbsolutePath(), cacheDirName, false);
+
+                File snpPart = new File(cacheDir, part.getName());
+
+                if (!snpPart.exists() || snpPart.delete())
+                    snpPart.createNewFile();
+
+                copy(part, snpPart, len);
+
+                if (log.isInfoEnabled()) {
+                    log.info("Partition has been snapshot [snapshotDir=" + dbDir.getAbsolutePath() +
+                        ", cacheDirName=" + cacheDirName + ", part=" + part.getName() +
+                        ", length=" + part.length() + ", snapshot=" + snpPart.getName() + ']');
+                }
+            }
+            catch (IOException | IgniteCheckedException ex) {
+                throw new IgniteException(ex);
+            }
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendDelta0(File delta, String cacheDirName, GroupPartitionId pair) {
+            File snpPart = getPartitionFile(dbDir, cacheDirName, pair.getPartitionId());
+
+            if (log.isInfoEnabled()) {
+                log.info("Start partition snapshot recovery with the given delta page file [part=" + snpPart +
+                    ", delta=" + delta + ']');
+            }
+
+            try (FileIO fileIo = ioFactory.create(delta, READ);
+                 FilePageStore pageStore = (FilePageStore)storeFactory
+                     .apply(pair.getGroupId(), false)
+                     .createPageStore(getFlagByPartId(pair.getPartitionId()),
+                         snpPart::toPath,
+                         new LongAdderMetric("NO_OP", null))
+            ) {
+                ByteBuffer pageBuf = ByteBuffer.allocate(pageSize)
+                    .order(ByteOrder.nativeOrder());
+
+                long totalBytes = fileIo.size();
+
+                assert totalBytes % pageSize == 0 : "Given file with delta pages has incorrect size: " + fileIo.size();
+
+                pageStore.beginRecover();
+
+                for (long pos = 0; pos < totalBytes; pos += pageSize) {
+                    long read = fileIo.readFully(pageBuf, pos);
+
+                    assert read == pageBuf.capacity();
+
+                    pageBuf.flip();
+
+                    if (log.isDebugEnabled()) {
+                        log.debug("Read page given delta file [path=" + delta.getName() +
+                            ", pageId=" + PageIO.getPageId(pageBuf) + ", pos=" + pos + ", pages=" + (totalBytes / pageSize) +
+                            ", crcBuff=" + FastCrc.calcCrc(pageBuf, pageBuf.limit()) + ", crcPage=" + PageIO.getCrc(pageBuf) + ']');
+
+                        pageBuf.rewind();
+                    }
+
+                    pageStore.write(PageIO.getPageId(pageBuf), pageBuf, 0, false);
+
+                    pageBuf.flip();
+                }
+
+                pageStore.finishRecover();
+            }
+            catch (IOException | IgniteCheckedException e) {
+                throw new IgniteException(e);
+            }
+        }
+
+        /** {@inheritDoc} */
+        @Override protected void close0(@Nullable Throwable th) {
+            if (th == null) {
+                if (log.isInfoEnabled())
+                    log.info("Local snapshot sender closed, resources released [dbNodeSnpDir=" + dbDir + ']');
+            }
+            else {
+                deleteSnapshot(snpLocDir, pdsSettings.folderName());
+
+                U.warn(log, "Local snapshot sender closed due to an error occurred", th);
+            }
+        }
+
+        /**
+         * @param from Copy from file.
+         * @param to Copy data to file.
+         * @param length Number of bytes to copy from beginning.
+         * @throws IOException If fails.
+         */
+        private void copy(File from, File to, long length) throws IOException {
+            try (FileIO src = ioFactory.create(from, READ);
+                 FileChannel dest = new FileOutputStream(to).getChannel()) {
+                if (src.size() < length) {
+                    throw new IgniteException("The source file to copy has to enough length " +
+                        "[expected=" + length + ", actual=" + src.size() + ']');
+                }
+
+                src.position(0);
+
+                long written = 0;
+
+                while (written < length)
+                    written += src.transferTo(written, length - written, dest);
+            }
+        }
+    }
+
+    /** Snapshot start request for {@link DistributedProcess} initiate message. */
+    private static class SnapshotOperationRequest implements Serializable {
+        /** Serial version uid. */
+        private static final long serialVersionUID = 0L;
+
+        /** Unique snapshot request id. */
+        private final UUID rqId;
+
+        /** Source node id which trigger request. */
+        private final UUID srcNodeId;
+
+        /** Snapshot name. */
+        private final String snpName;
+
+        @GridToStringInclude
+        /** The list of cache groups to include into snapshot. */
+        private final List<Integer> grpIds;
+
+        @GridToStringInclude
+        /** The list of affected by snapshot operation baseline nodes. */
 
 Review comment:
   Fixed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r408792408
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManager.java
 ##########
 @@ -0,0 +1,1986 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.channels.FileChannel;
+import java.nio.file.FileVisitResult;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.SimpleFileVisitor;
+import java.nio.file.attribute.BasicFileAttributes;
+import java.util.ArrayDeque;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Queue;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.Executor;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.concurrent.atomic.LongAdder;
+import java.util.function.BiConsumer;
+import java.util.function.BiFunction;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.events.DiscoveryEvent;
+import org.apache.ignite.failure.FailureContext;
+import org.apache.ignite.failure.FailureType;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.GridTopic;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.events.DiscoveryCustomEvent;
+import org.apache.ignite.internal.managers.communication.GridIoManager;
+import org.apache.ignite.internal.managers.communication.GridMessageListener;
+import org.apache.ignite.internal.managers.communication.TransmissionCancelledException;
+import org.apache.ignite.internal.managers.communication.TransmissionHandler;
+import org.apache.ignite.internal.managers.communication.TransmissionMeta;
+import org.apache.ignite.internal.managers.communication.TransmissionPolicy;
+import org.apache.ignite.internal.managers.discovery.DiscoCache;
+import org.apache.ignite.internal.managers.discovery.DiscoveryCustomMessage;
+import org.apache.ignite.internal.managers.discovery.GridDiscoveryManager;
+import org.apache.ignite.internal.managers.eventstorage.DiscoveryEventListener;
+import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.CacheType;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.PartitionsExchangeAware;
+import org.apache.ignite.internal.processors.cache.persistence.StorageException;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.filename.PdsFolderSettings;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.MetastorageLifecycleListener;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadOnlyMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadWriteMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.tree.io.PageIO;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.processors.marshaller.MappedName;
+import org.apache.ignite.internal.processors.metric.MetricRegistry;
+import org.apache.ignite.internal.processors.metric.impl.LongAdderMetric;
+import org.apache.ignite.internal.util.GridBusyLock;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.distributed.InitMessage;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.tostring.GridToStringInclude;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.A;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.S;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.lang.IgniteUuid;
+import org.apache.ignite.thread.IgniteThreadPoolExecutor;
+import org.apache.ignite.thread.OomExceptionHandler;
+import org.jetbrains.annotations.Nullable;
+
+import static java.nio.file.StandardOpenOption.READ;
+import static org.apache.ignite.cluster.ClusterState.active;
+import static org.apache.ignite.configuration.IgniteConfiguration.DFLT_SNAPSHOT_DIRECTORY;
+import static org.apache.ignite.events.EventType.EVT_NODE_FAILED;
+import static org.apache.ignite.events.EventType.EVT_NODE_LEFT;
+import static org.apache.ignite.internal.IgniteFeatures.PERSISTENCE_CACHE_SNAPSHOT;
+import static org.apache.ignite.internal.IgniteFeatures.nodeSupports;
+import static org.apache.ignite.internal.MarshallerContextImpl.mappingFileStoreWorkDir;
+import static org.apache.ignite.internal.MarshallerContextImpl.saveMappings;
+import static org.apache.ignite.internal.events.DiscoveryCustomEvent.EVT_DISCOVERY_CUSTOM_EVT;
+import static org.apache.ignite.internal.managers.communication.GridIoPolicy.SYSTEM_POOL;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.INDEX_PARTITION;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.MAX_PARTITION_ID;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.resolveBinaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.INDEX_FILE_NAME;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.PART_FILE_TEMPLATE;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFile;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFileName;
+import static org.apache.ignite.internal.processors.cache.persistence.filename.PdsConsistentIdProcessor.DB_DEFAULT_FOLDER;
+import static org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId.getFlagByPartId;
+import static org.apache.ignite.internal.util.IgniteUtils.isLocalNodeCoordinator;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.END_SNAPSHOT;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.START_SNAPSHOT;
+
+/**
+ * Internal implementation of snapshot operations over persistence caches.
+ * <p>
+ * There are two major actions available:
+ * <ul>
+ *     <li>Create snapshot of the whole cluster cache groups by triggering PME to achieve consistency.</li>
+ *     <li>Create local snapshot of requested cache groups and send it to the node which request this operation.
+ *     Cache groups will be transmitted using internal API for transferring files. See {@link TransmissionHandler}.</li>
+ * </ul>
+ */
+public class IgniteSnapshotManager extends GridCacheSharedManagerAdapter
+    implements IgniteSnapshot, PartitionsExchangeAware, MetastorageLifecycleListener {
+    /** File with delta pages suffix. */
+    public static final String DELTA_SUFFIX = ".delta";
+
+    /** File name template consists of delta pages. */
+    public static final String PART_DELTA_TEMPLATE = PART_FILE_TEMPLATE + DELTA_SUFFIX;
+
+    /** File name template for index delta pages. */
+    public static final String INDEX_DELTA_NAME = INDEX_FILE_NAME + DELTA_SUFFIX;
+
+    /** Text Reason for checkpoint to start snapshot operation. */
+    public static final String CP_SNAPSHOT_REASON = "Checkpoint started to enforce snapshot operation: %s";
+
+    /** Name prefix for each remote snapshot operation. */
+    public static final String RMT_SNAPSHOT_PREFIX = "snapshot_";
+
+    /** Default snapshot directory for loading remote snapshots. */
+    public static final String DFLT_SNAPSHOT_TMP_DIR = "snp";
+
+    /** Timeout in millisecond for snapshot operations. */
+    public static final long DFLT_SNAPSHOT_TIMEOUT = 15_000L;
+
+    /** Snapshot in progress error message. */
+    public static final String SNP_IN_PROGRESS_ERR_MSG = "Operation rejected due to the snapshot operation in progress.";
+
+    /** Error message to finalize snapshot tasks. */
+    public static final String SNP_NODE_STOPPING_ERR_MSG = "Snapshot has been cancelled due to the local node " +
+        "is stopping";
+
+    /** Metastorage key to save currently running snapshot. */
+    public static final String SNP_RUNNING_KEY = "snapshot-running";
+
+    /** Snapshot metrics prefix. */
+    public static final String SNAPSHOT_METRICS = "snapshot";
+
+    /** Prefix for snapshot threads. */
+    private static final String SNAPSHOT_RUNNER_THREAD_PREFIX = "snapshot-runner";
+
+    /** Total number of thread to perform local snapshot. */
+    private static final int SNAPSHOT_THREAD_POOL_SIZE = 4;
+
+    /** Default snapshot topic to receive snapshots from remote node. */
+    private static final Object DFLT_INITIAL_SNAPSHOT_TOPIC = GridTopic.TOPIC_SNAPSHOT.topic("rmt_snp");
+
+    /** File transmission parameter of cache group id. */
+    private static final String SNP_GRP_ID_PARAM = "grpId";
+
+    /** File transmission parameter of cache partition id. */
+    private static final String SNP_PART_ID_PARAM = "partId";
+
+    /** File transmission parameter of node-sender directory path with its consistentId (e.g. db/IgniteNode0). */
+    private static final String SNP_DB_NODE_PATH_PARAM = "dbNodePath";
+
+    /** File transmission parameter of a cache directory with is currently sends its partitions. */
+    private static final String SNP_CACHE_DIR_NAME_PARAM = "cacheDirName";
+
+    /** Snapshot parameter name for a file transmission. */
+    private static final String SNP_NAME_PARAM = "snpName";
+
+    /** Total snapshot files count which receiver should expect to receive. */
+    private static final String SNP_PARTITIONS_CNT = "partsCnt";
+
+    /**
+     * Local buffer to perform copy-on-write operations with pages for {@code SnapshotFutureTask.PageStoreSerialWriter}s.
+     * It is important to have only only buffer per thread (instead of creating each buffer per
+     * each {@code SnapshotFutureTask.PageStoreSerialWriter}) this is redundant and can lead to OOM errors. Direct buffer
+     * deallocate only when ByteBuffer is garbage collected, but it can get out of off-heap memory before it.
+     */
+    private final ThreadLocal<ByteBuffer> locBuff;
+
+    /** Map of registered cache snapshot processes and their corresponding contexts. */
+    private final ConcurrentMap<String, SnapshotFutureTask> locSnpTasks = new ConcurrentHashMap<>();
+
+    /** Lock to protect the resources is used. */
+    private final GridBusyLock busyLock = new GridBusyLock();
+
+    /** Requested snapshot from remote node. */
+    private final AtomicReference<RemoteSnapshotFuture> rmtSnpReq = new AtomicReference<>();
+
+    /** Mutex used to order cluster snapshot operation progress. */
+    private final Object snpOpMux = new Object();
+
+    /** Take snapshot operation procedure. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> startSnpProc;
+
+    /** Check previously performed snapshot operation and delete uncompleted files if need. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> endSnpProc;
+
+    /** Resolved persistent data storage settings. */
+    private volatile PdsFolderSettings pdsSettings;
+
+    /** Fully initialized metastorage. */
+    private volatile ReadWriteMetastorage metaStorage;
+
+    /** Local snapshot sender factory. */
+    private Function<String, SnapshotSender> locSndrFactory = localSnapshotSenderFactory();
+
+    /** Main snapshot directory to save created snapshots. */
+    private volatile File locSnpDir;
+
+    /**
+     * Working directory for loaded snapshots from the remote nodes and storing
+     * temporary partition delta-files of locally started snapshot process.
+     */
+    private File tmpWorkDir;
+
+    /** Factory to working with delta as file storage. */
+    private volatile FileIOFactory ioFactory = new RandomAccessFileIOFactory();
+
+    /** Factory to create page store for restore. */
+    private volatile BiFunction<Integer, Boolean, FilePageStoreFactory> storeFactory;
+
+    /** Snapshot thread pool to perform local partition snapshots. */
+    private ExecutorService snpRunner;
+
+    /** System discovery message listener. */
+    private DiscoveryEventListener discoLsnr;
+
+    /** Cluster snapshot operation requested by user. */
+    private ClusterSnapshotFuture clusterSnpFut;
+
+    /** Current snapshot operation on local node. */
+    private volatile SnapshotOperationRequest clusterSnpRq;
+
+    /** {@code true} if recovery process occurred for snapshot. */
+    private volatile boolean recovered;
+
+    /** Last seen cluster snapshot operation. */
+    private volatile ClusterSnapshotFuture lastSeenSnpFut = new ClusterSnapshotFuture();
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public IgniteSnapshotManager(GridKernalContext ctx) {
+        locBuff = ThreadLocal.withInitial(() ->
+            ByteBuffer.allocateDirect(ctx.config().getDataStorageConfiguration().getPageSize())
+                .order(ByteOrder.nativeOrder()));
+
+        startSnpProc = new DistributedProcess<>(ctx, START_SNAPSHOT, this::initLocalSnapshotStartStage,
+            this::processLocalSnapshotStartStageResult);
+
+        endSnpProc = new DistributedProcess<>(ctx, END_SNAPSHOT, this::initLocalSnapshotEndStage,
+            this::processLocalSnapshotEndStageResult);
+    }
+
+    /**
+     * @param snapshotCacheDir Snapshot directory to store files.
+     * @param partId Cache partition identifier.
+     * @return A file representation.
+     */
+    public static File partDeltaFile(File snapshotCacheDir, int partId) {
+        return new File(snapshotCacheDir, partDeltaFileName(partId));
+    }
+
+    /**
+     * @param partId Partition id.
+     * @return File name of delta partition pages.
+     */
+    public static String partDeltaFileName(int partId) {
+        assert partId <= MAX_PARTITION_ID || partId == INDEX_PARTITION;
+
+        return partId == INDEX_PARTITION ? INDEX_DELTA_NAME : String.format(PART_DELTA_TEMPLATE, partId);
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void start0() throws IgniteCheckedException {
+        super.start0();
+
+        GridKernalContext ctx = cctx.kernalContext();
+
+        if (ctx.clientNode())
+            return;
+
+        if (!CU.isPersistenceEnabled(ctx.config()))
+            return;
+
+        snpRunner = new IgniteThreadPoolExecutor(SNAPSHOT_RUNNER_THREAD_PREFIX,
+            cctx.igniteInstanceName(),
+            SNAPSHOT_THREAD_POOL_SIZE,
+            SNAPSHOT_THREAD_POOL_SIZE,
+            IgniteConfiguration.DFLT_THREAD_KEEP_ALIVE_TIME,
+            new LinkedBlockingQueue<>(),
+            SYSTEM_POOL,
+            new OomExceptionHandler(ctx));
+
+        assert cctx.pageStore() instanceof FilePageStoreManager;
+
+        FilePageStoreManager storeMgr = (FilePageStoreManager)cctx.pageStore();
+
+        pdsSettings = cctx.kernalContext().pdsFolderResolver().resolveFolders();
+
+        locSnpDir = resolveSnapshotWorkDirectory(ctx.config());
+        tmpWorkDir = Paths.get(storeMgr.workDir().getAbsolutePath(), DFLT_SNAPSHOT_TMP_DIR).toFile();
+
+        U.ensureDirectory(locSnpDir, "snapshot work directory", log);
+        U.ensureDirectory(tmpWorkDir, "temp directory for snapshot creation", log);
+
+        MetricRegistry mreg = cctx.kernalContext().metric().registry(SNAPSHOT_METRICS);
+
+        mreg.register("LastSnapshotStartTime", () -> lastSeenSnpFut.startTime,
+            "The system time approximated by 10 ms when the last cluster snapshot operation has been started.");
+        mreg.register("LastSnapshotEndTime", () -> lastSeenSnpFut.endTime,
+            "The system time approximated by 10 ms when the last cluster snapshot operation has been finished.");
+        mreg.register("LastSnapshotName", () -> lastSeenSnpFut.name, String.class,
+            "The name of last started cluster snapshot operation.");
+        mreg.register("LastSnapshotErrorMessage",
+            () -> lastSeenSnpFut.error() == null ? null : lastSeenSnpFut.error().getMessage(),
+            String.class,
+            "The error message of last started cluster snapshot operation which fail. This value will be 'null' " +
+                "if last snapshot operation completed successfully.");
+        mreg.register("localSnapshotList", this::getSnapshots, List.class,
 
 Review comment:
   Use always upper case for the first letter or lower case for the first letter, but not mixed

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r410205714
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotFutureTask.java
 ##########
 @@ -0,0 +1,881 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.Closeable;
+import java.io.File;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.atomic.AtomicBoolean;
+import java.util.concurrent.atomic.AtomicIntegerArray;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.concurrent.locks.ReadWriteLock;
+import java.util.concurrent.locks.ReentrantReadWriteLock;
+import java.util.function.BooleanSupplier;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.internal.pagemem.PageIdUtils;
+import org.apache.ignite.internal.pagemem.store.PageStore;
+import org.apache.ignite.internal.pagemem.store.PageWriteListener;
+import org.apache.ignite.internal.processors.cache.CacheGroupContext;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtLocalPartition;
+import org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionState;
+import org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopology;
+import org.apache.ignite.internal.processors.cache.persistence.DbCheckpointListener;
+import org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.tree.io.PageIO;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.processors.marshaller.MappedName;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.lang.IgniteThrowableRunner;
+import org.apache.ignite.internal.util.tostring.GridToStringExclude;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.A;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.S;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.INDEX_PARTITION;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.cacheDirName;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.cacheWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFile;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.partDeltaFile;
+
+/**
+ *
+ */
+class SnapshotFutureTask extends GridFutureAdapter<Boolean> implements DbCheckpointListener {
+    /** Shared context. */
+    private final GridCacheSharedContext<?, ?> cctx;
+
+    /** Ignite logger. */
+    private final IgniteLogger log;
+
+    /** Node id which cause snapshot operation. */
+    private final UUID srcNodeId;
+
+    /** Unique identifier of snapshot process. */
+    private final String snpName;
+
+    /** Snapshot working directory on file system. */
+    private final File tmpTaskWorkDir;
+
+    /** Local buffer to perform copy-on-write operations for {@link PageStoreSerialWriter}. */
+    private final ThreadLocal<ByteBuffer> locBuff;
+
+    /** IO factory which will be used for creating snapshot delta-writers. */
+    private final FileIOFactory ioFactory;
+
+    /**
+     * The length of file size per each cache partition file.
+     * Partition has value greater than zero only for partitions in OWNING state.
+     * Information collected under checkpoint write lock.
+     */
+    private final Map<GroupPartitionId, Long> partFileLengths = new HashMap<>();
+
+    /**
+     * Map of partitions to snapshot and theirs corresponding delta PageStores.
+     * Writers are pinned to the snapshot context due to controlling partition
+     * processing supplier.
+     */
+    private final Map<GroupPartitionId, PageStoreSerialWriter> partDeltaWriters = new HashMap<>();
+
+    /** Snapshot data sender. */
+    @GridToStringExclude
+    private final SnapshotSender snpSndr;
+
+    /**
+     * Requested map of cache groups and its partitions to include into snapshot. If array of partitions
+     * is {@code null} than all OWNING partitions for given cache groups will be included into snapshot.
+     * In this case if all of partitions have OWNING state the index partition also will be included.
+     * <p>
+     * If partitions for particular cache group are not provided that they will be collected and added
+     * on checkpoint under the write lock.
+     */
+    private final Map<Integer, Set<Integer>> parts;
+
+    /** Cache group and corresponding partitions collected under the checkpoint write lock. */
+    private final Map<Integer, Set<Integer>> processed = new HashMap<>();
+
+    /** Checkpoint end future. */
+    private final CompletableFuture<Boolean> cpEndFut = new CompletableFuture<>();
+
+    /** Future to wait until checkpoint mark phase will be finished and snapshot tasks scheduled. */
+    private final GridFutureAdapter<Void> startedFut = new GridFutureAdapter<>();
+
+    /** Absolute snapshot storage path. */
+    private File tmpSnpDir;
+
+    /** Future which will be completed when task requested to be closed. Will be executed on system pool. */
+    private volatile CompletableFuture<Void> closeFut;
+
+    /** An exception which has been occurred during snapshot processing. */
+    private final AtomicReference<Throwable> err = new AtomicReference<>();
+
+    /** Flag indicates that task already scheduled on checkpoint. */
+    private final AtomicBoolean started = new AtomicBoolean();
+
+    /**
+     * @param e Finished snapshot task future with particular exception.
+     */
+    public SnapshotFutureTask(IgniteCheckedException e) {
+        A.notNull(e, "Exception for a finished snapshot task must be not null");
+
+        cctx = null;
+        log = null;
+        snpName = null;
+        srcNodeId = null;
+        tmpTaskWorkDir = null;
+        snpSndr = null;
+
+        err.set(e);
+        startedFut.onDone(e);
+        onDone(e);
+        parts = null;
+        ioFactory = null;
+        locBuff = null;
+    }
+
+    /**
+     * @param snpName Unique identifier of snapshot task.
+     * @param ioFactory Factory to working with delta as file storage.
+     * @param parts Map of cache groups and its partitions to include into snapshot, if set of partitions
+     * is {@code null} than all OWNING partitions for given cache groups will be included into snapshot.
+     */
+    public SnapshotFutureTask(
+        GridCacheSharedContext<?, ?> cctx,
+        UUID srcNodeId,
+        String snpName,
+        File tmpWorkDir,
+        FileIOFactory ioFactory,
+        SnapshotSender snpSndr,
+        Map<Integer, Set<Integer>> parts,
+        ThreadLocal<ByteBuffer> locBuff
+    ) {
+        A.notNull(snpName, "Snapshot name cannot be empty or null");
+        A.notNull(snpSndr, "Snapshot sender which handles execution tasks must be not null");
+        A.notNull(snpSndr.executor(), "Executor service must be not null");
+
+        this.parts = parts;
+        this.cctx = cctx;
+        this.log = cctx.logger(SnapshotFutureTask.class);
+        this.snpName = snpName;
+        this.srcNodeId = srcNodeId;
+        this.tmpTaskWorkDir = new File(tmpWorkDir, snpName);
+        this.snpSndr = snpSndr;
+        this.ioFactory = ioFactory;
+        this.locBuff = locBuff;
+    }
+
+    /**
+     * @return Snapshot name.
+     */
+    public String snapshotName() {
+        return snpName;
+    }
+
+    /**
+     * @return Node id which triggers this operation.
+     */
+    public UUID sourceNodeId() {
+        return srcNodeId;
+    }
+
+    /**
+     * @return Type of snapshot operation.
+     */
+    public Class<? extends SnapshotSender> type() {
+        return snpSndr.getClass();
+    }
+
+    /**
+     * @return Set of cache groups included into snapshot operation.
+     */
+    public Set<Integer> affectedCacheGroups() {
+        return parts.keySet();
+    }
+
+    /**
+     * @param th An exception which occurred during snapshot processing.
+     */
+    public void acceptException(Throwable th) {
+        if (th == null)
+            return;
+
+        if (err.compareAndSet(null, th))
+            closeAsync();
+
+        startedFut.onDone(th);
+
+        U.warn(log, "Snapshot task has accepted exception to stop: " + th);
+    }
+
+    /** {@inheritDoc} */
+    @Override public boolean onDone(@Nullable Boolean res, @Nullable Throwable err) {
+        for (PageStoreSerialWriter writer : partDeltaWriters.values())
+            U.closeQuiet(writer);
+
+        snpSndr.close(err);
+
+        if (tmpSnpDir != null)
+            U.delete(tmpSnpDir);
+
+        // Delete snapshot directory if no other files exists.
+        try {
+            if (U.fileCount(tmpTaskWorkDir.toPath()) == 0 || err != null)
+                U.delete(tmpTaskWorkDir.toPath());
+        }
+        catch (IOException e) {
+            log.error("Snapshot directory doesn't exist [snpName=" + snpName + ", dir=" + tmpTaskWorkDir + ']');
+        }
+
+        if (err != null)
+            startedFut.onDone(err);
+
+        return super.onDone(res, err);
+    }
+
+    /**
+     * @throws IgniteCheckedException If fails.
+     */
+    public void awaitStarted() throws IgniteCheckedException {
+        startedFut.get();
+    }
+
+    /**
+     * @return {@code true} if current task requested to be stopped.
+     */
+    private boolean stopping() {
+        return err.get() != null;
+    }
+
+    /**
+     * Initiates snapshot task.
+     *
+     * @return {@code true} if task started by this call.
+     */
+    public boolean start() {
+        if (stopping())
+            return false;
+
+        try {
+            if (!started.compareAndSet(false, true))
+                return false;
+
+            tmpSnpDir = U.resolveWorkDirectory(tmpTaskWorkDir.getAbsolutePath(),
+                databaseRelativePath(cctx.kernalContext().pdsFolderResolver().resolveFolders().folderName()),
+                false);
+
+            for (Integer grpId : parts.keySet()) {
+                CacheGroupContext gctx = cctx.cache().cacheGroup(grpId);
+
+                if (gctx == null)
+                    throw new IgniteCheckedException("Cache group context not found: " + grpId);
+
+                if (!CU.isPersistentCache(gctx.config(), cctx.kernalContext().config().getDataStorageConfiguration()))
+                    throw new IgniteCheckedException("In-memory cache groups are not allowed to be snapshot: " + grpId);
+
+                if (gctx.config().isEncryptionEnabled())
+                    throw new IgniteCheckedException("Encrypted cache groups are not allowed to be snapshot: " + grpId);
+
+                // Create cache group snapshot directory on start in a single thread.
+                U.ensureDirectory(cacheWorkDir(tmpSnpDir, cacheDirName(gctx.config())),
+                    "directory for snapshotting cache group",
+                    log);
+            }
+
+            startedFut.listen(f ->
+                ((GridCacheDatabaseSharedManager)cctx.database()).removeCheckpointListener(this)
+            );
+
+            // Listener will be removed right after first execution.
+            ((GridCacheDatabaseSharedManager)cctx.database()).addCheckpointListener(this);
+
+            if (log.isInfoEnabled()) {
+                log.info("Snapshot operation is scheduled on local node and will be handled by the checkpoint " +
+                    "listener [sctx=" + this + ", topVer=" + cctx.discovery().topologyVersionEx() + ']');
+            }
+        }
+        catch (IgniteCheckedException e) {
+            acceptException(e);
+
+            return false;
+        }
+
+        return true;
+    }
+
+    /** {@inheritDoc} */
+    @Override public void beforeCheckpointBegin(Context ctx) {
+        if (stopping())
+            return;
+
+        ctx.finishedStateFut().listen(f -> {
+            if (f.error() == null)
+                cpEndFut.complete(true);
+            else
+                cpEndFut.completeExceptionally(f.error());
+        });
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onMarkCheckpointBegin(Context ctx) {
+        // Write lock is hold. Partition pages counters has been collected under write lock.
+        if (stopping())
+            return;
+
+        try {
+            for (Map.Entry<Integer, Set<Integer>> e : parts.entrySet()) {
+                int grpId = e.getKey();
+                Set<Integer> grpParts = e.getValue();
+
+                GridDhtPartitionTopology top = cctx.cache().cacheGroup(grpId).topology();
+
+                Iterator<GridDhtLocalPartition> iter;
+
+                if (grpParts == null)
+                    iter = top.currentLocalPartitions().iterator();
+                else {
+                    if (grpParts.contains(INDEX_PARTITION)) {
+                        throw new IgniteCheckedException("Index partition cannot be included into snapshot if " +
+                            " set of cache group partitions has been explicitly provided [grpId=" + grpId + ']');
+                    }
+
+                    iter = F.iterator(grpParts, top::localPartition, false);
+                }
+
+                Set<Integer> owning = new HashSet<>();
+                Set<Integer> missed = new HashSet<>();
+
+                // Iterate over partitions in particular cache group.
+                while (iter.hasNext()) {
+                    GridDhtLocalPartition part = iter.next();
+
+                    // Partition can be in MOVING\RENTING states.
+                    // Index partition will be excluded if not all partition OWNING.
+                    // There is no data assigned to partition, thus it haven't been created yet.
+                    if (part.state() == GridDhtPartitionState.OWNING)
+                        owning.add(part.id());
+                    else
+                        missed.add(part.id());
+                }
+
+                if (grpParts != null) {
+                    // Partition has been provided for cache group, but some of them are not in OWNING state.
+                    // Exit with an error.
+                    if (!missed.isEmpty()) {
+                        throw new IgniteCheckedException("Snapshot operation cancelled due to " +
+                            "not all of requested partitions has OWNING state on local node [grpId=" + grpId +
+                            ", missed" + missed + ']');
+                    }
+                }
+                else {
+                    // Partitions has not been provided for snapshot task and all partitions have
+                    // OWNING state, so index partition must be included into snapshot.
+                    if (!missed.isEmpty()) {
+                        log.warning("All local cache group partitions in OWNING state have been included into a snapshot. " +
+                            "Partitions which have different states skipped. Index partitions has also been skipped " +
+                            "[snpName=" + snpName + ", grpId=" + grpId + ", missed=" + missed + ']');
+                    }
+                    else if (missed.isEmpty() && cctx.kernalContext().query().moduleEnabled())
+                        owning.add(INDEX_PARTITION);
+                }
+
+                processed.put(grpId, owning);
+            }
+
+            for (Map.Entry<Integer, Set<Integer>> e : processed.entrySet()) {
+                int grpId = e.getKey();
+
+                CacheGroupContext gctx = cctx.cache().cacheGroup(grpId);
+
+                if (gctx == null) {
+                    throw new IgniteCheckedException("Cache group context has not found " +
+                        "due to the cache group is stopped: " + grpId);
+                }
+
+                for (int partId : e.getValue()) {
+                    GroupPartitionId pair = new GroupPartitionId(grpId, partId);
+
+                    PageStore store = ((FilePageStoreManager)cctx.pageStore()).getStore(grpId, partId);
+
+                    partDeltaWriters.put(pair,
+                        new PageStoreSerialWriter(store,
+                            partDeltaFile(cacheWorkDir(tmpSnpDir, cacheDirName(gctx.config())), partId)));
+
+                    partFileLengths.put(pair, store.size());
+                }
+            }
+        }
+        catch (IgniteCheckedException e) {
+            acceptException(e);
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onCheckpointBegin(Context ctx) {
+        if (stopping())
+            return;
+
+        // Snapshot task is now started since checkpoint write lock released.
+        if (!startedFut.onDone())
+            return;
+
+        assert !processed.isEmpty() : "Partitions to process must be collected under checkpoint mark phase";
+
+        wrapExceptionIfStarted(() -> snpSndr.init(processed.values().stream().mapToInt(Set::size).sum()))
 
 Review comment:
   Fixed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r409054221
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManager.java
 ##########
 @@ -0,0 +1,1986 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.channels.FileChannel;
+import java.nio.file.FileVisitResult;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.SimpleFileVisitor;
+import java.nio.file.attribute.BasicFileAttributes;
+import java.util.ArrayDeque;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Queue;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.Executor;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.concurrent.atomic.LongAdder;
+import java.util.function.BiConsumer;
+import java.util.function.BiFunction;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.events.DiscoveryEvent;
+import org.apache.ignite.failure.FailureContext;
+import org.apache.ignite.failure.FailureType;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.GridTopic;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.events.DiscoveryCustomEvent;
+import org.apache.ignite.internal.managers.communication.GridIoManager;
+import org.apache.ignite.internal.managers.communication.GridMessageListener;
+import org.apache.ignite.internal.managers.communication.TransmissionCancelledException;
+import org.apache.ignite.internal.managers.communication.TransmissionHandler;
+import org.apache.ignite.internal.managers.communication.TransmissionMeta;
+import org.apache.ignite.internal.managers.communication.TransmissionPolicy;
+import org.apache.ignite.internal.managers.discovery.DiscoCache;
+import org.apache.ignite.internal.managers.discovery.DiscoveryCustomMessage;
+import org.apache.ignite.internal.managers.discovery.GridDiscoveryManager;
+import org.apache.ignite.internal.managers.eventstorage.DiscoveryEventListener;
+import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.CacheType;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.PartitionsExchangeAware;
+import org.apache.ignite.internal.processors.cache.persistence.StorageException;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.filename.PdsFolderSettings;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.MetastorageLifecycleListener;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadOnlyMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadWriteMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.tree.io.PageIO;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.processors.marshaller.MappedName;
+import org.apache.ignite.internal.processors.metric.MetricRegistry;
+import org.apache.ignite.internal.processors.metric.impl.LongAdderMetric;
+import org.apache.ignite.internal.util.GridBusyLock;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.distributed.InitMessage;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.tostring.GridToStringInclude;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.A;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.S;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.lang.IgniteUuid;
+import org.apache.ignite.thread.IgniteThreadPoolExecutor;
+import org.apache.ignite.thread.OomExceptionHandler;
+import org.jetbrains.annotations.Nullable;
+
+import static java.nio.file.StandardOpenOption.READ;
+import static org.apache.ignite.cluster.ClusterState.active;
+import static org.apache.ignite.configuration.IgniteConfiguration.DFLT_SNAPSHOT_DIRECTORY;
+import static org.apache.ignite.events.EventType.EVT_NODE_FAILED;
+import static org.apache.ignite.events.EventType.EVT_NODE_LEFT;
+import static org.apache.ignite.internal.IgniteFeatures.PERSISTENCE_CACHE_SNAPSHOT;
+import static org.apache.ignite.internal.IgniteFeatures.nodeSupports;
+import static org.apache.ignite.internal.MarshallerContextImpl.mappingFileStoreWorkDir;
+import static org.apache.ignite.internal.MarshallerContextImpl.saveMappings;
+import static org.apache.ignite.internal.events.DiscoveryCustomEvent.EVT_DISCOVERY_CUSTOM_EVT;
+import static org.apache.ignite.internal.managers.communication.GridIoPolicy.SYSTEM_POOL;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.INDEX_PARTITION;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.MAX_PARTITION_ID;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.resolveBinaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.INDEX_FILE_NAME;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.PART_FILE_TEMPLATE;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFile;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFileName;
+import static org.apache.ignite.internal.processors.cache.persistence.filename.PdsConsistentIdProcessor.DB_DEFAULT_FOLDER;
+import static org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId.getFlagByPartId;
+import static org.apache.ignite.internal.util.IgniteUtils.isLocalNodeCoordinator;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.END_SNAPSHOT;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.START_SNAPSHOT;
+
+/**
+ * Internal implementation of snapshot operations over persistence caches.
+ * <p>
+ * There are two major actions available:
+ * <ul>
+ *     <li>Create snapshot of the whole cluster cache groups by triggering PME to achieve consistency.</li>
+ *     <li>Create local snapshot of requested cache groups and send it to the node which request this operation.
+ *     Cache groups will be transmitted using internal API for transferring files. See {@link TransmissionHandler}.</li>
+ * </ul>
+ */
+public class IgniteSnapshotManager extends GridCacheSharedManagerAdapter
+    implements IgniteSnapshot, PartitionsExchangeAware, MetastorageLifecycleListener {
+    /** File with delta pages suffix. */
+    public static final String DELTA_SUFFIX = ".delta";
+
+    /** File name template consists of delta pages. */
+    public static final String PART_DELTA_TEMPLATE = PART_FILE_TEMPLATE + DELTA_SUFFIX;
+
+    /** File name template for index delta pages. */
+    public static final String INDEX_DELTA_NAME = INDEX_FILE_NAME + DELTA_SUFFIX;
+
+    /** Text Reason for checkpoint to start snapshot operation. */
+    public static final String CP_SNAPSHOT_REASON = "Checkpoint started to enforce snapshot operation: %s";
+
+    /** Name prefix for each remote snapshot operation. */
+    public static final String RMT_SNAPSHOT_PREFIX = "snapshot_";
+
+    /** Default snapshot directory for loading remote snapshots. */
+    public static final String DFLT_SNAPSHOT_TMP_DIR = "snp";
+
+    /** Timeout in millisecond for snapshot operations. */
+    public static final long DFLT_SNAPSHOT_TIMEOUT = 15_000L;
+
+    /** Snapshot in progress error message. */
+    public static final String SNP_IN_PROGRESS_ERR_MSG = "Operation rejected due to the snapshot operation in progress.";
+
+    /** Error message to finalize snapshot tasks. */
+    public static final String SNP_NODE_STOPPING_ERR_MSG = "Snapshot has been cancelled due to the local node " +
+        "is stopping";
+
+    /** Metastorage key to save currently running snapshot. */
+    public static final String SNP_RUNNING_KEY = "snapshot-running";
+
+    /** Snapshot metrics prefix. */
+    public static final String SNAPSHOT_METRICS = "snapshot";
+
+    /** Prefix for snapshot threads. */
+    private static final String SNAPSHOT_RUNNER_THREAD_PREFIX = "snapshot-runner";
+
+    /** Total number of thread to perform local snapshot. */
+    private static final int SNAPSHOT_THREAD_POOL_SIZE = 4;
+
+    /** Default snapshot topic to receive snapshots from remote node. */
+    private static final Object DFLT_INITIAL_SNAPSHOT_TOPIC = GridTopic.TOPIC_SNAPSHOT.topic("rmt_snp");
+
+    /** File transmission parameter of cache group id. */
+    private static final String SNP_GRP_ID_PARAM = "grpId";
+
+    /** File transmission parameter of cache partition id. */
+    private static final String SNP_PART_ID_PARAM = "partId";
+
+    /** File transmission parameter of node-sender directory path with its consistentId (e.g. db/IgniteNode0). */
+    private static final String SNP_DB_NODE_PATH_PARAM = "dbNodePath";
+
+    /** File transmission parameter of a cache directory with is currently sends its partitions. */
+    private static final String SNP_CACHE_DIR_NAME_PARAM = "cacheDirName";
+
+    /** Snapshot parameter name for a file transmission. */
+    private static final String SNP_NAME_PARAM = "snpName";
+
+    /** Total snapshot files count which receiver should expect to receive. */
+    private static final String SNP_PARTITIONS_CNT = "partsCnt";
+
+    /**
+     * Local buffer to perform copy-on-write operations with pages for {@code SnapshotFutureTask.PageStoreSerialWriter}s.
+     * It is important to have only only buffer per thread (instead of creating each buffer per
+     * each {@code SnapshotFutureTask.PageStoreSerialWriter}) this is redundant and can lead to OOM errors. Direct buffer
+     * deallocate only when ByteBuffer is garbage collected, but it can get out of off-heap memory before it.
+     */
+    private final ThreadLocal<ByteBuffer> locBuff;
+
+    /** Map of registered cache snapshot processes and their corresponding contexts. */
+    private final ConcurrentMap<String, SnapshotFutureTask> locSnpTasks = new ConcurrentHashMap<>();
+
+    /** Lock to protect the resources is used. */
+    private final GridBusyLock busyLock = new GridBusyLock();
+
+    /** Requested snapshot from remote node. */
+    private final AtomicReference<RemoteSnapshotFuture> rmtSnpReq = new AtomicReference<>();
+
+    /** Mutex used to order cluster snapshot operation progress. */
+    private final Object snpOpMux = new Object();
+
+    /** Take snapshot operation procedure. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> startSnpProc;
+
+    /** Check previously performed snapshot operation and delete uncompleted files if need. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> endSnpProc;
+
+    /** Resolved persistent data storage settings. */
+    private volatile PdsFolderSettings pdsSettings;
+
+    /** Fully initialized metastorage. */
+    private volatile ReadWriteMetastorage metaStorage;
+
+    /** Local snapshot sender factory. */
+    private Function<String, SnapshotSender> locSndrFactory = localSnapshotSenderFactory();
+
+    /** Main snapshot directory to save created snapshots. */
+    private volatile File locSnpDir;
+
+    /**
+     * Working directory for loaded snapshots from the remote nodes and storing
+     * temporary partition delta-files of locally started snapshot process.
+     */
+    private File tmpWorkDir;
+
+    /** Factory to working with delta as file storage. */
+    private volatile FileIOFactory ioFactory = new RandomAccessFileIOFactory();
+
+    /** Factory to create page store for restore. */
+    private volatile BiFunction<Integer, Boolean, FilePageStoreFactory> storeFactory;
+
+    /** Snapshot thread pool to perform local partition snapshots. */
+    private ExecutorService snpRunner;
+
+    /** System discovery message listener. */
+    private DiscoveryEventListener discoLsnr;
+
+    /** Cluster snapshot operation requested by user. */
+    private ClusterSnapshotFuture clusterSnpFut;
+
+    /** Current snapshot operation on local node. */
+    private volatile SnapshotOperationRequest clusterSnpRq;
+
+    /** {@code true} if recovery process occurred for snapshot. */
+    private volatile boolean recovered;
+
+    /** Last seen cluster snapshot operation. */
+    private volatile ClusterSnapshotFuture lastSeenSnpFut = new ClusterSnapshotFuture();
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public IgniteSnapshotManager(GridKernalContext ctx) {
+        locBuff = ThreadLocal.withInitial(() ->
+            ByteBuffer.allocateDirect(ctx.config().getDataStorageConfiguration().getPageSize())
+                .order(ByteOrder.nativeOrder()));
+
+        startSnpProc = new DistributedProcess<>(ctx, START_SNAPSHOT, this::initLocalSnapshotStartStage,
+            this::processLocalSnapshotStartStageResult);
+
+        endSnpProc = new DistributedProcess<>(ctx, END_SNAPSHOT, this::initLocalSnapshotEndStage,
+            this::processLocalSnapshotEndStageResult);
+    }
+
+    /**
+     * @param snapshotCacheDir Snapshot directory to store files.
+     * @param partId Cache partition identifier.
+     * @return A file representation.
+     */
+    public static File partDeltaFile(File snapshotCacheDir, int partId) {
+        return new File(snapshotCacheDir, partDeltaFileName(partId));
+    }
+
+    /**
+     * @param partId Partition id.
+     * @return File name of delta partition pages.
+     */
+    public static String partDeltaFileName(int partId) {
+        assert partId <= MAX_PARTITION_ID || partId == INDEX_PARTITION;
+
+        return partId == INDEX_PARTITION ? INDEX_DELTA_NAME : String.format(PART_DELTA_TEMPLATE, partId);
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void start0() throws IgniteCheckedException {
+        super.start0();
+
+        GridKernalContext ctx = cctx.kernalContext();
+
+        if (ctx.clientNode())
+            return;
+
+        if (!CU.isPersistenceEnabled(ctx.config()))
+            return;
+
+        snpRunner = new IgniteThreadPoolExecutor(SNAPSHOT_RUNNER_THREAD_PREFIX,
+            cctx.igniteInstanceName(),
+            SNAPSHOT_THREAD_POOL_SIZE,
+            SNAPSHOT_THREAD_POOL_SIZE,
+            IgniteConfiguration.DFLT_THREAD_KEEP_ALIVE_TIME,
+            new LinkedBlockingQueue<>(),
+            SYSTEM_POOL,
+            new OomExceptionHandler(ctx));
+
+        assert cctx.pageStore() instanceof FilePageStoreManager;
+
+        FilePageStoreManager storeMgr = (FilePageStoreManager)cctx.pageStore();
+
+        pdsSettings = cctx.kernalContext().pdsFolderResolver().resolveFolders();
+
+        locSnpDir = resolveSnapshotWorkDirectory(ctx.config());
+        tmpWorkDir = Paths.get(storeMgr.workDir().getAbsolutePath(), DFLT_SNAPSHOT_TMP_DIR).toFile();
+
+        U.ensureDirectory(locSnpDir, "snapshot work directory", log);
+        U.ensureDirectory(tmpWorkDir, "temp directory for snapshot creation", log);
+
+        MetricRegistry mreg = cctx.kernalContext().metric().registry(SNAPSHOT_METRICS);
+
+        mreg.register("LastSnapshotStartTime", () -> lastSeenSnpFut.startTime,
+            "The system time approximated by 10 ms when the last cluster snapshot operation has been started.");
 
 Review comment:
   Update all metrics description.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r410205787
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotSender.java
 ##########
 @@ -0,0 +1,234 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.util.Collection;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.Executor;
+import java.util.concurrent.locks.ReadWriteLock;
+import java.util.concurrent.locks.ReentrantReadWriteLock;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.marshaller.MappedName;
+import org.jetbrains.annotations.Nullable;
+
+/**
+ *
+ */
+abstract class SnapshotSender {
+    /** Busy processing lock. */
+    private final ReadWriteLock lock = new ReentrantReadWriteLock();
+
+    /** Executor to run operation at. */
+    private final Executor exec;
+
+    /** {@code true} if sender is currently working */
 
 Review comment:
   Fixed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r410149060
 
 

 ##########
 File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotSelfTest.java
 ##########
 @@ -286,6 +293,134 @@ public void testSnapshotPrimaryBackupsTheSame() throws Exception {
         TestRecordingCommunicationSpi.stopBlockAll();
     }
 
+    /** @throws Exception If fails. */
+    @Test
+    public void testClusterSnapshotConsistencyUnderLoad() throws Exception {
+        int clients = 50;
+        int balance = 10_000;
+        int transferLimit = 1000;
+        int total = clients * balance * 2;
+        int grids = 3;
+        int transferThreadCnt = 4;
+        AtomicBoolean stop = new AtomicBoolean(false);
+        CountDownLatch txStarted = new CountDownLatch(1);
+
+        CacheConfiguration<Integer, Account> eastCcfg = txCacheConfig(new CacheConfiguration<>("east"));
+        CacheConfiguration<Integer, Account> westCcfg = txCacheConfig(new CacheConfiguration<>("west"));
+
+        for (int i = 0; i < grids; i++)
+            startGrid(optimize(getConfiguration(getTestIgniteInstanceName(i)).setCacheConfiguration(eastCcfg, westCcfg)));
+
+        grid(0).cluster().state(ACTIVE);
+
+        Ignite client = startClientGrid(grids);
+
+        IgniteCache<Integer, Account> eastCache = client.cache(eastCcfg.getName());
+        IgniteCache<Integer, Account> westCache = client.cache(westCcfg.getName());
+
+        // Create clients with zero balance.
+        for (int i = 0; i < clients; i++) {
+            eastCache.put(i, new Account(i, balance));
+            westCache.put(i, new Account(i, balance));
+        }
+
+        assertEquals("The initial summary value in all caches is not correct.",
+            total, sumAllCacheValues(client, clients, eastCcfg.getName(), westCcfg.getName()));
 
 Review comment:
   Renamed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r409105893
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManager.java
 ##########
 @@ -0,0 +1,1986 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.channels.FileChannel;
+import java.nio.file.FileVisitResult;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.SimpleFileVisitor;
+import java.nio.file.attribute.BasicFileAttributes;
+import java.util.ArrayDeque;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Queue;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.Executor;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.concurrent.atomic.LongAdder;
+import java.util.function.BiConsumer;
+import java.util.function.BiFunction;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.events.DiscoveryEvent;
+import org.apache.ignite.failure.FailureContext;
+import org.apache.ignite.failure.FailureType;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.GridTopic;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.events.DiscoveryCustomEvent;
+import org.apache.ignite.internal.managers.communication.GridIoManager;
+import org.apache.ignite.internal.managers.communication.GridMessageListener;
+import org.apache.ignite.internal.managers.communication.TransmissionCancelledException;
+import org.apache.ignite.internal.managers.communication.TransmissionHandler;
+import org.apache.ignite.internal.managers.communication.TransmissionMeta;
+import org.apache.ignite.internal.managers.communication.TransmissionPolicy;
+import org.apache.ignite.internal.managers.discovery.DiscoCache;
+import org.apache.ignite.internal.managers.discovery.DiscoveryCustomMessage;
+import org.apache.ignite.internal.managers.discovery.GridDiscoveryManager;
+import org.apache.ignite.internal.managers.eventstorage.DiscoveryEventListener;
+import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.CacheType;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.PartitionsExchangeAware;
+import org.apache.ignite.internal.processors.cache.persistence.StorageException;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.filename.PdsFolderSettings;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.MetastorageLifecycleListener;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadOnlyMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadWriteMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.tree.io.PageIO;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.processors.marshaller.MappedName;
+import org.apache.ignite.internal.processors.metric.MetricRegistry;
+import org.apache.ignite.internal.processors.metric.impl.LongAdderMetric;
+import org.apache.ignite.internal.util.GridBusyLock;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.distributed.InitMessage;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.tostring.GridToStringInclude;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.A;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.S;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.lang.IgniteUuid;
+import org.apache.ignite.thread.IgniteThreadPoolExecutor;
+import org.apache.ignite.thread.OomExceptionHandler;
+import org.jetbrains.annotations.Nullable;
+
+import static java.nio.file.StandardOpenOption.READ;
+import static org.apache.ignite.cluster.ClusterState.active;
+import static org.apache.ignite.configuration.IgniteConfiguration.DFLT_SNAPSHOT_DIRECTORY;
+import static org.apache.ignite.events.EventType.EVT_NODE_FAILED;
+import static org.apache.ignite.events.EventType.EVT_NODE_LEFT;
+import static org.apache.ignite.internal.IgniteFeatures.PERSISTENCE_CACHE_SNAPSHOT;
+import static org.apache.ignite.internal.IgniteFeatures.nodeSupports;
+import static org.apache.ignite.internal.MarshallerContextImpl.mappingFileStoreWorkDir;
+import static org.apache.ignite.internal.MarshallerContextImpl.saveMappings;
+import static org.apache.ignite.internal.events.DiscoveryCustomEvent.EVT_DISCOVERY_CUSTOM_EVT;
+import static org.apache.ignite.internal.managers.communication.GridIoPolicy.SYSTEM_POOL;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.INDEX_PARTITION;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.MAX_PARTITION_ID;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.resolveBinaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.INDEX_FILE_NAME;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.PART_FILE_TEMPLATE;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFile;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFileName;
+import static org.apache.ignite.internal.processors.cache.persistence.filename.PdsConsistentIdProcessor.DB_DEFAULT_FOLDER;
+import static org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId.getFlagByPartId;
+import static org.apache.ignite.internal.util.IgniteUtils.isLocalNodeCoordinator;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.END_SNAPSHOT;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.START_SNAPSHOT;
+
+/**
+ * Internal implementation of snapshot operations over persistence caches.
+ * <p>
+ * There are two major actions available:
+ * <ul>
+ *     <li>Create snapshot of the whole cluster cache groups by triggering PME to achieve consistency.</li>
+ *     <li>Create local snapshot of requested cache groups and send it to the node which request this operation.
+ *     Cache groups will be transmitted using internal API for transferring files. See {@link TransmissionHandler}.</li>
+ * </ul>
+ */
+public class IgniteSnapshotManager extends GridCacheSharedManagerAdapter
+    implements IgniteSnapshot, PartitionsExchangeAware, MetastorageLifecycleListener {
+    /** File with delta pages suffix. */
+    public static final String DELTA_SUFFIX = ".delta";
+
+    /** File name template consists of delta pages. */
+    public static final String PART_DELTA_TEMPLATE = PART_FILE_TEMPLATE + DELTA_SUFFIX;
+
+    /** File name template for index delta pages. */
+    public static final String INDEX_DELTA_NAME = INDEX_FILE_NAME + DELTA_SUFFIX;
+
+    /** Text Reason for checkpoint to start snapshot operation. */
+    public static final String CP_SNAPSHOT_REASON = "Checkpoint started to enforce snapshot operation: %s";
+
+    /** Name prefix for each remote snapshot operation. */
+    public static final String RMT_SNAPSHOT_PREFIX = "snapshot_";
+
+    /** Default snapshot directory for loading remote snapshots. */
+    public static final String DFLT_SNAPSHOT_TMP_DIR = "snp";
+
+    /** Timeout in millisecond for snapshot operations. */
+    public static final long DFLT_SNAPSHOT_TIMEOUT = 15_000L;
+
+    /** Snapshot in progress error message. */
+    public static final String SNP_IN_PROGRESS_ERR_MSG = "Operation rejected due to the snapshot operation in progress.";
+
+    /** Error message to finalize snapshot tasks. */
+    public static final String SNP_NODE_STOPPING_ERR_MSG = "Snapshot has been cancelled due to the local node " +
+        "is stopping";
+
+    /** Metastorage key to save currently running snapshot. */
+    public static final String SNP_RUNNING_KEY = "snapshot-running";
+
+    /** Snapshot metrics prefix. */
+    public static final String SNAPSHOT_METRICS = "snapshot";
+
+    /** Prefix for snapshot threads. */
+    private static final String SNAPSHOT_RUNNER_THREAD_PREFIX = "snapshot-runner";
+
+    /** Total number of thread to perform local snapshot. */
+    private static final int SNAPSHOT_THREAD_POOL_SIZE = 4;
+
+    /** Default snapshot topic to receive snapshots from remote node. */
+    private static final Object DFLT_INITIAL_SNAPSHOT_TOPIC = GridTopic.TOPIC_SNAPSHOT.topic("rmt_snp");
+
+    /** File transmission parameter of cache group id. */
+    private static final String SNP_GRP_ID_PARAM = "grpId";
+
+    /** File transmission parameter of cache partition id. */
+    private static final String SNP_PART_ID_PARAM = "partId";
+
+    /** File transmission parameter of node-sender directory path with its consistentId (e.g. db/IgniteNode0). */
+    private static final String SNP_DB_NODE_PATH_PARAM = "dbNodePath";
+
+    /** File transmission parameter of a cache directory with is currently sends its partitions. */
+    private static final String SNP_CACHE_DIR_NAME_PARAM = "cacheDirName";
+
+    /** Snapshot parameter name for a file transmission. */
+    private static final String SNP_NAME_PARAM = "snpName";
+
+    /** Total snapshot files count which receiver should expect to receive. */
+    private static final String SNP_PARTITIONS_CNT = "partsCnt";
+
+    /**
+     * Local buffer to perform copy-on-write operations with pages for {@code SnapshotFutureTask.PageStoreSerialWriter}s.
+     * It is important to have only only buffer per thread (instead of creating each buffer per
+     * each {@code SnapshotFutureTask.PageStoreSerialWriter}) this is redundant and can lead to OOM errors. Direct buffer
+     * deallocate only when ByteBuffer is garbage collected, but it can get out of off-heap memory before it.
+     */
+    private final ThreadLocal<ByteBuffer> locBuff;
+
+    /** Map of registered cache snapshot processes and their corresponding contexts. */
+    private final ConcurrentMap<String, SnapshotFutureTask> locSnpTasks = new ConcurrentHashMap<>();
+
+    /** Lock to protect the resources is used. */
+    private final GridBusyLock busyLock = new GridBusyLock();
+
+    /** Requested snapshot from remote node. */
+    private final AtomicReference<RemoteSnapshotFuture> rmtSnpReq = new AtomicReference<>();
+
+    /** Mutex used to order cluster snapshot operation progress. */
+    private final Object snpOpMux = new Object();
+
+    /** Take snapshot operation procedure. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> startSnpProc;
+
+    /** Check previously performed snapshot operation and delete uncompleted files if need. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> endSnpProc;
+
+    /** Resolved persistent data storage settings. */
+    private volatile PdsFolderSettings pdsSettings;
+
+    /** Fully initialized metastorage. */
+    private volatile ReadWriteMetastorage metaStorage;
+
+    /** Local snapshot sender factory. */
+    private Function<String, SnapshotSender> locSndrFactory = localSnapshotSenderFactory();
+
+    /** Main snapshot directory to save created snapshots. */
+    private volatile File locSnpDir;
+
+    /**
+     * Working directory for loaded snapshots from the remote nodes and storing
+     * temporary partition delta-files of locally started snapshot process.
+     */
+    private File tmpWorkDir;
+
+    /** Factory to working with delta as file storage. */
+    private volatile FileIOFactory ioFactory = new RandomAccessFileIOFactory();
+
+    /** Factory to create page store for restore. */
+    private volatile BiFunction<Integer, Boolean, FilePageStoreFactory> storeFactory;
+
+    /** Snapshot thread pool to perform local partition snapshots. */
+    private ExecutorService snpRunner;
+
+    /** System discovery message listener. */
+    private DiscoveryEventListener discoLsnr;
+
+    /** Cluster snapshot operation requested by user. */
+    private ClusterSnapshotFuture clusterSnpFut;
+
+    /** Current snapshot operation on local node. */
+    private volatile SnapshotOperationRequest clusterSnpRq;
+
+    /** {@code true} if recovery process occurred for snapshot. */
+    private volatile boolean recovered;
+
+    /** Last seen cluster snapshot operation. */
+    private volatile ClusterSnapshotFuture lastSeenSnpFut = new ClusterSnapshotFuture();
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public IgniteSnapshotManager(GridKernalContext ctx) {
+        locBuff = ThreadLocal.withInitial(() ->
+            ByteBuffer.allocateDirect(ctx.config().getDataStorageConfiguration().getPageSize())
+                .order(ByteOrder.nativeOrder()));
+
+        startSnpProc = new DistributedProcess<>(ctx, START_SNAPSHOT, this::initLocalSnapshotStartStage,
+            this::processLocalSnapshotStartStageResult);
+
+        endSnpProc = new DistributedProcess<>(ctx, END_SNAPSHOT, this::initLocalSnapshotEndStage,
+            this::processLocalSnapshotEndStageResult);
+    }
+
+    /**
+     * @param snapshotCacheDir Snapshot directory to store files.
+     * @param partId Cache partition identifier.
+     * @return A file representation.
+     */
+    public static File partDeltaFile(File snapshotCacheDir, int partId) {
+        return new File(snapshotCacheDir, partDeltaFileName(partId));
+    }
+
+    /**
+     * @param partId Partition id.
+     * @return File name of delta partition pages.
+     */
+    public static String partDeltaFileName(int partId) {
+        assert partId <= MAX_PARTITION_ID || partId == INDEX_PARTITION;
+
+        return partId == INDEX_PARTITION ? INDEX_DELTA_NAME : String.format(PART_DELTA_TEMPLATE, partId);
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void start0() throws IgniteCheckedException {
+        super.start0();
+
+        GridKernalContext ctx = cctx.kernalContext();
+
+        if (ctx.clientNode())
+            return;
+
+        if (!CU.isPersistenceEnabled(ctx.config()))
+            return;
+
+        snpRunner = new IgniteThreadPoolExecutor(SNAPSHOT_RUNNER_THREAD_PREFIX,
+            cctx.igniteInstanceName(),
+            SNAPSHOT_THREAD_POOL_SIZE,
+            SNAPSHOT_THREAD_POOL_SIZE,
+            IgniteConfiguration.DFLT_THREAD_KEEP_ALIVE_TIME,
+            new LinkedBlockingQueue<>(),
+            SYSTEM_POOL,
+            new OomExceptionHandler(ctx));
+
+        assert cctx.pageStore() instanceof FilePageStoreManager;
+
+        FilePageStoreManager storeMgr = (FilePageStoreManager)cctx.pageStore();
+
+        pdsSettings = cctx.kernalContext().pdsFolderResolver().resolveFolders();
+
+        locSnpDir = resolveSnapshotWorkDirectory(ctx.config());
+        tmpWorkDir = Paths.get(storeMgr.workDir().getAbsolutePath(), DFLT_SNAPSHOT_TMP_DIR).toFile();
+
+        U.ensureDirectory(locSnpDir, "snapshot work directory", log);
+        U.ensureDirectory(tmpWorkDir, "temp directory for snapshot creation", log);
+
+        MetricRegistry mreg = cctx.kernalContext().metric().registry(SNAPSHOT_METRICS);
+
+        mreg.register("LastSnapshotStartTime", () -> lastSeenSnpFut.startTime,
+            "The system time approximated by 10 ms when the last cluster snapshot operation has been started.");
+        mreg.register("LastSnapshotEndTime", () -> lastSeenSnpFut.endTime,
+            "The system time approximated by 10 ms when the last cluster snapshot operation has been finished.");
+        mreg.register("LastSnapshotName", () -> lastSeenSnpFut.name, String.class,
+            "The name of last started cluster snapshot operation.");
+        mreg.register("LastSnapshotErrorMessage",
+            () -> lastSeenSnpFut.error() == null ? null : lastSeenSnpFut.error().getMessage(),
+            String.class,
+            "The error message of last started cluster snapshot operation which fail. This value will be 'null' " +
+                "if last snapshot operation completed successfully.");
+        mreg.register("localSnapshotList", this::getSnapshots, List.class,
+            "The list of all known snapshots currently saved on the local node with respect to " +
+                "configured via IgniteConfiguration a snapshot path.");
+
+        storeFactory = storeMgr::getPageStoreFactory;
+
+        cctx.exchange().registerExchangeAwareComponent(this);
+        ctx.internalSubscriptionProcessor().registerMetastorageListener(this);
+
+        // Receive remote snapshots requests.
+        cctx.gridIO().addMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC, new GridMessageListener() {
+            @Override public void onMessage(UUID nodeId, Object msg, byte plc) {
+                if (!busyLock.enterBusy())
+                    return;
+
+                try {
+                    if (msg instanceof SnapshotRequestMessage) {
+                        SnapshotRequestMessage reqMsg0 = (SnapshotRequestMessage)msg;
+                        String snpName = reqMsg0.snapshotName();
+
+                        synchronized (this) {
+                            SnapshotFutureTask task = lastScheduledRemoteSnapshotTask(nodeId);
+
+                            if (task != null) {
+                                // Task will also be removed from local map due to the listener on future done.
+                                task.cancel();
+
+                                log.info("Snapshot request has been cancelled due to another request received " +
+                                    "[prevSnpResp=" + task + ", msg0=" + reqMsg0 + ']');
+                            }
+                        }
+
+                        SnapshotFutureTask task = registerSnapshotTask(snpName,
+                            nodeId,
+                            reqMsg0.parts(),
+                            remoteSnapshotSender(snpName, nodeId));
+
+                        task.listen(f -> {
+                            if (f.error() == null)
+                                return;
+
+                            U.error(log, "Failed to process request of creating a snapshot " +
+                                "[from=" + nodeId + ", msg=" + reqMsg0 + ']', f.error());
+
+                            try {
+                                cctx.gridIO().sendToCustomTopic(nodeId,
+                                    DFLT_INITIAL_SNAPSHOT_TOPIC,
+                                    new SnapshotResponseMessage(reqMsg0.snapshotName(), f.error().getMessage()),
+                                    SYSTEM_POOL);
+                            }
+                            catch (IgniteCheckedException ex0) {
+                                U.error(log, "Fail to send the response message with processing snapshot request " +
+                                    "error [request=" + reqMsg0 + ", nodeId=" + nodeId + ']', ex0);
+                            }
+                        });
+
+                        task.start();
+                    }
+                    else if (msg instanceof SnapshotResponseMessage) {
+                        SnapshotResponseMessage respMsg0 = (SnapshotResponseMessage)msg;
+
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.snpName.equals(respMsg0.snapshotName())) {
+                            if (log.isInfoEnabled()) {
+                                log.info("A stale snapshot response message has been received. Will be ignored " +
+                                    "[fromNodeId=" + nodeId + ", response=" + respMsg0 + ']');
+                            }
+
+                            return;
+                        }
+
+                        if (respMsg0.errorMessage() != null) {
+                            fut0.onDone(new IgniteCheckedException("Request cancelled. The snapshot operation stopped " +
+                                "on the remote node with an error: " + respMsg0.errorMessage()));
+                        }
+                    }
+                }
+                catch (Throwable e) {
+                    U.error(log, "Processing snapshot request from remote node fails with an error", e);
+
+                    cctx.kernalContext().failure().process(new FailureContext(FailureType.CRITICAL_ERROR, e));
+                }
+                finally {
+                    busyLock.leaveBusy();
+                }
+            }
+        });
+
+        cctx.gridEvents().addDiscoveryEventListener(discoLsnr = (evt, discoCache) -> {
+            if (!busyLock.enterBusy())
+                return;
+
+            try {
+                UUID leftNodeId = evt.eventNode().id();
+
+                if (evt.type() == EVT_DISCOVERY_CUSTOM_EVT) {
+                    DiscoveryCustomEvent evt0 = (DiscoveryCustomEvent)evt;
+
+                    if (evt0.customMessage() instanceof InitMessage) {
+                        InitMessage<?> msg = (InitMessage<?>)evt0.customMessage();
+
+                        // This happens when #takeSnapshot() method already invoked and distributed process
+                        // starts its action.
+                        if (msg.type() == START_SNAPSHOT.ordinal()) {
+                            assert clusterSnpRq != null ||
+                                !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()) : evt;
+
+                            DiscoveryCustomEvent customEvt = new DiscoveryCustomEvent();
+
+                            customEvt.node(evt0.node());
+                            customEvt.eventNode(evt0.eventNode());
+                            customEvt.affinityTopologyVersion(evt0.affinityTopologyVersion());
+                            customEvt.customMessage(new SnapshotStartDiscoveryMessage(discoCache, msg.processId()));
+
+                            // Handle new event inside discovery thread, so no guarantees will be violated.
+                            cctx.exchange().onDiscoveryEvent(customEvt, discoCache);
+                        }
+                    }
+                }
+                else if (evt.type() == EVT_NODE_LEFT || evt.type() == EVT_NODE_FAILED) {
+                    SnapshotOperationRequest snpRq = clusterSnpRq;
+
+                    for (SnapshotFutureTask sctx : locSnpTasks.values()) {
+                        if (sctx.sourceNodeId().equals(leftNodeId) ||
+                            (snpRq != null &&
+                                snpRq.snpName.equals(sctx.snapshotName()) &&
+                                snpRq.bltNodes.contains(leftNodeId))) {
+                            sctx.acceptException(new ClusterTopologyCheckedException("The node which requested snapshot " +
+                                "creation has left the grid"));
+                        }
+                    }
+
+                    RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                    if (snpTrFut != null && snpTrFut.rmtNodeId.equals(leftNodeId)) {
+                        snpTrFut.onDone(new ClusterTopologyCheckedException("The node from which a snapshot has been " +
+                            "requested left the grid"));
+                    }
+                }
+            }
+            finally {
+                busyLock.leaveBusy();
+            }
+        }, EVT_NODE_LEFT, EVT_NODE_FAILED, EVT_DISCOVERY_CUSTOM_EVT);
+
+        // Remote snapshot handler.
+        cctx.kernalContext().io().addTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC, new TransmissionHandler() {
+            @Override public void onEnd(UUID nodeId) {
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                assert snpTrFut.stores.isEmpty() : snpTrFut.stores.entrySet();
+                assert snpTrFut.partsLeft == 0 : snpTrFut;
+
+                snpTrFut.onDone();
+
+                log.info("Requested snapshot from remote node has been fully received " +
+                    "[snpName=" + snpTrFut.snpName + ", snpTrans=" + snpTrFut + ']');
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onException(UUID nodeId, Throwable err) {
+                RemoteSnapshotFuture fut = rmtSnpReq.get();
+
+                if (fut == null)
+                    return;
+
+                if (fut.rmtNodeId.equals(nodeId))
+                    fut.onDone(err);
+            }
+
+            /** {@inheritDoc} */
+            @Override public String filePath(UUID nodeId, TransmissionMeta fileMeta) {
+                Integer partId = (Integer)fileMeta.params().get(SNP_PART_ID_PARAM);
+                String rmtDbNodePath = (String)fileMeta.params().get(SNP_DB_NODE_PATH_PARAM);
+                String cacheDirName = (String)fileMeta.params().get(SNP_CACHE_DIR_NAME_PARAM);
+
+                RemoteSnapshotFuture transFut = resolve(nodeId, fileMeta);
+
+                try {
+                    File cacheDir = U.resolveWorkDirectory(tmpWorkDir.getAbsolutePath(),
+                        Paths.get(transFut.snpName, rmtDbNodePath, cacheDirName).toString(),
+                        false);
+
+                    return new File(cacheDir, getPartitionFileName(partId)).getAbsolutePath();
+                }
+                catch (IgniteCheckedException e) {
+                    throw new IgniteException(e);
+                }
+            }
+
+            /**
+             * @param nodeId Remote node id.
+             * @param meta Transmission meta.
+             * @return Resolved transmission future.
+             */
+            private RemoteSnapshotFuture resolve(UUID nodeId, TransmissionMeta meta) {
+                String snpName = (String)meta.params().get(SNP_NAME_PARAM);
+                Integer partsCnt = (Integer)meta.params().get(SNP_PARTITIONS_CNT);
+
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                if (snpTrFut == null || !snpTrFut.snpName.equals(snpName)) {
+                    throw new TransmissionCancelledException("Stale snapshot transmission will be ignored " +
+                        "[snpName=" + snpName + ", meta=" + meta + ", snpTrFut=" + snpTrFut + ']');
+                }
+
+                assert snpTrFut.snpName.equals(snpName) && snpTrFut.rmtNodeId.equals(nodeId) :
+                    "Another transmission in progress [snpTrFut=" + snpTrFut + ", nodeId=" + snpName + ']';
+
+                if (snpTrFut.partsLeft == -1)
+                    snpTrFut.partsLeft = partsCnt;
+
+                return snpTrFut;
+            }
+
+            /**
+             * @param snpTrans Current snapshot transmission.
+             * @param grpPartId Pair of group id and its partition id.
+             */
+            private void finishRecover(RemoteSnapshotFuture snpTrans, GroupPartitionId grpPartId) {
+                FilePageStore pageStore = null;
+
+                try {
+                    pageStore = snpTrans.stores.remove(grpPartId);
+
+                    pageStore.finishRecover();
+
+                    snpTrans.partConsumer.accept(new File(pageStore.getFileAbsolutePath()), grpPartId);
+
+                    snpTrans.partsLeft--;
+                }
+                catch (StorageException e) {
+                    throw new IgniteException(e);
+                }
+                finally {
+                    U.closeQuiet(pageStore);
+                }
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<ByteBuffer> chunkHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+
+                RemoteSnapshotFuture snpTrFut = resolve(nodeId, initMeta);
+
+                GroupPartitionId grpPartId = new GroupPartitionId(grpId, partId);
+                FilePageStore pageStore = snpTrFut.stores.get(grpPartId);
+
+                if (pageStore == null) {
+                    throw new IgniteException("Partition must be loaded before applying snapshot delta pages " +
+                        "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                pageStore.beginRecover();
+
+                // No snapshot delta pages received. Finalize recovery.
+                if (initMeta.count() == 0)
+                    finishRecover(snpTrFut, grpPartId);
+
+                return new Consumer<ByteBuffer>() {
+                    final LongAdder transferred = new LongAdder();
+
+                    @Override public void accept(ByteBuffer buff) {
+                        try {
+                            assert initMeta.count() != 0 : initMeta;
+
+                            RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                            if (fut0 == null || !fut0.equals(snpTrFut) || fut0.isCancelled()) {
+                                throw new TransmissionCancelledException("Snapshot request is cancelled " +
+                                    "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                            }
+
+                            pageStore.write(PageIO.getPageId(buff), buff, 0, false);
+
+                            transferred.add(buff.capacity());
+
+                            if (transferred.longValue() == initMeta.count())
+                                finishRecover(snpTrFut, grpPartId);
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                    }
+                };
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<File> fileHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+                String snpName = (String)initMeta.params().get(SNP_NAME_PARAM);
+
+                assert grpId != null;
+                assert partId != null;
+                assert snpName != null;
+                assert storeFactory != null;
+
+                RemoteSnapshotFuture transFut = rmtSnpReq.get();
+
+                if (transFut == null) {
+                    throw new IgniteException("Snapshot transmission with given name doesn't exists " +
+                        "[snpName=" + snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                return new Consumer<File>() {
+                    @Override public void accept(File file) {
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.equals(transFut) || fut0.isCancelled()) {
+                            throw new TransmissionCancelledException("Snapshot request is cancelled [snpName=" + snpName +
+                                ", grpId=" + grpId + ", partId=" + partId + ']');
+                        }
+
+                        busyLock.enterBusy();
+
+                        try {
+                            FilePageStore pageStore = (FilePageStore)storeFactory
+                                .apply(grpId, false)
+                                .createPageStore(getFlagByPartId(partId),
+                                    file::toPath,
+                                    new LongAdderMetric("NO_OP", null));
+
+                            transFut.stores.put(new GroupPartitionId(grpId, partId), pageStore);
+
+                            pageStore.init();
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                        finally {
+                            busyLock.leaveBusy();
+                        }
+                    }
+                };
+            }
+        });
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void stop0(boolean cancel) {
+        busyLock.block();
+
+        try {
+            // Try stop all snapshot processing if not yet.
+            for (SnapshotFutureTask sctx : locSnpTasks.values())
+                sctx.acceptException(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+            locSnpTasks.clear();
+
+            RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+            if (snpTrFut != null)
+                snpTrFut.cancel();
+
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null) {
+                    clusterSnpFut.onDone(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+                    clusterSnpFut = null;
+                }
+            }
+
+            if (snpRunner != null)
+                snpRunner.shutdownNow();
+
+            cctx.kernalContext().io().removeMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC);
+            cctx.kernalContext().io().removeTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC);
+
+            if (discoLsnr != null)
+                cctx.kernalContext().event().removeDiscoveryEventListener(discoLsnr);
+
+            cctx.exchange().unregisterExchangeAwareComponent(this);
+        }
+        finally {
+            busyLock.unblock();
+        }
+    }
+
+    /**
+     * @param snpDir Snapshot dire
+     * @param folderName Local node folder name (see {@link U#maskForFileName} with consistent id).
+     */
+    public static void deleteSnapshot(File snpDir, String folderName) {
+        if (!snpDir.exists())
+            return;
+
+        assert snpDir.isDirectory() : snpDir;
+
+        try {
+            File binDir = resolveBinaryWorkDir(snpDir.getAbsolutePath(), folderName);
+            File dbDir = U.resolveWorkDirectory(snpDir.getAbsolutePath(), databaseRelativePath(folderName), false);
+
+            U.delete(binDir);
+            U.delete(dbDir);
+
+            File marshDir = mappingFileStoreWorkDir(snpDir.getAbsolutePath());
+
+            // Concurrently traverse the snapshot marshaller directory and delete all files.
+            Files.walkFileTree(marshDir.toPath(), new SimpleFileVisitor<Path>() {
+                @Override public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException {
+                    U.delete(file);
+
+                    return FileVisitResult.CONTINUE;
+                }
+
+                @Override public FileVisitResult visitFileFailed(Path file, IOException exc) {
+                    // Skip files which can be concurrently removed from FileTree.
+                    return FileVisitResult.CONTINUE;
+                }
+            });
+
+            File db = new File(snpDir, DB_DEFAULT_FOLDER);
+
+            if (!db.exists() || db.list().length == 0)
+                U.delete(snpDir);
+        }
+        catch (IOException | IgniteCheckedException e) {
+            throw new IgniteException(e);
+        }
+    }
+
+    /**
+     * @param snpName Snapshot name.
+     * @return Local snapshot directory for snapshot with given name.
+     */
+    public File snapshotLocalDir(String snpName) {
+        assert locSnpDir != null;
+
+        return new File(locSnpDir, snpName);
+    }
+
+    /**
+     * @return Node snapshot working directory.
+     */
+    public File snapshotTmpDir() {
+        assert tmpWorkDir != null;
+
+        return tmpWorkDir;
+    }
+
+    /**
+     * @param req Request on snapshot creation.
+     * @return Future which will be completed when a snapshot has been started.
+     */
+    private IgniteInternalFuture<SnapshotOperationResponse> initLocalSnapshotStartStage(SnapshotOperationRequest req) {
+        if (cctx.kernalContext().clientNode() ||
+            !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()))
+            return new GridFinishedFuture<>();
+
+        // Executed inside discovery notifier thread, prior to firing discovery custom event,
+        // so it is safe to set new snapshot task inside this method without synchronization.
+        if (clusterSnpRq != null) {
+            return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot operation has been rejected. " +
+                "Another snapshot operation in progress [req=" + req + ", curr=" + clusterSnpRq + ']'));
+        }
+
+        // Collection of pairs group and appropriate cache partition to be snapshot.
+        Map<Integer, Set<Integer>> parts = new HashMap<>();
+
+        for (Integer grpId : req.grpIds)
+            parts.put(grpId, null);
+
+        SnapshotFutureTask task0 = registerSnapshotTask(req.snpName,
+            req.srcNodeId,
+            parts,
+            locSndrFactory.apply(req.snpName));
+
+        clusterSnpRq = req;
+
+        return task0.chain(f -> new SnapshotOperationResponse());
+    }
+
+    /**
+     * @param id Request id.
+     * @param res Results.
+     * @param err Errors.
+     */
+    private void processLocalSnapshotStartStageResult(UUID id, Map<UUID, SnapshotOperationResponse> res, Map<UUID, Exception> err) {
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        if (snpRq == null)
+            return;
+
+        if (!snpRq.rqId.equals(id)) {
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null && clusterSnpFut.rqId.equals(id)) {
+                    clusterSnpFut.onDone(new IgniteCheckedException("Snapshot operation failed due to another snapshot " +
+                        "operation in progress: " + snpRq.snpName));
+
+                    clusterSnpFut = null;
+                }
+
+                return;
+            }
+        }
+
+        if (isLocalNodeCoordinator(cctx.discovery())) {
+            Set<UUID> missed = new HashSet<>(snpRq.bltNodes);
+            missed.removeAll(res.keySet());
+            missed.removeAll(err.keySet());
+
+            snpRq.hasErr = !F.isEmpty(err) || !missed.isEmpty();
+
+            if (snpRq.hasErr) {
+                U.warn(log, "Execution of local snapshot tasks fails or them haven't been executed " +
+                    "due to some of nodes left the cluster. Uncompleted snapshot will be deleted " +
+                    "[err=" + err + ", missed=" + missed + ']');
+            }
+
+            endSnpProc.start(UUID.randomUUID(), snpRq);
+        }
+    }
+
+    /**
+     * @param req Request on snapshot creation.
+     * @return Future which will be completed when the snapshot will be finalized.
+     */
+    private IgniteInternalFuture<SnapshotOperationResponse> initLocalSnapshotEndStage(SnapshotOperationRequest req) {
+        if (clusterSnpRq == null)
+            return new GridFinishedFuture<>(new SnapshotOperationResponse());
+
+        try {
+            if (req.hasErr)
+                deleteSnapshot(snapshotLocalDir(req.snpName), pdsSettings.folderName());
+
+            removeLastMetaStorageKey();
+        }
+        catch (Exception e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        return new GridFinishedFuture<>(new SnapshotOperationResponse());
+    }
+
+    /**
+     * @param id Request id.
+     * @param res Results.
+     * @param err Errors.
+     */
+    private void processLocalSnapshotEndStageResult(UUID id, Map<UUID, SnapshotOperationResponse> res, Map<UUID, Exception> err) {
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        if (snpRq == null)
+            return;
+
+        Set<UUID> endFail = new HashSet<>(snpRq.bltNodes);
+        endFail.removeAll(res.keySet());
+
+        clusterSnpRq = null;
+
+        synchronized (snpOpMux) {
+            if (clusterSnpFut != null) {
+                if (endFail.isEmpty() && !snpRq.hasErr) {
+                    clusterSnpFut.onDone();
+
+                    if (log.isInfoEnabled())
+                        log.info("Cluster-wide snapshot operation finished successfully [req=" + snpRq + ']');
+                }
+                else {
+                    clusterSnpFut.onDone(new IgniteCheckedException("Snapshot creation has been finished with an error. " +
+                        "Local snapshot tasks may not finished completely or finalizing results fails " +
+                        "[hasErr" + snpRq.hasErr + ", fail=" + endFail + ']'));
+                }
+
+                clusterSnpFut = null;
+            }
+        }
+    }
+
+    /**
+     * @return {@code True} if snapshot operation is in progress.
+     */
+    public boolean isSnapshotCreating() {
+        if (clusterSnpRq != null)
+            return true;
+
+        synchronized (snpOpMux) {
+            return clusterSnpRq != null || clusterSnpFut != null;
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public List<String> getSnapshots() {
+        if (cctx.kernalContext().clientNode())
+            throw new UnsupportedOperationException("Client and daemon nodes can not perform this operation.");
+
+        synchronized (snpOpMux) {
+            return Arrays.stream(locSnpDir.listFiles(File::isDirectory))
+                .map(File::getName)
+                .collect(Collectors.toList());
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public IgniteFuture<Void> createSnapshot(String name) {
+        A.notNullOrEmpty(name, "name");
+
+        try {
+            if (cctx.kernalContext().clientNode())
+                throw new UnsupportedOperationException("Client and daemon nodes can not perform this operation.");
+
+            if (!IgniteFeatures.allNodesSupports(cctx.discovery().allNodes(), PERSISTENCE_CACHE_SNAPSHOT))
+                throw new IgniteException("Not all nodes in the cluster support a snapshot operation.");
+
+            if (!active(cctx.kernalContext().state().clusterState().state()))
+                throw new IgniteException("Snapshot operation has been rejected. The cluster is inactive.");
+
+            DiscoveryDataClusterState clusterState = cctx.kernalContext().state().clusterState();
+
+            if (!clusterState.hasBaselineTopology())
+                throw new IgniteException("Snapshot operation has been rejected. The baseline topology is not configured for cluster.");
+
+            ClusterSnapshotFuture snpFut0;
+
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null && !clusterSnpFut.isDone())
+                    throw new IgniteException("Create snapshot request has been rejected. The previous snapshot operation was not completed.");
+
+                if (clusterSnpRq != null)
+                    throw new IgniteException("Create snapshot request has been rejected. Parallel snapshot processes are not allowed.");
+
+                if (getSnapshots().contains(name))
+                    throw new IgniteException("Create snapshot request has been rejected. Snapshot with given name already exists.");
+
+                snpFut0 = new ClusterSnapshotFuture(UUID.randomUUID(), name);
+
+                clusterSnpFut = snpFut0;
+                lastSeenSnpFut = snpFut0;
+            }
+
+            List<Integer> grps = cctx.cache().persistentGroups().stream()
+                .filter(g -> cctx.cache().cacheType(g.cacheOrGroupName()) == CacheType.USER)
+                .filter(g -> !g.config().isEncryptionEnabled())
+                .map(CacheGroupDescriptor::groupId)
+                .collect(Collectors.toList());
+
+            List<ClusterNode> srvNodes = cctx.discovery().serverNodes(AffinityTopologyVersion.NONE);
+
+            startSnpProc.start(snpFut0.rqId, new SnapshotOperationRequest(snpFut0.rqId,
+                cctx.localNodeId(),
+                name,
+                grps,
+                new HashSet<>(F.viewReadOnly(srvNodes,
+                    F.node2id(),
+                    (node) -> CU.baselineNode(node, clusterState)))));
+
+            if (log.isInfoEnabled())
+                log.info("Cluster-wide snapshot operation started [snpName=" + name + ", grps=" + grps + ']');
+
+            return new IgniteFutureImpl<>(snpFut0);
+        }
+        catch (Exception e) {
+            U.error(log, "Start snapshot operation failed", e);
+
+            lastSeenSnpFut = new ClusterSnapshotFuture(name, e);
+
+            return new IgniteFinishedFutureImpl<>(e);
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onReadyForReadWrite(ReadWriteMetastorage metaStorage) throws IgniteCheckedException {
+        synchronized (snpOpMux) {
+            this.metaStorage = metaStorage;
+
+            if (recovered)
+                removeLastMetaStorageKey();
+
+            recovered = false;
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onReadyForRead(ReadOnlyMetastorage metaStorage) throws IgniteCheckedException {
+        // Snapshot which has not been completed due to the local node crashed must be deleted.
+        String snpName = (String)metaStorage.read(SNP_RUNNING_KEY);
+
+        if (snpName == null)
+            return;
+
+        recovered = true;
+
+        for (File tmp : snapshotTmpDir().listFiles())
+            U.delete(tmp);
+
+        deleteSnapshot(snapshotLocalDir(snpName), pdsSettings.folderName());
+
+        if (log.isInfoEnabled()) {
+            log.info("Previous attempt to create snapshot fail due to the local node crash. All resources " +
+                "related to snapshot operation have been deleted: " + snpName);
+        }
+    }
+
+    /**
+     * @param evt Discovery event to check.
+     * @return {@code true} if exchange started by snapshot operation.
+     */
+    public static boolean isSnapshotOperation(DiscoveryEvent evt) {
+        return !evt.eventNode().isClient() &&
+            evt.type() == EVT_DISCOVERY_CUSTOM_EVT &&
+            ((DiscoveryCustomEvent)evt).customMessage() instanceof SnapshotStartDiscoveryMessage;
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onDoneBeforeTopologyUnlock(GridDhtPartitionsExchangeFuture fut) {
+        if (clusterSnpRq == null || cctx.kernalContext().clientNode())
+            return;
+
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        SnapshotFutureTask task = locSnpTasks.get(snpRq.snpName);
+
+        if (task == null)
+            return;
+
+        if (task.start()) {
+            cctx.database().forceCheckpoint(String.format("Start snapshot operation: %s", snpRq.snpName));
+
+            // schedule task on checkpoint and wait when it starts
+            try {
+                task.awaitStarted();
+            }
+            catch (IgniteCheckedException e) {
+                U.error(log, "Fail to wait while cluster-wide snapshot operation started", e);
+            }
+        }
+    }
+
+    /**
+     * @param parts Collection of pairs group and appropriate cache partition to be snapshot.
+     * @param rmtNodeId The remote node to connect to.
+     * @param partConsumer Received partition handler.
+     * @return Future which will be completed when requested snapshot fully received.
+     */
+    public IgniteInternalFuture<Void> createRemoteSnapshot(
+        UUID rmtNodeId,
+        Map<Integer, Set<Integer>> parts,
+        BiConsumer<File, GroupPartitionId> partConsumer
+    ) {
+        assert partConsumer != null;
+
+        ClusterNode rmtNode = cctx.discovery().node(rmtNodeId);
+
+        if (!nodeSupports(rmtNode, PERSISTENCE_CACHE_SNAPSHOT))
+            return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot on remote node is not supported: " + rmtNode.id()));
+
+        if (rmtNode == null) {
+            return new GridFinishedFuture<>(new ClusterTopologyCheckedException("Snapshot request cannot be performed. " +
+                "Remote node left the grid [rmtNodeId=" + rmtNodeId + ']'));
+        }
+
+        String snpName = RMT_SNAPSHOT_PREFIX + UUID.randomUUID().toString();
+
+        RemoteSnapshotFuture snpTransFut = new RemoteSnapshotFuture(rmtNodeId, snpName, partConsumer);
+
+        busyLock.enterBusy();
+        SnapshotRequestMessage msg0;
+
+        try {
+            msg0 = new SnapshotRequestMessage(snpName, parts);
+
+            RemoteSnapshotFuture fut = rmtSnpReq.get();
+
+            try {
+                if (fut != null)
+                    fut.get(DFLT_SNAPSHOT_TIMEOUT, TimeUnit.MILLISECONDS);
+            }
+            catch (IgniteCheckedException e) {
+                if (log.isInfoEnabled())
+                    log.info("The previous snapshot request finished with an exception:" + e.getMessage());
+            }
+
+            try {
+                if (rmtSnpReq.compareAndSet(null, snpTransFut)) {
+                    cctx.gridIO().sendOrderedMessage(rmtNode, DFLT_INITIAL_SNAPSHOT_TOPIC, msg0, SYSTEM_POOL,
+                        Long.MAX_VALUE, true);
+                }
+                else
+                    return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot request has been concurrently interrupted."));
+
+            }
+            catch (IgniteCheckedException e) {
+                rmtSnpReq.compareAndSet(snpTransFut, null);
+
+                return new GridFinishedFuture<>(e);
+            }
+        }
+        finally {
+            busyLock.leaveBusy();
+        }
+
+        if (log.isInfoEnabled()) {
+            log.info("Snapshot request is sent to the remote node [rmtNodeId=" + rmtNodeId +
+                ", msg0=" + msg0 + ", snpTransFut=" + snpTransFut +
+                ", topVer=" + cctx.discovery().topologyVersionEx() + ']');
+        }
+
+        return snpTransFut;
+    }
+
+    /**
+     * @param grps List of cache groups which will be destroyed.
+     */
+    public void onCacheGroupsStopped(List<Integer> grps) {
+        for (SnapshotFutureTask sctx : locSnpTasks.values()) {
+            Set<Integer> retain = new HashSet<>(grps);
+            retain.retainAll(sctx.affectedCacheGroups());
+
+            if (!retain.isEmpty()) {
+                sctx.acceptException(new IgniteCheckedException("Snapshot has been interrupted due to some of the required " +
+                    "cache groups stopped: " + retain));
+            }
+        }
+    }
+
+    /**
+     * @param snpName Unique snapshot name.
+     * @param srcNodeId Node id which cause snapshot operation.
+     * @param parts Collection of pairs group and appropriate cache partition to be snapshot.
+     * @param snpSndr Factory which produces snapshot receiver instance.
+     * @return Snapshot operation task which should be registered on checkpoint to run.
+     */
+    SnapshotFutureTask registerSnapshotTask(
+        String snpName,
+        UUID srcNodeId,
+        Map<Integer, Set<Integer>> parts,
+        SnapshotSender snpSndr
+    ) {
+        if (!busyLock.enterBusy())
+            return new SnapshotFutureTask(new IgniteCheckedException("Snapshot manager is stopping [locNodeId=" + cctx.localNodeId() + ']'));
+
+        try {
+            if (locSnpTasks.containsKey(snpName))
+                return new SnapshotFutureTask(new IgniteCheckedException("Snapshot with requested name is already scheduled: " + snpName));
+
+            SnapshotFutureTask snpFutTask;
+
+            SnapshotFutureTask prev = locSnpTasks.putIfAbsent(snpName,
+                snpFutTask = new SnapshotFutureTask(cctx,
+                    srcNodeId,
+                    snpName,
+                    tmpWorkDir,
+                    ioFactory,
+                    snpSndr,
+                    parts,
+                    locBuff));
+
+            if (prev != null)
+                return new SnapshotFutureTask(new IgniteCheckedException("Snapshot with requested name is already scheduled: " + snpName));
+
+            if (log.isInfoEnabled()) {
+                log.info("Snapshot task has been registered on local node [sctx=" + this +
+                    ", topVer=" + cctx.discovery().topologyVersionEx() + ']');
+            }
+
+            snpFutTask.listen(f -> locSnpTasks.remove(snpName));
+
+            return snpFutTask;
+        }
+        finally {
+            busyLock.leaveBusy();
+        }
+    }
+
+    /**
+     * @param factory Factory which produces {@link LocalSnapshotSender} implementation.
+     */
+    void setLocalSnapshotSenderFactory(Function<String, SnapshotSender> factory) {
+        locSndrFactory = factory;
+    }
+
+    /**
+     * @return Factory which produces {@link LocalSnapshotSender} implementation.
+     */
+    Function<String, SnapshotSender> localSnapshotSenderFactory() {
+        return LocalSnapshotSender::new;
+    }
+
+    /**
+     * @param snpName Snapshot name.
+     * @param rmtNodeId Remote node id to send snapshot to.
+     * @return Snapshot sender instance.
+     */
+    SnapshotSender remoteSnapshotSender(String snpName, UUID rmtNodeId) {
+        // Remote snapshots can be send only by single threaded executor since only one transmissionSender created.
+        return new RemoteSnapshotSender(log,
+            new SequentialExecutorWrapper(log, snpRunner),
+            () -> databaseRelativePath(pdsSettings.folderName()),
+            cctx.gridIO().openTransmissionSender(rmtNodeId, DFLT_INITIAL_SNAPSHOT_TOPIC),
+            snpName);
+    }
+
+    /** Snapshot finished successfully or already restored. Key can be removed. */
+    private void removeLastMetaStorageKey() throws IgniteCheckedException {
+        cctx.database().checkpointReadLock();
+
+        try {
+            metaStorage.remove(SNP_RUNNING_KEY);
+        }
+        finally {
+            cctx.database().checkpointReadUnlock();
+        }
+    }
+
+    /**
+     * @return The executor service used to run snapshot tasks.
+     */
+    ExecutorService snapshotExecutorService() {
+        assert snpRunner != null;
+
+        return snpRunner;
+    }
+
+    /**
+     * @param ioFactory Factory to create IO interface over a page stores.
+     */
+    void ioFactory(FileIOFactory ioFactory) {
+        this.ioFactory = ioFactory;
+    }
+
+    /**
+     * @param nodeId Remote node id on which requests has been registered.
+     * @return Snapshot future related to given node id.
+     */
+    SnapshotFutureTask lastScheduledRemoteSnapshotTask(UUID nodeId) {
+        return locSnpTasks.values().stream()
+            .filter(t -> t.type() == RemoteSnapshotSender.class && t.sourceNodeId().equals(nodeId))
+            .findFirst()
+            .orElse(null);
+    }
+
+    /**
+     * @return Relative configured path of persistence data storage directory for the local node.
+     * Example: {@code snapshotWorkDir/db/IgniteNodeName0}
+     */
+    static String databaseRelativePath(String folderName) {
+        return Paths.get(DB_DEFAULT_FOLDER, folderName).toString();
+    }
+
+    /**
+     * @param cfg Ignite configuration.
+     * @return Snapshot work path.
+     */
+    static File resolveSnapshotWorkDirectory(IgniteConfiguration cfg) {
+        try {
+            return cfg.getSnapshotPath() == null ?
+                U.resolveWorkDirectory(cfg.getWorkDirectory(), DFLT_SNAPSHOT_DIRECTORY, false) :
+                U.resolveWorkDirectory(cfg.getWorkDirectory(), cfg.getSnapshotPath(), false);
+        }
+        catch (IgniteCheckedException e) {
+            throw new IgniteException(e);
+        }
+    }
+
+    /** Remote snapshot future which tracks remote snapshot transmission result. */
+    private class RemoteSnapshotFuture extends GridFutureAdapter<Void> {
+        /** Snapshot name to create. */
+        private final String snpName;
+
+        /** Remote node id to request snapshot from. */
+        private final UUID rmtNodeId;
+
+        /** Collection of partition to be received. */
+        private final Map<GroupPartitionId, FilePageStore> stores = new ConcurrentHashMap<>();
+
+        /** Partition handler given by request initiator. */
+        private final BiConsumer<File, GroupPartitionId> partConsumer;
+
+        /** Counter which show how many partitions left to be received. */
+        private int partsLeft = -1;
+
+        /**
+         * @param partConsumer Received partition handler.
+         */
+        public RemoteSnapshotFuture(UUID rmtNodeId, String snpName, BiConsumer<File, GroupPartitionId> partConsumer) {
+            this.snpName = snpName;
+            this.rmtNodeId = rmtNodeId;
+            this.partConsumer = partConsumer;
+        }
+
+        /** {@inheritDoc} */
+        @Override public boolean cancel() {
+            return onCancelled();
+        }
+
+        /** {@inheritDoc} */
+        @Override protected boolean onDone(@Nullable Void res, @Nullable Throwable err, boolean cancel) {
+            assert err != null || cancel || stores.isEmpty() : "Not all file storage processed: " + stores;
+
+            rmtSnpReq.compareAndSet(this, null);
+
+            if (err != null || cancel) {
+                // Close non finished file storage.
+                for (Map.Entry<GroupPartitionId, FilePageStore> entry : stores.entrySet()) {
+                    FilePageStore store = entry.getValue();
+
+                    try {
+                        store.stop(true);
+                    }
+                    catch (StorageException e) {
+                        log.warning("Error stopping received file page store", e);
+                    }
+                }
+            }
+
+            U.delete(Paths.get(tmpWorkDir.getAbsolutePath(), snpName));
+
+            return super.onDone(res, err, cancel);
+        }
+
+        /** {@inheritDoc} */
+        @Override public boolean equals(Object o) {
+            if (this == o)
+                return true;
+
+            if (o == null || getClass() != o.getClass())
+                return false;
+
+            RemoteSnapshotFuture fut = (RemoteSnapshotFuture)o;
+
+            return rmtNodeId.equals(fut.rmtNodeId) &&
+                snpName.equals(fut.snpName);
+        }
+
+        /** {@inheritDoc} */
+        @Override public int hashCode() {
+            return Objects.hash(rmtNodeId, snpName);
+        }
+
+        /** {@inheritDoc} */
+        @Override public String toString() {
+            return S.toString(RemoteSnapshotFuture.class, this);
+        }
+    }
+
+    /**
+     * Such an executor can executes tasks not in a single thread, but executes them
+     * on different threads sequentially. It's important for some {@link SnapshotSender}'s
+     * to process sub-task sequentially due to all these sub-tasks may share a single socket
+     * channel to send data to.
+     */
+    private static class SequentialExecutorWrapper implements Executor {
+        /** Ignite logger. */
+        private final IgniteLogger log;
+
+        /** Queue of task to execute. */
+        private final Queue<Runnable> tasks = new ArrayDeque<>();
+
+        /** Delegate executor. */
+        private final Executor executor;
+
+        /** Currently running task. */
+        private volatile Runnable active;
+
+        /** If wrapped executor is shutting down. */
+        private volatile boolean stopping;
+
+        /**
+         * @param executor Executor to run tasks on.
+         */
+        public SequentialExecutorWrapper(IgniteLogger log, Executor executor) {
+            this.log = log.getLogger(SequentialExecutorWrapper.class);
+            this.executor = executor;
+        }
+
+        /** {@inheritDoc} */
+        @Override public synchronized void execute(final Runnable r) {
+            assert !stopping : "Task must be cancelled prior to the wrapped executor is shutting down.";
+
+            tasks.offer(() -> {
+                try {
+                    r.run();
+                }
+                finally {
+                    scheduleNext();
+                }
+            });
+
+            if (active == null)
+                scheduleNext();
+        }
+
+        /** */
+        protected synchronized void scheduleNext() {
+            if ((active = tasks.poll()) != null) {
+                try {
+                    executor.execute(active);
+                }
+                catch (RejectedExecutionException e) {
+                    tasks.clear();
+
+                    stopping = true;
+
+                    log.warning("Task is outdated. Wrapped executor is shutting down.", e);
+                }
+            }
+        }
+    }
+
+    /**
+     *
+     */
+    private static class RemoteSnapshotSender extends SnapshotSender {
+        /** The sender which sends files to remote node. */
+        private final GridIoManager.TransmissionSender sndr;
+
+        /** Relative node path initializer. */
+        private final Supplier<String> initPath;
+
+        /** Snapshot name */
+        private final String snpName;
+
+        /** Local node persistent directory with consistent id. */
+        private String relativeNodePath;
+
+        /** The number of cache partition files expected to be processed. */
+        private int partsCnt;
+
+        /**
+         * @param log Ignite logger.
+         * @param sndr File sender instance.
+         * @param snpName Snapshot name.
+         */
+        public RemoteSnapshotSender(
+            IgniteLogger log,
+            Executor exec,
+            Supplier<String> initPath,
+            GridIoManager.TransmissionSender sndr,
+            String snpName
+        ) {
+            super(log, exec);
+
+            this.sndr = sndr;
+            this.snpName = snpName;
+            this.initPath = initPath;
+        }
+
+        /** {@inheritDoc} */
+        @Override protected void init(int partsCnt) {
+            this.partsCnt = partsCnt;
+
+            relativeNodePath = initPath.get();
+
+            if (relativeNodePath == null)
+                throw new IgniteException("Relative node path cannot be empty.");
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendPart0(File part, String cacheDirName, GroupPartitionId pair, Long len) {
+            try {
+                assert part.exists();
+                assert len > 0 : "Requested partitions has incorrect file length " +
+                    "[pair=" + pair + ", cacheDirName=" + cacheDirName + ']';
+
+                sndr.send(part, 0, len, transmissionParams(snpName, cacheDirName, pair), TransmissionPolicy.FILE);
+
+                if (log.isInfoEnabled()) {
+                    log.info("Partition file has been send [part=" + part.getName() + ", pair=" + pair +
+                        ", length=" + len + ']');
+                }
+            }
+            catch (TransmissionCancelledException e) {
+                if (log.isInfoEnabled()) {
+                    log.info("Transmission partition file has been interrupted [part=" + part.getName() +
+                        ", pair=" + pair + ']');
+                }
+            }
+            catch (IgniteCheckedException | InterruptedException | IOException e) {
+                U.error(log, "Error sending partition file [part=" + part.getName() + ", pair=" + pair +
+                    ", length=" + len + ']', e);
+
+                throw new IgniteException(e);
+            }
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendDelta0(File delta, String cacheDirName, GroupPartitionId pair) {
+            try {
+                sndr.send(delta, transmissionParams(snpName, cacheDirName, pair), TransmissionPolicy.CHUNK);
+
+                if (log.isInfoEnabled())
+                    log.info("Delta pages storage has been send [part=" + delta.getName() + ", pair=" + pair + ']');
+            }
+            catch (TransmissionCancelledException e) {
+                if (log.isInfoEnabled()) {
+                    log.info("Transmission delta pages has been interrupted [part=" + delta.getName() +
+                        ", pair=" + pair + ']');
+                }
+            }
+            catch (IgniteCheckedException | InterruptedException | IOException e) {
+                U.error(log, "Error sending delta file  [part=" + delta.getName() + ", pair=" + pair + ']', e);
+
+                throw new IgniteException(e);
+            }
+        }
+
+        /**
+         * @param cacheDirName Cache directory name.
+         * @param pair Cache group id with corresponding partition id.
+         * @return Map of params.
+         */
+        private Map<String, Serializable> transmissionParams(String snpName, String cacheDirName,
+            GroupPartitionId pair) {
+            Map<String, Serializable> params = new HashMap<>();
+
+            params.put(SNP_GRP_ID_PARAM, pair.getGroupId());
+            params.put(SNP_PART_ID_PARAM, pair.getPartitionId());
+            params.put(SNP_DB_NODE_PATH_PARAM, relativeNodePath);
+            params.put(SNP_CACHE_DIR_NAME_PARAM, cacheDirName);
+            params.put(SNP_NAME_PARAM, snpName);
+            params.put(SNP_PARTITIONS_CNT, partsCnt);
+
+            return params;
+        }
+
+        /** {@inheritDoc} */
+        @Override public void close0(@Nullable Throwable th) {
+            U.closeQuiet(sndr);
+
+            if (th == null) {
+                if (log.isInfoEnabled())
+                    log.info("The remote snapshot sender closed normally [snpName=" + snpName + ']');
+            }
+            else {
+                U.warn(log, "The remote snapshot sender closed due to an error occurred while processing " +
+                    "snapshot operation [snpName=" + snpName + ']', th);
+            }
+        }
+    }
+
+    /**
+     * Snapshot sender which writes all data to local directory.
+     */
+    private class LocalSnapshotSender extends SnapshotSender {
+        /** Snapshot name. */
+        private final String snpName;
+
+        /** Local snapshot directory. */
+        private final File snpLocDir;
+
+        /** Local node snapshot directory calculated on snapshot directory. */
+        private File dbDir;
+
+        /** Size of page. */
+        private final int pageSize;
+
+        /**
+         * @param snpName Snapshot name.
+         */
+        public LocalSnapshotSender(String snpName) {
+            super(IgniteSnapshotManager.this.log, snpRunner);
+
+            this.snpName = snpName;
+            snpLocDir = snapshotLocalDir(snpName);
+            pageSize = cctx.kernalContext().config().getDataStorageConfiguration().getPageSize();
+        }
+
+        /** {@inheritDoc} */
+        @Override protected void init(int partsCnt) {
+            dbDir = new File (snpLocDir, databaseRelativePath(pdsSettings.folderName()));
+
+            if (dbDir.exists()) {
+                throw new IgniteException("Snapshot with given name already exists " +
+                    "[snpName=" + snpName + ", absPath=" + dbDir.getAbsolutePath() + ']');
+            }
+
+            cctx.database().checkpointReadLock();
+
+            try {
+                assert metaStorage != null && metaStorage.read(SNP_RUNNING_KEY) == null :
+                    "The previous snapshot hasn't been completed correctly";
+
+                metaStorage.write(SNP_RUNNING_KEY, snpName);
+
+                U.ensureDirectory(dbDir, "snapshot work directory", log);
+            }
+            catch (IgniteCheckedException e) {
+                throw new IgniteException(e);
+            }
+            finally {
+                cctx.database().checkpointReadUnlock();
+            }
+
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendCacheConfig0(File ccfg, String cacheDirName) {
+            assert dbDir != null;
+
+            try {
+                File cacheDir = U.resolveWorkDirectory(dbDir.getAbsolutePath(), cacheDirName, false);
+
+                copy(ccfg, new File(cacheDir, ccfg.getName()), ccfg.length());
+            }
+            catch (IgniteCheckedException | IOException e) {
+                throw new IgniteException(e);
+            }
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendMarshallerMeta0(List<Map<Integer, MappedName>> mappings) {
+            if (mappings == null)
+                return;
+
+            saveMappings(cctx.kernalContext(), mappings, snpLocDir);
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendBinaryMeta0(Collection<BinaryType> types) {
+            if (types == null)
+                return;
+
+            cctx.kernalContext().cacheObjects().saveMetadata(types, snpLocDir);
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendPart0(File part, String cacheDirName, GroupPartitionId pair, Long len) {
+            try {
+                if (len == 0)
+                    return;
+
+                File cacheDir = U.resolveWorkDirectory(dbDir.getAbsolutePath(), cacheDirName, false);
+
+                File snpPart = new File(cacheDir, part.getName());
+
+                if (!snpPart.exists() || snpPart.delete())
+                    snpPart.createNewFile();
+
+                copy(part, snpPart, len);
+
+                if (log.isInfoEnabled()) {
+                    log.info("Partition has been snapshot [snapshotDir=" + dbDir.getAbsolutePath() +
+                        ", cacheDirName=" + cacheDirName + ", part=" + part.getName() +
+                        ", length=" + part.length() + ", snapshot=" + snpPart.getName() + ']');
+                }
+            }
+            catch (IOException | IgniteCheckedException ex) {
+                throw new IgniteException(ex);
+            }
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendDelta0(File delta, String cacheDirName, GroupPartitionId pair) {
+            File snpPart = getPartitionFile(dbDir, cacheDirName, pair.getPartitionId());
+
+            if (log.isInfoEnabled()) {
+                log.info("Start partition snapshot recovery with the given delta page file [part=" + snpPart +
+                    ", delta=" + delta + ']');
+            }
+
+            try (FileIO fileIo = ioFactory.create(delta, READ);
+                 FilePageStore pageStore = (FilePageStore)storeFactory
+                     .apply(pair.getGroupId(), false)
+                     .createPageStore(getFlagByPartId(pair.getPartitionId()),
+                         snpPart::toPath,
+                         new LongAdderMetric("NO_OP", null))
+            ) {
+                ByteBuffer pageBuf = ByteBuffer.allocate(pageSize)
+                    .order(ByteOrder.nativeOrder());
+
+                long totalBytes = fileIo.size();
+
+                assert totalBytes % pageSize == 0 : "Given file with delta pages has incorrect size: " + fileIo.size();
+
+                pageStore.beginRecover();
+
+                for (long pos = 0; pos < totalBytes; pos += pageSize) {
+                    long read = fileIo.readFully(pageBuf, pos);
+
+                    assert read == pageBuf.capacity();
+
+                    pageBuf.flip();
+
+                    if (log.isDebugEnabled()) {
+                        log.debug("Read page given delta file [path=" + delta.getName() +
+                            ", pageId=" + PageIO.getPageId(pageBuf) + ", pos=" + pos + ", pages=" + (totalBytes / pageSize) +
+                            ", crcBuff=" + FastCrc.calcCrc(pageBuf, pageBuf.limit()) + ", crcPage=" + PageIO.getCrc(pageBuf) + ']');
+
+                        pageBuf.rewind();
+                    }
+
+                    pageStore.write(PageIO.getPageId(pageBuf), pageBuf, 0, false);
+
+                    pageBuf.flip();
+                }
+
+                pageStore.finishRecover();
+            }
+            catch (IOException | IgniteCheckedException e) {
+                throw new IgniteException(e);
+            }
+        }
+
+        /** {@inheritDoc} */
+        @Override protected void close0(@Nullable Throwable th) {
+            if (th == null) {
+                if (log.isInfoEnabled())
+                    log.info("Local snapshot sender closed, resources released [dbNodeSnpDir=" + dbDir + ']');
+            }
+            else {
+                deleteSnapshot(snpLocDir, pdsSettings.folderName());
+
+                U.warn(log, "Local snapshot sender closed due to an error occurred", th);
+            }
+        }
+
+        /**
+         * @param from Copy from file.
+         * @param to Copy data to file.
+         * @param length Number of bytes to copy from beginning.
+         * @throws IOException If fails.
+         */
+        private void copy(File from, File to, long length) throws IOException {
+            try (FileIO src = ioFactory.create(from, READ);
+                 FileChannel dest = new FileOutputStream(to).getChannel()) {
+                if (src.size() < length) {
+                    throw new IgniteException("The source file to copy has to enough length " +
+                        "[expected=" + length + ", actual=" + src.size() + ']');
+                }
+
+                src.position(0);
+
+                long written = 0;
+
+                while (written < length)
+                    written += src.transferTo(written, length - written, dest);
+            }
+        }
+    }
+
+    /** Snapshot start request for {@link DistributedProcess} initiate message. */
+    private static class SnapshotOperationRequest implements Serializable {
+        /** Serial version uid. */
+        private static final long serialVersionUID = 0L;
+
+        /** Unique snapshot request id. */
+        private final UUID rqId;
+
+        /** Source node id which trigger request. */
+        private final UUID srcNodeId;
+
+        /** Snapshot name. */
+        private final String snpName;
+
+        @GridToStringInclude
+        /** The list of cache groups to include into snapshot. */
+        private final List<Integer> grpIds;
+
+        @GridToStringInclude
+        /** The list of affected by snapshot operation baseline nodes. */
+        private final Set<UUID> bltNodes;
+
+        /** {@code true} if an execution of local snapshot tasks failed with an error. */
+        private volatile boolean hasErr;
+
+        /**
+         * @param snpName Snapshot name.
+         * @param grpIds Cache groups to include into snapshot.
+         */
+        public SnapshotOperationRequest(UUID rqId, UUID srcNodeId, String snpName, List<Integer> grpIds, Set<UUID> bltNodes) {
+            this.rqId = rqId;
+            this.srcNodeId = srcNodeId;
+            this.snpName = snpName;
+            this.grpIds = grpIds;
+            this.bltNodes = bltNodes;
+        }
+
+        /** {@inheritDoc} */
+        @Override public String toString() {
+            return S.toString(SnapshotOperationRequest.class, this);
+        }
+    }
+
+    /** */
+    private static class SnapshotOperationResponse implements Serializable {
+        /** Serial version uid. */
+        private static final long serialVersionUID = 0L;
+    }
+
+    /** Snapshot operation start message. */
+    private static class SnapshotStartDiscoveryMessage implements SnapshotDiscoveryMessage {
+        /** Serial version UID. */
+        private static final long serialVersionUID = 0L;
+
+        /** Discovery cache. */
+        private final DiscoCache discoCache;
+
+        /** Snapshot request id */
 
 Review comment:
   I've removed it since the `id` from InitMessage is used.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r409428433
 
 

 ##########
 File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManagerSelfTest.java
 ##########
 @@ -0,0 +1,770 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.nio.file.OpenOption;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.CountDownLatch;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.function.BiConsumer;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.cache.affinity.rendezvous.RendezvousAffinityFunction;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.IgniteInterruptedCheckedException;
+import org.apache.ignite.internal.TestRecordingCommunicationSpi;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionDemandMessage;
+import org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionState;
+import org.apache.ignite.internal.processors.cache.persistence.CheckpointState;
+import org.apache.ignite.internal.processors.cache.persistence.DbCheckpointListener;
+import org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIODecorator;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.filename.PdsFolderSettings;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.util.lang.GridAbsPredicate;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.testframework.GridTestUtils;
+import org.junit.Test;
+
+import static org.apache.ignite.internal.MarshallerContextImpl.mappingFileStoreWorkDir;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.resolveBinaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.cacheDirName;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.CP_SNAPSHOT_REASON;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.testframework.GridTestUtils.assertThrowsAnyCause;
+
+/**
+ * Default snapshot manager test.
+ */
+public class IgniteSnapshotManagerSelfTest extends AbstractSnapshotSelfTest {
+    /** @throws Exception If fails. */
+    @Test
+    public void testSnapshotLocalPartitions() throws Exception {
+        // Start grid node with data before each test.
+        IgniteEx ig = startGridWithCache(dfltCacheCfg, 2048);
+
+        // The following data will be included into checkpoint.
+        for (int i = 2048; i < 4096; i++)
+            ig.cache(DEFAULT_CACHE_NAME).put(i, new TestOrderItem(i, i));
+
+        for (int i = 4096; i < 8192; i++) {
+            ig.cache(DEFAULT_CACHE_NAME).put(i, new TestOrderItem(i, i) {
+                @Override public String toString() {
+                    return "_" + super.toString();
+                }
+            });
+        }
+
+        GridCacheSharedContext<?, ?> cctx = ig.context().cache().context();
+        IgniteSnapshotManager mgr = snp(ig);
+
+        // Collection of pairs group and appropriate cache partition to be snapshot.
+        IgniteInternalFuture<?> snpFut = startLocalSnapshotTask(cctx,
+            SNAPSHOT_NAME,
+            F.asMap(CU.cacheId(DEFAULT_CACHE_NAME), null),
+            mgr.localSnapshotSenderFactory().apply(SNAPSHOT_NAME));
+
+        snpFut.get();
+
+        File cacheWorkDir = ((FilePageStoreManager)ig.context()
+            .cache()
+            .context()
+            .pageStore())
+            .cacheWorkDir(dfltCacheCfg);
+
+        // Checkpoint forces on cluster deactivation (currently only single node in cluster),
+        // so we must have the same data in snapshot partitions and those which left
+        // after node stop.
+        stopGrid(ig.name());
+
+        // Calculate CRCs.
+        IgniteConfiguration cfg = ig.context().config();
+        PdsFolderSettings settings = ig.context().pdsFolderResolver().resolveFolders();
+        String nodePath = databaseRelativePath(settings.folderName());
+        File binWorkDir = resolveBinaryWorkDir(cfg.getWorkDirectory(), settings.folderName());
+        File marshWorkDir = mappingFileStoreWorkDir(U.workDirectory(cfg.getWorkDirectory(), cfg.getIgniteHome()));
+        File snpBinWorkDir = resolveBinaryWorkDir(mgr.snapshotLocalDir(SNAPSHOT_NAME).getAbsolutePath(), settings.folderName());
+        File snpMarshWorkDir = mappingFileStoreWorkDir(mgr.snapshotLocalDir(SNAPSHOT_NAME).getAbsolutePath());
+
+        final Map<String, Integer> origPartCRCs = calculateCRC32Partitions(cacheWorkDir);
+        final Map<String, Integer> snpPartCRCs = calculateCRC32Partitions(
+            FilePageStoreManager.cacheWorkDir(U.resolveWorkDirectory(mgr.snapshotLocalDir(SNAPSHOT_NAME)
+                    .getAbsolutePath(),
+                nodePath,
+                false),
+                cacheDirName(dfltCacheCfg)));
+
+        assertEquals("Partitions must have the same CRC after file copying and merging partition delta files",
+            origPartCRCs, snpPartCRCs);
+        assertEquals("Binary object mappings must be the same for local node and created snapshot",
+            calculateCRC32Partitions(binWorkDir), calculateCRC32Partitions(snpBinWorkDir));
+        assertEquals("Marshaller meta mast be the same for local node and created snapshot",
+            calculateCRC32Partitions(marshWorkDir), calculateCRC32Partitions(snpMarshWorkDir));
+
+        File snpWorkDir = mgr.snapshotTmpDir();
+
+        assertEquals("Snapshot working directory must be cleaned after usage", 0, snpWorkDir.listFiles().length);
+    }
+
+    /**
+     * Test that all partitions are copied successfully even after multiple checkpoints occur during
+     * the long copy of cache partition files.
+     *
+     * Data consistency checked through a test node started right from snapshot directory and all values
+     * read successes.
+     *
+     * @throws Exception If fails.
+     */
+    @Test
+    public void testSnapshotLocalPartitionMultiCpWithLoad() throws Exception {
+        int valMultiplier = 2;
+        CountDownLatch slowCopy = new CountDownLatch(1);
+
+        // Start grid node with data before each test.
+        IgniteEx ig = startGrid(0);
+
+        ig.cluster().baselineAutoAdjustEnabled(false);
+        ig.cluster().state(ClusterState.ACTIVE);
+        GridCacheSharedContext<?, ?> cctx = ig.context().cache().context();
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            ig.cache(DEFAULT_CACHE_NAME).put(i, new TestOrderItem(i, i));
+
+        forceCheckpoint(ig);
+
+        AtomicInteger cntr = new AtomicInteger();
+        CountDownLatch ldrLatch = new CountDownLatch(1);
+        IgniteSnapshotManager mgr = snp(ig);
+        GridCacheDatabaseSharedManager db = (GridCacheDatabaseSharedManager)cctx.database();
+
+        IgniteInternalFuture<?> loadFut = GridTestUtils.runMultiThreadedAsync(() -> {
+            try {
+                U.await(ldrLatch);
+
+                while (!Thread.currentThread().isInterrupted())
+                    ig.cache(DEFAULT_CACHE_NAME).put(cntr.incrementAndGet(),
+                        new TestOrderItem(cntr.incrementAndGet(), cntr.incrementAndGet()));
+            }
+            catch (IgniteInterruptedCheckedException e) {
+                log.warning("Loader has been interrupted", e);
+            }
+        }, 5, "cache-loader-");
+
+        // Register task but not schedule it on the checkpoint.
+        SnapshotFutureTask snpFutTask = mgr.registerSnapshotTask(SNAPSHOT_NAME,
+            cctx.localNodeId(),
+            F.asMap(CU.cacheId(DEFAULT_CACHE_NAME), null),
+            new DelegateSnapshotSender(log, mgr.snapshotExecutorService(), mgr.localSnapshotSenderFactory().apply(SNAPSHOT_NAME)) {
+                @Override public void sendPart0(File part, String cacheDirName, GroupPartitionId pair, Long length) {
+                    try {
+                        U.await(slowCopy);
+
+                        delegate.sendPart0(part, cacheDirName, pair, length);
+                    }
+                    catch (IgniteInterruptedCheckedException e) {
+                        throw new IgniteException(e);
+                    }
+                }
+            });
+
+        db.addCheckpointListener(new DbCheckpointListener() {
+            /** {@inheritDoc} */
+            @Override public void beforeCheckpointBegin(Context ctx) {
+                // No-op.
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onMarkCheckpointBegin(Context ctx) {
+                // No-op.
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onCheckpointBegin(Context ctx) {
+                Map<Integer, Set<Integer>> processed = GridTestUtils.getFieldValue(snpFutTask,
+                    SnapshotFutureTask.class,
+                    "processed");
+
+                if (!processed.isEmpty())
+                    ldrLatch.countDown();
+            }
+        });
+
+        try {
+            snpFutTask.start();
+
+            // Change data before snapshot creation which must be included into it witch correct value multiplier.
 
 Review comment:
   witch -> with

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r409105400
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManager.java
 ##########
 @@ -0,0 +1,1986 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.channels.FileChannel;
+import java.nio.file.FileVisitResult;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.SimpleFileVisitor;
+import java.nio.file.attribute.BasicFileAttributes;
+import java.util.ArrayDeque;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Queue;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.Executor;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.concurrent.atomic.LongAdder;
+import java.util.function.BiConsumer;
+import java.util.function.BiFunction;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.events.DiscoveryEvent;
+import org.apache.ignite.failure.FailureContext;
+import org.apache.ignite.failure.FailureType;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.GridTopic;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.events.DiscoveryCustomEvent;
+import org.apache.ignite.internal.managers.communication.GridIoManager;
+import org.apache.ignite.internal.managers.communication.GridMessageListener;
+import org.apache.ignite.internal.managers.communication.TransmissionCancelledException;
+import org.apache.ignite.internal.managers.communication.TransmissionHandler;
+import org.apache.ignite.internal.managers.communication.TransmissionMeta;
+import org.apache.ignite.internal.managers.communication.TransmissionPolicy;
+import org.apache.ignite.internal.managers.discovery.DiscoCache;
+import org.apache.ignite.internal.managers.discovery.DiscoveryCustomMessage;
+import org.apache.ignite.internal.managers.discovery.GridDiscoveryManager;
+import org.apache.ignite.internal.managers.eventstorage.DiscoveryEventListener;
+import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.CacheType;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.PartitionsExchangeAware;
+import org.apache.ignite.internal.processors.cache.persistence.StorageException;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.filename.PdsFolderSettings;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.MetastorageLifecycleListener;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadOnlyMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadWriteMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.tree.io.PageIO;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.processors.marshaller.MappedName;
+import org.apache.ignite.internal.processors.metric.MetricRegistry;
+import org.apache.ignite.internal.processors.metric.impl.LongAdderMetric;
+import org.apache.ignite.internal.util.GridBusyLock;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.distributed.InitMessage;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.tostring.GridToStringInclude;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.A;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.S;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.lang.IgniteUuid;
+import org.apache.ignite.thread.IgniteThreadPoolExecutor;
+import org.apache.ignite.thread.OomExceptionHandler;
+import org.jetbrains.annotations.Nullable;
+
+import static java.nio.file.StandardOpenOption.READ;
+import static org.apache.ignite.cluster.ClusterState.active;
+import static org.apache.ignite.configuration.IgniteConfiguration.DFLT_SNAPSHOT_DIRECTORY;
+import static org.apache.ignite.events.EventType.EVT_NODE_FAILED;
+import static org.apache.ignite.events.EventType.EVT_NODE_LEFT;
+import static org.apache.ignite.internal.IgniteFeatures.PERSISTENCE_CACHE_SNAPSHOT;
+import static org.apache.ignite.internal.IgniteFeatures.nodeSupports;
+import static org.apache.ignite.internal.MarshallerContextImpl.mappingFileStoreWorkDir;
+import static org.apache.ignite.internal.MarshallerContextImpl.saveMappings;
+import static org.apache.ignite.internal.events.DiscoveryCustomEvent.EVT_DISCOVERY_CUSTOM_EVT;
+import static org.apache.ignite.internal.managers.communication.GridIoPolicy.SYSTEM_POOL;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.INDEX_PARTITION;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.MAX_PARTITION_ID;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.resolveBinaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.INDEX_FILE_NAME;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.PART_FILE_TEMPLATE;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFile;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFileName;
+import static org.apache.ignite.internal.processors.cache.persistence.filename.PdsConsistentIdProcessor.DB_DEFAULT_FOLDER;
+import static org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId.getFlagByPartId;
+import static org.apache.ignite.internal.util.IgniteUtils.isLocalNodeCoordinator;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.END_SNAPSHOT;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.START_SNAPSHOT;
+
+/**
+ * Internal implementation of snapshot operations over persistence caches.
+ * <p>
+ * There are two major actions available:
+ * <ul>
+ *     <li>Create snapshot of the whole cluster cache groups by triggering PME to achieve consistency.</li>
+ *     <li>Create local snapshot of requested cache groups and send it to the node which request this operation.
+ *     Cache groups will be transmitted using internal API for transferring files. See {@link TransmissionHandler}.</li>
+ * </ul>
+ */
+public class IgniteSnapshotManager extends GridCacheSharedManagerAdapter
+    implements IgniteSnapshot, PartitionsExchangeAware, MetastorageLifecycleListener {
+    /** File with delta pages suffix. */
+    public static final String DELTA_SUFFIX = ".delta";
+
+    /** File name template consists of delta pages. */
+    public static final String PART_DELTA_TEMPLATE = PART_FILE_TEMPLATE + DELTA_SUFFIX;
+
+    /** File name template for index delta pages. */
+    public static final String INDEX_DELTA_NAME = INDEX_FILE_NAME + DELTA_SUFFIX;
+
+    /** Text Reason for checkpoint to start snapshot operation. */
+    public static final String CP_SNAPSHOT_REASON = "Checkpoint started to enforce snapshot operation: %s";
+
+    /** Name prefix for each remote snapshot operation. */
+    public static final String RMT_SNAPSHOT_PREFIX = "snapshot_";
+
+    /** Default snapshot directory for loading remote snapshots. */
+    public static final String DFLT_SNAPSHOT_TMP_DIR = "snp";
+
+    /** Timeout in millisecond for snapshot operations. */
+    public static final long DFLT_SNAPSHOT_TIMEOUT = 15_000L;
+
+    /** Snapshot in progress error message. */
+    public static final String SNP_IN_PROGRESS_ERR_MSG = "Operation rejected due to the snapshot operation in progress.";
+
+    /** Error message to finalize snapshot tasks. */
+    public static final String SNP_NODE_STOPPING_ERR_MSG = "Snapshot has been cancelled due to the local node " +
+        "is stopping";
+
+    /** Metastorage key to save currently running snapshot. */
+    public static final String SNP_RUNNING_KEY = "snapshot-running";
+
+    /** Snapshot metrics prefix. */
+    public static final String SNAPSHOT_METRICS = "snapshot";
+
+    /** Prefix for snapshot threads. */
+    private static final String SNAPSHOT_RUNNER_THREAD_PREFIX = "snapshot-runner";
+
+    /** Total number of thread to perform local snapshot. */
+    private static final int SNAPSHOT_THREAD_POOL_SIZE = 4;
+
+    /** Default snapshot topic to receive snapshots from remote node. */
+    private static final Object DFLT_INITIAL_SNAPSHOT_TOPIC = GridTopic.TOPIC_SNAPSHOT.topic("rmt_snp");
+
+    /** File transmission parameter of cache group id. */
+    private static final String SNP_GRP_ID_PARAM = "grpId";
+
+    /** File transmission parameter of cache partition id. */
+    private static final String SNP_PART_ID_PARAM = "partId";
+
+    /** File transmission parameter of node-sender directory path with its consistentId (e.g. db/IgniteNode0). */
+    private static final String SNP_DB_NODE_PATH_PARAM = "dbNodePath";
+
+    /** File transmission parameter of a cache directory with is currently sends its partitions. */
+    private static final String SNP_CACHE_DIR_NAME_PARAM = "cacheDirName";
+
+    /** Snapshot parameter name for a file transmission. */
+    private static final String SNP_NAME_PARAM = "snpName";
+
+    /** Total snapshot files count which receiver should expect to receive. */
+    private static final String SNP_PARTITIONS_CNT = "partsCnt";
+
+    /**
+     * Local buffer to perform copy-on-write operations with pages for {@code SnapshotFutureTask.PageStoreSerialWriter}s.
+     * It is important to have only only buffer per thread (instead of creating each buffer per
+     * each {@code SnapshotFutureTask.PageStoreSerialWriter}) this is redundant and can lead to OOM errors. Direct buffer
+     * deallocate only when ByteBuffer is garbage collected, but it can get out of off-heap memory before it.
+     */
+    private final ThreadLocal<ByteBuffer> locBuff;
+
+    /** Map of registered cache snapshot processes and their corresponding contexts. */
+    private final ConcurrentMap<String, SnapshotFutureTask> locSnpTasks = new ConcurrentHashMap<>();
+
+    /** Lock to protect the resources is used. */
+    private final GridBusyLock busyLock = new GridBusyLock();
+
+    /** Requested snapshot from remote node. */
+    private final AtomicReference<RemoteSnapshotFuture> rmtSnpReq = new AtomicReference<>();
+
+    /** Mutex used to order cluster snapshot operation progress. */
+    private final Object snpOpMux = new Object();
+
+    /** Take snapshot operation procedure. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> startSnpProc;
+
+    /** Check previously performed snapshot operation and delete uncompleted files if need. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> endSnpProc;
+
+    /** Resolved persistent data storage settings. */
+    private volatile PdsFolderSettings pdsSettings;
+
+    /** Fully initialized metastorage. */
+    private volatile ReadWriteMetastorage metaStorage;
+
+    /** Local snapshot sender factory. */
+    private Function<String, SnapshotSender> locSndrFactory = localSnapshotSenderFactory();
+
+    /** Main snapshot directory to save created snapshots. */
+    private volatile File locSnpDir;
+
+    /**
+     * Working directory for loaded snapshots from the remote nodes and storing
+     * temporary partition delta-files of locally started snapshot process.
+     */
+    private File tmpWorkDir;
+
+    /** Factory to working with delta as file storage. */
+    private volatile FileIOFactory ioFactory = new RandomAccessFileIOFactory();
+
+    /** Factory to create page store for restore. */
+    private volatile BiFunction<Integer, Boolean, FilePageStoreFactory> storeFactory;
+
+    /** Snapshot thread pool to perform local partition snapshots. */
+    private ExecutorService snpRunner;
+
+    /** System discovery message listener. */
+    private DiscoveryEventListener discoLsnr;
+
+    /** Cluster snapshot operation requested by user. */
+    private ClusterSnapshotFuture clusterSnpFut;
+
+    /** Current snapshot operation on local node. */
+    private volatile SnapshotOperationRequest clusterSnpRq;
+
+    /** {@code true} if recovery process occurred for snapshot. */
+    private volatile boolean recovered;
+
+    /** Last seen cluster snapshot operation. */
+    private volatile ClusterSnapshotFuture lastSeenSnpFut = new ClusterSnapshotFuture();
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public IgniteSnapshotManager(GridKernalContext ctx) {
+        locBuff = ThreadLocal.withInitial(() ->
+            ByteBuffer.allocateDirect(ctx.config().getDataStorageConfiguration().getPageSize())
+                .order(ByteOrder.nativeOrder()));
+
+        startSnpProc = new DistributedProcess<>(ctx, START_SNAPSHOT, this::initLocalSnapshotStartStage,
+            this::processLocalSnapshotStartStageResult);
+
+        endSnpProc = new DistributedProcess<>(ctx, END_SNAPSHOT, this::initLocalSnapshotEndStage,
+            this::processLocalSnapshotEndStageResult);
+    }
+
+    /**
+     * @param snapshotCacheDir Snapshot directory to store files.
+     * @param partId Cache partition identifier.
+     * @return A file representation.
+     */
+    public static File partDeltaFile(File snapshotCacheDir, int partId) {
+        return new File(snapshotCacheDir, partDeltaFileName(partId));
+    }
+
+    /**
+     * @param partId Partition id.
+     * @return File name of delta partition pages.
+     */
+    public static String partDeltaFileName(int partId) {
+        assert partId <= MAX_PARTITION_ID || partId == INDEX_PARTITION;
+
+        return partId == INDEX_PARTITION ? INDEX_DELTA_NAME : String.format(PART_DELTA_TEMPLATE, partId);
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void start0() throws IgniteCheckedException {
+        super.start0();
+
+        GridKernalContext ctx = cctx.kernalContext();
+
+        if (ctx.clientNode())
+            return;
+
+        if (!CU.isPersistenceEnabled(ctx.config()))
+            return;
+
+        snpRunner = new IgniteThreadPoolExecutor(SNAPSHOT_RUNNER_THREAD_PREFIX,
+            cctx.igniteInstanceName(),
+            SNAPSHOT_THREAD_POOL_SIZE,
+            SNAPSHOT_THREAD_POOL_SIZE,
+            IgniteConfiguration.DFLT_THREAD_KEEP_ALIVE_TIME,
+            new LinkedBlockingQueue<>(),
+            SYSTEM_POOL,
+            new OomExceptionHandler(ctx));
+
+        assert cctx.pageStore() instanceof FilePageStoreManager;
+
+        FilePageStoreManager storeMgr = (FilePageStoreManager)cctx.pageStore();
+
+        pdsSettings = cctx.kernalContext().pdsFolderResolver().resolveFolders();
+
+        locSnpDir = resolveSnapshotWorkDirectory(ctx.config());
+        tmpWorkDir = Paths.get(storeMgr.workDir().getAbsolutePath(), DFLT_SNAPSHOT_TMP_DIR).toFile();
+
+        U.ensureDirectory(locSnpDir, "snapshot work directory", log);
+        U.ensureDirectory(tmpWorkDir, "temp directory for snapshot creation", log);
+
+        MetricRegistry mreg = cctx.kernalContext().metric().registry(SNAPSHOT_METRICS);
+
+        mreg.register("LastSnapshotStartTime", () -> lastSeenSnpFut.startTime,
+            "The system time approximated by 10 ms when the last cluster snapshot operation has been started.");
+        mreg.register("LastSnapshotEndTime", () -> lastSeenSnpFut.endTime,
+            "The system time approximated by 10 ms when the last cluster snapshot operation has been finished.");
+        mreg.register("LastSnapshotName", () -> lastSeenSnpFut.name, String.class,
+            "The name of last started cluster snapshot operation.");
+        mreg.register("LastSnapshotErrorMessage",
+            () -> lastSeenSnpFut.error() == null ? null : lastSeenSnpFut.error().getMessage(),
+            String.class,
+            "The error message of last started cluster snapshot operation which fail. This value will be 'null' " +
+                "if last snapshot operation completed successfully.");
+        mreg.register("localSnapshotList", this::getSnapshots, List.class,
+            "The list of all known snapshots currently saved on the local node with respect to " +
+                "configured via IgniteConfiguration a snapshot path.");
+
+        storeFactory = storeMgr::getPageStoreFactory;
+
+        cctx.exchange().registerExchangeAwareComponent(this);
+        ctx.internalSubscriptionProcessor().registerMetastorageListener(this);
+
+        // Receive remote snapshots requests.
+        cctx.gridIO().addMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC, new GridMessageListener() {
+            @Override public void onMessage(UUID nodeId, Object msg, byte plc) {
+                if (!busyLock.enterBusy())
+                    return;
+
+                try {
+                    if (msg instanceof SnapshotRequestMessage) {
+                        SnapshotRequestMessage reqMsg0 = (SnapshotRequestMessage)msg;
+                        String snpName = reqMsg0.snapshotName();
+
+                        synchronized (this) {
+                            SnapshotFutureTask task = lastScheduledRemoteSnapshotTask(nodeId);
+
+                            if (task != null) {
+                                // Task will also be removed from local map due to the listener on future done.
+                                task.cancel();
+
+                                log.info("Snapshot request has been cancelled due to another request received " +
+                                    "[prevSnpResp=" + task + ", msg0=" + reqMsg0 + ']');
+                            }
+                        }
+
+                        SnapshotFutureTask task = registerSnapshotTask(snpName,
+                            nodeId,
+                            reqMsg0.parts(),
+                            remoteSnapshotSender(snpName, nodeId));
+
+                        task.listen(f -> {
+                            if (f.error() == null)
+                                return;
+
+                            U.error(log, "Failed to process request of creating a snapshot " +
+                                "[from=" + nodeId + ", msg=" + reqMsg0 + ']', f.error());
+
+                            try {
+                                cctx.gridIO().sendToCustomTopic(nodeId,
+                                    DFLT_INITIAL_SNAPSHOT_TOPIC,
+                                    new SnapshotResponseMessage(reqMsg0.snapshotName(), f.error().getMessage()),
+                                    SYSTEM_POOL);
+                            }
+                            catch (IgniteCheckedException ex0) {
+                                U.error(log, "Fail to send the response message with processing snapshot request " +
+                                    "error [request=" + reqMsg0 + ", nodeId=" + nodeId + ']', ex0);
+                            }
+                        });
+
+                        task.start();
+                    }
+                    else if (msg instanceof SnapshotResponseMessage) {
+                        SnapshotResponseMessage respMsg0 = (SnapshotResponseMessage)msg;
+
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.snpName.equals(respMsg0.snapshotName())) {
+                            if (log.isInfoEnabled()) {
+                                log.info("A stale snapshot response message has been received. Will be ignored " +
+                                    "[fromNodeId=" + nodeId + ", response=" + respMsg0 + ']');
+                            }
+
+                            return;
+                        }
+
+                        if (respMsg0.errorMessage() != null) {
+                            fut0.onDone(new IgniteCheckedException("Request cancelled. The snapshot operation stopped " +
+                                "on the remote node with an error: " + respMsg0.errorMessage()));
+                        }
+                    }
+                }
+                catch (Throwable e) {
+                    U.error(log, "Processing snapshot request from remote node fails with an error", e);
+
+                    cctx.kernalContext().failure().process(new FailureContext(FailureType.CRITICAL_ERROR, e));
+                }
+                finally {
+                    busyLock.leaveBusy();
+                }
+            }
+        });
+
+        cctx.gridEvents().addDiscoveryEventListener(discoLsnr = (evt, discoCache) -> {
+            if (!busyLock.enterBusy())
+                return;
+
+            try {
+                UUID leftNodeId = evt.eventNode().id();
+
+                if (evt.type() == EVT_DISCOVERY_CUSTOM_EVT) {
+                    DiscoveryCustomEvent evt0 = (DiscoveryCustomEvent)evt;
+
+                    if (evt0.customMessage() instanceof InitMessage) {
+                        InitMessage<?> msg = (InitMessage<?>)evt0.customMessage();
+
+                        // This happens when #takeSnapshot() method already invoked and distributed process
+                        // starts its action.
+                        if (msg.type() == START_SNAPSHOT.ordinal()) {
+                            assert clusterSnpRq != null ||
+                                !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()) : evt;
+
+                            DiscoveryCustomEvent customEvt = new DiscoveryCustomEvent();
+
+                            customEvt.node(evt0.node());
+                            customEvt.eventNode(evt0.eventNode());
+                            customEvt.affinityTopologyVersion(evt0.affinityTopologyVersion());
+                            customEvt.customMessage(new SnapshotStartDiscoveryMessage(discoCache, msg.processId()));
+
+                            // Handle new event inside discovery thread, so no guarantees will be violated.
+                            cctx.exchange().onDiscoveryEvent(customEvt, discoCache);
+                        }
+                    }
+                }
+                else if (evt.type() == EVT_NODE_LEFT || evt.type() == EVT_NODE_FAILED) {
+                    SnapshotOperationRequest snpRq = clusterSnpRq;
+
+                    for (SnapshotFutureTask sctx : locSnpTasks.values()) {
+                        if (sctx.sourceNodeId().equals(leftNodeId) ||
+                            (snpRq != null &&
+                                snpRq.snpName.equals(sctx.snapshotName()) &&
+                                snpRq.bltNodes.contains(leftNodeId))) {
+                            sctx.acceptException(new ClusterTopologyCheckedException("The node which requested snapshot " +
+                                "creation has left the grid"));
+                        }
+                    }
+
+                    RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                    if (snpTrFut != null && snpTrFut.rmtNodeId.equals(leftNodeId)) {
+                        snpTrFut.onDone(new ClusterTopologyCheckedException("The node from which a snapshot has been " +
+                            "requested left the grid"));
+                    }
+                }
+            }
+            finally {
+                busyLock.leaveBusy();
+            }
+        }, EVT_NODE_LEFT, EVT_NODE_FAILED, EVT_DISCOVERY_CUSTOM_EVT);
+
+        // Remote snapshot handler.
+        cctx.kernalContext().io().addTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC, new TransmissionHandler() {
+            @Override public void onEnd(UUID nodeId) {
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                assert snpTrFut.stores.isEmpty() : snpTrFut.stores.entrySet();
+                assert snpTrFut.partsLeft == 0 : snpTrFut;
+
+                snpTrFut.onDone();
+
+                log.info("Requested snapshot from remote node has been fully received " +
+                    "[snpName=" + snpTrFut.snpName + ", snpTrans=" + snpTrFut + ']');
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onException(UUID nodeId, Throwable err) {
+                RemoteSnapshotFuture fut = rmtSnpReq.get();
+
+                if (fut == null)
+                    return;
+
+                if (fut.rmtNodeId.equals(nodeId))
+                    fut.onDone(err);
+            }
+
+            /** {@inheritDoc} */
+            @Override public String filePath(UUID nodeId, TransmissionMeta fileMeta) {
+                Integer partId = (Integer)fileMeta.params().get(SNP_PART_ID_PARAM);
+                String rmtDbNodePath = (String)fileMeta.params().get(SNP_DB_NODE_PATH_PARAM);
+                String cacheDirName = (String)fileMeta.params().get(SNP_CACHE_DIR_NAME_PARAM);
+
+                RemoteSnapshotFuture transFut = resolve(nodeId, fileMeta);
+
+                try {
+                    File cacheDir = U.resolveWorkDirectory(tmpWorkDir.getAbsolutePath(),
+                        Paths.get(transFut.snpName, rmtDbNodePath, cacheDirName).toString(),
+                        false);
+
+                    return new File(cacheDir, getPartitionFileName(partId)).getAbsolutePath();
+                }
+                catch (IgniteCheckedException e) {
+                    throw new IgniteException(e);
+                }
+            }
+
+            /**
+             * @param nodeId Remote node id.
+             * @param meta Transmission meta.
+             * @return Resolved transmission future.
+             */
+            private RemoteSnapshotFuture resolve(UUID nodeId, TransmissionMeta meta) {
+                String snpName = (String)meta.params().get(SNP_NAME_PARAM);
+                Integer partsCnt = (Integer)meta.params().get(SNP_PARTITIONS_CNT);
+
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                if (snpTrFut == null || !snpTrFut.snpName.equals(snpName)) {
+                    throw new TransmissionCancelledException("Stale snapshot transmission will be ignored " +
+                        "[snpName=" + snpName + ", meta=" + meta + ", snpTrFut=" + snpTrFut + ']');
+                }
+
+                assert snpTrFut.snpName.equals(snpName) && snpTrFut.rmtNodeId.equals(nodeId) :
+                    "Another transmission in progress [snpTrFut=" + snpTrFut + ", nodeId=" + snpName + ']';
+
+                if (snpTrFut.partsLeft == -1)
+                    snpTrFut.partsLeft = partsCnt;
+
+                return snpTrFut;
+            }
+
+            /**
+             * @param snpTrans Current snapshot transmission.
+             * @param grpPartId Pair of group id and its partition id.
+             */
+            private void finishRecover(RemoteSnapshotFuture snpTrans, GroupPartitionId grpPartId) {
+                FilePageStore pageStore = null;
+
+                try {
+                    pageStore = snpTrans.stores.remove(grpPartId);
+
+                    pageStore.finishRecover();
+
+                    snpTrans.partConsumer.accept(new File(pageStore.getFileAbsolutePath()), grpPartId);
+
+                    snpTrans.partsLeft--;
+                }
+                catch (StorageException e) {
+                    throw new IgniteException(e);
+                }
+                finally {
+                    U.closeQuiet(pageStore);
+                }
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<ByteBuffer> chunkHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+
+                RemoteSnapshotFuture snpTrFut = resolve(nodeId, initMeta);
+
+                GroupPartitionId grpPartId = new GroupPartitionId(grpId, partId);
+                FilePageStore pageStore = snpTrFut.stores.get(grpPartId);
+
+                if (pageStore == null) {
+                    throw new IgniteException("Partition must be loaded before applying snapshot delta pages " +
+                        "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                pageStore.beginRecover();
+
+                // No snapshot delta pages received. Finalize recovery.
+                if (initMeta.count() == 0)
+                    finishRecover(snpTrFut, grpPartId);
+
+                return new Consumer<ByteBuffer>() {
+                    final LongAdder transferred = new LongAdder();
+
+                    @Override public void accept(ByteBuffer buff) {
+                        try {
+                            assert initMeta.count() != 0 : initMeta;
+
+                            RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                            if (fut0 == null || !fut0.equals(snpTrFut) || fut0.isCancelled()) {
+                                throw new TransmissionCancelledException("Snapshot request is cancelled " +
+                                    "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                            }
+
+                            pageStore.write(PageIO.getPageId(buff), buff, 0, false);
+
+                            transferred.add(buff.capacity());
+
+                            if (transferred.longValue() == initMeta.count())
+                                finishRecover(snpTrFut, grpPartId);
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                    }
+                };
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<File> fileHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+                String snpName = (String)initMeta.params().get(SNP_NAME_PARAM);
+
+                assert grpId != null;
+                assert partId != null;
+                assert snpName != null;
+                assert storeFactory != null;
+
+                RemoteSnapshotFuture transFut = rmtSnpReq.get();
+
+                if (transFut == null) {
+                    throw new IgniteException("Snapshot transmission with given name doesn't exists " +
+                        "[snpName=" + snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                return new Consumer<File>() {
+                    @Override public void accept(File file) {
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.equals(transFut) || fut0.isCancelled()) {
+                            throw new TransmissionCancelledException("Snapshot request is cancelled [snpName=" + snpName +
+                                ", grpId=" + grpId + ", partId=" + partId + ']');
+                        }
+
+                        busyLock.enterBusy();
+
+                        try {
+                            FilePageStore pageStore = (FilePageStore)storeFactory
+                                .apply(grpId, false)
+                                .createPageStore(getFlagByPartId(partId),
+                                    file::toPath,
+                                    new LongAdderMetric("NO_OP", null));
+
+                            transFut.stores.put(new GroupPartitionId(grpId, partId), pageStore);
+
+                            pageStore.init();
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                        finally {
+                            busyLock.leaveBusy();
+                        }
+                    }
+                };
+            }
+        });
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void stop0(boolean cancel) {
+        busyLock.block();
+
+        try {
+            // Try stop all snapshot processing if not yet.
+            for (SnapshotFutureTask sctx : locSnpTasks.values())
+                sctx.acceptException(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+            locSnpTasks.clear();
+
+            RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+            if (snpTrFut != null)
+                snpTrFut.cancel();
+
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null) {
+                    clusterSnpFut.onDone(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+                    clusterSnpFut = null;
+                }
+            }
+
+            if (snpRunner != null)
+                snpRunner.shutdownNow();
+
+            cctx.kernalContext().io().removeMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC);
+            cctx.kernalContext().io().removeTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC);
+
+            if (discoLsnr != null)
+                cctx.kernalContext().event().removeDiscoveryEventListener(discoLsnr);
+
+            cctx.exchange().unregisterExchangeAwareComponent(this);
+        }
+        finally {
+            busyLock.unblock();
+        }
+    }
+
+    /**
+     * @param snpDir Snapshot dire
+     * @param folderName Local node folder name (see {@link U#maskForFileName} with consistent id).
+     */
+    public static void deleteSnapshot(File snpDir, String folderName) {
+        if (!snpDir.exists())
+            return;
+
+        assert snpDir.isDirectory() : snpDir;
+
+        try {
+            File binDir = resolveBinaryWorkDir(snpDir.getAbsolutePath(), folderName);
+            File dbDir = U.resolveWorkDirectory(snpDir.getAbsolutePath(), databaseRelativePath(folderName), false);
+
+            U.delete(binDir);
+            U.delete(dbDir);
+
+            File marshDir = mappingFileStoreWorkDir(snpDir.getAbsolutePath());
+
+            // Concurrently traverse the snapshot marshaller directory and delete all files.
+            Files.walkFileTree(marshDir.toPath(), new SimpleFileVisitor<Path>() {
+                @Override public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException {
+                    U.delete(file);
+
+                    return FileVisitResult.CONTINUE;
+                }
+
+                @Override public FileVisitResult visitFileFailed(Path file, IOException exc) {
+                    // Skip files which can be concurrently removed from FileTree.
+                    return FileVisitResult.CONTINUE;
+                }
+            });
+
+            File db = new File(snpDir, DB_DEFAULT_FOLDER);
+
+            if (!db.exists() || db.list().length == 0)
+                U.delete(snpDir);
+        }
+        catch (IOException | IgniteCheckedException e) {
+            throw new IgniteException(e);
+        }
+    }
+
+    /**
+     * @param snpName Snapshot name.
+     * @return Local snapshot directory for snapshot with given name.
+     */
+    public File snapshotLocalDir(String snpName) {
+        assert locSnpDir != null;
+
+        return new File(locSnpDir, snpName);
+    }
+
+    /**
+     * @return Node snapshot working directory.
+     */
+    public File snapshotTmpDir() {
+        assert tmpWorkDir != null;
+
+        return tmpWorkDir;
+    }
+
+    /**
+     * @param req Request on snapshot creation.
+     * @return Future which will be completed when a snapshot has been started.
+     */
+    private IgniteInternalFuture<SnapshotOperationResponse> initLocalSnapshotStartStage(SnapshotOperationRequest req) {
+        if (cctx.kernalContext().clientNode() ||
+            !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()))
+            return new GridFinishedFuture<>();
+
+        // Executed inside discovery notifier thread, prior to firing discovery custom event,
+        // so it is safe to set new snapshot task inside this method without synchronization.
+        if (clusterSnpRq != null) {
+            return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot operation has been rejected. " +
+                "Another snapshot operation in progress [req=" + req + ", curr=" + clusterSnpRq + ']'));
+        }
+
+        // Collection of pairs group and appropriate cache partition to be snapshot.
+        Map<Integer, Set<Integer>> parts = new HashMap<>();
+
+        for (Integer grpId : req.grpIds)
+            parts.put(grpId, null);
+
+        SnapshotFutureTask task0 = registerSnapshotTask(req.snpName,
+            req.srcNodeId,
+            parts,
+            locSndrFactory.apply(req.snpName));
+
+        clusterSnpRq = req;
+
+        return task0.chain(f -> new SnapshotOperationResponse());
+    }
+
+    /**
+     * @param id Request id.
+     * @param res Results.
+     * @param err Errors.
+     */
+    private void processLocalSnapshotStartStageResult(UUID id, Map<UUID, SnapshotOperationResponse> res, Map<UUID, Exception> err) {
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        if (snpRq == null)
+            return;
+
+        if (!snpRq.rqId.equals(id)) {
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null && clusterSnpFut.rqId.equals(id)) {
+                    clusterSnpFut.onDone(new IgniteCheckedException("Snapshot operation failed due to another snapshot " +
+                        "operation in progress: " + snpRq.snpName));
+
+                    clusterSnpFut = null;
+                }
+
+                return;
+            }
+        }
+
+        if (isLocalNodeCoordinator(cctx.discovery())) {
+            Set<UUID> missed = new HashSet<>(snpRq.bltNodes);
+            missed.removeAll(res.keySet());
+            missed.removeAll(err.keySet());
+
+            snpRq.hasErr = !F.isEmpty(err) || !missed.isEmpty();
+
+            if (snpRq.hasErr) {
+                U.warn(log, "Execution of local snapshot tasks fails or them haven't been executed " +
+                    "due to some of nodes left the cluster. Uncompleted snapshot will be deleted " +
+                    "[err=" + err + ", missed=" + missed + ']');
+            }
+
+            endSnpProc.start(UUID.randomUUID(), snpRq);
+        }
+    }
+
+    /**
+     * @param req Request on snapshot creation.
+     * @return Future which will be completed when the snapshot will be finalized.
+     */
+    private IgniteInternalFuture<SnapshotOperationResponse> initLocalSnapshotEndStage(SnapshotOperationRequest req) {
+        if (clusterSnpRq == null)
+            return new GridFinishedFuture<>(new SnapshotOperationResponse());
+
+        try {
+            if (req.hasErr)
+                deleteSnapshot(snapshotLocalDir(req.snpName), pdsSettings.folderName());
+
+            removeLastMetaStorageKey();
+        }
+        catch (Exception e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        return new GridFinishedFuture<>(new SnapshotOperationResponse());
+    }
+
+    /**
+     * @param id Request id.
+     * @param res Results.
+     * @param err Errors.
+     */
+    private void processLocalSnapshotEndStageResult(UUID id, Map<UUID, SnapshotOperationResponse> res, Map<UUID, Exception> err) {
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        if (snpRq == null)
+            return;
+
+        Set<UUID> endFail = new HashSet<>(snpRq.bltNodes);
+        endFail.removeAll(res.keySet());
+
+        clusterSnpRq = null;
+
+        synchronized (snpOpMux) {
+            if (clusterSnpFut != null) {
+                if (endFail.isEmpty() && !snpRq.hasErr) {
+                    clusterSnpFut.onDone();
+
+                    if (log.isInfoEnabled())
+                        log.info("Cluster-wide snapshot operation finished successfully [req=" + snpRq + ']');
+                }
+                else {
+                    clusterSnpFut.onDone(new IgniteCheckedException("Snapshot creation has been finished with an error. " +
+                        "Local snapshot tasks may not finished completely or finalizing results fails " +
+                        "[hasErr" + snpRq.hasErr + ", fail=" + endFail + ']'));
+                }
+
+                clusterSnpFut = null;
+            }
+        }
+    }
+
+    /**
+     * @return {@code True} if snapshot operation is in progress.
+     */
+    public boolean isSnapshotCreating() {
+        if (clusterSnpRq != null)
+            return true;
+
+        synchronized (snpOpMux) {
+            return clusterSnpRq != null || clusterSnpFut != null;
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public List<String> getSnapshots() {
+        if (cctx.kernalContext().clientNode())
+            throw new UnsupportedOperationException("Client and daemon nodes can not perform this operation.");
+
+        synchronized (snpOpMux) {
+            return Arrays.stream(locSnpDir.listFiles(File::isDirectory))
+                .map(File::getName)
+                .collect(Collectors.toList());
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public IgniteFuture<Void> createSnapshot(String name) {
+        A.notNullOrEmpty(name, "name");
+
+        try {
+            if (cctx.kernalContext().clientNode())
+                throw new UnsupportedOperationException("Client and daemon nodes can not perform this operation.");
+
+            if (!IgniteFeatures.allNodesSupports(cctx.discovery().allNodes(), PERSISTENCE_CACHE_SNAPSHOT))
+                throw new IgniteException("Not all nodes in the cluster support a snapshot operation.");
+
+            if (!active(cctx.kernalContext().state().clusterState().state()))
+                throw new IgniteException("Snapshot operation has been rejected. The cluster is inactive.");
+
+            DiscoveryDataClusterState clusterState = cctx.kernalContext().state().clusterState();
+
+            if (!clusterState.hasBaselineTopology())
+                throw new IgniteException("Snapshot operation has been rejected. The baseline topology is not configured for cluster.");
+
+            ClusterSnapshotFuture snpFut0;
+
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null && !clusterSnpFut.isDone())
+                    throw new IgniteException("Create snapshot request has been rejected. The previous snapshot operation was not completed.");
+
+                if (clusterSnpRq != null)
+                    throw new IgniteException("Create snapshot request has been rejected. Parallel snapshot processes are not allowed.");
+
+                if (getSnapshots().contains(name))
+                    throw new IgniteException("Create snapshot request has been rejected. Snapshot with given name already exists.");
+
+                snpFut0 = new ClusterSnapshotFuture(UUID.randomUUID(), name);
+
+                clusterSnpFut = snpFut0;
+                lastSeenSnpFut = snpFut0;
+            }
+
+            List<Integer> grps = cctx.cache().persistentGroups().stream()
+                .filter(g -> cctx.cache().cacheType(g.cacheOrGroupName()) == CacheType.USER)
+                .filter(g -> !g.config().isEncryptionEnabled())
+                .map(CacheGroupDescriptor::groupId)
+                .collect(Collectors.toList());
+
+            List<ClusterNode> srvNodes = cctx.discovery().serverNodes(AffinityTopologyVersion.NONE);
+
+            startSnpProc.start(snpFut0.rqId, new SnapshotOperationRequest(snpFut0.rqId,
+                cctx.localNodeId(),
+                name,
+                grps,
+                new HashSet<>(F.viewReadOnly(srvNodes,
+                    F.node2id(),
+                    (node) -> CU.baselineNode(node, clusterState)))));
+
+            if (log.isInfoEnabled())
+                log.info("Cluster-wide snapshot operation started [snpName=" + name + ", grps=" + grps + ']');
+
+            return new IgniteFutureImpl<>(snpFut0);
+        }
+        catch (Exception e) {
+            U.error(log, "Start snapshot operation failed", e);
+
+            lastSeenSnpFut = new ClusterSnapshotFuture(name, e);
+
+            return new IgniteFinishedFutureImpl<>(e);
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onReadyForReadWrite(ReadWriteMetastorage metaStorage) throws IgniteCheckedException {
+        synchronized (snpOpMux) {
+            this.metaStorage = metaStorage;
+
+            if (recovered)
+                removeLastMetaStorageKey();
+
+            recovered = false;
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onReadyForRead(ReadOnlyMetastorage metaStorage) throws IgniteCheckedException {
+        // Snapshot which has not been completed due to the local node crashed must be deleted.
+        String snpName = (String)metaStorage.read(SNP_RUNNING_KEY);
+
+        if (snpName == null)
+            return;
+
+        recovered = true;
+
+        for (File tmp : snapshotTmpDir().listFiles())
+            U.delete(tmp);
+
+        deleteSnapshot(snapshotLocalDir(snpName), pdsSettings.folderName());
+
+        if (log.isInfoEnabled()) {
+            log.info("Previous attempt to create snapshot fail due to the local node crash. All resources " +
+                "related to snapshot operation have been deleted: " + snpName);
+        }
+    }
+
+    /**
+     * @param evt Discovery event to check.
+     * @return {@code true} if exchange started by snapshot operation.
+     */
+    public static boolean isSnapshotOperation(DiscoveryEvent evt) {
+        return !evt.eventNode().isClient() &&
+            evt.type() == EVT_DISCOVERY_CUSTOM_EVT &&
+            ((DiscoveryCustomEvent)evt).customMessage() instanceof SnapshotStartDiscoveryMessage;
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onDoneBeforeTopologyUnlock(GridDhtPartitionsExchangeFuture fut) {
+        if (clusterSnpRq == null || cctx.kernalContext().clientNode())
+            return;
+
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        SnapshotFutureTask task = locSnpTasks.get(snpRq.snpName);
+
+        if (task == null)
+            return;
+
+        if (task.start()) {
+            cctx.database().forceCheckpoint(String.format("Start snapshot operation: %s", snpRq.snpName));
+
+            // schedule task on checkpoint and wait when it starts
+            try {
+                task.awaitStarted();
+            }
+            catch (IgniteCheckedException e) {
+                U.error(log, "Fail to wait while cluster-wide snapshot operation started", e);
+            }
+        }
+    }
+
+    /**
+     * @param parts Collection of pairs group and appropriate cache partition to be snapshot.
+     * @param rmtNodeId The remote node to connect to.
+     * @param partConsumer Received partition handler.
+     * @return Future which will be completed when requested snapshot fully received.
+     */
+    public IgniteInternalFuture<Void> createRemoteSnapshot(
+        UUID rmtNodeId,
+        Map<Integer, Set<Integer>> parts,
+        BiConsumer<File, GroupPartitionId> partConsumer
+    ) {
+        assert partConsumer != null;
+
+        ClusterNode rmtNode = cctx.discovery().node(rmtNodeId);
+
+        if (!nodeSupports(rmtNode, PERSISTENCE_CACHE_SNAPSHOT))
+            return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot on remote node is not supported: " + rmtNode.id()));
+
+        if (rmtNode == null) {
+            return new GridFinishedFuture<>(new ClusterTopologyCheckedException("Snapshot request cannot be performed. " +
+                "Remote node left the grid [rmtNodeId=" + rmtNodeId + ']'));
+        }
+
+        String snpName = RMT_SNAPSHOT_PREFIX + UUID.randomUUID().toString();
+
+        RemoteSnapshotFuture snpTransFut = new RemoteSnapshotFuture(rmtNodeId, snpName, partConsumer);
+
+        busyLock.enterBusy();
+        SnapshotRequestMessage msg0;
+
+        try {
+            msg0 = new SnapshotRequestMessage(snpName, parts);
+
+            RemoteSnapshotFuture fut = rmtSnpReq.get();
+
+            try {
+                if (fut != null)
+                    fut.get(DFLT_SNAPSHOT_TIMEOUT, TimeUnit.MILLISECONDS);
+            }
+            catch (IgniteCheckedException e) {
+                if (log.isInfoEnabled())
+                    log.info("The previous snapshot request finished with an exception:" + e.getMessage());
+            }
+
+            try {
+                if (rmtSnpReq.compareAndSet(null, snpTransFut)) {
+                    cctx.gridIO().sendOrderedMessage(rmtNode, DFLT_INITIAL_SNAPSHOT_TOPIC, msg0, SYSTEM_POOL,
+                        Long.MAX_VALUE, true);
+                }
+                else
+                    return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot request has been concurrently interrupted."));
+
+            }
+            catch (IgniteCheckedException e) {
+                rmtSnpReq.compareAndSet(snpTransFut, null);
+
+                return new GridFinishedFuture<>(e);
+            }
+        }
+        finally {
+            busyLock.leaveBusy();
+        }
+
+        if (log.isInfoEnabled()) {
+            log.info("Snapshot request is sent to the remote node [rmtNodeId=" + rmtNodeId +
+                ", msg0=" + msg0 + ", snpTransFut=" + snpTransFut +
+                ", topVer=" + cctx.discovery().topologyVersionEx() + ']');
+        }
+
+        return snpTransFut;
+    }
+
+    /**
+     * @param grps List of cache groups which will be destroyed.
+     */
+    public void onCacheGroupsStopped(List<Integer> grps) {
+        for (SnapshotFutureTask sctx : locSnpTasks.values()) {
+            Set<Integer> retain = new HashSet<>(grps);
+            retain.retainAll(sctx.affectedCacheGroups());
+
+            if (!retain.isEmpty()) {
+                sctx.acceptException(new IgniteCheckedException("Snapshot has been interrupted due to some of the required " +
+                    "cache groups stopped: " + retain));
+            }
+        }
+    }
+
+    /**
+     * @param snpName Unique snapshot name.
+     * @param srcNodeId Node id which cause snapshot operation.
+     * @param parts Collection of pairs group and appropriate cache partition to be snapshot.
+     * @param snpSndr Factory which produces snapshot receiver instance.
+     * @return Snapshot operation task which should be registered on checkpoint to run.
+     */
+    SnapshotFutureTask registerSnapshotTask(
+        String snpName,
+        UUID srcNodeId,
+        Map<Integer, Set<Integer>> parts,
+        SnapshotSender snpSndr
+    ) {
+        if (!busyLock.enterBusy())
+            return new SnapshotFutureTask(new IgniteCheckedException("Snapshot manager is stopping [locNodeId=" + cctx.localNodeId() + ']'));
+
+        try {
+            if (locSnpTasks.containsKey(snpName))
+                return new SnapshotFutureTask(new IgniteCheckedException("Snapshot with requested name is already scheduled: " + snpName));
+
+            SnapshotFutureTask snpFutTask;
+
+            SnapshotFutureTask prev = locSnpTasks.putIfAbsent(snpName,
+                snpFutTask = new SnapshotFutureTask(cctx,
+                    srcNodeId,
+                    snpName,
+                    tmpWorkDir,
+                    ioFactory,
+                    snpSndr,
+                    parts,
+                    locBuff));
+
+            if (prev != null)
+                return new SnapshotFutureTask(new IgniteCheckedException("Snapshot with requested name is already scheduled: " + snpName));
+
+            if (log.isInfoEnabled()) {
+                log.info("Snapshot task has been registered on local node [sctx=" + this +
+                    ", topVer=" + cctx.discovery().topologyVersionEx() + ']');
+            }
+
+            snpFutTask.listen(f -> locSnpTasks.remove(snpName));
+
+            return snpFutTask;
+        }
+        finally {
+            busyLock.leaveBusy();
+        }
+    }
+
+    /**
+     * @param factory Factory which produces {@link LocalSnapshotSender} implementation.
+     */
+    void setLocalSnapshotSenderFactory(Function<String, SnapshotSender> factory) {
+        locSndrFactory = factory;
+    }
+
+    /**
+     * @return Factory which produces {@link LocalSnapshotSender} implementation.
+     */
+    Function<String, SnapshotSender> localSnapshotSenderFactory() {
+        return LocalSnapshotSender::new;
+    }
+
+    /**
+     * @param snpName Snapshot name.
+     * @param rmtNodeId Remote node id to send snapshot to.
+     * @return Snapshot sender instance.
+     */
+    SnapshotSender remoteSnapshotSender(String snpName, UUID rmtNodeId) {
+        // Remote snapshots can be send only by single threaded executor since only one transmissionSender created.
+        return new RemoteSnapshotSender(log,
+            new SequentialExecutorWrapper(log, snpRunner),
+            () -> databaseRelativePath(pdsSettings.folderName()),
+            cctx.gridIO().openTransmissionSender(rmtNodeId, DFLT_INITIAL_SNAPSHOT_TOPIC),
+            snpName);
+    }
+
+    /** Snapshot finished successfully or already restored. Key can be removed. */
+    private void removeLastMetaStorageKey() throws IgniteCheckedException {
+        cctx.database().checkpointReadLock();
+
+        try {
+            metaStorage.remove(SNP_RUNNING_KEY);
+        }
+        finally {
+            cctx.database().checkpointReadUnlock();
+        }
+    }
+
+    /**
+     * @return The executor service used to run snapshot tasks.
+     */
+    ExecutorService snapshotExecutorService() {
+        assert snpRunner != null;
+
+        return snpRunner;
+    }
+
+    /**
+     * @param ioFactory Factory to create IO interface over a page stores.
+     */
+    void ioFactory(FileIOFactory ioFactory) {
+        this.ioFactory = ioFactory;
+    }
+
+    /**
+     * @param nodeId Remote node id on which requests has been registered.
+     * @return Snapshot future related to given node id.
+     */
+    SnapshotFutureTask lastScheduledRemoteSnapshotTask(UUID nodeId) {
+        return locSnpTasks.values().stream()
+            .filter(t -> t.type() == RemoteSnapshotSender.class && t.sourceNodeId().equals(nodeId))
+            .findFirst()
+            .orElse(null);
+    }
+
+    /**
+     * @return Relative configured path of persistence data storage directory for the local node.
+     * Example: {@code snapshotWorkDir/db/IgniteNodeName0}
+     */
+    static String databaseRelativePath(String folderName) {
+        return Paths.get(DB_DEFAULT_FOLDER, folderName).toString();
+    }
+
+    /**
+     * @param cfg Ignite configuration.
+     * @return Snapshot work path.
+     */
+    static File resolveSnapshotWorkDirectory(IgniteConfiguration cfg) {
+        try {
+            return cfg.getSnapshotPath() == null ?
+                U.resolveWorkDirectory(cfg.getWorkDirectory(), DFLT_SNAPSHOT_DIRECTORY, false) :
+                U.resolveWorkDirectory(cfg.getWorkDirectory(), cfg.getSnapshotPath(), false);
+        }
+        catch (IgniteCheckedException e) {
+            throw new IgniteException(e);
+        }
+    }
+
+    /** Remote snapshot future which tracks remote snapshot transmission result. */
+    private class RemoteSnapshotFuture extends GridFutureAdapter<Void> {
+        /** Snapshot name to create. */
+        private final String snpName;
+
+        /** Remote node id to request snapshot from. */
+        private final UUID rmtNodeId;
+
+        /** Collection of partition to be received. */
+        private final Map<GroupPartitionId, FilePageStore> stores = new ConcurrentHashMap<>();
+
+        /** Partition handler given by request initiator. */
+        private final BiConsumer<File, GroupPartitionId> partConsumer;
+
+        /** Counter which show how many partitions left to be received. */
+        private int partsLeft = -1;
+
+        /**
+         * @param partConsumer Received partition handler.
+         */
+        public RemoteSnapshotFuture(UUID rmtNodeId, String snpName, BiConsumer<File, GroupPartitionId> partConsumer) {
+            this.snpName = snpName;
+            this.rmtNodeId = rmtNodeId;
+            this.partConsumer = partConsumer;
+        }
+
+        /** {@inheritDoc} */
+        @Override public boolean cancel() {
+            return onCancelled();
+        }
+
+        /** {@inheritDoc} */
+        @Override protected boolean onDone(@Nullable Void res, @Nullable Throwable err, boolean cancel) {
+            assert err != null || cancel || stores.isEmpty() : "Not all file storage processed: " + stores;
+
+            rmtSnpReq.compareAndSet(this, null);
+
+            if (err != null || cancel) {
+                // Close non finished file storage.
+                for (Map.Entry<GroupPartitionId, FilePageStore> entry : stores.entrySet()) {
+                    FilePageStore store = entry.getValue();
+
+                    try {
+                        store.stop(true);
+                    }
+                    catch (StorageException e) {
+                        log.warning("Error stopping received file page store", e);
+                    }
+                }
+            }
+
+            U.delete(Paths.get(tmpWorkDir.getAbsolutePath(), snpName));
+
+            return super.onDone(res, err, cancel);
+        }
+
+        /** {@inheritDoc} */
+        @Override public boolean equals(Object o) {
+            if (this == o)
+                return true;
+
+            if (o == null || getClass() != o.getClass())
+                return false;
+
+            RemoteSnapshotFuture fut = (RemoteSnapshotFuture)o;
+
+            return rmtNodeId.equals(fut.rmtNodeId) &&
+                snpName.equals(fut.snpName);
+        }
+
+        /** {@inheritDoc} */
+        @Override public int hashCode() {
+            return Objects.hash(rmtNodeId, snpName);
+        }
+
+        /** {@inheritDoc} */
+        @Override public String toString() {
+            return S.toString(RemoteSnapshotFuture.class, this);
+        }
+    }
+
+    /**
+     * Such an executor can executes tasks not in a single thread, but executes them
+     * on different threads sequentially. It's important for some {@link SnapshotSender}'s
+     * to process sub-task sequentially due to all these sub-tasks may share a single socket
+     * channel to send data to.
+     */
+    private static class SequentialExecutorWrapper implements Executor {
+        /** Ignite logger. */
+        private final IgniteLogger log;
+
+        /** Queue of task to execute. */
+        private final Queue<Runnable> tasks = new ArrayDeque<>();
+
+        /** Delegate executor. */
+        private final Executor executor;
+
+        /** Currently running task. */
+        private volatile Runnable active;
+
+        /** If wrapped executor is shutting down. */
+        private volatile boolean stopping;
+
+        /**
+         * @param executor Executor to run tasks on.
+         */
+        public SequentialExecutorWrapper(IgniteLogger log, Executor executor) {
+            this.log = log.getLogger(SequentialExecutorWrapper.class);
+            this.executor = executor;
+        }
+
+        /** {@inheritDoc} */
+        @Override public synchronized void execute(final Runnable r) {
+            assert !stopping : "Task must be cancelled prior to the wrapped executor is shutting down.";
+
+            tasks.offer(() -> {
+                try {
+                    r.run();
+                }
+                finally {
+                    scheduleNext();
+                }
+            });
+
+            if (active == null)
+                scheduleNext();
+        }
+
+        /** */
+        protected synchronized void scheduleNext() {
+            if ((active = tasks.poll()) != null) {
+                try {
+                    executor.execute(active);
+                }
+                catch (RejectedExecutionException e) {
+                    tasks.clear();
+
+                    stopping = true;
+
+                    log.warning("Task is outdated. Wrapped executor is shutting down.", e);
+                }
+            }
+        }
+    }
+
+    /**
+     *
+     */
+    private static class RemoteSnapshotSender extends SnapshotSender {
+        /** The sender which sends files to remote node. */
+        private final GridIoManager.TransmissionSender sndr;
+
+        /** Relative node path initializer. */
+        private final Supplier<String> initPath;
+
+        /** Snapshot name */
+        private final String snpName;
+
+        /** Local node persistent directory with consistent id. */
+        private String relativeNodePath;
+
+        /** The number of cache partition files expected to be processed. */
+        private int partsCnt;
+
+        /**
+         * @param log Ignite logger.
+         * @param sndr File sender instance.
+         * @param snpName Snapshot name.
+         */
+        public RemoteSnapshotSender(
+            IgniteLogger log,
+            Executor exec,
+            Supplier<String> initPath,
+            GridIoManager.TransmissionSender sndr,
+            String snpName
+        ) {
+            super(log, exec);
+
+            this.sndr = sndr;
+            this.snpName = snpName;
+            this.initPath = initPath;
+        }
+
+        /** {@inheritDoc} */
+        @Override protected void init(int partsCnt) {
+            this.partsCnt = partsCnt;
+
+            relativeNodePath = initPath.get();
+
+            if (relativeNodePath == null)
+                throw new IgniteException("Relative node path cannot be empty.");
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendPart0(File part, String cacheDirName, GroupPartitionId pair, Long len) {
+            try {
+                assert part.exists();
+                assert len > 0 : "Requested partitions has incorrect file length " +
+                    "[pair=" + pair + ", cacheDirName=" + cacheDirName + ']';
+
+                sndr.send(part, 0, len, transmissionParams(snpName, cacheDirName, pair), TransmissionPolicy.FILE);
+
+                if (log.isInfoEnabled()) {
+                    log.info("Partition file has been send [part=" + part.getName() + ", pair=" + pair +
+                        ", length=" + len + ']');
+                }
+            }
+            catch (TransmissionCancelledException e) {
+                if (log.isInfoEnabled()) {
+                    log.info("Transmission partition file has been interrupted [part=" + part.getName() +
+                        ", pair=" + pair + ']');
+                }
+            }
+            catch (IgniteCheckedException | InterruptedException | IOException e) {
+                U.error(log, "Error sending partition file [part=" + part.getName() + ", pair=" + pair +
+                    ", length=" + len + ']', e);
+
+                throw new IgniteException(e);
+            }
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendDelta0(File delta, String cacheDirName, GroupPartitionId pair) {
+            try {
+                sndr.send(delta, transmissionParams(snpName, cacheDirName, pair), TransmissionPolicy.CHUNK);
+
+                if (log.isInfoEnabled())
+                    log.info("Delta pages storage has been send [part=" + delta.getName() + ", pair=" + pair + ']');
+            }
+            catch (TransmissionCancelledException e) {
+                if (log.isInfoEnabled()) {
+                    log.info("Transmission delta pages has been interrupted [part=" + delta.getName() +
+                        ", pair=" + pair + ']');
+                }
+            }
+            catch (IgniteCheckedException | InterruptedException | IOException e) {
+                U.error(log, "Error sending delta file  [part=" + delta.getName() + ", pair=" + pair + ']', e);
+
+                throw new IgniteException(e);
+            }
+        }
+
+        /**
+         * @param cacheDirName Cache directory name.
+         * @param pair Cache group id with corresponding partition id.
+         * @return Map of params.
+         */
+        private Map<String, Serializable> transmissionParams(String snpName, String cacheDirName,
+            GroupPartitionId pair) {
+            Map<String, Serializable> params = new HashMap<>();
+
+            params.put(SNP_GRP_ID_PARAM, pair.getGroupId());
+            params.put(SNP_PART_ID_PARAM, pair.getPartitionId());
+            params.put(SNP_DB_NODE_PATH_PARAM, relativeNodePath);
+            params.put(SNP_CACHE_DIR_NAME_PARAM, cacheDirName);
+            params.put(SNP_NAME_PARAM, snpName);
+            params.put(SNP_PARTITIONS_CNT, partsCnt);
+
+            return params;
+        }
+
+        /** {@inheritDoc} */
+        @Override public void close0(@Nullable Throwable th) {
+            U.closeQuiet(sndr);
+
+            if (th == null) {
+                if (log.isInfoEnabled())
+                    log.info("The remote snapshot sender closed normally [snpName=" + snpName + ']');
+            }
+            else {
+                U.warn(log, "The remote snapshot sender closed due to an error occurred while processing " +
+                    "snapshot operation [snpName=" + snpName + ']', th);
+            }
+        }
+    }
+
+    /**
+     * Snapshot sender which writes all data to local directory.
+     */
+    private class LocalSnapshotSender extends SnapshotSender {
+        /** Snapshot name. */
+        private final String snpName;
+
+        /** Local snapshot directory. */
+        private final File snpLocDir;
+
+        /** Local node snapshot directory calculated on snapshot directory. */
+        private File dbDir;
+
+        /** Size of page. */
+        private final int pageSize;
+
+        /**
+         * @param snpName Snapshot name.
+         */
+        public LocalSnapshotSender(String snpName) {
+            super(IgniteSnapshotManager.this.log, snpRunner);
+
+            this.snpName = snpName;
+            snpLocDir = snapshotLocalDir(snpName);
+            pageSize = cctx.kernalContext().config().getDataStorageConfiguration().getPageSize();
+        }
+
+        /** {@inheritDoc} */
+        @Override protected void init(int partsCnt) {
+            dbDir = new File (snpLocDir, databaseRelativePath(pdsSettings.folderName()));
+
+            if (dbDir.exists()) {
+                throw new IgniteException("Snapshot with given name already exists " +
+                    "[snpName=" + snpName + ", absPath=" + dbDir.getAbsolutePath() + ']');
+            }
+
+            cctx.database().checkpointReadLock();
+
+            try {
+                assert metaStorage != null && metaStorage.read(SNP_RUNNING_KEY) == null :
+                    "The previous snapshot hasn't been completed correctly";
+
+                metaStorage.write(SNP_RUNNING_KEY, snpName);
+
+                U.ensureDirectory(dbDir, "snapshot work directory", log);
+            }
+            catch (IgniteCheckedException e) {
+                throw new IgniteException(e);
+            }
+            finally {
+                cctx.database().checkpointReadUnlock();
+            }
+
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendCacheConfig0(File ccfg, String cacheDirName) {
+            assert dbDir != null;
+
+            try {
+                File cacheDir = U.resolveWorkDirectory(dbDir.getAbsolutePath(), cacheDirName, false);
+
+                copy(ccfg, new File(cacheDir, ccfg.getName()), ccfg.length());
+            }
+            catch (IgniteCheckedException | IOException e) {
+                throw new IgniteException(e);
+            }
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendMarshallerMeta0(List<Map<Integer, MappedName>> mappings) {
+            if (mappings == null)
+                return;
+
+            saveMappings(cctx.kernalContext(), mappings, snpLocDir);
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendBinaryMeta0(Collection<BinaryType> types) {
+            if (types == null)
+                return;
+
+            cctx.kernalContext().cacheObjects().saveMetadata(types, snpLocDir);
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendPart0(File part, String cacheDirName, GroupPartitionId pair, Long len) {
+            try {
+                if (len == 0)
+                    return;
+
+                File cacheDir = U.resolveWorkDirectory(dbDir.getAbsolutePath(), cacheDirName, false);
+
+                File snpPart = new File(cacheDir, part.getName());
+
+                if (!snpPart.exists() || snpPart.delete())
+                    snpPart.createNewFile();
+
+                copy(part, snpPart, len);
+
+                if (log.isInfoEnabled()) {
+                    log.info("Partition has been snapshot [snapshotDir=" + dbDir.getAbsolutePath() +
+                        ", cacheDirName=" + cacheDirName + ", part=" + part.getName() +
+                        ", length=" + part.length() + ", snapshot=" + snpPart.getName() + ']');
+                }
+            }
+            catch (IOException | IgniteCheckedException ex) {
+                throw new IgniteException(ex);
+            }
+        }
+
+        /** {@inheritDoc} */
+        @Override public void sendDelta0(File delta, String cacheDirName, GroupPartitionId pair) {
+            File snpPart = getPartitionFile(dbDir, cacheDirName, pair.getPartitionId());
+
+            if (log.isInfoEnabled()) {
+                log.info("Start partition snapshot recovery with the given delta page file [part=" + snpPart +
+                    ", delta=" + delta + ']');
+            }
+
+            try (FileIO fileIo = ioFactory.create(delta, READ);
+                 FilePageStore pageStore = (FilePageStore)storeFactory
+                     .apply(pair.getGroupId(), false)
+                     .createPageStore(getFlagByPartId(pair.getPartitionId()),
+                         snpPart::toPath,
+                         new LongAdderMetric("NO_OP", null))
+            ) {
+                ByteBuffer pageBuf = ByteBuffer.allocate(pageSize)
+                    .order(ByteOrder.nativeOrder());
+
+                long totalBytes = fileIo.size();
+
+                assert totalBytes % pageSize == 0 : "Given file with delta pages has incorrect size: " + fileIo.size();
+
+                pageStore.beginRecover();
+
+                for (long pos = 0; pos < totalBytes; pos += pageSize) {
+                    long read = fileIo.readFully(pageBuf, pos);
+
+                    assert read == pageBuf.capacity();
+
+                    pageBuf.flip();
+
+                    if (log.isDebugEnabled()) {
+                        log.debug("Read page given delta file [path=" + delta.getName() +
+                            ", pageId=" + PageIO.getPageId(pageBuf) + ", pos=" + pos + ", pages=" + (totalBytes / pageSize) +
+                            ", crcBuff=" + FastCrc.calcCrc(pageBuf, pageBuf.limit()) + ", crcPage=" + PageIO.getCrc(pageBuf) + ']');
+
+                        pageBuf.rewind();
+                    }
+
+                    pageStore.write(PageIO.getPageId(pageBuf), pageBuf, 0, false);
+
+                    pageBuf.flip();
+                }
+
+                pageStore.finishRecover();
+            }
+            catch (IOException | IgniteCheckedException e) {
+                throw new IgniteException(e);
+            }
+        }
+
+        /** {@inheritDoc} */
+        @Override protected void close0(@Nullable Throwable th) {
+            if (th == null) {
+                if (log.isInfoEnabled())
+                    log.info("Local snapshot sender closed, resources released [dbNodeSnpDir=" + dbDir + ']');
+            }
+            else {
+                deleteSnapshot(snpLocDir, pdsSettings.folderName());
+
+                U.warn(log, "Local snapshot sender closed due to an error occurred", th);
+            }
+        }
+
+        /**
+         * @param from Copy from file.
+         * @param to Copy data to file.
+         * @param length Number of bytes to copy from beginning.
+         * @throws IOException If fails.
+         */
+        private void copy(File from, File to, long length) throws IOException {
+            try (FileIO src = ioFactory.create(from, READ);
+                 FileChannel dest = new FileOutputStream(to).getChannel()) {
+                if (src.size() < length) {
+                    throw new IgniteException("The source file to copy has to enough length " +
+                        "[expected=" + length + ", actual=" + src.size() + ']');
+                }
+
+                src.position(0);
+
+                long written = 0;
+
+                while (written < length)
+                    written += src.transferTo(written, length - written, dest);
+            }
+        }
+    }
+
+    /** Snapshot start request for {@link DistributedProcess} initiate message. */
+    private static class SnapshotOperationRequest implements Serializable {
+        /** Serial version uid. */
+        private static final long serialVersionUID = 0L;
+
+        /** Unique snapshot request id. */
+        private final UUID rqId;
+
+        /** Source node id which trigger request. */
+        private final UUID srcNodeId;
+
+        /** Snapshot name. */
+        private final String snpName;
+
+        @GridToStringInclude
+        /** The list of cache groups to include into snapshot. */
+        private final List<Integer> grpIds;
+
+        @GridToStringInclude
+        /** The list of affected by snapshot operation baseline nodes. */
+        private final Set<UUID> bltNodes;
+
+        /** {@code true} if an execution of local snapshot tasks failed with an error. */
+        private volatile boolean hasErr;
+
+        /**
+         * @param snpName Snapshot name.
+         * @param grpIds Cache groups to include into snapshot.
+         */
+        public SnapshotOperationRequest(UUID rqId, UUID srcNodeId, String snpName, List<Integer> grpIds, Set<UUID> bltNodes) {
+            this.rqId = rqId;
+            this.srcNodeId = srcNodeId;
+            this.snpName = snpName;
+            this.grpIds = grpIds;
+            this.bltNodes = bltNodes;
+        }
+
+        /** {@inheritDoc} */
+        @Override public String toString() {
+            return S.toString(SnapshotOperationRequest.class, this);
+        }
+    }
+
+    /** */
+    private static class SnapshotOperationResponse implements Serializable {
+        /** Serial version uid. */
+        private static final long serialVersionUID = 0L;
+    }
+
+    /** Snapshot operation start message. */
+    private static class SnapshotStartDiscoveryMessage implements SnapshotDiscoveryMessage {
+        /** Serial version UID. */
+        private static final long serialVersionUID = 0L;
+
+        /** Discovery cache. */
+        private final DiscoCache discoCache;
+
+        /** Snapshot request id */
+        private final IgniteUuid id;
+
+        /**
+         * @param discoCache Discovery cache.
+         * @param id Snapshot request id.
+         */
+        public SnapshotStartDiscoveryMessage(DiscoCache discoCache, UUID id) {
+            this.discoCache = discoCache;
+            this.id = new IgniteUuid(id, 0);
+        }
+
+        /** {@inheritDoc} */
+        @Override public boolean needExchange() {
+            return true;
+        }
+
+        /** {@inheritDoc} */
+        @Override public boolean needAssignPartitions() {
+            return false;
+        }
+
+        /** {@inheritDoc} */
+        @Override public IgniteUuid id() {
+            return id;
+        }
+
+        /** {@inheritDoc} */
+        @Override public @Nullable DiscoveryCustomMessage ackMessage() {
+            return null;
+        }
+
+        /** {@inheritDoc} */
+        @Override public boolean isMutable() {
+            return false;
+        }
+
+        /** {@inheritDoc} */
+        @Override public DiscoCache createDiscoCache(GridDiscoveryManager mgr, AffinityTopologyVersion topVer,
+            DiscoCache discoCache) {
+            return this.discoCache;
+        }
+
+        /** {@inheritDoc} */
+        @Override public boolean equals(Object o) {
+            if (this == o)
+                return true;
+
+            if (o == null || getClass() != o.getClass())
+                return false;
+
+            SnapshotStartDiscoveryMessage message = (SnapshotStartDiscoveryMessage)o;
 
 Review comment:
   I've removed `equals` and `hashCode` methods.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r408710467
 
 

 ##########
 File path: modules/platforms/dotnet/Apache.Ignite.Core.Tests/Services/ServicesTest.cs
 ##########
 @@ -870,7 +870,7 @@ public void TestCallJavaService()
                 binSvc.testBinaryObject(
                     Grid1.GetBinary().ToBinary<IBinaryObject>(new PlatformComputeBinarizable {Field = 6}))
                     .GetField<int>("Field"));
-            
+
 
 Review comment:
   Still not fixed

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r409511679
 
 

 ##########
 File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManagerSelfTest.java
 ##########
 @@ -0,0 +1,770 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.nio.file.OpenOption;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.CountDownLatch;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.function.BiConsumer;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.cache.affinity.rendezvous.RendezvousAffinityFunction;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.IgniteInterruptedCheckedException;
+import org.apache.ignite.internal.TestRecordingCommunicationSpi;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionDemandMessage;
+import org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionState;
+import org.apache.ignite.internal.processors.cache.persistence.CheckpointState;
+import org.apache.ignite.internal.processors.cache.persistence.DbCheckpointListener;
+import org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIODecorator;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.filename.PdsFolderSettings;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.util.lang.GridAbsPredicate;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.testframework.GridTestUtils;
+import org.junit.Test;
+
+import static org.apache.ignite.internal.MarshallerContextImpl.mappingFileStoreWorkDir;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.resolveBinaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.cacheDirName;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.CP_SNAPSHOT_REASON;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.testframework.GridTestUtils.assertThrowsAnyCause;
+
+/**
+ * Default snapshot manager test.
+ */
+public class IgniteSnapshotManagerSelfTest extends AbstractSnapshotSelfTest {
+    /** @throws Exception If fails. */
+    @Test
+    public void testSnapshotLocalPartitions() throws Exception {
+        // Start grid node with data before each test.
+        IgniteEx ig = startGridWithCache(dfltCacheCfg, 2048);
+
+        // The following data will be included into checkpoint.
+        for (int i = 2048; i < 4096; i++)
+            ig.cache(DEFAULT_CACHE_NAME).put(i, new TestOrderItem(i, i));
+
+        for (int i = 4096; i < 8192; i++) {
+            ig.cache(DEFAULT_CACHE_NAME).put(i, new TestOrderItem(i, i) {
+                @Override public String toString() {
+                    return "_" + super.toString();
+                }
+            });
+        }
+
+        GridCacheSharedContext<?, ?> cctx = ig.context().cache().context();
+        IgniteSnapshotManager mgr = snp(ig);
+
+        // Collection of pairs group and appropriate cache partition to be snapshot.
+        IgniteInternalFuture<?> snpFut = startLocalSnapshotTask(cctx,
+            SNAPSHOT_NAME,
+            F.asMap(CU.cacheId(DEFAULT_CACHE_NAME), null),
+            mgr.localSnapshotSenderFactory().apply(SNAPSHOT_NAME));
+
+        snpFut.get();
+
+        File cacheWorkDir = ((FilePageStoreManager)ig.context()
+            .cache()
+            .context()
+            .pageStore())
+            .cacheWorkDir(dfltCacheCfg);
+
+        // Checkpoint forces on cluster deactivation (currently only single node in cluster),
+        // so we must have the same data in snapshot partitions and those which left
+        // after node stop.
+        stopGrid(ig.name());
+
+        // Calculate CRCs.
+        IgniteConfiguration cfg = ig.context().config();
+        PdsFolderSettings settings = ig.context().pdsFolderResolver().resolveFolders();
+        String nodePath = databaseRelativePath(settings.folderName());
+        File binWorkDir = resolveBinaryWorkDir(cfg.getWorkDirectory(), settings.folderName());
+        File marshWorkDir = mappingFileStoreWorkDir(U.workDirectory(cfg.getWorkDirectory(), cfg.getIgniteHome()));
+        File snpBinWorkDir = resolveBinaryWorkDir(mgr.snapshotLocalDir(SNAPSHOT_NAME).getAbsolutePath(), settings.folderName());
+        File snpMarshWorkDir = mappingFileStoreWorkDir(mgr.snapshotLocalDir(SNAPSHOT_NAME).getAbsolutePath());
+
+        final Map<String, Integer> origPartCRCs = calculateCRC32Partitions(cacheWorkDir);
+        final Map<String, Integer> snpPartCRCs = calculateCRC32Partitions(
+            FilePageStoreManager.cacheWorkDir(U.resolveWorkDirectory(mgr.snapshotLocalDir(SNAPSHOT_NAME)
+                    .getAbsolutePath(),
+                nodePath,
+                false),
+                cacheDirName(dfltCacheCfg)));
+
+        assertEquals("Partitions must have the same CRC after file copying and merging partition delta files",
+            origPartCRCs, snpPartCRCs);
+        assertEquals("Binary object mappings must be the same for local node and created snapshot",
+            calculateCRC32Partitions(binWorkDir), calculateCRC32Partitions(snpBinWorkDir));
+        assertEquals("Marshaller meta mast be the same for local node and created snapshot",
+            calculateCRC32Partitions(marshWorkDir), calculateCRC32Partitions(snpMarshWorkDir));
+
+        File snpWorkDir = mgr.snapshotTmpDir();
+
+        assertEquals("Snapshot working directory must be cleaned after usage", 0, snpWorkDir.listFiles().length);
+    }
+
+    /**
+     * Test that all partitions are copied successfully even after multiple checkpoints occur during
+     * the long copy of cache partition files.
+     *
+     * Data consistency checked through a test node started right from snapshot directory and all values
+     * read successes.
+     *
+     * @throws Exception If fails.
+     */
+    @Test
+    public void testSnapshotLocalPartitionMultiCpWithLoad() throws Exception {
+        int valMultiplier = 2;
+        CountDownLatch slowCopy = new CountDownLatch(1);
+
+        // Start grid node with data before each test.
+        IgniteEx ig = startGrid(0);
+
+        ig.cluster().baselineAutoAdjustEnabled(false);
+        ig.cluster().state(ClusterState.ACTIVE);
+        GridCacheSharedContext<?, ?> cctx = ig.context().cache().context();
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            ig.cache(DEFAULT_CACHE_NAME).put(i, new TestOrderItem(i, i));
+
+        forceCheckpoint(ig);
+
+        AtomicInteger cntr = new AtomicInteger();
+        CountDownLatch ldrLatch = new CountDownLatch(1);
+        IgniteSnapshotManager mgr = snp(ig);
+        GridCacheDatabaseSharedManager db = (GridCacheDatabaseSharedManager)cctx.database();
+
+        IgniteInternalFuture<?> loadFut = GridTestUtils.runMultiThreadedAsync(() -> {
+            try {
+                U.await(ldrLatch);
+
+                while (!Thread.currentThread().isInterrupted())
+                    ig.cache(DEFAULT_CACHE_NAME).put(cntr.incrementAndGet(),
+                        new TestOrderItem(cntr.incrementAndGet(), cntr.incrementAndGet()));
+            }
+            catch (IgniteInterruptedCheckedException e) {
+                log.warning("Loader has been interrupted", e);
+            }
+        }, 5, "cache-loader-");
+
+        // Register task but not schedule it on the checkpoint.
+        SnapshotFutureTask snpFutTask = mgr.registerSnapshotTask(SNAPSHOT_NAME,
+            cctx.localNodeId(),
+            F.asMap(CU.cacheId(DEFAULT_CACHE_NAME), null),
+            new DelegateSnapshotSender(log, mgr.snapshotExecutorService(), mgr.localSnapshotSenderFactory().apply(SNAPSHOT_NAME)) {
+                @Override public void sendPart0(File part, String cacheDirName, GroupPartitionId pair, Long length) {
+                    try {
+                        U.await(slowCopy);
+
+                        delegate.sendPart0(part, cacheDirName, pair, length);
+                    }
+                    catch (IgniteInterruptedCheckedException e) {
+                        throw new IgniteException(e);
+                    }
+                }
+            });
+
+        db.addCheckpointListener(new DbCheckpointListener() {
+            /** {@inheritDoc} */
+            @Override public void beforeCheckpointBegin(Context ctx) {
+                // No-op.
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onMarkCheckpointBegin(Context ctx) {
+                // No-op.
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onCheckpointBegin(Context ctx) {
+                Map<Integer, Set<Integer>> processed = GridTestUtils.getFieldValue(snpFutTask,
+                    SnapshotFutureTask.class,
+                    "processed");
+
+                if (!processed.isEmpty())
+                    ldrLatch.countDown();
+            }
+        });
+
+        try {
+            snpFutTask.start();
+
+            // Change data before snapshot creation which must be included into it witch correct value multiplier.
+            for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+                ig.cache(DEFAULT_CACHE_NAME).put(i, new TestOrderItem(i, valMultiplier * i));
+
+            // Snapshot is still in the INIT state. beforeCheckpoint has been skipped
+            // due to checkpoint already running and we need to schedule the next one
+            // right after current will be completed.
+            cctx.database().forceCheckpoint(String.format(CP_SNAPSHOT_REASON, SNAPSHOT_NAME));
+
+            snpFutTask.awaitStarted();
+
+            db.forceCheckpoint("snapshot is ready to be created")
+                .futureFor(CheckpointState.MARKER_STORED_TO_DISK)
+                .get();
+
+            // Change data after snapshot.
+            for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+                ig.cache(DEFAULT_CACHE_NAME).put(i, new TestOrderItem(i, 3 * i));
+
+            // Snapshot on the next checkpoint must copy page to delta file before write it to a partition.
+            forceCheckpoint(ig);
+
+            slowCopy.countDown();
+
+            snpFutTask.get();
+        }
+        finally {
+            loadFut.cancel();
+        }
+
+        // Now can stop the node and check created snapshots.
+        stopGrid(0);
+
+        cleanPersistenceDir(ig.name());
+
+        // Start Ignite instance from snapshot directory.
+        IgniteEx ig2 = startGridsFromSnapshot(1, SNAPSHOT_NAME);
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++) {
+            assertEquals("snapshot data consistency violation [key=" + i + ']',
+                i * valMultiplier, ((TestOrderItem)ig2.cache(DEFAULT_CACHE_NAME).get(i)).value);
+        }
+    }
+
+    /** @throws Exception If fails. */
+    @Test
+    public void testSnapshotLocalPartitionNotEnoughSpace() throws Exception {
+        String err_msg = "Test exception. Not enough space.";
+        AtomicInteger throwCntr = new AtomicInteger();
+        RandomAccessFileIOFactory ioFactory = new RandomAccessFileIOFactory();
+
+        IgniteEx ig = startGridWithCache(dfltCacheCfg.setAffinity(new ZeroPartitionAffinityFunction()),
+            CACHE_KEYS_RANGE);
+
+        // Change data after backup.
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            ig.cache(DEFAULT_CACHE_NAME).put(i, 2 * i);
+
+        GridCacheSharedContext<?, ?> cctx0 = ig.context().cache().context();
+
+        IgniteSnapshotManager mgr = snp(ig);
+
+        mgr.ioFactory(new FileIOFactory() {
+            @Override public FileIO create(File file, OpenOption... modes) throws IOException {
+                FileIO fileIo = ioFactory.create(file, modes);
+
+                if (file.getName().equals(IgniteSnapshotManager.partDeltaFileName(0)))
+                    return new FileIODecorator(fileIo) {
+                        @Override public int writeFully(ByteBuffer srcBuf) throws IOException {
+                            if (throwCntr.incrementAndGet() == 3)
+                                throw new IOException(err_msg);
+
+                            return super.writeFully(srcBuf);
+                        }
+                    };
+
+                return fileIo;
+            }
+        });
+
+        IgniteInternalFuture<?> snpFut = startLocalSnapshotTask(cctx0,
+            SNAPSHOT_NAME,
+            F.asMap(CU.cacheId(DEFAULT_CACHE_NAME), null),
+            mgr.localSnapshotSenderFactory().apply(SNAPSHOT_NAME));
+
+        // Check the right exception thrown.
+        assertThrowsAnyCause(log,
+            snpFut::get,
+            IOException.class,
+            err_msg);
+    }
+
+    /** @throws Exception If fails. */
+    @Test
+    public void testSnapshotCreateLocalCopyPartitionFail() throws Exception {
+        String err_msg = "Test. Fail to copy partition: ";
+        IgniteEx ig = startGridWithCache(dfltCacheCfg, CACHE_KEYS_RANGE);
+
+        Map<Integer, Set<Integer>> parts = new HashMap<>();
+        parts.put(CU.cacheId(DEFAULT_CACHE_NAME), new HashSet<>(Collections.singletonList(0)));
+
+        IgniteSnapshotManager mgr0 = snp(ig);
+
+        IgniteInternalFuture<?> fut = startLocalSnapshotTask(ig.context().cache().context(),
+            SNAPSHOT_NAME,
+            parts,
+            new DelegateSnapshotSender(log, mgr0.snapshotExecutorService(),
+                mgr0.localSnapshotSenderFactory().apply(SNAPSHOT_NAME)) {
+                @Override public void sendPart0(File part, String cacheDirName, GroupPartitionId pair, Long length) {
+                    if (pair.getPartitionId() == 0)
+                        throw new IgniteException(err_msg + pair);
+
+                    delegate.sendPart0(part, cacheDirName, pair, length);
+                }
+            });
+
+        assertThrowsAnyCause(log,
+            fut::get,
+            IgniteException.class,
+            err_msg);
+    }
+
+    /** @throws Exception If fails. */
+    @Test
+    public void testSnapshotRemotePartitionsWithLoad() throws Exception {
+        IgniteEx ig0 = startGrids(2);
+
+        ig0.cluster().state(ClusterState.ACTIVE);
+
+        AtomicInteger cntr = new AtomicInteger();
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            ig0.cache(DEFAULT_CACHE_NAME).put(i, cntr.incrementAndGet());
+
+        GridCacheSharedContext<?, ?> cctx1 = grid(1).context().cache().context();
+        GridCacheDatabaseSharedManager db1 = (GridCacheDatabaseSharedManager)cctx1.database();
+
+        forceCheckpoint();
+
+        Map<String, Integer> rmtPartCRCs = new HashMap<>();
+        CountDownLatch cancelLatch = new CountDownLatch(1);
+
+        db1.addCheckpointListener(new DbCheckpointListener() {
+            /** {@inheritDoc} */
+            @Override public void beforeCheckpointBegin(Context ctx) {
+                //No-op.
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onMarkCheckpointBegin(Context ctx) {
+                // No-op.
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onCheckpointBegin(Context ctx) {
+                SnapshotFutureTask task = cctx1.snapshotMgr().lastScheduledRemoteSnapshotTask(grid(0).localNode().id());
+
+                // Skip first remote snapshot creation due to it will be cancelled.
+                if (task == null || cancelLatch.getCount() > 0)
+                    return;
+
+                Map<Integer, Set<Integer>> processed = GridTestUtils.getFieldValue(task,
+                    SnapshotFutureTask.class,
+                    "processed");
+
+                if (!processed.isEmpty()) {
+                    assert rmtPartCRCs.isEmpty();
+
+                    // Calculate actual partition CRCs when the checkpoint will be finished on this node.
+                    ctx.finishedStateFut().listen(f -> {
+                        File cacheWorkDir = ((FilePageStoreManager)grid(1).context().cache().context().pageStore())
+                            .cacheWorkDir(dfltCacheCfg);
+
+                        rmtPartCRCs.putAll(calculateCRC32Partitions(cacheWorkDir));
+                    });
+                }
+            }
+        });
+
+        IgniteSnapshotManager mgr0 = snp(ig0);
+
+        UUID rmtNodeId = grid(1).localNode().id();
+        Map<String, Integer> snpPartCRCs = new HashMap<>();
+
+        Map<Integer, Set<Integer>> parts = new HashMap<>();
+        parts.put(CU.cacheId(DEFAULT_CACHE_NAME), null);
+
+        IgniteInternalFuture<?> loadFut = GridTestUtils.runMultiThreadedAsync(() -> {
+            while (!Thread.currentThread().isInterrupted())
+                ig0.cache(DEFAULT_CACHE_NAME).put(cntr.incrementAndGet(), cntr.incrementAndGet());
+        }, 5, "cache-loader-");
+
+        try {
+            // Snapshot must be taken on node1 and transmitted to node0.
+            IgniteInternalFuture<?> fut = mgr0.requestRemoteSnapshot(rmtNodeId,
+                parts,
+                new BiConsumer<File, GroupPartitionId>() {
+                    @Override public void accept(File file, GroupPartitionId gprPartId) {
+                        log.info("Snapshot partition received successfully [rmtNodeId=" + rmtNodeId +
+                            ", part=" + file.getAbsolutePath() + ", gprPartId=" + gprPartId + ']');
+
+                        cancelLatch.countDown();
+                    }
+                });
+
+            cancelLatch.await();
+
+            fut.cancel();
+
+            IgniteInternalFuture<?> fut2 = mgr0.requestRemoteSnapshot(rmtNodeId,
+                parts,
+                (part, pair) -> {
+                    try {
+                        snpPartCRCs.put(part.getName(), FastCrc.calcCrc(part));
+                    }
+                    catch (IOException e) {
+                        throw new IgniteException(e);
+                    }
+                });
+
+            fut2.get();
+        }
+        finally {
+            loadFut.cancel();
+        }
+
+        assertEquals("Partitions from remote node must have the same CRCs as those which have been received",
+            rmtPartCRCs, snpPartCRCs);
+    }
+
+    /** @throws Exception If fails. */
+    @Test
+    public void testSnapshotRemoteOnBothNodes() throws Exception {
+        IgniteEx ig0 = startGrids(2);
+
+        ig0.cluster().state(ClusterState.ACTIVE);
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            ig0.cache(DEFAULT_CACHE_NAME).put(i, i);
+
+        forceCheckpoint(ig0);
+
+        IgniteSnapshotManager mgr0 = snp(ig0);
+        IgniteSnapshotManager mgr1 = snp(grid(1));
+
+        UUID node0 = grid(0).localNode().id();
+        UUID node1 = grid(1).localNode().id();
+
+        Map<Integer, Set<Integer>> fromNode1 = owningParts(ig0,
+            new HashSet<>(Collections.singletonList(CU.cacheId(DEFAULT_CACHE_NAME))),
+            node1);
+
+        Map<Integer, Set<Integer>> fromNode0 = owningParts(grid(1),
+            new HashSet<>(Collections.singletonList(CU.cacheId(DEFAULT_CACHE_NAME))),
+            node0);
+
+        // Snapshot must be taken on node1 and transmitted to node0.
+        IgniteInternalFuture<?> futFrom1To0 = mgr0.requestRemoteSnapshot(node1, fromNode1,
+            (part, pair) -> assertTrue("Received partition has not been requested", fromNode1.get(pair.getGroupId())
+                    .remove(pair.getPartitionId())));
+        IgniteInternalFuture<?> futFrom0To1 = mgr1.requestRemoteSnapshot(node0, fromNode0,
+            (part, pair) -> assertTrue("Received partition has not been requested", fromNode0.get(pair.getGroupId())
+                .remove(pair.getPartitionId())));
+
+        futFrom0To1.get();
+        futFrom1To0.get();
+
+        assertTrue("Not all of partitions have been received: " + fromNode1,
+            fromNode1.get(CU.cacheId(DEFAULT_CACHE_NAME)).isEmpty());
+        assertTrue("Not all of partitions have been received: " + fromNode0,
+            fromNode0.get(CU.cacheId(DEFAULT_CACHE_NAME)).isEmpty());
+    }
+
+    /** @throws Exception If fails. */
+    @Test(expected = ClusterTopologyCheckedException.class)
+    public void testRemoteSnapshotRequestedNodeLeft() throws Exception {
+        IgniteEx ig0 = startGridWithCache(dfltCacheCfg, CACHE_KEYS_RANGE);
+        IgniteEx ig1 = startGrid(1);
+
+        ig0.cluster().setBaselineTopology(ig0.cluster().forServers().nodes());
+
+        awaitPartitionMapExchange();
+
+        CountDownLatch hold = new CountDownLatch(1);
+
+        ((GridCacheDatabaseSharedManager)ig1.context().cache().context().database())
+            .addCheckpointListener(new DbCheckpointListener() {
+                /** {@inheritDoc} */
+                @Override public void beforeCheckpointBegin(Context ctx) throws IgniteCheckedException {
+                    // Listener will be executed inside the checkpoint thead.
+                    U.await(hold);
+                }
+
+                /** {@inheritDoc} */
+                @Override public void onMarkCheckpointBegin(Context ctx) {
+                    // No-op.
+                }
+
+                /** {@inheritDoc} */
+                @Override public void onCheckpointBegin(Context ctx) {
+                    // No-op.
+                }
+            });
+
+        UUID rmtNodeId = ig1.localNode().id();
+
+        Map<Integer, Set<Integer>> parts = new HashMap<>();
+        parts.put(CU.cacheId(DEFAULT_CACHE_NAME), null);
+
+        snp(ig0).requestRemoteSnapshot(rmtNodeId, parts, (part, grp) -> {});
+
+        IgniteInternalFuture<?>[] futs = new IgniteInternalFuture[1];
+
+        assertTrue(GridTestUtils.waitForCondition(new GridAbsPredicate() {
+            @Override public boolean apply() {
+                IgniteInternalFuture<Boolean> snpFut = snp(ig1)
+                    .lastScheduledRemoteSnapshotTask(ig0.localNode().id());
+
+                if (snpFut == null)
+                    return false;
+                else
+                    futs[0] = snpFut;
+
+                return true;
+            }
+        }, 5_000L));
+
+        stopGrid(0);
+
+        hold.countDown();
+
+        futs[0].get();
+    }
+
+    /**
+     * <pre>
+     * 1. Start 2 nodes.
+     * 2. Request snapshot from 2-nd node
+     * 3. Block snapshot-request message.
+     * 4. Start 3-rd node and change BLT.
+     * 5. Stop 3-rd node and change BLT.
+     * 6. 2-nd node now have MOVING partitions to be preloaded.
+     * 7. Release snapshot-request message.
+     * 8. Should get an error of snapshot creation since MOVING partitions cannot be snapshot.
+     * </pre>
+     *
+     * @throws Exception If fails.
+     */
+    @Test(expected = IgniteCheckedException.class)
+    public void testRemoteOutdatedSnapshot() throws Exception {
+        IgniteEx ig0 = startGrids(2);
+
+        ig0.cluster().state(ClusterState.ACTIVE);
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            ig0.cache(DEFAULT_CACHE_NAME).put(i, i);
+
+        awaitPartitionMapExchange();
+
+        forceCheckpoint();
+
+        TestRecordingCommunicationSpi.spi(ig0)
+            .blockMessages((node, msg) -> msg instanceof SnapshotRequestMessage);
+
+        UUID rmtNodeId = grid(1).localNode().id();
+
+        IgniteSnapshotManager mgr0 = snp(ig0);
+
+        // Snapshot must be taken on node1 and transmitted to node0.
+        IgniteInternalFuture<?> snpFut = mgr0.requestRemoteSnapshot(rmtNodeId,
+            owningParts(ig0, new HashSet<>(Collections.singletonList(CU.cacheId(DEFAULT_CACHE_NAME))), rmtNodeId),
+            (part, grp) -> {});
+
+        TestRecordingCommunicationSpi.spi(ig0)
+            .waitForBlocked();
+
+        startGrid(2);
+
+        ig0.cluster().setBaselineTopology(ig0.cluster().forServers().nodes());
+
+        awaitPartitionMapExchange();
+
+        stopGrid(2);
+
+        TestRecordingCommunicationSpi.spi(grid(1))
+            .blockMessages((node, msg) ->  msg instanceof GridDhtPartitionDemandMessage);
+
+        ig0.cluster().setBaselineTopology(ig0.cluster().forServers().nodes());
+
+        TestRecordingCommunicationSpi.spi(ig0)
+            .stopBlock(true, obj -> obj.get2().message() instanceof SnapshotRequestMessage);
+
+        snpFut.get();
+    }
+
+    /** @throws Exception If fails. */
+    @Test(expected = IgniteCheckedException.class)
+    public void testLocalSnapshotOnCacheStopped() throws Exception {
+        IgniteEx ig = startGridWithCache(dfltCacheCfg, CACHE_KEYS_RANGE);
+
+        startGrid(1);
+
+        ig.cluster().state(ClusterState.ACTIVE);
+
+        awaitPartitionMapExchange();
+
+        GridCacheSharedContext<?, ?> cctx0 = ig.context().cache().context();
+        IgniteSnapshotManager mgr = snp(ig);
+
+        CountDownLatch cpLatch = new CountDownLatch(1);
+
+        IgniteInternalFuture<?> snpFut = startLocalSnapshotTask(cctx0,
+            SNAPSHOT_NAME,
+            F.asMap(CU.cacheId(DEFAULT_CACHE_NAME), null),
+            new DelegateSnapshotSender(log, mgr.snapshotExecutorService(), mgr.localSnapshotSenderFactory().apply(SNAPSHOT_NAME)) {
+                @Override public void sendPart0(File part, String cacheDirName, GroupPartitionId pair, Long length) {
+                    try {
+                        U.await(cpLatch);
+
+                            delegate.sendPart0(part, cacheDirName, pair, length);
 
 Review comment:
   Wrong indent

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r408829257
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManager.java
 ##########
 @@ -0,0 +1,1986 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.channels.FileChannel;
+import java.nio.file.FileVisitResult;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.SimpleFileVisitor;
+import java.nio.file.attribute.BasicFileAttributes;
+import java.util.ArrayDeque;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Queue;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.Executor;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.concurrent.atomic.LongAdder;
+import java.util.function.BiConsumer;
+import java.util.function.BiFunction;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.events.DiscoveryEvent;
+import org.apache.ignite.failure.FailureContext;
+import org.apache.ignite.failure.FailureType;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.GridTopic;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.events.DiscoveryCustomEvent;
+import org.apache.ignite.internal.managers.communication.GridIoManager;
+import org.apache.ignite.internal.managers.communication.GridMessageListener;
+import org.apache.ignite.internal.managers.communication.TransmissionCancelledException;
+import org.apache.ignite.internal.managers.communication.TransmissionHandler;
+import org.apache.ignite.internal.managers.communication.TransmissionMeta;
+import org.apache.ignite.internal.managers.communication.TransmissionPolicy;
+import org.apache.ignite.internal.managers.discovery.DiscoCache;
+import org.apache.ignite.internal.managers.discovery.DiscoveryCustomMessage;
+import org.apache.ignite.internal.managers.discovery.GridDiscoveryManager;
+import org.apache.ignite.internal.managers.eventstorage.DiscoveryEventListener;
+import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.CacheType;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.PartitionsExchangeAware;
+import org.apache.ignite.internal.processors.cache.persistence.StorageException;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.filename.PdsFolderSettings;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.MetastorageLifecycleListener;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadOnlyMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadWriteMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.tree.io.PageIO;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.processors.marshaller.MappedName;
+import org.apache.ignite.internal.processors.metric.MetricRegistry;
+import org.apache.ignite.internal.processors.metric.impl.LongAdderMetric;
+import org.apache.ignite.internal.util.GridBusyLock;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.distributed.InitMessage;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.tostring.GridToStringInclude;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.A;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.S;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.lang.IgniteUuid;
+import org.apache.ignite.thread.IgniteThreadPoolExecutor;
+import org.apache.ignite.thread.OomExceptionHandler;
+import org.jetbrains.annotations.Nullable;
+
+import static java.nio.file.StandardOpenOption.READ;
+import static org.apache.ignite.cluster.ClusterState.active;
+import static org.apache.ignite.configuration.IgniteConfiguration.DFLT_SNAPSHOT_DIRECTORY;
+import static org.apache.ignite.events.EventType.EVT_NODE_FAILED;
+import static org.apache.ignite.events.EventType.EVT_NODE_LEFT;
+import static org.apache.ignite.internal.IgniteFeatures.PERSISTENCE_CACHE_SNAPSHOT;
+import static org.apache.ignite.internal.IgniteFeatures.nodeSupports;
+import static org.apache.ignite.internal.MarshallerContextImpl.mappingFileStoreWorkDir;
+import static org.apache.ignite.internal.MarshallerContextImpl.saveMappings;
+import static org.apache.ignite.internal.events.DiscoveryCustomEvent.EVT_DISCOVERY_CUSTOM_EVT;
+import static org.apache.ignite.internal.managers.communication.GridIoPolicy.SYSTEM_POOL;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.INDEX_PARTITION;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.MAX_PARTITION_ID;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.resolveBinaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.INDEX_FILE_NAME;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.PART_FILE_TEMPLATE;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFile;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFileName;
+import static org.apache.ignite.internal.processors.cache.persistence.filename.PdsConsistentIdProcessor.DB_DEFAULT_FOLDER;
+import static org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId.getFlagByPartId;
+import static org.apache.ignite.internal.util.IgniteUtils.isLocalNodeCoordinator;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.END_SNAPSHOT;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.START_SNAPSHOT;
+
+/**
+ * Internal implementation of snapshot operations over persistence caches.
+ * <p>
+ * There are two major actions available:
+ * <ul>
+ *     <li>Create snapshot of the whole cluster cache groups by triggering PME to achieve consistency.</li>
+ *     <li>Create local snapshot of requested cache groups and send it to the node which request this operation.
+ *     Cache groups will be transmitted using internal API for transferring files. See {@link TransmissionHandler}.</li>
+ * </ul>
+ */
+public class IgniteSnapshotManager extends GridCacheSharedManagerAdapter
+    implements IgniteSnapshot, PartitionsExchangeAware, MetastorageLifecycleListener {
+    /** File with delta pages suffix. */
+    public static final String DELTA_SUFFIX = ".delta";
+
+    /** File name template consists of delta pages. */
+    public static final String PART_DELTA_TEMPLATE = PART_FILE_TEMPLATE + DELTA_SUFFIX;
+
+    /** File name template for index delta pages. */
+    public static final String INDEX_DELTA_NAME = INDEX_FILE_NAME + DELTA_SUFFIX;
+
+    /** Text Reason for checkpoint to start snapshot operation. */
+    public static final String CP_SNAPSHOT_REASON = "Checkpoint started to enforce snapshot operation: %s";
+
+    /** Name prefix for each remote snapshot operation. */
+    public static final String RMT_SNAPSHOT_PREFIX = "snapshot_";
+
+    /** Default snapshot directory for loading remote snapshots. */
+    public static final String DFLT_SNAPSHOT_TMP_DIR = "snp";
+
+    /** Timeout in millisecond for snapshot operations. */
+    public static final long DFLT_SNAPSHOT_TIMEOUT = 15_000L;
+
+    /** Snapshot in progress error message. */
+    public static final String SNP_IN_PROGRESS_ERR_MSG = "Operation rejected due to the snapshot operation in progress.";
+
+    /** Error message to finalize snapshot tasks. */
+    public static final String SNP_NODE_STOPPING_ERR_MSG = "Snapshot has been cancelled due to the local node " +
+        "is stopping";
+
+    /** Metastorage key to save currently running snapshot. */
+    public static final String SNP_RUNNING_KEY = "snapshot-running";
+
+    /** Snapshot metrics prefix. */
+    public static final String SNAPSHOT_METRICS = "snapshot";
+
+    /** Prefix for snapshot threads. */
+    private static final String SNAPSHOT_RUNNER_THREAD_PREFIX = "snapshot-runner";
+
+    /** Total number of thread to perform local snapshot. */
+    private static final int SNAPSHOT_THREAD_POOL_SIZE = 4;
+
+    /** Default snapshot topic to receive snapshots from remote node. */
+    private static final Object DFLT_INITIAL_SNAPSHOT_TOPIC = GridTopic.TOPIC_SNAPSHOT.topic("rmt_snp");
+
+    /** File transmission parameter of cache group id. */
+    private static final String SNP_GRP_ID_PARAM = "grpId";
+
+    /** File transmission parameter of cache partition id. */
+    private static final String SNP_PART_ID_PARAM = "partId";
+
+    /** File transmission parameter of node-sender directory path with its consistentId (e.g. db/IgniteNode0). */
+    private static final String SNP_DB_NODE_PATH_PARAM = "dbNodePath";
+
+    /** File transmission parameter of a cache directory with is currently sends its partitions. */
+    private static final String SNP_CACHE_DIR_NAME_PARAM = "cacheDirName";
+
+    /** Snapshot parameter name for a file transmission. */
+    private static final String SNP_NAME_PARAM = "snpName";
+
+    /** Total snapshot files count which receiver should expect to receive. */
+    private static final String SNP_PARTITIONS_CNT = "partsCnt";
+
+    /**
+     * Local buffer to perform copy-on-write operations with pages for {@code SnapshotFutureTask.PageStoreSerialWriter}s.
+     * It is important to have only only buffer per thread (instead of creating each buffer per
+     * each {@code SnapshotFutureTask.PageStoreSerialWriter}) this is redundant and can lead to OOM errors. Direct buffer
+     * deallocate only when ByteBuffer is garbage collected, but it can get out of off-heap memory before it.
+     */
+    private final ThreadLocal<ByteBuffer> locBuff;
+
+    /** Map of registered cache snapshot processes and their corresponding contexts. */
+    private final ConcurrentMap<String, SnapshotFutureTask> locSnpTasks = new ConcurrentHashMap<>();
+
+    /** Lock to protect the resources is used. */
+    private final GridBusyLock busyLock = new GridBusyLock();
+
+    /** Requested snapshot from remote node. */
+    private final AtomicReference<RemoteSnapshotFuture> rmtSnpReq = new AtomicReference<>();
+
+    /** Mutex used to order cluster snapshot operation progress. */
+    private final Object snpOpMux = new Object();
+
+    /** Take snapshot operation procedure. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> startSnpProc;
+
+    /** Check previously performed snapshot operation and delete uncompleted files if need. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> endSnpProc;
+
+    /** Resolved persistent data storage settings. */
+    private volatile PdsFolderSettings pdsSettings;
+
+    /** Fully initialized metastorage. */
+    private volatile ReadWriteMetastorage metaStorage;
+
+    /** Local snapshot sender factory. */
+    private Function<String, SnapshotSender> locSndrFactory = localSnapshotSenderFactory();
+
+    /** Main snapshot directory to save created snapshots. */
+    private volatile File locSnpDir;
+
+    /**
+     * Working directory for loaded snapshots from the remote nodes and storing
+     * temporary partition delta-files of locally started snapshot process.
+     */
+    private File tmpWorkDir;
+
+    /** Factory to working with delta as file storage. */
+    private volatile FileIOFactory ioFactory = new RandomAccessFileIOFactory();
+
+    /** Factory to create page store for restore. */
+    private volatile BiFunction<Integer, Boolean, FilePageStoreFactory> storeFactory;
+
+    /** Snapshot thread pool to perform local partition snapshots. */
+    private ExecutorService snpRunner;
+
+    /** System discovery message listener. */
+    private DiscoveryEventListener discoLsnr;
+
+    /** Cluster snapshot operation requested by user. */
+    private ClusterSnapshotFuture clusterSnpFut;
+
+    /** Current snapshot operation on local node. */
+    private volatile SnapshotOperationRequest clusterSnpRq;
+
+    /** {@code true} if recovery process occurred for snapshot. */
+    private volatile boolean recovered;
+
+    /** Last seen cluster snapshot operation. */
+    private volatile ClusterSnapshotFuture lastSeenSnpFut = new ClusterSnapshotFuture();
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public IgniteSnapshotManager(GridKernalContext ctx) {
+        locBuff = ThreadLocal.withInitial(() ->
+            ByteBuffer.allocateDirect(ctx.config().getDataStorageConfiguration().getPageSize())
+                .order(ByteOrder.nativeOrder()));
+
+        startSnpProc = new DistributedProcess<>(ctx, START_SNAPSHOT, this::initLocalSnapshotStartStage,
+            this::processLocalSnapshotStartStageResult);
+
+        endSnpProc = new DistributedProcess<>(ctx, END_SNAPSHOT, this::initLocalSnapshotEndStage,
+            this::processLocalSnapshotEndStageResult);
+    }
+
+    /**
+     * @param snapshotCacheDir Snapshot directory to store files.
+     * @param partId Cache partition identifier.
+     * @return A file representation.
+     */
+    public static File partDeltaFile(File snapshotCacheDir, int partId) {
+        return new File(snapshotCacheDir, partDeltaFileName(partId));
+    }
+
+    /**
+     * @param partId Partition id.
+     * @return File name of delta partition pages.
+     */
+    public static String partDeltaFileName(int partId) {
+        assert partId <= MAX_PARTITION_ID || partId == INDEX_PARTITION;
+
+        return partId == INDEX_PARTITION ? INDEX_DELTA_NAME : String.format(PART_DELTA_TEMPLATE, partId);
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void start0() throws IgniteCheckedException {
+        super.start0();
+
+        GridKernalContext ctx = cctx.kernalContext();
+
+        if (ctx.clientNode())
+            return;
+
+        if (!CU.isPersistenceEnabled(ctx.config()))
+            return;
+
+        snpRunner = new IgniteThreadPoolExecutor(SNAPSHOT_RUNNER_THREAD_PREFIX,
+            cctx.igniteInstanceName(),
+            SNAPSHOT_THREAD_POOL_SIZE,
+            SNAPSHOT_THREAD_POOL_SIZE,
+            IgniteConfiguration.DFLT_THREAD_KEEP_ALIVE_TIME,
+            new LinkedBlockingQueue<>(),
+            SYSTEM_POOL,
+            new OomExceptionHandler(ctx));
+
+        assert cctx.pageStore() instanceof FilePageStoreManager;
+
+        FilePageStoreManager storeMgr = (FilePageStoreManager)cctx.pageStore();
+
+        pdsSettings = cctx.kernalContext().pdsFolderResolver().resolveFolders();
+
+        locSnpDir = resolveSnapshotWorkDirectory(ctx.config());
+        tmpWorkDir = Paths.get(storeMgr.workDir().getAbsolutePath(), DFLT_SNAPSHOT_TMP_DIR).toFile();
+
+        U.ensureDirectory(locSnpDir, "snapshot work directory", log);
+        U.ensureDirectory(tmpWorkDir, "temp directory for snapshot creation", log);
+
+        MetricRegistry mreg = cctx.kernalContext().metric().registry(SNAPSHOT_METRICS);
+
+        mreg.register("LastSnapshotStartTime", () -> lastSeenSnpFut.startTime,
+            "The system time approximated by 10 ms when the last cluster snapshot operation has been started.");
+        mreg.register("LastSnapshotEndTime", () -> lastSeenSnpFut.endTime,
+            "The system time approximated by 10 ms when the last cluster snapshot operation has been finished.");
+        mreg.register("LastSnapshotName", () -> lastSeenSnpFut.name, String.class,
+            "The name of last started cluster snapshot operation.");
+        mreg.register("LastSnapshotErrorMessage",
+            () -> lastSeenSnpFut.error() == null ? null : lastSeenSnpFut.error().getMessage(),
+            String.class,
+            "The error message of last started cluster snapshot operation which fail. This value will be 'null' " +
+                "if last snapshot operation completed successfully.");
+        mreg.register("localSnapshotList", this::getSnapshots, List.class,
+            "The list of all known snapshots currently saved on the local node with respect to " +
+                "configured via IgniteConfiguration a snapshot path.");
+
+        storeFactory = storeMgr::getPageStoreFactory;
+
+        cctx.exchange().registerExchangeAwareComponent(this);
+        ctx.internalSubscriptionProcessor().registerMetastorageListener(this);
+
+        // Receive remote snapshots requests.
+        cctx.gridIO().addMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC, new GridMessageListener() {
+            @Override public void onMessage(UUID nodeId, Object msg, byte plc) {
+                if (!busyLock.enterBusy())
+                    return;
+
+                try {
+                    if (msg instanceof SnapshotRequestMessage) {
+                        SnapshotRequestMessage reqMsg0 = (SnapshotRequestMessage)msg;
+                        String snpName = reqMsg0.snapshotName();
+
+                        synchronized (this) {
+                            SnapshotFutureTask task = lastScheduledRemoteSnapshotTask(nodeId);
+
+                            if (task != null) {
+                                // Task will also be removed from local map due to the listener on future done.
+                                task.cancel();
+
+                                log.info("Snapshot request has been cancelled due to another request received " +
+                                    "[prevSnpResp=" + task + ", msg0=" + reqMsg0 + ']');
+                            }
+                        }
+
+                        SnapshotFutureTask task = registerSnapshotTask(snpName,
+                            nodeId,
+                            reqMsg0.parts(),
+                            remoteSnapshotSender(snpName, nodeId));
+
+                        task.listen(f -> {
+                            if (f.error() == null)
+                                return;
+
+                            U.error(log, "Failed to process request of creating a snapshot " +
+                                "[from=" + nodeId + ", msg=" + reqMsg0 + ']', f.error());
+
+                            try {
+                                cctx.gridIO().sendToCustomTopic(nodeId,
+                                    DFLT_INITIAL_SNAPSHOT_TOPIC,
+                                    new SnapshotResponseMessage(reqMsg0.snapshotName(), f.error().getMessage()),
+                                    SYSTEM_POOL);
+                            }
+                            catch (IgniteCheckedException ex0) {
+                                U.error(log, "Fail to send the response message with processing snapshot request " +
+                                    "error [request=" + reqMsg0 + ", nodeId=" + nodeId + ']', ex0);
+                            }
+                        });
+
+                        task.start();
+                    }
+                    else if (msg instanceof SnapshotResponseMessage) {
+                        SnapshotResponseMessage respMsg0 = (SnapshotResponseMessage)msg;
+
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.snpName.equals(respMsg0.snapshotName())) {
+                            if (log.isInfoEnabled()) {
+                                log.info("A stale snapshot response message has been received. Will be ignored " +
+                                    "[fromNodeId=" + nodeId + ", response=" + respMsg0 + ']');
+                            }
+
+                            return;
+                        }
+
+                        if (respMsg0.errorMessage() != null) {
+                            fut0.onDone(new IgniteCheckedException("Request cancelled. The snapshot operation stopped " +
+                                "on the remote node with an error: " + respMsg0.errorMessage()));
+                        }
+                    }
+                }
+                catch (Throwable e) {
+                    U.error(log, "Processing snapshot request from remote node fails with an error", e);
+
+                    cctx.kernalContext().failure().process(new FailureContext(FailureType.CRITICAL_ERROR, e));
+                }
+                finally {
+                    busyLock.leaveBusy();
+                }
+            }
+        });
+
+        cctx.gridEvents().addDiscoveryEventListener(discoLsnr = (evt, discoCache) -> {
+            if (!busyLock.enterBusy())
+                return;
+
+            try {
+                UUID leftNodeId = evt.eventNode().id();
+
+                if (evt.type() == EVT_DISCOVERY_CUSTOM_EVT) {
+                    DiscoveryCustomEvent evt0 = (DiscoveryCustomEvent)evt;
+
+                    if (evt0.customMessage() instanceof InitMessage) {
+                        InitMessage<?> msg = (InitMessage<?>)evt0.customMessage();
+
+                        // This happens when #takeSnapshot() method already invoked and distributed process
+                        // starts its action.
+                        if (msg.type() == START_SNAPSHOT.ordinal()) {
+                            assert clusterSnpRq != null ||
+                                !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()) : evt;
+
+                            DiscoveryCustomEvent customEvt = new DiscoveryCustomEvent();
+
+                            customEvt.node(evt0.node());
+                            customEvt.eventNode(evt0.eventNode());
+                            customEvt.affinityTopologyVersion(evt0.affinityTopologyVersion());
+                            customEvt.customMessage(new SnapshotStartDiscoveryMessage(discoCache, msg.processId()));
+
+                            // Handle new event inside discovery thread, so no guarantees will be violated.
+                            cctx.exchange().onDiscoveryEvent(customEvt, discoCache);
+                        }
+                    }
+                }
+                else if (evt.type() == EVT_NODE_LEFT || evt.type() == EVT_NODE_FAILED) {
+                    SnapshotOperationRequest snpRq = clusterSnpRq;
+
+                    for (SnapshotFutureTask sctx : locSnpTasks.values()) {
+                        if (sctx.sourceNodeId().equals(leftNodeId) ||
+                            (snpRq != null &&
+                                snpRq.snpName.equals(sctx.snapshotName()) &&
+                                snpRq.bltNodes.contains(leftNodeId))) {
+                            sctx.acceptException(new ClusterTopologyCheckedException("The node which requested snapshot " +
+                                "creation has left the grid"));
+                        }
+                    }
+
+                    RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                    if (snpTrFut != null && snpTrFut.rmtNodeId.equals(leftNodeId)) {
+                        snpTrFut.onDone(new ClusterTopologyCheckedException("The node from which a snapshot has been " +
+                            "requested left the grid"));
+                    }
+                }
+            }
+            finally {
+                busyLock.leaveBusy();
+            }
+        }, EVT_NODE_LEFT, EVT_NODE_FAILED, EVT_DISCOVERY_CUSTOM_EVT);
+
+        // Remote snapshot handler.
+        cctx.kernalContext().io().addTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC, new TransmissionHandler() {
+            @Override public void onEnd(UUID nodeId) {
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                assert snpTrFut.stores.isEmpty() : snpTrFut.stores.entrySet();
+                assert snpTrFut.partsLeft == 0 : snpTrFut;
+
+                snpTrFut.onDone();
+
+                log.info("Requested snapshot from remote node has been fully received " +
+                    "[snpName=" + snpTrFut.snpName + ", snpTrans=" + snpTrFut + ']');
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onException(UUID nodeId, Throwable err) {
+                RemoteSnapshotFuture fut = rmtSnpReq.get();
+
+                if (fut == null)
+                    return;
+
+                if (fut.rmtNodeId.equals(nodeId))
+                    fut.onDone(err);
+            }
+
+            /** {@inheritDoc} */
+            @Override public String filePath(UUID nodeId, TransmissionMeta fileMeta) {
+                Integer partId = (Integer)fileMeta.params().get(SNP_PART_ID_PARAM);
+                String rmtDbNodePath = (String)fileMeta.params().get(SNP_DB_NODE_PATH_PARAM);
+                String cacheDirName = (String)fileMeta.params().get(SNP_CACHE_DIR_NAME_PARAM);
+
+                RemoteSnapshotFuture transFut = resolve(nodeId, fileMeta);
+
+                try {
+                    File cacheDir = U.resolveWorkDirectory(tmpWorkDir.getAbsolutePath(),
+                        Paths.get(transFut.snpName, rmtDbNodePath, cacheDirName).toString(),
+                        false);
+
+                    return new File(cacheDir, getPartitionFileName(partId)).getAbsolutePath();
+                }
+                catch (IgniteCheckedException e) {
+                    throw new IgniteException(e);
+                }
+            }
+
+            /**
+             * @param nodeId Remote node id.
+             * @param meta Transmission meta.
+             * @return Resolved transmission future.
+             */
+            private RemoteSnapshotFuture resolve(UUID nodeId, TransmissionMeta meta) {
+                String snpName = (String)meta.params().get(SNP_NAME_PARAM);
+                Integer partsCnt = (Integer)meta.params().get(SNP_PARTITIONS_CNT);
+
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                if (snpTrFut == null || !snpTrFut.snpName.equals(snpName)) {
+                    throw new TransmissionCancelledException("Stale snapshot transmission will be ignored " +
+                        "[snpName=" + snpName + ", meta=" + meta + ", snpTrFut=" + snpTrFut + ']');
+                }
+
+                assert snpTrFut.snpName.equals(snpName) && snpTrFut.rmtNodeId.equals(nodeId) :
+                    "Another transmission in progress [snpTrFut=" + snpTrFut + ", nodeId=" + snpName + ']';
+
+                if (snpTrFut.partsLeft == -1)
+                    snpTrFut.partsLeft = partsCnt;
+
+                return snpTrFut;
+            }
+
+            /**
+             * @param snpTrans Current snapshot transmission.
+             * @param grpPartId Pair of group id and its partition id.
+             */
+            private void finishRecover(RemoteSnapshotFuture snpTrans, GroupPartitionId grpPartId) {
+                FilePageStore pageStore = null;
+
+                try {
+                    pageStore = snpTrans.stores.remove(grpPartId);
+
+                    pageStore.finishRecover();
+
+                    snpTrans.partConsumer.accept(new File(pageStore.getFileAbsolutePath()), grpPartId);
+
+                    snpTrans.partsLeft--;
+                }
+                catch (StorageException e) {
+                    throw new IgniteException(e);
+                }
+                finally {
+                    U.closeQuiet(pageStore);
+                }
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<ByteBuffer> chunkHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+
+                RemoteSnapshotFuture snpTrFut = resolve(nodeId, initMeta);
+
+                GroupPartitionId grpPartId = new GroupPartitionId(grpId, partId);
+                FilePageStore pageStore = snpTrFut.stores.get(grpPartId);
+
+                if (pageStore == null) {
+                    throw new IgniteException("Partition must be loaded before applying snapshot delta pages " +
+                        "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                pageStore.beginRecover();
+
+                // No snapshot delta pages received. Finalize recovery.
+                if (initMeta.count() == 0)
+                    finishRecover(snpTrFut, grpPartId);
+
+                return new Consumer<ByteBuffer>() {
+                    final LongAdder transferred = new LongAdder();
+
+                    @Override public void accept(ByteBuffer buff) {
+                        try {
+                            assert initMeta.count() != 0 : initMeta;
+
+                            RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                            if (fut0 == null || !fut0.equals(snpTrFut) || fut0.isCancelled()) {
+                                throw new TransmissionCancelledException("Snapshot request is cancelled " +
+                                    "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                            }
+
+                            pageStore.write(PageIO.getPageId(buff), buff, 0, false);
+
+                            transferred.add(buff.capacity());
+
+                            if (transferred.longValue() == initMeta.count())
+                                finishRecover(snpTrFut, grpPartId);
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                    }
+                };
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<File> fileHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+                String snpName = (String)initMeta.params().get(SNP_NAME_PARAM);
+
+                assert grpId != null;
+                assert partId != null;
+                assert snpName != null;
+                assert storeFactory != null;
+
+                RemoteSnapshotFuture transFut = rmtSnpReq.get();
+
+                if (transFut == null) {
+                    throw new IgniteException("Snapshot transmission with given name doesn't exists " +
+                        "[snpName=" + snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                return new Consumer<File>() {
+                    @Override public void accept(File file) {
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.equals(transFut) || fut0.isCancelled()) {
+                            throw new TransmissionCancelledException("Snapshot request is cancelled [snpName=" + snpName +
+                                ", grpId=" + grpId + ", partId=" + partId + ']');
+                        }
+
+                        busyLock.enterBusy();
+
+                        try {
+                            FilePageStore pageStore = (FilePageStore)storeFactory
+                                .apply(grpId, false)
+                                .createPageStore(getFlagByPartId(partId),
+                                    file::toPath,
+                                    new LongAdderMetric("NO_OP", null));
+
+                            transFut.stores.put(new GroupPartitionId(grpId, partId), pageStore);
+
+                            pageStore.init();
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                        finally {
+                            busyLock.leaveBusy();
+                        }
+                    }
+                };
+            }
+        });
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void stop0(boolean cancel) {
+        busyLock.block();
+
+        try {
+            // Try stop all snapshot processing if not yet.
+            for (SnapshotFutureTask sctx : locSnpTasks.values())
+                sctx.acceptException(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+            locSnpTasks.clear();
+
+            RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+            if (snpTrFut != null)
+                snpTrFut.cancel();
+
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null) {
+                    clusterSnpFut.onDone(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+                    clusterSnpFut = null;
+                }
+            }
+
+            if (snpRunner != null)
+                snpRunner.shutdownNow();
+
+            cctx.kernalContext().io().removeMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC);
+            cctx.kernalContext().io().removeTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC);
+
+            if (discoLsnr != null)
+                cctx.kernalContext().event().removeDiscoveryEventListener(discoLsnr);
+
+            cctx.exchange().unregisterExchangeAwareComponent(this);
+        }
+        finally {
+            busyLock.unblock();
+        }
+    }
+
+    /**
+     * @param snpDir Snapshot dire
+     * @param folderName Local node folder name (see {@link U#maskForFileName} with consistent id).
+     */
+    public static void deleteSnapshot(File snpDir, String folderName) {
+        if (!snpDir.exists())
+            return;
+
+        assert snpDir.isDirectory() : snpDir;
+
+        try {
+            File binDir = resolveBinaryWorkDir(snpDir.getAbsolutePath(), folderName);
+            File dbDir = U.resolveWorkDirectory(snpDir.getAbsolutePath(), databaseRelativePath(folderName), false);
+
+            U.delete(binDir);
+            U.delete(dbDir);
+
+            File marshDir = mappingFileStoreWorkDir(snpDir.getAbsolutePath());
+
+            // Concurrently traverse the snapshot marshaller directory and delete all files.
+            Files.walkFileTree(marshDir.toPath(), new SimpleFileVisitor<Path>() {
+                @Override public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException {
+                    U.delete(file);
+
+                    return FileVisitResult.CONTINUE;
+                }
+
+                @Override public FileVisitResult visitFileFailed(Path file, IOException exc) {
+                    // Skip files which can be concurrently removed from FileTree.
+                    return FileVisitResult.CONTINUE;
+                }
+            });
+
+            File db = new File(snpDir, DB_DEFAULT_FOLDER);
+
+            if (!db.exists() || db.list().length == 0)
+                U.delete(snpDir);
+        }
+        catch (IOException | IgniteCheckedException e) {
+            throw new IgniteException(e);
+        }
+    }
+
+    /**
+     * @param snpName Snapshot name.
+     * @return Local snapshot directory for snapshot with given name.
+     */
+    public File snapshotLocalDir(String snpName) {
+        assert locSnpDir != null;
+
+        return new File(locSnpDir, snpName);
+    }
+
+    /**
+     * @return Node snapshot working directory.
+     */
+    public File snapshotTmpDir() {
+        assert tmpWorkDir != null;
+
+        return tmpWorkDir;
+    }
+
+    /**
+     * @param req Request on snapshot creation.
+     * @return Future which will be completed when a snapshot has been started.
+     */
+    private IgniteInternalFuture<SnapshotOperationResponse> initLocalSnapshotStartStage(SnapshotOperationRequest req) {
+        if (cctx.kernalContext().clientNode() ||
+            !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()))
+            return new GridFinishedFuture<>();
+
+        // Executed inside discovery notifier thread, prior to firing discovery custom event,
+        // so it is safe to set new snapshot task inside this method without synchronization.
+        if (clusterSnpRq != null) {
+            return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot operation has been rejected. " +
+                "Another snapshot operation in progress [req=" + req + ", curr=" + clusterSnpRq + ']'));
+        }
+
+        // Collection of pairs group and appropriate cache partition to be snapshot.
+        Map<Integer, Set<Integer>> parts = new HashMap<>();
+
+        for (Integer grpId : req.grpIds)
+            parts.put(grpId, null);
+
+        SnapshotFutureTask task0 = registerSnapshotTask(req.snpName,
+            req.srcNodeId,
+            parts,
+            locSndrFactory.apply(req.snpName));
+
+        clusterSnpRq = req;
+
+        return task0.chain(f -> new SnapshotOperationResponse());
+    }
+
+    /**
+     * @param id Request id.
+     * @param res Results.
+     * @param err Errors.
+     */
+    private void processLocalSnapshotStartStageResult(UUID id, Map<UUID, SnapshotOperationResponse> res, Map<UUID, Exception> err) {
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        if (snpRq == null)
+            return;
+
+        if (!snpRq.rqId.equals(id)) {
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null && clusterSnpFut.rqId.equals(id)) {
+                    clusterSnpFut.onDone(new IgniteCheckedException("Snapshot operation failed due to another snapshot " +
+                        "operation in progress: " + snpRq.snpName));
+
+                    clusterSnpFut = null;
+                }
+
+                return;
+            }
+        }
+
+        if (isLocalNodeCoordinator(cctx.discovery())) {
+            Set<UUID> missed = new HashSet<>(snpRq.bltNodes);
+            missed.removeAll(res.keySet());
+            missed.removeAll(err.keySet());
+
+            snpRq.hasErr = !F.isEmpty(err) || !missed.isEmpty();
+
+            if (snpRq.hasErr) {
+                U.warn(log, "Execution of local snapshot tasks fails or them haven't been executed " +
+                    "due to some of nodes left the cluster. Uncompleted snapshot will be deleted " +
+                    "[err=" + err + ", missed=" + missed + ']');
+            }
+
+            endSnpProc.start(UUID.randomUUID(), snpRq);
+        }
+    }
+
+    /**
+     * @param req Request on snapshot creation.
+     * @return Future which will be completed when the snapshot will be finalized.
+     */
+    private IgniteInternalFuture<SnapshotOperationResponse> initLocalSnapshotEndStage(SnapshotOperationRequest req) {
+        if (clusterSnpRq == null)
+            return new GridFinishedFuture<>(new SnapshotOperationResponse());
+
+        try {
+            if (req.hasErr)
+                deleteSnapshot(snapshotLocalDir(req.snpName), pdsSettings.folderName());
+
+            removeLastMetaStorageKey();
+        }
+        catch (Exception e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        return new GridFinishedFuture<>(new SnapshotOperationResponse());
+    }
+
+    /**
+     * @param id Request id.
+     * @param res Results.
+     * @param err Errors.
+     */
+    private void processLocalSnapshotEndStageResult(UUID id, Map<UUID, SnapshotOperationResponse> res, Map<UUID, Exception> err) {
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        if (snpRq == null)
+            return;
+
+        Set<UUID> endFail = new HashSet<>(snpRq.bltNodes);
+        endFail.removeAll(res.keySet());
+
+        clusterSnpRq = null;
+
+        synchronized (snpOpMux) {
+            if (clusterSnpFut != null) {
+                if (endFail.isEmpty() && !snpRq.hasErr) {
+                    clusterSnpFut.onDone();
+
+                    if (log.isInfoEnabled())
+                        log.info("Cluster-wide snapshot operation finished successfully [req=" + snpRq + ']');
+                }
+                else {
+                    clusterSnpFut.onDone(new IgniteCheckedException("Snapshot creation has been finished with an error. " +
+                        "Local snapshot tasks may not finished completely or finalizing results fails " +
+                        "[hasErr" + snpRq.hasErr + ", fail=" + endFail + ']'));
+                }
+
+                clusterSnpFut = null;
+            }
+        }
+    }
+
+    /**
+     * @return {@code True} if snapshot operation is in progress.
+     */
+    public boolean isSnapshotCreating() {
+        if (clusterSnpRq != null)
+            return true;
+
+        synchronized (snpOpMux) {
+            return clusterSnpRq != null || clusterSnpFut != null;
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public List<String> getSnapshots() {
+        if (cctx.kernalContext().clientNode())
+            throw new UnsupportedOperationException("Client and daemon nodes can not perform this operation.");
+
+        synchronized (snpOpMux) {
+            return Arrays.stream(locSnpDir.listFiles(File::isDirectory))
+                .map(File::getName)
+                .collect(Collectors.toList());
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public IgniteFuture<Void> createSnapshot(String name) {
+        A.notNullOrEmpty(name, "name");
+
+        try {
+            if (cctx.kernalContext().clientNode())
+                throw new UnsupportedOperationException("Client and daemon nodes can not perform this operation.");
+
+            if (!IgniteFeatures.allNodesSupports(cctx.discovery().allNodes(), PERSISTENCE_CACHE_SNAPSHOT))
+                throw new IgniteException("Not all nodes in the cluster support a snapshot operation.");
+
+            if (!active(cctx.kernalContext().state().clusterState().state()))
+                throw new IgniteException("Snapshot operation has been rejected. The cluster is inactive.");
+
+            DiscoveryDataClusterState clusterState = cctx.kernalContext().state().clusterState();
+
+            if (!clusterState.hasBaselineTopology())
+                throw new IgniteException("Snapshot operation has been rejected. The baseline topology is not configured for cluster.");
+
+            ClusterSnapshotFuture snpFut0;
+
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null && !clusterSnpFut.isDone())
+                    throw new IgniteException("Create snapshot request has been rejected. The previous snapshot operation was not completed.");
+
+                if (clusterSnpRq != null)
+                    throw new IgniteException("Create snapshot request has been rejected. Parallel snapshot processes are not allowed.");
+
+                if (getSnapshots().contains(name))
+                    throw new IgniteException("Create snapshot request has been rejected. Snapshot with given name already exists.");
+
+                snpFut0 = new ClusterSnapshotFuture(UUID.randomUUID(), name);
+
+                clusterSnpFut = snpFut0;
+                lastSeenSnpFut = snpFut0;
+            }
+
+            List<Integer> grps = cctx.cache().persistentGroups().stream()
+                .filter(g -> cctx.cache().cacheType(g.cacheOrGroupName()) == CacheType.USER)
+                .filter(g -> !g.config().isEncryptionEnabled())
+                .map(CacheGroupDescriptor::groupId)
+                .collect(Collectors.toList());
+
+            List<ClusterNode> srvNodes = cctx.discovery().serverNodes(AffinityTopologyVersion.NONE);
+
+            startSnpProc.start(snpFut0.rqId, new SnapshotOperationRequest(snpFut0.rqId,
+                cctx.localNodeId(),
+                name,
+                grps,
+                new HashSet<>(F.viewReadOnly(srvNodes,
+                    F.node2id(),
+                    (node) -> CU.baselineNode(node, clusterState)))));
+
+            if (log.isInfoEnabled())
+                log.info("Cluster-wide snapshot operation started [snpName=" + name + ", grps=" + grps + ']');
+
+            return new IgniteFutureImpl<>(snpFut0);
+        }
+        catch (Exception e) {
+            U.error(log, "Start snapshot operation failed", e);
+
+            lastSeenSnpFut = new ClusterSnapshotFuture(name, e);
+
+            return new IgniteFinishedFutureImpl<>(e);
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onReadyForReadWrite(ReadWriteMetastorage metaStorage) throws IgniteCheckedException {
+        synchronized (snpOpMux) {
+            this.metaStorage = metaStorage;
+
+            if (recovered)
+                removeLastMetaStorageKey();
+
+            recovered = false;
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onReadyForRead(ReadOnlyMetastorage metaStorage) throws IgniteCheckedException {
+        // Snapshot which has not been completed due to the local node crashed must be deleted.
+        String snpName = (String)metaStorage.read(SNP_RUNNING_KEY);
+
+        if (snpName == null)
+            return;
+
+        recovered = true;
+
+        for (File tmp : snapshotTmpDir().listFiles())
+            U.delete(tmp);
+
+        deleteSnapshot(snapshotLocalDir(snpName), pdsSettings.folderName());
+
+        if (log.isInfoEnabled()) {
+            log.info("Previous attempt to create snapshot fail due to the local node crash. All resources " +
+                "related to snapshot operation have been deleted: " + snpName);
+        }
+    }
+
+    /**
+     * @param evt Discovery event to check.
+     * @return {@code true} if exchange started by snapshot operation.
+     */
+    public static boolean isSnapshotOperation(DiscoveryEvent evt) {
+        return !evt.eventNode().isClient() &&
+            evt.type() == EVT_DISCOVERY_CUSTOM_EVT &&
+            ((DiscoveryCustomEvent)evt).customMessage() instanceof SnapshotStartDiscoveryMessage;
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onDoneBeforeTopologyUnlock(GridDhtPartitionsExchangeFuture fut) {
+        if (clusterSnpRq == null || cctx.kernalContext().clientNode())
+            return;
+
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        SnapshotFutureTask task = locSnpTasks.get(snpRq.snpName);
+
+        if (task == null)
+            return;
+
+        if (task.start()) {
+            cctx.database().forceCheckpoint(String.format("Start snapshot operation: %s", snpRq.snpName));
+
+            // schedule task on checkpoint and wait when it starts
 
 Review comment:
   Upcase and point

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r408313561
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManager.java
 ##########
 @@ -0,0 +1,1894 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.channels.FileChannel;
+import java.nio.file.FileVisitResult;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.SimpleFileVisitor;
+import java.nio.file.attribute.BasicFileAttributes;
+import java.util.ArrayDeque;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Queue;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.Executor;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.concurrent.atomic.LongAdder;
+import java.util.function.BiConsumer;
+import java.util.function.BiFunction;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.events.DiscoveryEvent;
+import org.apache.ignite.failure.FailureContext;
+import org.apache.ignite.failure.FailureType;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.GridTopic;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.events.DiscoveryCustomEvent;
+import org.apache.ignite.internal.managers.communication.GridIoManager;
+import org.apache.ignite.internal.managers.communication.GridMessageListener;
+import org.apache.ignite.internal.managers.communication.TransmissionCancelledException;
+import org.apache.ignite.internal.managers.communication.TransmissionHandler;
+import org.apache.ignite.internal.managers.communication.TransmissionMeta;
+import org.apache.ignite.internal.managers.communication.TransmissionPolicy;
+import org.apache.ignite.internal.managers.discovery.DiscoCache;
+import org.apache.ignite.internal.managers.discovery.DiscoveryCustomMessage;
+import org.apache.ignite.internal.managers.discovery.GridDiscoveryManager;
+import org.apache.ignite.internal.managers.eventstorage.DiscoveryEventListener;
+import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.CacheType;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.PartitionsExchangeAware;
+import org.apache.ignite.internal.processors.cache.persistence.StorageException;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.filename.PdsFolderSettings;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.MetastorageLifecycleListener;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadOnlyMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadWriteMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.tree.io.PageIO;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.processors.marshaller.MappedName;
+import org.apache.ignite.internal.processors.metric.impl.LongAdderMetric;
+import org.apache.ignite.internal.util.GridBusyLock;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.distributed.InitMessage;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.tostring.GridToStringInclude;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.S;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.lang.IgniteUuid;
+import org.apache.ignite.thread.IgniteThreadPoolExecutor;
+import org.apache.ignite.thread.OomExceptionHandler;
+import org.jetbrains.annotations.Nullable;
+
+import static java.nio.file.StandardOpenOption.READ;
+import static org.apache.ignite.cluster.ClusterState.active;
+import static org.apache.ignite.configuration.IgniteConfiguration.DFLT_SNAPSHOT_DIRECTORY;
+import static org.apache.ignite.events.EventType.EVT_NODE_FAILED;
+import static org.apache.ignite.events.EventType.EVT_NODE_LEFT;
+import static org.apache.ignite.internal.IgniteFeatures.PERSISTENCE_CACHE_SNAPSHOT;
+import static org.apache.ignite.internal.IgniteFeatures.nodeSupports;
+import static org.apache.ignite.internal.MarshallerContextImpl.saveMappings;
+import static org.apache.ignite.internal.events.DiscoveryCustomEvent.EVT_DISCOVERY_CUSTOM_EVT;
+import static org.apache.ignite.internal.managers.communication.GridIoPolicy.SYSTEM_POOL;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.INDEX_PARTITION;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.MAX_PARTITION_ID;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.INDEX_FILE_NAME;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.PART_FILE_TEMPLATE;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFile;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFileName;
+import static org.apache.ignite.internal.processors.cache.persistence.filename.PdsConsistentIdProcessor.DB_DEFAULT_FOLDER;
+import static org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId.getFlagByPartId;
+import static org.apache.ignite.internal.util.IgniteUtils.isLocalNodeCoordinator;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.END_SNAPSHOT;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.START_SNAPSHOT;
+
+/**
+ * Internal implementation of snapshot operations over persistence caches.
+ * <p>
+ * There are two major actions available:
+ * <ul>
+ *     <li>Create snapshot of the whole cluster cache groups by triggering PME to achieve consistency.</li>
+ *     <li>Create local snapshot of requested cache groups and send it to the node which request this operation.
+ *     Cache groups will be transmitted using internal API for transferring files. See {@link TransmissionHandler}.</li>
+ * </ul>
+ */
+public class IgniteSnapshotManager extends GridCacheSharedManagerAdapter
+    implements IgniteSnapshot, PartitionsExchangeAware, MetastorageLifecycleListener {
+    /** File with delta pages suffix. */
+    public static final String DELTA_SUFFIX = ".delta";
+
+    /** File name template consists of delta pages. */
+    public static final String PART_DELTA_TEMPLATE = PART_FILE_TEMPLATE + DELTA_SUFFIX;
+
+    /** File name template for index delta pages. */
+    public static final String INDEX_DELTA_NAME = INDEX_FILE_NAME + DELTA_SUFFIX;
+
+    /** Text Reason for checkpoint to start snapshot operation. */
+    public static final String CP_SNAPSHOT_REASON = "Checkpoint started to enforce snapshot operation: %s";
+
+    /** Name prefix for each remote snapshot operation. */
+    public static final String RMT_SNAPSHOT_PREFIX = "snapshot_";
+
+    /** Default snapshot directory for loading remote snapshots. */
+    public static final String DFLT_SNAPSHOT_TMP_DIR = "snp";
+
+    /** Timeout in millisecond for snapshot operations. */
+    public static final long DFLT_SNAPSHOT_TIMEOUT = 15_000L;
+
+    /** Snapshot in progress error message. */
+    public static final String SNP_IN_PROGRESS_ERR_MSG = "Operation rejected due to the snapshot operation in progress.";
+
+    /** Error message to finalize snapshot tasks. */
+    public static final String SNP_NODE_STOPPING_ERR_MSG = "Snapshot has been cancelled due to the local node " +
+        "is stopping";
+
+    /** Metastorage key to save currently running snapshot. */
+    public static final String SNP_RUNNING_KEY = "snapshot-running";
+
+    /** Prefix for snapshot threads. */
+    private static final String SNAPSHOT_RUNNER_THREAD_PREFIX = "snapshot-runner";
+
+    /** Total number of thread to perform local snapshot. */
+    private static final int SNAPSHOT_THREAD_POOL_SIZE = 4;
+
+    /** Default snapshot topic to receive snapshots from remote node. */
+    private static final Object DFLT_INITIAL_SNAPSHOT_TOPIC = GridTopic.TOPIC_SNAPSHOT.topic("rmt_snp");
+
+    /** File transmission parameter of cache group id. */
+    private static final String SNP_GRP_ID_PARAM = "grpId";
+
+    /** File transmission parameter of cache partition id. */
+    private static final String SNP_PART_ID_PARAM = "partId";
+
+    /** File transmission parameter of node-sender directory path with its consistentId (e.g. db/IgniteNode0). */
+    private static final String SNP_DB_NODE_PATH_PARAM = "dbNodePath";
+
+    /** File transmission parameter of a cache directory with is currently sends its partitions. */
+    private static final String SNP_CACHE_DIR_NAME_PARAM = "cacheDirName";
+
+    /** Snapshot parameter name for a file transmission. */
+    private static final String SNP_NAME_PARAM = "snpName";
+
+    /** Total snapshot files count which receiver should expect to receive. */
+    private static final String SNP_PARTITIONS_CNT = "partsCnt";
+
+    /**
+     * Local buffer to perform copy-on-write operations with pages for {@code SnapshotFutureTask.PageStoreSerialWriter}s.
+     * It is important to have only only buffer per thread (instead of creating each buffer per
+     * each {@code SnapshotFutureTask.PageStoreSerialWriter}) this is redundant and can lead to OOM errors. Direct buffer
+     * deallocate only when ByteBuffer is garbage collected, but it can get out of off-heap memory before it.
+     */
+    private final ThreadLocal<ByteBuffer> locBuff;
+
+    /** Map of registered cache snapshot processes and their corresponding contexts. */
+    private final ConcurrentMap<String, SnapshotFutureTask> locSnpTasks = new ConcurrentHashMap<>();
+
+    /** Lock to protect the resources is used. */
+    private final GridBusyLock busyLock = new GridBusyLock();
+
+    /** Requested snapshot from remote node. */
+    private final AtomicReference<RemoteSnapshotFuture> rmtSnpReq = new AtomicReference<>();
+
+    /** Mutex used to order cluster snapshot operation progress. */
+    private final Object snpOpMux = new Object();
+
+    /** Take snapshot operation procedure. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> startSnpProc;
+
+    /** Check previously preformed snapshot operation and delete uncompleted files if need. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> endSnpProc;
+
+    /** Resolved persistent data storage settings. */
+    private volatile PdsFolderSettings pdsSettings;
+
+    /** Fully initialized metastorage. */
+    private volatile ReadWriteMetastorage metaStorage;
+
+    /** Local snapshot sender factory. */
+    private Function<String, SnapshotSender> locSndrFactory = this::localSnapshotSender;
+
+    /** Main snapshot directory to save created snapshots. */
+    private volatile File locSnpDir;
+
+    /**
+     * Working directory for loaded snapshots from the remote nodes and storing
+     * temporary partition delta-files of locally started snapshot process.
+     */
+    private File tmpWorkDir;
+
+    /** Factory to working with delta as file storage. */
+    private volatile FileIOFactory ioFactory = new RandomAccessFileIOFactory();
+
+    /** Factory to create page store for restore. */
+    private volatile BiFunction<Integer, Boolean, FilePageStoreFactory> storeFactory;
+
+    /** Snapshot thread pool to perform local partition snapshots. */
+    private ExecutorService snpRunner;
+
+    /** System discovery message listener. */
+    private DiscoveryEventListener discoLsnr;
+
+    /** Cluster snapshot operation requested by user. */
+    private GridFutureAdapter<Void> clusterSnpFut;
+
+    /** Current snapshot operation on local node. */
+    private volatile SnapshotOperationRequest clusterSnpRq;
+
+    /** {@code true} if recovery process occurred for snapshot. */
+    private volatile boolean recovered;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public IgniteSnapshotManager(GridKernalContext ctx) {
+        locBuff = ThreadLocal.withInitial(() ->
+            ByteBuffer.allocateDirect(ctx.config().getDataStorageConfiguration().getPageSize())
+                .order(ByteOrder.nativeOrder()));
+
+        startSnpProc = new DistributedProcess<>(ctx, START_SNAPSHOT, this::startLocalSnapshot,
+            this::startLocalSnapshotResult);
+
+        endSnpProc = new DistributedProcess<>(ctx, END_SNAPSHOT, this::endLocalSnapshot,
+            this::endLocalSnapshotResult);
+    }
+
+    /**
+     * @param snapshotCacheDir Snapshot directory to store files.
+     * @param partId Cache partition identifier.
+     * @return A file representation.
+     */
+    public static File partDeltaFile(File snapshotCacheDir, int partId) {
+        return new File(snapshotCacheDir, partDeltaFileName(partId));
+    }
+
+    /**
+     * @param partId Partition id.
+     * @return File name of delta partition pages.
+     */
+    public static String partDeltaFileName(int partId) {
+        assert partId <= MAX_PARTITION_ID || partId == INDEX_PARTITION;
+
+        return partId == INDEX_PARTITION ? INDEX_DELTA_NAME : String.format(PART_DELTA_TEMPLATE, partId);
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void start0() throws IgniteCheckedException {
+        super.start0();
+
+        GridKernalContext ctx = cctx.kernalContext();
+
+        if (ctx.clientNode())
+            return;
+
+        if (!CU.isPersistenceEnabled(ctx.config()))
+            return;
+
+        snpRunner = new IgniteThreadPoolExecutor(SNAPSHOT_RUNNER_THREAD_PREFIX,
+            cctx.igniteInstanceName(),
+            SNAPSHOT_THREAD_POOL_SIZE,
+            SNAPSHOT_THREAD_POOL_SIZE,
+            IgniteConfiguration.DFLT_THREAD_KEEP_ALIVE_TIME,
+            new LinkedBlockingQueue<>(),
+            SYSTEM_POOL,
+            new OomExceptionHandler(ctx));
+
+        assert cctx.pageStore() instanceof FilePageStoreManager;
+
+        FilePageStoreManager storeMgr = (FilePageStoreManager)cctx.pageStore();
+
+        pdsSettings = cctx.kernalContext().pdsFolderResolver().resolveFolders();
+
+        locSnpDir = snapshotPath(ctx.config()).toFile();
+        tmpWorkDir = Paths.get(storeMgr.workDir().getAbsolutePath(), DFLT_SNAPSHOT_TMP_DIR).toFile();
+
+        U.ensureDirectory(locSnpDir, "snapshot work directory", log);
+        U.ensureDirectory(tmpWorkDir, "temp directory for snapshot creation", log);
+
+        storeFactory = storeMgr::getPageStoreFactory;
+
+        cctx.exchange().registerExchangeAwareComponent(this);
+        ctx.internalSubscriptionProcessor().registerMetastorageListener(this);
+
+        // Receive remote snapshots requests.
+        cctx.gridIO().addMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC, new GridMessageListener() {
+            @Override public void onMessage(UUID nodeId, Object msg, byte plc) {
+                if (!busyLock.enterBusy())
+                    return;
+
+                try {
+                    if (msg instanceof SnapshotRequestMessage) {
+                        SnapshotRequestMessage reqMsg0 = (SnapshotRequestMessage)msg;
+                        String snpName = reqMsg0.snapshotName();
+
+                        synchronized (this) {
+                            SnapshotFutureTask task = lastScheduledRemoteSnapshotTask(nodeId);
+
+                            if (task != null) {
+                                // Task will also be removed from local map due to the listener on future done.
+                                task.cancel();
+
+                                log.info("Snapshot request has been cancelled due to another request received " +
+                                    "[prevSnpResp=" + task + ", msg0=" + reqMsg0 + ']');
+                            }
+                        }
+
+                        SnapshotFutureTask task = registerSnapshotTask(snpName,
+                            nodeId,
+                            reqMsg0.parts(),
+                            remoteSnapshotSender(snpName, nodeId));
+
+                        task.listen(f -> {
+                            if (f.error() == null)
+                                return;
+
+                            U.error(log, "Failed to process request of creating a snapshot " +
+                                "[from=" + nodeId + ", msg=" + reqMsg0 + ']', f.error());
+
+                            try {
+                                cctx.gridIO().sendToCustomTopic(nodeId,
+                                    DFLT_INITIAL_SNAPSHOT_TOPIC,
+                                    new SnapshotResponseMessage(reqMsg0.snapshotName(), f.error().getMessage()),
+                                    SYSTEM_POOL);
+                            }
+                            catch (IgniteCheckedException ex0) {
+                                U.error(log, "Fail to send the response message with processing snapshot request " +
+                                    "error [request=" + reqMsg0 + ", nodeId=" + nodeId + ']', ex0);
+                            }
+                        });
+
+                        task.start();
+                    }
+                    else if (msg instanceof SnapshotResponseMessage) {
+                        SnapshotResponseMessage respMsg0 = (SnapshotResponseMessage)msg;
+
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.snpName.equals(respMsg0.snapshotName())) {
+                            if (log.isInfoEnabled()) {
+                                log.info("A stale snapshot response message has been received. Will be ignored " +
+                                    "[fromNodeId=" + nodeId + ", response=" + respMsg0 + ']');
+                            }
+
+                            return;
+                        }
+
+                        if (respMsg0.errorMessage() != null) {
+                            fut0.onDone(new IgniteCheckedException("Request cancelled. The snapshot operation stopped " +
+                                "on the remote node with an error: " + respMsg0.errorMessage()));
+                        }
+                    }
+                }
+                catch (Throwable e) {
+                    U.error(log, "Processing snapshot request from remote node fails with an error", e);
+
+                    cctx.kernalContext().failure().process(new FailureContext(FailureType.CRITICAL_ERROR, e));
+                }
+                finally {
+                    busyLock.leaveBusy();
+                }
+            }
+        });
+
+        cctx.gridEvents().addDiscoveryEventListener(discoLsnr = (evt, discoCache) -> {
+            if (!busyLock.enterBusy())
+                return;
+
+            try {
+                UUID leftNodeId = evt.eventNode().id();
+
+                if (evt.type() == EVT_DISCOVERY_CUSTOM_EVT) {
+                    DiscoveryCustomEvent evt0 = (DiscoveryCustomEvent)evt;
+
+                    if (evt0.customMessage() instanceof InitMessage) {
+                        InitMessage<?> msg = (InitMessage<?>)evt0.customMessage();
+
+                        // This happens when #takeSnapshot() method already invoked and distributed process
+                        // starts its action.
+                        if (msg.type() == START_SNAPSHOT.ordinal()) {
+                            assert clusterSnpRq != null ||
+                                !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()) : evt;
+
+                            DiscoveryCustomEvent customEvt = new DiscoveryCustomEvent();
+
+                            customEvt.node(evt0.node());
+                            customEvt.eventNode(evt0.eventNode());
+                            customEvt.affinityTopologyVersion(evt0.affinityTopologyVersion());
+                            customEvt.customMessage(new SnapshotStartDiscoveryMessage(discoCache, msg.processId()));
+
+                            // Handle new event inside discovery thread, so no guarantees will be violated.
+                            cctx.exchange().onDiscoveryEvent(customEvt, discoCache);
+                        }
+                    }
+                }
+                else if (evt.type() == EVT_NODE_LEFT || evt.type() == EVT_NODE_FAILED) {
+                    SnapshotOperationRequest snpRq = clusterSnpRq;
+
+                    for (SnapshotFutureTask sctx : locSnpTasks.values()) {
+                        if (sctx.sourceNodeId().equals(leftNodeId) ||
+                            (snpRq != null &&
+                                snpRq.snpName.equals(sctx.snapshotName()) &&
+                                snpRq.bltNodes.contains(leftNodeId))) {
+                            sctx.acceptException(new ClusterTopologyCheckedException("The node which requested snapshot " +
+                                "creation has left the grid"));
+                        }
+                    }
+
+                    RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                    if (snpTrFut != null && snpTrFut.rmtNodeId.equals(leftNodeId)) {
+                        snpTrFut.onDone(new ClusterTopologyCheckedException("The node from which a snapshot has been " +
+                            "requested left the grid"));
+                    }
+                }
+            }
+            finally {
+                busyLock.leaveBusy();
+            }
+        }, EVT_NODE_LEFT, EVT_NODE_FAILED, EVT_DISCOVERY_CUSTOM_EVT);
+
+        // Remote snapshot handler.
+        cctx.kernalContext().io().addTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC, new TransmissionHandler() {
+            @Override public void onEnd(UUID nodeId) {
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                assert snpTrFut.stores.isEmpty() : snpTrFut.stores.entrySet();
+                assert snpTrFut.partsLeft == 0 : snpTrFut;
+
+                snpTrFut.onDone();
+
+                log.info("Requested snapshot from remote node has been fully received " +
+                    "[snpName=" + snpTrFut.snpName + ", snpTrans=" + snpTrFut + ']');
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onException(UUID nodeId, Throwable err) {
+                RemoteSnapshotFuture fut = rmtSnpReq.get();
+
+                if (fut == null)
+                    return;
+
+                if (fut.rmtNodeId.equals(nodeId))
+                    fut.onDone(err);
+            }
+
+            /** {@inheritDoc} */
+            @Override public String filePath(UUID nodeId, TransmissionMeta fileMeta) {
+                Integer partId = (Integer)fileMeta.params().get(SNP_PART_ID_PARAM);
+                String rmtDbNodePath = (String)fileMeta.params().get(SNP_DB_NODE_PATH_PARAM);
+                String cacheDirName = (String)fileMeta.params().get(SNP_CACHE_DIR_NAME_PARAM);
+
+                RemoteSnapshotFuture transFut = resolve(nodeId, fileMeta);
+
+                try {
+                    File cacheDir = U.resolveWorkDirectory(tmpWorkDir.getAbsolutePath(),
+                        Paths.get(transFut.snpName, rmtDbNodePath, cacheDirName).toString(),
+                        false);
+
+                    return new File(cacheDir, getPartitionFileName(partId)).getAbsolutePath();
+                }
+                catch (IgniteCheckedException e) {
+                    throw new IgniteException(e);
+                }
+            }
+
+            /**
+             * @param nodeId Remote node id.
+             * @param meta Transmission meta.
+             * @return Resolved transmission future.
+             */
+            private RemoteSnapshotFuture resolve(UUID nodeId, TransmissionMeta meta) {
+                String snpName = (String)meta.params().get(SNP_NAME_PARAM);
+                Integer partsCnt = (Integer)meta.params().get(SNP_PARTITIONS_CNT);
+
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                if (snpTrFut == null || !snpTrFut.snpName.equals(snpName)) {
+                    throw new TransmissionCancelledException("Stale snapshot transmission will be ignored " +
+                        "[snpName=" + snpName + ", meta=" + meta + ", snpTrFut=" + snpTrFut + ']');
+                }
+
+                assert snpTrFut.snpName.equals(snpName) && snpTrFut.rmtNodeId.equals(nodeId) :
+                    "Another transmission in progress [snpTrFut=" + snpTrFut + ", nodeId=" + snpName + ']';
+
+                if (snpTrFut.partsLeft == -1)
+                    snpTrFut.partsLeft = partsCnt;
+
+                return snpTrFut;
+            }
+
+            /**
+             * @param snpTrans Current snapshot transmission.
+             * @param grpPartId Pair of group id and its partition id.
+             */
+            private void finishRecover(RemoteSnapshotFuture snpTrans, GroupPartitionId grpPartId) {
+                FilePageStore pageStore = null;
+
+                try {
+                    pageStore = snpTrans.stores.remove(grpPartId);
+
+                    pageStore.finishRecover();
+
+                    snpTrans.partConsumer.accept(new File(pageStore.getFileAbsolutePath()), grpPartId);
+
+                    snpTrans.partsLeft--;
+                }
+                catch (StorageException e) {
+                    throw new IgniteException(e);
+                }
+                finally {
+                    U.closeQuiet(pageStore);
+                }
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<ByteBuffer> chunkHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+
+                RemoteSnapshotFuture snpTrFut = resolve(nodeId, initMeta);
+
+                GroupPartitionId grpPartId = new GroupPartitionId(grpId, partId);
+                FilePageStore pageStore = snpTrFut.stores.get(grpPartId);
+
+                if (pageStore == null) {
+                    throw new IgniteException("Partition must be loaded before applying snapshot delta pages " +
+                        "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                pageStore.beginRecover();
+
+                // No snapshot delta pages received. Finalize recovery.
+                if (initMeta.count() == 0)
+                    finishRecover(snpTrFut, grpPartId);
+
+                return new Consumer<ByteBuffer>() {
+                    final LongAdder transferred = new LongAdder();
+
+                    @Override public void accept(ByteBuffer buff) {
+                        try {
+                            assert initMeta.count() != 0 : initMeta;
+
+                            RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                            if (fut0 == null || !fut0.equals(snpTrFut) || fut0.isCancelled()) {
+                                throw new TransmissionCancelledException("Snapshot request is cancelled " +
+                                    "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                            }
+
+                            pageStore.write(PageIO.getPageId(buff), buff, 0, false);
+
+                            transferred.add(buff.capacity());
+
+                            if (transferred.longValue() == initMeta.count())
+                                finishRecover(snpTrFut, grpPartId);
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                    }
+                };
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<File> fileHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+                String snpName = (String)initMeta.params().get(SNP_NAME_PARAM);
+
+                assert grpId != null;
+                assert partId != null;
+                assert snpName != null;
+                assert storeFactory != null;
+
+                RemoteSnapshotFuture transFut = rmtSnpReq.get();
+
+                if (transFut == null) {
+                    throw new IgniteException("Snapshot transmission with given name doesn't exists " +
+                        "[snpName=" + snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                return new Consumer<File>() {
+                    @Override public void accept(File file) {
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.equals(transFut) || fut0.isCancelled()) {
+                            throw new TransmissionCancelledException("Snapshot request is cancelled [snpName=" + snpName +
+                                ", grpId=" + grpId + ", partId=" + partId + ']');
+                        }
+
+                        busyLock.enterBusy();
+
+                        try {
+                            FilePageStore pageStore = (FilePageStore)storeFactory
+                                .apply(grpId, false)
+                                .createPageStore(getFlagByPartId(partId),
+                                    file::toPath,
+                                    new LongAdderMetric("NO_OP", null));
+
+                            transFut.stores.put(new GroupPartitionId(grpId, partId), pageStore);
+
+                            pageStore.init();
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                        finally {
+                            busyLock.leaveBusy();
+                        }
+                    }
+                };
+            }
+        });
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void stop0(boolean cancel) {
+        busyLock.block();
+
+        try {
+            // Try stop all snapshot processing if not yet.
+            for (SnapshotFutureTask sctx : locSnpTasks.values())
+                sctx.acceptException(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+            locSnpTasks.clear();
+
+            RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+            if (snpTrFut != null)
+                snpTrFut.cancel();
+
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null) {
+                    clusterSnpFut.onDone(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+                    clusterSnpFut = null;
+                }
+            }
+
+            if (snpRunner != null)
+                snpRunner.shutdownNow();
+
+            cctx.kernalContext().io().removeMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC);
+            cctx.kernalContext().io().removeTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC);
+
+            if (discoLsnr != null)
+                cctx.kernalContext().event().removeDiscoveryEventListener(discoLsnr);
+
+            cctx.exchange().unregisterExchangeAwareComponent(this);
+        }
+        finally {
+            busyLock.unblock();
+        }
+    }
+
+    /**
+     * Concurrently traverse the snapshot directory for given local node folder name and
 
 Review comment:
   Fixed

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r409741526
 
 

 ##########
 File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManagerSelfTest.java
 ##########
 @@ -0,0 +1,770 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.nio.file.OpenOption;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.CountDownLatch;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.function.BiConsumer;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.cache.affinity.rendezvous.RendezvousAffinityFunction;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.IgniteInterruptedCheckedException;
+import org.apache.ignite.internal.TestRecordingCommunicationSpi;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionDemandMessage;
+import org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionState;
+import org.apache.ignite.internal.processors.cache.persistence.CheckpointState;
+import org.apache.ignite.internal.processors.cache.persistence.DbCheckpointListener;
+import org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIODecorator;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.filename.PdsFolderSettings;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.util.lang.GridAbsPredicate;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.testframework.GridTestUtils;
+import org.junit.Test;
+
+import static org.apache.ignite.internal.MarshallerContextImpl.mappingFileStoreWorkDir;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.resolveBinaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.cacheDirName;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.CP_SNAPSHOT_REASON;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.testframework.GridTestUtils.assertThrowsAnyCause;
+
+/**
+ * Default snapshot manager test.
+ */
+public class IgniteSnapshotManagerSelfTest extends AbstractSnapshotSelfTest {
+    /** @throws Exception If fails. */
+    @Test
+    public void testSnapshotLocalPartitions() throws Exception {
+        // Start grid node with data before each test.
+        IgniteEx ig = startGridWithCache(dfltCacheCfg, 2048);
+
+        // The following data will be included into checkpoint.
+        for (int i = 2048; i < 4096; i++)
+            ig.cache(DEFAULT_CACHE_NAME).put(i, new TestOrderItem(i, i));
+
+        for (int i = 4096; i < 8192; i++) {
+            ig.cache(DEFAULT_CACHE_NAME).put(i, new TestOrderItem(i, i) {
+                @Override public String toString() {
+                    return "_" + super.toString();
+                }
+            });
+        }
+
+        GridCacheSharedContext<?, ?> cctx = ig.context().cache().context();
+        IgniteSnapshotManager mgr = snp(ig);
+
+        // Collection of pairs group and appropriate cache partition to be snapshot.
+        IgniteInternalFuture<?> snpFut = startLocalSnapshotTask(cctx,
+            SNAPSHOT_NAME,
+            F.asMap(CU.cacheId(DEFAULT_CACHE_NAME), null),
+            mgr.localSnapshotSenderFactory().apply(SNAPSHOT_NAME));
+
+        snpFut.get();
+
+        File cacheWorkDir = ((FilePageStoreManager)ig.context()
+            .cache()
+            .context()
+            .pageStore())
+            .cacheWorkDir(dfltCacheCfg);
+
+        // Checkpoint forces on cluster deactivation (currently only single node in cluster),
+        // so we must have the same data in snapshot partitions and those which left
+        // after node stop.
+        stopGrid(ig.name());
+
+        // Calculate CRCs.
+        IgniteConfiguration cfg = ig.context().config();
+        PdsFolderSettings settings = ig.context().pdsFolderResolver().resolveFolders();
+        String nodePath = databaseRelativePath(settings.folderName());
+        File binWorkDir = resolveBinaryWorkDir(cfg.getWorkDirectory(), settings.folderName());
+        File marshWorkDir = mappingFileStoreWorkDir(U.workDirectory(cfg.getWorkDirectory(), cfg.getIgniteHome()));
+        File snpBinWorkDir = resolveBinaryWorkDir(mgr.snapshotLocalDir(SNAPSHOT_NAME).getAbsolutePath(), settings.folderName());
+        File snpMarshWorkDir = mappingFileStoreWorkDir(mgr.snapshotLocalDir(SNAPSHOT_NAME).getAbsolutePath());
+
+        final Map<String, Integer> origPartCRCs = calculateCRC32Partitions(cacheWorkDir);
+        final Map<String, Integer> snpPartCRCs = calculateCRC32Partitions(
+            FilePageStoreManager.cacheWorkDir(U.resolveWorkDirectory(mgr.snapshotLocalDir(SNAPSHOT_NAME)
+                    .getAbsolutePath(),
+                nodePath,
+                false),
+                cacheDirName(dfltCacheCfg)));
+
+        assertEquals("Partitions must have the same CRC after file copying and merging partition delta files",
+            origPartCRCs, snpPartCRCs);
+        assertEquals("Binary object mappings must be the same for local node and created snapshot",
+            calculateCRC32Partitions(binWorkDir), calculateCRC32Partitions(snpBinWorkDir));
+        assertEquals("Marshaller meta mast be the same for local node and created snapshot",
+            calculateCRC32Partitions(marshWorkDir), calculateCRC32Partitions(snpMarshWorkDir));
+
+        File snpWorkDir = mgr.snapshotTmpDir();
+
+        assertEquals("Snapshot working directory must be cleaned after usage", 0, snpWorkDir.listFiles().length);
+    }
+
+    /**
+     * Test that all partitions are copied successfully even after multiple checkpoints occur during
+     * the long copy of cache partition files.
+     *
+     * Data consistency checked through a test node started right from snapshot directory and all values
+     * read successes.
+     *
+     * @throws Exception If fails.
+     */
+    @Test
+    public void testSnapshotLocalPartitionMultiCpWithLoad() throws Exception {
+        int valMultiplier = 2;
+        CountDownLatch slowCopy = new CountDownLatch(1);
+
+        // Start grid node with data before each test.
+        IgniteEx ig = startGrid(0);
+
+        ig.cluster().baselineAutoAdjustEnabled(false);
+        ig.cluster().state(ClusterState.ACTIVE);
+        GridCacheSharedContext<?, ?> cctx = ig.context().cache().context();
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            ig.cache(DEFAULT_CACHE_NAME).put(i, new TestOrderItem(i, i));
+
+        forceCheckpoint(ig);
+
+        AtomicInteger cntr = new AtomicInteger();
+        CountDownLatch ldrLatch = new CountDownLatch(1);
+        IgniteSnapshotManager mgr = snp(ig);
+        GridCacheDatabaseSharedManager db = (GridCacheDatabaseSharedManager)cctx.database();
+
+        IgniteInternalFuture<?> loadFut = GridTestUtils.runMultiThreadedAsync(() -> {
+            try {
+                U.await(ldrLatch);
+
+                while (!Thread.currentThread().isInterrupted())
+                    ig.cache(DEFAULT_CACHE_NAME).put(cntr.incrementAndGet(),
+                        new TestOrderItem(cntr.incrementAndGet(), cntr.incrementAndGet()));
+            }
+            catch (IgniteInterruptedCheckedException e) {
+                log.warning("Loader has been interrupted", e);
+            }
+        }, 5, "cache-loader-");
+
+        // Register task but not schedule it on the checkpoint.
+        SnapshotFutureTask snpFutTask = mgr.registerSnapshotTask(SNAPSHOT_NAME,
+            cctx.localNodeId(),
+            F.asMap(CU.cacheId(DEFAULT_CACHE_NAME), null),
+            new DelegateSnapshotSender(log, mgr.snapshotExecutorService(), mgr.localSnapshotSenderFactory().apply(SNAPSHOT_NAME)) {
+                @Override public void sendPart0(File part, String cacheDirName, GroupPartitionId pair, Long length) {
+                    try {
+                        U.await(slowCopy);
+
+                        delegate.sendPart0(part, cacheDirName, pair, length);
+                    }
+                    catch (IgniteInterruptedCheckedException e) {
+                        throw new IgniteException(e);
+                    }
+                }
+            });
+
+        db.addCheckpointListener(new DbCheckpointListener() {
+            /** {@inheritDoc} */
+            @Override public void beforeCheckpointBegin(Context ctx) {
+                // No-op.
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onMarkCheckpointBegin(Context ctx) {
+                // No-op.
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onCheckpointBegin(Context ctx) {
+                Map<Integer, Set<Integer>> processed = GridTestUtils.getFieldValue(snpFutTask,
+                    SnapshotFutureTask.class,
+                    "processed");
+
+                if (!processed.isEmpty())
+                    ldrLatch.countDown();
+            }
+        });
+
+        try {
+            snpFutTask.start();
+
+            // Change data before snapshot creation which must be included into it witch correct value multiplier.
 
 Review comment:
   Fixed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [ignite] alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node

Posted by GitBox <gi...@apache.org>.
alex-plekhanov commented on a change in pull request #7607: IGNITE-11073: Create consistent partitions copy on each cluster node
URL: https://github.com/apache/ignite/pull/7607#discussion_r408830539
 
 

 ##########
 File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManager.java
 ##########
 @@ -0,0 +1,1986 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.channels.FileChannel;
+import java.nio.file.FileVisitResult;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.SimpleFileVisitor;
+import java.nio.file.attribute.BasicFileAttributes;
+import java.util.ArrayDeque;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Queue;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.Executor;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.concurrent.atomic.LongAdder;
+import java.util.function.BiConsumer;
+import java.util.function.BiFunction;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.events.DiscoveryEvent;
+import org.apache.ignite.failure.FailureContext;
+import org.apache.ignite.failure.FailureType;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.GridTopic;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.events.DiscoveryCustomEvent;
+import org.apache.ignite.internal.managers.communication.GridIoManager;
+import org.apache.ignite.internal.managers.communication.GridMessageListener;
+import org.apache.ignite.internal.managers.communication.TransmissionCancelledException;
+import org.apache.ignite.internal.managers.communication.TransmissionHandler;
+import org.apache.ignite.internal.managers.communication.TransmissionMeta;
+import org.apache.ignite.internal.managers.communication.TransmissionPolicy;
+import org.apache.ignite.internal.managers.discovery.DiscoCache;
+import org.apache.ignite.internal.managers.discovery.DiscoveryCustomMessage;
+import org.apache.ignite.internal.managers.discovery.GridDiscoveryManager;
+import org.apache.ignite.internal.managers.eventstorage.DiscoveryEventListener;
+import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.CacheType;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.PartitionsExchangeAware;
+import org.apache.ignite.internal.processors.cache.persistence.StorageException;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.filename.PdsFolderSettings;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.MetastorageLifecycleListener;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadOnlyMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.metastorage.ReadWriteMetastorage;
+import org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId;
+import org.apache.ignite.internal.processors.cache.persistence.tree.io.PageIO;
+import org.apache.ignite.internal.processors.cache.persistence.wal.crc.FastCrc;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.processors.marshaller.MappedName;
+import org.apache.ignite.internal.processors.metric.MetricRegistry;
+import org.apache.ignite.internal.processors.metric.impl.LongAdderMetric;
+import org.apache.ignite.internal.util.GridBusyLock;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.distributed.InitMessage;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.tostring.GridToStringInclude;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.A;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.S;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.lang.IgniteUuid;
+import org.apache.ignite.thread.IgniteThreadPoolExecutor;
+import org.apache.ignite.thread.OomExceptionHandler;
+import org.jetbrains.annotations.Nullable;
+
+import static java.nio.file.StandardOpenOption.READ;
+import static org.apache.ignite.cluster.ClusterState.active;
+import static org.apache.ignite.configuration.IgniteConfiguration.DFLT_SNAPSHOT_DIRECTORY;
+import static org.apache.ignite.events.EventType.EVT_NODE_FAILED;
+import static org.apache.ignite.events.EventType.EVT_NODE_LEFT;
+import static org.apache.ignite.internal.IgniteFeatures.PERSISTENCE_CACHE_SNAPSHOT;
+import static org.apache.ignite.internal.IgniteFeatures.nodeSupports;
+import static org.apache.ignite.internal.MarshallerContextImpl.mappingFileStoreWorkDir;
+import static org.apache.ignite.internal.MarshallerContextImpl.saveMappings;
+import static org.apache.ignite.internal.events.DiscoveryCustomEvent.EVT_DISCOVERY_CUSTOM_EVT;
+import static org.apache.ignite.internal.managers.communication.GridIoPolicy.SYSTEM_POOL;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.INDEX_PARTITION;
+import static org.apache.ignite.internal.pagemem.PageIdAllocator.MAX_PARTITION_ID;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.resolveBinaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.INDEX_FILE_NAME;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.PART_FILE_TEMPLATE;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFile;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getPartitionFileName;
+import static org.apache.ignite.internal.processors.cache.persistence.filename.PdsConsistentIdProcessor.DB_DEFAULT_FOLDER;
+import static org.apache.ignite.internal.processors.cache.persistence.partstate.GroupPartitionId.getFlagByPartId;
+import static org.apache.ignite.internal.util.IgniteUtils.isLocalNodeCoordinator;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.END_SNAPSHOT;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.START_SNAPSHOT;
+
+/**
+ * Internal implementation of snapshot operations over persistence caches.
+ * <p>
+ * There are two major actions available:
+ * <ul>
+ *     <li>Create snapshot of the whole cluster cache groups by triggering PME to achieve consistency.</li>
+ *     <li>Create local snapshot of requested cache groups and send it to the node which request this operation.
+ *     Cache groups will be transmitted using internal API for transferring files. See {@link TransmissionHandler}.</li>
+ * </ul>
+ */
+public class IgniteSnapshotManager extends GridCacheSharedManagerAdapter
+    implements IgniteSnapshot, PartitionsExchangeAware, MetastorageLifecycleListener {
+    /** File with delta pages suffix. */
+    public static final String DELTA_SUFFIX = ".delta";
+
+    /** File name template consists of delta pages. */
+    public static final String PART_DELTA_TEMPLATE = PART_FILE_TEMPLATE + DELTA_SUFFIX;
+
+    /** File name template for index delta pages. */
+    public static final String INDEX_DELTA_NAME = INDEX_FILE_NAME + DELTA_SUFFIX;
+
+    /** Text Reason for checkpoint to start snapshot operation. */
+    public static final String CP_SNAPSHOT_REASON = "Checkpoint started to enforce snapshot operation: %s";
+
+    /** Name prefix for each remote snapshot operation. */
+    public static final String RMT_SNAPSHOT_PREFIX = "snapshot_";
+
+    /** Default snapshot directory for loading remote snapshots. */
+    public static final String DFLT_SNAPSHOT_TMP_DIR = "snp";
+
+    /** Timeout in millisecond for snapshot operations. */
+    public static final long DFLT_SNAPSHOT_TIMEOUT = 15_000L;
+
+    /** Snapshot in progress error message. */
+    public static final String SNP_IN_PROGRESS_ERR_MSG = "Operation rejected due to the snapshot operation in progress.";
+
+    /** Error message to finalize snapshot tasks. */
+    public static final String SNP_NODE_STOPPING_ERR_MSG = "Snapshot has been cancelled due to the local node " +
+        "is stopping";
+
+    /** Metastorage key to save currently running snapshot. */
+    public static final String SNP_RUNNING_KEY = "snapshot-running";
+
+    /** Snapshot metrics prefix. */
+    public static final String SNAPSHOT_METRICS = "snapshot";
+
+    /** Prefix for snapshot threads. */
+    private static final String SNAPSHOT_RUNNER_THREAD_PREFIX = "snapshot-runner";
+
+    /** Total number of thread to perform local snapshot. */
+    private static final int SNAPSHOT_THREAD_POOL_SIZE = 4;
+
+    /** Default snapshot topic to receive snapshots from remote node. */
+    private static final Object DFLT_INITIAL_SNAPSHOT_TOPIC = GridTopic.TOPIC_SNAPSHOT.topic("rmt_snp");
+
+    /** File transmission parameter of cache group id. */
+    private static final String SNP_GRP_ID_PARAM = "grpId";
+
+    /** File transmission parameter of cache partition id. */
+    private static final String SNP_PART_ID_PARAM = "partId";
+
+    /** File transmission parameter of node-sender directory path with its consistentId (e.g. db/IgniteNode0). */
+    private static final String SNP_DB_NODE_PATH_PARAM = "dbNodePath";
+
+    /** File transmission parameter of a cache directory with is currently sends its partitions. */
+    private static final String SNP_CACHE_DIR_NAME_PARAM = "cacheDirName";
+
+    /** Snapshot parameter name for a file transmission. */
+    private static final String SNP_NAME_PARAM = "snpName";
+
+    /** Total snapshot files count which receiver should expect to receive. */
+    private static final String SNP_PARTITIONS_CNT = "partsCnt";
+
+    /**
+     * Local buffer to perform copy-on-write operations with pages for {@code SnapshotFutureTask.PageStoreSerialWriter}s.
+     * It is important to have only only buffer per thread (instead of creating each buffer per
+     * each {@code SnapshotFutureTask.PageStoreSerialWriter}) this is redundant and can lead to OOM errors. Direct buffer
+     * deallocate only when ByteBuffer is garbage collected, but it can get out of off-heap memory before it.
+     */
+    private final ThreadLocal<ByteBuffer> locBuff;
+
+    /** Map of registered cache snapshot processes and their corresponding contexts. */
+    private final ConcurrentMap<String, SnapshotFutureTask> locSnpTasks = new ConcurrentHashMap<>();
+
+    /** Lock to protect the resources is used. */
+    private final GridBusyLock busyLock = new GridBusyLock();
+
+    /** Requested snapshot from remote node. */
+    private final AtomicReference<RemoteSnapshotFuture> rmtSnpReq = new AtomicReference<>();
+
+    /** Mutex used to order cluster snapshot operation progress. */
+    private final Object snpOpMux = new Object();
+
+    /** Take snapshot operation procedure. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> startSnpProc;
+
+    /** Check previously performed snapshot operation and delete uncompleted files if need. */
+    private final DistributedProcess<SnapshotOperationRequest, SnapshotOperationResponse> endSnpProc;
+
+    /** Resolved persistent data storage settings. */
+    private volatile PdsFolderSettings pdsSettings;
+
+    /** Fully initialized metastorage. */
+    private volatile ReadWriteMetastorage metaStorage;
+
+    /** Local snapshot sender factory. */
+    private Function<String, SnapshotSender> locSndrFactory = localSnapshotSenderFactory();
+
+    /** Main snapshot directory to save created snapshots. */
+    private volatile File locSnpDir;
+
+    /**
+     * Working directory for loaded snapshots from the remote nodes and storing
+     * temporary partition delta-files of locally started snapshot process.
+     */
+    private File tmpWorkDir;
+
+    /** Factory to working with delta as file storage. */
+    private volatile FileIOFactory ioFactory = new RandomAccessFileIOFactory();
+
+    /** Factory to create page store for restore. */
+    private volatile BiFunction<Integer, Boolean, FilePageStoreFactory> storeFactory;
+
+    /** Snapshot thread pool to perform local partition snapshots. */
+    private ExecutorService snpRunner;
+
+    /** System discovery message listener. */
+    private DiscoveryEventListener discoLsnr;
+
+    /** Cluster snapshot operation requested by user. */
+    private ClusterSnapshotFuture clusterSnpFut;
+
+    /** Current snapshot operation on local node. */
+    private volatile SnapshotOperationRequest clusterSnpRq;
+
+    /** {@code true} if recovery process occurred for snapshot. */
+    private volatile boolean recovered;
+
+    /** Last seen cluster snapshot operation. */
+    private volatile ClusterSnapshotFuture lastSeenSnpFut = new ClusterSnapshotFuture();
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public IgniteSnapshotManager(GridKernalContext ctx) {
+        locBuff = ThreadLocal.withInitial(() ->
+            ByteBuffer.allocateDirect(ctx.config().getDataStorageConfiguration().getPageSize())
+                .order(ByteOrder.nativeOrder()));
+
+        startSnpProc = new DistributedProcess<>(ctx, START_SNAPSHOT, this::initLocalSnapshotStartStage,
+            this::processLocalSnapshotStartStageResult);
+
+        endSnpProc = new DistributedProcess<>(ctx, END_SNAPSHOT, this::initLocalSnapshotEndStage,
+            this::processLocalSnapshotEndStageResult);
+    }
+
+    /**
+     * @param snapshotCacheDir Snapshot directory to store files.
+     * @param partId Cache partition identifier.
+     * @return A file representation.
+     */
+    public static File partDeltaFile(File snapshotCacheDir, int partId) {
+        return new File(snapshotCacheDir, partDeltaFileName(partId));
+    }
+
+    /**
+     * @param partId Partition id.
+     * @return File name of delta partition pages.
+     */
+    public static String partDeltaFileName(int partId) {
+        assert partId <= MAX_PARTITION_ID || partId == INDEX_PARTITION;
+
+        return partId == INDEX_PARTITION ? INDEX_DELTA_NAME : String.format(PART_DELTA_TEMPLATE, partId);
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void start0() throws IgniteCheckedException {
+        super.start0();
+
+        GridKernalContext ctx = cctx.kernalContext();
+
+        if (ctx.clientNode())
+            return;
+
+        if (!CU.isPersistenceEnabled(ctx.config()))
+            return;
+
+        snpRunner = new IgniteThreadPoolExecutor(SNAPSHOT_RUNNER_THREAD_PREFIX,
+            cctx.igniteInstanceName(),
+            SNAPSHOT_THREAD_POOL_SIZE,
+            SNAPSHOT_THREAD_POOL_SIZE,
+            IgniteConfiguration.DFLT_THREAD_KEEP_ALIVE_TIME,
+            new LinkedBlockingQueue<>(),
+            SYSTEM_POOL,
+            new OomExceptionHandler(ctx));
+
+        assert cctx.pageStore() instanceof FilePageStoreManager;
+
+        FilePageStoreManager storeMgr = (FilePageStoreManager)cctx.pageStore();
+
+        pdsSettings = cctx.kernalContext().pdsFolderResolver().resolveFolders();
+
+        locSnpDir = resolveSnapshotWorkDirectory(ctx.config());
+        tmpWorkDir = Paths.get(storeMgr.workDir().getAbsolutePath(), DFLT_SNAPSHOT_TMP_DIR).toFile();
+
+        U.ensureDirectory(locSnpDir, "snapshot work directory", log);
+        U.ensureDirectory(tmpWorkDir, "temp directory for snapshot creation", log);
+
+        MetricRegistry mreg = cctx.kernalContext().metric().registry(SNAPSHOT_METRICS);
+
+        mreg.register("LastSnapshotStartTime", () -> lastSeenSnpFut.startTime,
+            "The system time approximated by 10 ms when the last cluster snapshot operation has been started.");
+        mreg.register("LastSnapshotEndTime", () -> lastSeenSnpFut.endTime,
+            "The system time approximated by 10 ms when the last cluster snapshot operation has been finished.");
+        mreg.register("LastSnapshotName", () -> lastSeenSnpFut.name, String.class,
+            "The name of last started cluster snapshot operation.");
+        mreg.register("LastSnapshotErrorMessage",
+            () -> lastSeenSnpFut.error() == null ? null : lastSeenSnpFut.error().getMessage(),
+            String.class,
+            "The error message of last started cluster snapshot operation which fail. This value will be 'null' " +
+                "if last snapshot operation completed successfully.");
+        mreg.register("localSnapshotList", this::getSnapshots, List.class,
+            "The list of all known snapshots currently saved on the local node with respect to " +
+                "configured via IgniteConfiguration a snapshot path.");
+
+        storeFactory = storeMgr::getPageStoreFactory;
+
+        cctx.exchange().registerExchangeAwareComponent(this);
+        ctx.internalSubscriptionProcessor().registerMetastorageListener(this);
+
+        // Receive remote snapshots requests.
+        cctx.gridIO().addMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC, new GridMessageListener() {
+            @Override public void onMessage(UUID nodeId, Object msg, byte plc) {
+                if (!busyLock.enterBusy())
+                    return;
+
+                try {
+                    if (msg instanceof SnapshotRequestMessage) {
+                        SnapshotRequestMessage reqMsg0 = (SnapshotRequestMessage)msg;
+                        String snpName = reqMsg0.snapshotName();
+
+                        synchronized (this) {
+                            SnapshotFutureTask task = lastScheduledRemoteSnapshotTask(nodeId);
+
+                            if (task != null) {
+                                // Task will also be removed from local map due to the listener on future done.
+                                task.cancel();
+
+                                log.info("Snapshot request has been cancelled due to another request received " +
+                                    "[prevSnpResp=" + task + ", msg0=" + reqMsg0 + ']');
+                            }
+                        }
+
+                        SnapshotFutureTask task = registerSnapshotTask(snpName,
+                            nodeId,
+                            reqMsg0.parts(),
+                            remoteSnapshotSender(snpName, nodeId));
+
+                        task.listen(f -> {
+                            if (f.error() == null)
+                                return;
+
+                            U.error(log, "Failed to process request of creating a snapshot " +
+                                "[from=" + nodeId + ", msg=" + reqMsg0 + ']', f.error());
+
+                            try {
+                                cctx.gridIO().sendToCustomTopic(nodeId,
+                                    DFLT_INITIAL_SNAPSHOT_TOPIC,
+                                    new SnapshotResponseMessage(reqMsg0.snapshotName(), f.error().getMessage()),
+                                    SYSTEM_POOL);
+                            }
+                            catch (IgniteCheckedException ex0) {
+                                U.error(log, "Fail to send the response message with processing snapshot request " +
+                                    "error [request=" + reqMsg0 + ", nodeId=" + nodeId + ']', ex0);
+                            }
+                        });
+
+                        task.start();
+                    }
+                    else if (msg instanceof SnapshotResponseMessage) {
+                        SnapshotResponseMessage respMsg0 = (SnapshotResponseMessage)msg;
+
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.snpName.equals(respMsg0.snapshotName())) {
+                            if (log.isInfoEnabled()) {
+                                log.info("A stale snapshot response message has been received. Will be ignored " +
+                                    "[fromNodeId=" + nodeId + ", response=" + respMsg0 + ']');
+                            }
+
+                            return;
+                        }
+
+                        if (respMsg0.errorMessage() != null) {
+                            fut0.onDone(new IgniteCheckedException("Request cancelled. The snapshot operation stopped " +
+                                "on the remote node with an error: " + respMsg0.errorMessage()));
+                        }
+                    }
+                }
+                catch (Throwable e) {
+                    U.error(log, "Processing snapshot request from remote node fails with an error", e);
+
+                    cctx.kernalContext().failure().process(new FailureContext(FailureType.CRITICAL_ERROR, e));
+                }
+                finally {
+                    busyLock.leaveBusy();
+                }
+            }
+        });
+
+        cctx.gridEvents().addDiscoveryEventListener(discoLsnr = (evt, discoCache) -> {
+            if (!busyLock.enterBusy())
+                return;
+
+            try {
+                UUID leftNodeId = evt.eventNode().id();
+
+                if (evt.type() == EVT_DISCOVERY_CUSTOM_EVT) {
+                    DiscoveryCustomEvent evt0 = (DiscoveryCustomEvent)evt;
+
+                    if (evt0.customMessage() instanceof InitMessage) {
+                        InitMessage<?> msg = (InitMessage<?>)evt0.customMessage();
+
+                        // This happens when #takeSnapshot() method already invoked and distributed process
+                        // starts its action.
+                        if (msg.type() == START_SNAPSHOT.ordinal()) {
+                            assert clusterSnpRq != null ||
+                                !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()) : evt;
+
+                            DiscoveryCustomEvent customEvt = new DiscoveryCustomEvent();
+
+                            customEvt.node(evt0.node());
+                            customEvt.eventNode(evt0.eventNode());
+                            customEvt.affinityTopologyVersion(evt0.affinityTopologyVersion());
+                            customEvt.customMessage(new SnapshotStartDiscoveryMessage(discoCache, msg.processId()));
+
+                            // Handle new event inside discovery thread, so no guarantees will be violated.
+                            cctx.exchange().onDiscoveryEvent(customEvt, discoCache);
+                        }
+                    }
+                }
+                else if (evt.type() == EVT_NODE_LEFT || evt.type() == EVT_NODE_FAILED) {
+                    SnapshotOperationRequest snpRq = clusterSnpRq;
+
+                    for (SnapshotFutureTask sctx : locSnpTasks.values()) {
+                        if (sctx.sourceNodeId().equals(leftNodeId) ||
+                            (snpRq != null &&
+                                snpRq.snpName.equals(sctx.snapshotName()) &&
+                                snpRq.bltNodes.contains(leftNodeId))) {
+                            sctx.acceptException(new ClusterTopologyCheckedException("The node which requested snapshot " +
+                                "creation has left the grid"));
+                        }
+                    }
+
+                    RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                    if (snpTrFut != null && snpTrFut.rmtNodeId.equals(leftNodeId)) {
+                        snpTrFut.onDone(new ClusterTopologyCheckedException("The node from which a snapshot has been " +
+                            "requested left the grid"));
+                    }
+                }
+            }
+            finally {
+                busyLock.leaveBusy();
+            }
+        }, EVT_NODE_LEFT, EVT_NODE_FAILED, EVT_DISCOVERY_CUSTOM_EVT);
+
+        // Remote snapshot handler.
+        cctx.kernalContext().io().addTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC, new TransmissionHandler() {
+            @Override public void onEnd(UUID nodeId) {
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                assert snpTrFut.stores.isEmpty() : snpTrFut.stores.entrySet();
+                assert snpTrFut.partsLeft == 0 : snpTrFut;
+
+                snpTrFut.onDone();
+
+                log.info("Requested snapshot from remote node has been fully received " +
+                    "[snpName=" + snpTrFut.snpName + ", snpTrans=" + snpTrFut + ']');
+            }
+
+            /** {@inheritDoc} */
+            @Override public void onException(UUID nodeId, Throwable err) {
+                RemoteSnapshotFuture fut = rmtSnpReq.get();
+
+                if (fut == null)
+                    return;
+
+                if (fut.rmtNodeId.equals(nodeId))
+                    fut.onDone(err);
+            }
+
+            /** {@inheritDoc} */
+            @Override public String filePath(UUID nodeId, TransmissionMeta fileMeta) {
+                Integer partId = (Integer)fileMeta.params().get(SNP_PART_ID_PARAM);
+                String rmtDbNodePath = (String)fileMeta.params().get(SNP_DB_NODE_PATH_PARAM);
+                String cacheDirName = (String)fileMeta.params().get(SNP_CACHE_DIR_NAME_PARAM);
+
+                RemoteSnapshotFuture transFut = resolve(nodeId, fileMeta);
+
+                try {
+                    File cacheDir = U.resolveWorkDirectory(tmpWorkDir.getAbsolutePath(),
+                        Paths.get(transFut.snpName, rmtDbNodePath, cacheDirName).toString(),
+                        false);
+
+                    return new File(cacheDir, getPartitionFileName(partId)).getAbsolutePath();
+                }
+                catch (IgniteCheckedException e) {
+                    throw new IgniteException(e);
+                }
+            }
+
+            /**
+             * @param nodeId Remote node id.
+             * @param meta Transmission meta.
+             * @return Resolved transmission future.
+             */
+            private RemoteSnapshotFuture resolve(UUID nodeId, TransmissionMeta meta) {
+                String snpName = (String)meta.params().get(SNP_NAME_PARAM);
+                Integer partsCnt = (Integer)meta.params().get(SNP_PARTITIONS_CNT);
+
+                RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+                if (snpTrFut == null || !snpTrFut.snpName.equals(snpName)) {
+                    throw new TransmissionCancelledException("Stale snapshot transmission will be ignored " +
+                        "[snpName=" + snpName + ", meta=" + meta + ", snpTrFut=" + snpTrFut + ']');
+                }
+
+                assert snpTrFut.snpName.equals(snpName) && snpTrFut.rmtNodeId.equals(nodeId) :
+                    "Another transmission in progress [snpTrFut=" + snpTrFut + ", nodeId=" + snpName + ']';
+
+                if (snpTrFut.partsLeft == -1)
+                    snpTrFut.partsLeft = partsCnt;
+
+                return snpTrFut;
+            }
+
+            /**
+             * @param snpTrans Current snapshot transmission.
+             * @param grpPartId Pair of group id and its partition id.
+             */
+            private void finishRecover(RemoteSnapshotFuture snpTrans, GroupPartitionId grpPartId) {
+                FilePageStore pageStore = null;
+
+                try {
+                    pageStore = snpTrans.stores.remove(grpPartId);
+
+                    pageStore.finishRecover();
+
+                    snpTrans.partConsumer.accept(new File(pageStore.getFileAbsolutePath()), grpPartId);
+
+                    snpTrans.partsLeft--;
+                }
+                catch (StorageException e) {
+                    throw new IgniteException(e);
+                }
+                finally {
+                    U.closeQuiet(pageStore);
+                }
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<ByteBuffer> chunkHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+
+                RemoteSnapshotFuture snpTrFut = resolve(nodeId, initMeta);
+
+                GroupPartitionId grpPartId = new GroupPartitionId(grpId, partId);
+                FilePageStore pageStore = snpTrFut.stores.get(grpPartId);
+
+                if (pageStore == null) {
+                    throw new IgniteException("Partition must be loaded before applying snapshot delta pages " +
+                        "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                pageStore.beginRecover();
+
+                // No snapshot delta pages received. Finalize recovery.
+                if (initMeta.count() == 0)
+                    finishRecover(snpTrFut, grpPartId);
+
+                return new Consumer<ByteBuffer>() {
+                    final LongAdder transferred = new LongAdder();
+
+                    @Override public void accept(ByteBuffer buff) {
+                        try {
+                            assert initMeta.count() != 0 : initMeta;
+
+                            RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                            if (fut0 == null || !fut0.equals(snpTrFut) || fut0.isCancelled()) {
+                                throw new TransmissionCancelledException("Snapshot request is cancelled " +
+                                    "[snpName=" + snpTrFut.snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                            }
+
+                            pageStore.write(PageIO.getPageId(buff), buff, 0, false);
+
+                            transferred.add(buff.capacity());
+
+                            if (transferred.longValue() == initMeta.count())
+                                finishRecover(snpTrFut, grpPartId);
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                    }
+                };
+            }
+
+            /** {@inheritDoc} */
+            @Override public Consumer<File> fileHandler(UUID nodeId, TransmissionMeta initMeta) {
+                Integer grpId = (Integer)initMeta.params().get(SNP_GRP_ID_PARAM);
+                Integer partId = (Integer)initMeta.params().get(SNP_PART_ID_PARAM);
+                String snpName = (String)initMeta.params().get(SNP_NAME_PARAM);
+
+                assert grpId != null;
+                assert partId != null;
+                assert snpName != null;
+                assert storeFactory != null;
+
+                RemoteSnapshotFuture transFut = rmtSnpReq.get();
+
+                if (transFut == null) {
+                    throw new IgniteException("Snapshot transmission with given name doesn't exists " +
+                        "[snpName=" + snpName + ", grpId=" + grpId + ", partId=" + partId + ']');
+                }
+
+                return new Consumer<File>() {
+                    @Override public void accept(File file) {
+                        RemoteSnapshotFuture fut0 = rmtSnpReq.get();
+
+                        if (fut0 == null || !fut0.equals(transFut) || fut0.isCancelled()) {
+                            throw new TransmissionCancelledException("Snapshot request is cancelled [snpName=" + snpName +
+                                ", grpId=" + grpId + ", partId=" + partId + ']');
+                        }
+
+                        busyLock.enterBusy();
+
+                        try {
+                            FilePageStore pageStore = (FilePageStore)storeFactory
+                                .apply(grpId, false)
+                                .createPageStore(getFlagByPartId(partId),
+                                    file::toPath,
+                                    new LongAdderMetric("NO_OP", null));
+
+                            transFut.stores.put(new GroupPartitionId(grpId, partId), pageStore);
+
+                            pageStore.init();
+                        }
+                        catch (IgniteCheckedException e) {
+                            throw new IgniteException(e);
+                        }
+                        finally {
+                            busyLock.leaveBusy();
+                        }
+                    }
+                };
+            }
+        });
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void stop0(boolean cancel) {
+        busyLock.block();
+
+        try {
+            // Try stop all snapshot processing if not yet.
+            for (SnapshotFutureTask sctx : locSnpTasks.values())
+                sctx.acceptException(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+            locSnpTasks.clear();
+
+            RemoteSnapshotFuture snpTrFut = rmtSnpReq.get();
+
+            if (snpTrFut != null)
+                snpTrFut.cancel();
+
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null) {
+                    clusterSnpFut.onDone(new NodeStoppingException(SNP_NODE_STOPPING_ERR_MSG));
+
+                    clusterSnpFut = null;
+                }
+            }
+
+            if (snpRunner != null)
+                snpRunner.shutdownNow();
+
+            cctx.kernalContext().io().removeMessageListener(DFLT_INITIAL_SNAPSHOT_TOPIC);
+            cctx.kernalContext().io().removeTransmissionHandler(DFLT_INITIAL_SNAPSHOT_TOPIC);
+
+            if (discoLsnr != null)
+                cctx.kernalContext().event().removeDiscoveryEventListener(discoLsnr);
+
+            cctx.exchange().unregisterExchangeAwareComponent(this);
+        }
+        finally {
+            busyLock.unblock();
+        }
+    }
+
+    /**
+     * @param snpDir Snapshot dire
+     * @param folderName Local node folder name (see {@link U#maskForFileName} with consistent id).
+     */
+    public static void deleteSnapshot(File snpDir, String folderName) {
+        if (!snpDir.exists())
+            return;
+
+        assert snpDir.isDirectory() : snpDir;
+
+        try {
+            File binDir = resolveBinaryWorkDir(snpDir.getAbsolutePath(), folderName);
+            File dbDir = U.resolveWorkDirectory(snpDir.getAbsolutePath(), databaseRelativePath(folderName), false);
+
+            U.delete(binDir);
+            U.delete(dbDir);
+
+            File marshDir = mappingFileStoreWorkDir(snpDir.getAbsolutePath());
+
+            // Concurrently traverse the snapshot marshaller directory and delete all files.
+            Files.walkFileTree(marshDir.toPath(), new SimpleFileVisitor<Path>() {
+                @Override public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException {
+                    U.delete(file);
+
+                    return FileVisitResult.CONTINUE;
+                }
+
+                @Override public FileVisitResult visitFileFailed(Path file, IOException exc) {
+                    // Skip files which can be concurrently removed from FileTree.
+                    return FileVisitResult.CONTINUE;
+                }
+            });
+
+            File db = new File(snpDir, DB_DEFAULT_FOLDER);
+
+            if (!db.exists() || db.list().length == 0)
+                U.delete(snpDir);
+        }
+        catch (IOException | IgniteCheckedException e) {
+            throw new IgniteException(e);
+        }
+    }
+
+    /**
+     * @param snpName Snapshot name.
+     * @return Local snapshot directory for snapshot with given name.
+     */
+    public File snapshotLocalDir(String snpName) {
+        assert locSnpDir != null;
+
+        return new File(locSnpDir, snpName);
+    }
+
+    /**
+     * @return Node snapshot working directory.
+     */
+    public File snapshotTmpDir() {
+        assert tmpWorkDir != null;
+
+        return tmpWorkDir;
+    }
+
+    /**
+     * @param req Request on snapshot creation.
+     * @return Future which will be completed when a snapshot has been started.
+     */
+    private IgniteInternalFuture<SnapshotOperationResponse> initLocalSnapshotStartStage(SnapshotOperationRequest req) {
+        if (cctx.kernalContext().clientNode() ||
+            !CU.baselineNode(cctx.localNode(), cctx.kernalContext().state().clusterState()))
+            return new GridFinishedFuture<>();
+
+        // Executed inside discovery notifier thread, prior to firing discovery custom event,
+        // so it is safe to set new snapshot task inside this method without synchronization.
+        if (clusterSnpRq != null) {
+            return new GridFinishedFuture<>(new IgniteCheckedException("Snapshot operation has been rejected. " +
+                "Another snapshot operation in progress [req=" + req + ", curr=" + clusterSnpRq + ']'));
+        }
+
+        // Collection of pairs group and appropriate cache partition to be snapshot.
+        Map<Integer, Set<Integer>> parts = new HashMap<>();
+
+        for (Integer grpId : req.grpIds)
+            parts.put(grpId, null);
+
+        SnapshotFutureTask task0 = registerSnapshotTask(req.snpName,
+            req.srcNodeId,
+            parts,
+            locSndrFactory.apply(req.snpName));
+
+        clusterSnpRq = req;
+
+        return task0.chain(f -> new SnapshotOperationResponse());
+    }
+
+    /**
+     * @param id Request id.
+     * @param res Results.
+     * @param err Errors.
+     */
+    private void processLocalSnapshotStartStageResult(UUID id, Map<UUID, SnapshotOperationResponse> res, Map<UUID, Exception> err) {
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        if (snpRq == null)
+            return;
+
+        if (!snpRq.rqId.equals(id)) {
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null && clusterSnpFut.rqId.equals(id)) {
+                    clusterSnpFut.onDone(new IgniteCheckedException("Snapshot operation failed due to another snapshot " +
+                        "operation in progress: " + snpRq.snpName));
+
+                    clusterSnpFut = null;
+                }
+
+                return;
+            }
+        }
+
+        if (isLocalNodeCoordinator(cctx.discovery())) {
+            Set<UUID> missed = new HashSet<>(snpRq.bltNodes);
+            missed.removeAll(res.keySet());
+            missed.removeAll(err.keySet());
+
+            snpRq.hasErr = !F.isEmpty(err) || !missed.isEmpty();
+
+            if (snpRq.hasErr) {
+                U.warn(log, "Execution of local snapshot tasks fails or them haven't been executed " +
+                    "due to some of nodes left the cluster. Uncompleted snapshot will be deleted " +
+                    "[err=" + err + ", missed=" + missed + ']');
+            }
+
+            endSnpProc.start(UUID.randomUUID(), snpRq);
+        }
+    }
+
+    /**
+     * @param req Request on snapshot creation.
+     * @return Future which will be completed when the snapshot will be finalized.
+     */
+    private IgniteInternalFuture<SnapshotOperationResponse> initLocalSnapshotEndStage(SnapshotOperationRequest req) {
+        if (clusterSnpRq == null)
+            return new GridFinishedFuture<>(new SnapshotOperationResponse());
+
+        try {
+            if (req.hasErr)
+                deleteSnapshot(snapshotLocalDir(req.snpName), pdsSettings.folderName());
+
+            removeLastMetaStorageKey();
+        }
+        catch (Exception e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        return new GridFinishedFuture<>(new SnapshotOperationResponse());
+    }
+
+    /**
+     * @param id Request id.
+     * @param res Results.
+     * @param err Errors.
+     */
+    private void processLocalSnapshotEndStageResult(UUID id, Map<UUID, SnapshotOperationResponse> res, Map<UUID, Exception> err) {
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        if (snpRq == null)
+            return;
+
+        Set<UUID> endFail = new HashSet<>(snpRq.bltNodes);
+        endFail.removeAll(res.keySet());
+
+        clusterSnpRq = null;
+
+        synchronized (snpOpMux) {
+            if (clusterSnpFut != null) {
+                if (endFail.isEmpty() && !snpRq.hasErr) {
+                    clusterSnpFut.onDone();
+
+                    if (log.isInfoEnabled())
+                        log.info("Cluster-wide snapshot operation finished successfully [req=" + snpRq + ']');
+                }
+                else {
+                    clusterSnpFut.onDone(new IgniteCheckedException("Snapshot creation has been finished with an error. " +
+                        "Local snapshot tasks may not finished completely or finalizing results fails " +
+                        "[hasErr" + snpRq.hasErr + ", fail=" + endFail + ']'));
+                }
+
+                clusterSnpFut = null;
+            }
+        }
+    }
+
+    /**
+     * @return {@code True} if snapshot operation is in progress.
+     */
+    public boolean isSnapshotCreating() {
+        if (clusterSnpRq != null)
+            return true;
+
+        synchronized (snpOpMux) {
+            return clusterSnpRq != null || clusterSnpFut != null;
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public List<String> getSnapshots() {
+        if (cctx.kernalContext().clientNode())
+            throw new UnsupportedOperationException("Client and daemon nodes can not perform this operation.");
+
+        synchronized (snpOpMux) {
+            return Arrays.stream(locSnpDir.listFiles(File::isDirectory))
+                .map(File::getName)
+                .collect(Collectors.toList());
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public IgniteFuture<Void> createSnapshot(String name) {
+        A.notNullOrEmpty(name, "name");
+
+        try {
+            if (cctx.kernalContext().clientNode())
+                throw new UnsupportedOperationException("Client and daemon nodes can not perform this operation.");
+
+            if (!IgniteFeatures.allNodesSupports(cctx.discovery().allNodes(), PERSISTENCE_CACHE_SNAPSHOT))
+                throw new IgniteException("Not all nodes in the cluster support a snapshot operation.");
+
+            if (!active(cctx.kernalContext().state().clusterState().state()))
+                throw new IgniteException("Snapshot operation has been rejected. The cluster is inactive.");
+
+            DiscoveryDataClusterState clusterState = cctx.kernalContext().state().clusterState();
+
+            if (!clusterState.hasBaselineTopology())
+                throw new IgniteException("Snapshot operation has been rejected. The baseline topology is not configured for cluster.");
+
+            ClusterSnapshotFuture snpFut0;
+
+            synchronized (snpOpMux) {
+                if (clusterSnpFut != null && !clusterSnpFut.isDone())
+                    throw new IgniteException("Create snapshot request has been rejected. The previous snapshot operation was not completed.");
+
+                if (clusterSnpRq != null)
+                    throw new IgniteException("Create snapshot request has been rejected. Parallel snapshot processes are not allowed.");
+
+                if (getSnapshots().contains(name))
+                    throw new IgniteException("Create snapshot request has been rejected. Snapshot with given name already exists.");
+
+                snpFut0 = new ClusterSnapshotFuture(UUID.randomUUID(), name);
+
+                clusterSnpFut = snpFut0;
+                lastSeenSnpFut = snpFut0;
+            }
+
+            List<Integer> grps = cctx.cache().persistentGroups().stream()
+                .filter(g -> cctx.cache().cacheType(g.cacheOrGroupName()) == CacheType.USER)
+                .filter(g -> !g.config().isEncryptionEnabled())
+                .map(CacheGroupDescriptor::groupId)
+                .collect(Collectors.toList());
+
+            List<ClusterNode> srvNodes = cctx.discovery().serverNodes(AffinityTopologyVersion.NONE);
+
+            startSnpProc.start(snpFut0.rqId, new SnapshotOperationRequest(snpFut0.rqId,
+                cctx.localNodeId(),
+                name,
+                grps,
+                new HashSet<>(F.viewReadOnly(srvNodes,
+                    F.node2id(),
+                    (node) -> CU.baselineNode(node, clusterState)))));
+
+            if (log.isInfoEnabled())
+                log.info("Cluster-wide snapshot operation started [snpName=" + name + ", grps=" + grps + ']');
+
+            return new IgniteFutureImpl<>(snpFut0);
+        }
+        catch (Exception e) {
+            U.error(log, "Start snapshot operation failed", e);
+
+            lastSeenSnpFut = new ClusterSnapshotFuture(name, e);
+
+            return new IgniteFinishedFutureImpl<>(e);
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onReadyForReadWrite(ReadWriteMetastorage metaStorage) throws IgniteCheckedException {
+        synchronized (snpOpMux) {
+            this.metaStorage = metaStorage;
+
+            if (recovered)
+                removeLastMetaStorageKey();
+
+            recovered = false;
+        }
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onReadyForRead(ReadOnlyMetastorage metaStorage) throws IgniteCheckedException {
+        // Snapshot which has not been completed due to the local node crashed must be deleted.
+        String snpName = (String)metaStorage.read(SNP_RUNNING_KEY);
+
+        if (snpName == null)
+            return;
+
+        recovered = true;
+
+        for (File tmp : snapshotTmpDir().listFiles())
+            U.delete(tmp);
+
+        deleteSnapshot(snapshotLocalDir(snpName), pdsSettings.folderName());
+
+        if (log.isInfoEnabled()) {
+            log.info("Previous attempt to create snapshot fail due to the local node crash. All resources " +
+                "related to snapshot operation have been deleted: " + snpName);
+        }
+    }
+
+    /**
+     * @param evt Discovery event to check.
+     * @return {@code true} if exchange started by snapshot operation.
+     */
+    public static boolean isSnapshotOperation(DiscoveryEvent evt) {
+        return !evt.eventNode().isClient() &&
+            evt.type() == EVT_DISCOVERY_CUSTOM_EVT &&
+            ((DiscoveryCustomEvent)evt).customMessage() instanceof SnapshotStartDiscoveryMessage;
+    }
+
+    /** {@inheritDoc} */
+    @Override public void onDoneBeforeTopologyUnlock(GridDhtPartitionsExchangeFuture fut) {
+        if (clusterSnpRq == null || cctx.kernalContext().clientNode())
+            return;
+
+        SnapshotOperationRequest snpRq = clusterSnpRq;
+
+        SnapshotFutureTask task = locSnpTasks.get(snpRq.snpName);
+
+        if (task == null)
+            return;
+
+        if (task.start()) {
+            cctx.database().forceCheckpoint(String.format("Start snapshot operation: %s", snpRq.snpName));
+
+            // schedule task on checkpoint and wait when it starts
+            try {
+                task.awaitStarted();
+            }
+            catch (IgniteCheckedException e) {
+                U.error(log, "Fail to wait while cluster-wide snapshot operation started", e);
+            }
+        }
+    }
+
+    /**
+     * @param parts Collection of pairs group and appropriate cache partition to be snapshot.
+     * @param rmtNodeId The remote node to connect to.
+     * @param partConsumer Received partition handler.
+     * @return Future which will be completed when requested snapshot fully received.
+     */
+    public IgniteInternalFuture<Void> createRemoteSnapshot(
 
 Review comment:
   Perhaps `requestRemoteSnapshot` is a better name. WDYT?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services