You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by GitBox <gi...@apache.org> on 2021/10/04 14:13:57 UTC

[GitHub] [hbase] Apache9 commented on a change in pull request #3716: HBASE-26323 introduce a SnapshotProcedure

Apache9 commented on a change in pull request #3716:
URL: https://github.com/apache/hbase/pull/3716#discussion_r721369546



##########
File path: hbase-client/src/main/java/org/apache/hadoop/hbase/client/RawAsyncHBaseAdmin.java
##########
@@ -1870,50 +1870,62 @@ public void onComplete() {
     } catch (IllegalArgumentException e) {
       return failedFuture(e);
     }
-    CompletableFuture<Void> future = new CompletableFuture<>();
-    final SnapshotRequest request = SnapshotRequest.newBuilder().setSnapshot(snapshot).build();
-    addListener(this.<Long> newMasterCaller()
-      .action((controller, stub) -> this.<SnapshotRequest, SnapshotResponse, Long> call(controller,
-        stub, request, (s, c, req, done) -> s.snapshot(c, req, done),
-        resp -> resp.getExpectedTimeout()))
-      .call(), (expectedTimeout, err) -> {
-        if (err != null) {
-          future.completeExceptionally(err);
-          return;
-        }
-        TimerTask pollingTask = new TimerTask() {
-          int tries = 0;
-          long startTime = EnvironmentEdgeManager.currentTime();
-          long endTime = startTime + expectedTimeout;
-          long maxPauseTime = expectedTimeout / maxAttempts;
 
-          @Override
-          public void run(Timeout timeout) throws Exception {
-            if (EnvironmentEdgeManager.currentTime() < endTime) {
-              addListener(isSnapshotFinished(snapshotDesc), (done, err2) -> {
-                if (err2 != null) {
-                  future.completeExceptionally(err2);
-                } else if (done) {
-                  future.complete(null);
-                } else {
-                  // retry again after pauseTime.
-                  long pauseTime =
-                    ConnectionUtils.getPauseTime(TimeUnit.NANOSECONDS.toMillis(pauseNs), ++tries);
-                  pauseTime = Math.min(pauseTime, maxPauseTime);
-                  AsyncConnectionImpl.RETRY_TIMER.newTimeout(this, pauseTime,
-                    TimeUnit.MILLISECONDS);
-                }
-              });
-            } else {
-              future.completeExceptionally(
-                new SnapshotCreationException("Snapshot '" + snapshot.getName() +
-                  "' wasn't completed in expectedTime:" + expectedTimeout + " ms", snapshotDesc));
-            }
+    boolean zkCoordinated = connection.getConfiguration()

Review comment:
       I do not think this is the correct way for client implementation. We should not rely on a client side configuraton to determine which method to call.
   By default, if we only introduce this in a major version, then it is OK to just call the new method, as we do not need to maintain compatibility with previous version(though it is a nice to have), if you want to implement this in a minor version, then it will be better to implement some fallback logic to support both methods. Maybe a possible way is to get the version of the server and determine which method to call?

##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/SnapshotProcedure.java
##########
@@ -0,0 +1,408 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hbase.master.procedure;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.concurrent.ThreadPoolExecutor;
+import java.util.stream.Collectors;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hbase.ServerName;
+import org.apache.hadoop.hbase.TableName;
+import org.apache.hadoop.hbase.client.RegionInfo;
+import org.apache.hadoop.hbase.client.RegionReplicaUtil;
+import org.apache.hadoop.hbase.client.TableDescriptor;
+import org.apache.hadoop.hbase.client.TableDescriptorBuilder;
+import org.apache.hadoop.hbase.client.TableState;
+import org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher;
+import org.apache.hadoop.hbase.master.MasterCoprocessorHost;
+import org.apache.hadoop.hbase.master.MetricsSnapshot;
+import org.apache.hadoop.hbase.master.snapshot.MasterSnapshotVerifier;
+import org.apache.hadoop.hbase.mob.MobUtils;
+import org.apache.hadoop.hbase.monitoring.MonitoredTask;
+import org.apache.hadoop.hbase.monitoring.TaskMonitor;
+import org.apache.hadoop.hbase.procedure2.Procedure;
+import org.apache.hadoop.hbase.procedure2.ProcedureStateSerializer;
+import org.apache.hadoop.hbase.procedure2.ProcedureSuspendedException;
+import org.apache.hadoop.hbase.procedure2.ProcedureYieldException;
+import org.apache.hadoop.hbase.snapshot.ClientSnapshotDescriptionUtils;
+import org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException;
+import org.apache.hadoop.hbase.snapshot.SnapshotDescriptionUtils;
+import org.apache.hadoop.hbase.snapshot.SnapshotManifest;
+import org.apache.hadoop.hbase.util.CommonFSUtils;
+import org.apache.hadoop.hbase.util.ModifyRegionUtils;
+import org.apache.yetus.audience.InterfaceAudience;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil;
+import org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProcedureProtos.SnapshotProcedureStateData;
+import org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProcedureProtos.SnapshotState;
+import org.apache.hadoop.hbase.shaded.protobuf.generated.SnapshotProtos.SnapshotDescription;
+
+/**
+ *  A procedure used to take snapshot on tables.
+ */
+@InterfaceAudience.Private
+public class SnapshotProcedure
+    extends AbstractStateMachineTableProcedure<SnapshotState> {
+  private static final Logger LOG = LoggerFactory.getLogger(SnapshotProcedure.class);
+  private final MetricsSnapshot metricsSnapshot = new MetricsSnapshot();
+
+  private Configuration conf;
+  private SnapshotDescription snapshot;
+  private Path rootDir;
+  private Path snapshotDir;
+  private Path workingDir;
+  private FileSystem workingDirFS;
+  private TableName snapshotTable;
+  private MonitoredTask status;
+  private SnapshotManifest snapshotManifest;
+  private TableDescriptor htd;
+  private ForeignExceptionDispatcher monitor;
+
+  public SnapshotProcedure() { }
+
+  public SnapshotProcedure(final MasterProcedureEnv env, final SnapshotDescription snapshot) {
+    super(env);
+    this.snapshot = snapshot;
+  }
+
+  @Override
+  public TableName getTableName() {
+    return TableName.valueOf(snapshot.getTable());
+  }
+
+  @Override
+  public TableOperationType getTableOperationType() {
+    return TableOperationType.SNAPSHOT;
+  }
+
+  @Override
+  protected LockState acquireLock(MasterProcedureEnv env) {
+    // AbstractStateMachineTableProcedure acquires exclusive table lock by default,
+    // but we may need to downgrade it to shared lock for some reasons:
+    // a. exclusive lock has a negative effect on assigning region. See HBASE-21480 for details.
+    // b. we want to support taking multiple different snapshots on same table on the same time.
+    if (env.getProcedureScheduler().waitTableSharedLock(this, getTableName())) {
+      return LockState.LOCK_EVENT_WAIT;
+    }
+    return LockState.LOCK_ACQUIRED;
+  }
+
+  @Override
+  protected void releaseLock(MasterProcedureEnv env) {
+    env.getProcedureScheduler().wakeTableSharedLock(this, getTableName());
+  }
+
+  @Override
+  protected boolean holdLock(MasterProcedureEnv env) {
+    // In order to avoid enabling/disabling/modifying/deleting table during snapshot,
+    // we don't release lock during suspend
+    return true;
+  }
+
+  @Override
+  protected Flow executeFromState(MasterProcedureEnv env, SnapshotState state)
+      throws ProcedureSuspendedException, ProcedureYieldException, InterruptedException {
+    LOG.info("{} execute state={}", this, state);
+
+    try {
+      switch (state) {
+        case SNAPSHOT_PREPARE:
+          prepareSnapshot(env);
+          setNextState(SnapshotState.SNAPSHOT_PRE_OPERATION);
+          return Flow.HAS_MORE_STATE;
+        case SNAPSHOT_PRE_OPERATION:
+          preSnapshot(env);
+          setNextState(SnapshotState.SNAPSHOT_WRITE_SNAPSHOT_INFO);
+          return Flow.HAS_MORE_STATE;
+        case SNAPSHOT_WRITE_SNAPSHOT_INFO:
+          SnapshotDescriptionUtils.writeSnapshotInfo(snapshot, workingDir, workingDirFS);
+          TableState tableState =
+            env.getMasterServices().getTableStateManager().getTableState(snapshotTable);
+          if (tableState.isEnabled()) {
+            setNextState(SnapshotState.SNAPSHOT_SNAPSHOT_ONLINE_REGIONS);
+          } else if (tableState.isDisabled()) {
+            setNextState(SnapshotState.SNAPSHOT_SNAPSHOT_OFFLINE_REGIONS);
+          }
+          return Flow.HAS_MORE_STATE;
+        case SNAPSHOT_SNAPSHOT_ONLINE_REGIONS:
+          addChildProcedure(createRemoteSnapshotProcedures(env));
+          setNextState(SnapshotState.SNAPSHOT_SNAPSHOT_OFFLINE_REGIONS);
+          return Flow.HAS_MORE_STATE;
+        case SNAPSHOT_SNAPSHOT_OFFLINE_REGIONS:
+          snapshotOfflineRegions(env);
+          if (MobUtils.hasMobColumns(htd)) {
+            setNextState(SnapshotState.SNAPSHOT_SNAPSHOT_MOB_REGION);
+          } else {
+            setNextState(SnapshotState.SNAPSHOT_CONSOLIDATE_SNAPSHOT);
+          }
+          return Flow.HAS_MORE_STATE;
+        case SNAPSHOT_SNAPSHOT_MOB_REGION:
+          snapshotMobRegion();
+          setNextState(SnapshotState.SNAPSHOT_CONSOLIDATE_SNAPSHOT);
+          return Flow.HAS_MORE_STATE;
+        case SNAPSHOT_CONSOLIDATE_SNAPSHOT:
+          // flush the in-memory state, and write the single manifest
+          status.setStatus("Consolidate snapshot: " + snapshot.getName());
+          snapshotManifest.consolidate();
+          setNextState(SnapshotState.SNAPSHOT_VERIFIER_SNAPSHOT);
+          return Flow.HAS_MORE_STATE;
+        case SNAPSHOT_VERIFIER_SNAPSHOT:
+          status.setStatus("Verifying snapshot: " + snapshot.getName());
+          verifySnapshot(env);
+          setNextState(SnapshotState.SNAPSHOT_COMPLETE_SNAPSHOT);
+          return Flow.HAS_MORE_STATE;
+        case SNAPSHOT_COMPLETE_SNAPSHOT:
+          completeSnapshot(env);
+          setNextState(SnapshotState.SNAPSHOT_POST_OPERATION);
+          return Flow.HAS_MORE_STATE;
+        case SNAPSHOT_POST_OPERATION:
+          postSnapshot(env);
+          return Flow.NO_MORE_STATE;
+        default:
+          throw new UnsupportedOperationException("unhandled state=" + state);
+      }
+    } catch (Exception e) {
+      if (e instanceof CorruptedSnapshotException) {
+        setNextState(SnapshotState.SNAPSHOT_PREPARE);
+        LOG.warn("{} looks got a corrupted snapshot, set next state to {} to retry", this,
+          SnapshotState.SNAPSHOT_PREPARE);
+        return Flow.HAS_MORE_STATE;
+      } else {
+        setFailure("master-snapshot", e);
+        LOG.warn("unexpected exception while execute {}", this, e);
+        return Flow.NO_MORE_STATE;
+      }
+    }
+  }
+
+  @Override
+  protected void rollbackState(MasterProcedureEnv env, SnapshotState state)
+    throws IOException, InterruptedException {
+    if (state == SnapshotState.SNAPSHOT_PRE_OPERATION) {
+      try {
+        if (!workingDirFS.delete(workingDir, true)) {
+          LOG.error("Couldn't delete snapshot working directory {}", workingDir);
+        }
+      } catch (IOException e) {
+        LOG.error("Couldn't delete snapshot working directory {}", workingDir, e);
+      }
+    }
+  }
+
+  @Override
+  protected boolean isRollbackSupported(SnapshotState state) {
+    return true;
+  }
+
+  @Override
+  protected SnapshotState getState(final int stateId) {
+    return SnapshotState.forNumber(stateId);
+  }
+
+  @Override
+  protected int getStateId(SnapshotState state) {
+    return state.getNumber();
+  }
+
+  @Override
+  protected SnapshotState getInitialState() {
+    return SnapshotState.SNAPSHOT_PREPARE;
+  }
+
+  private void prepareSnapshot(MasterProcedureEnv env) throws IOException {
+    this.conf = env.getMasterConfiguration();
+    this.snapshotTable = TableName.valueOf(snapshot.getTable());
+    this.htd = loadTableDescriptorSnapshot(env);
+    this.rootDir = CommonFSUtils.getRootDir(conf);
+    this.snapshotDir = SnapshotDescriptionUtils.getCompletedSnapshotDir(snapshot, rootDir);
+    this.workingDir = SnapshotDescriptionUtils.getWorkingSnapshotDir(snapshot, rootDir, conf);
+    this.workingDirFS = workingDir.getFileSystem(conf);
+    this.status = TaskMonitor.get()
+      .createStatus("Taking " + snapshot.getType() + " snapshot on table: " + snapshotTable);
+    this.monitor = new ForeignExceptionDispatcher(snapshot.getName());
+    this.snapshotManifest = SnapshotManifest.create(conf,
+      workingDirFS, workingDir, snapshot, monitor, status);
+    this.snapshotManifest.addTableDescriptor(htd);
+  }
+
+  private TableDescriptor loadTableDescriptorSnapshot(MasterProcedureEnv env) throws IOException {
+    TableDescriptor htd = env.getMasterServices().getTableDescriptors().get(snapshotTable);
+    if (htd == null) {
+      throw new IOException("TableDescriptor missing for " + snapshotTable);
+    }
+    if (htd.getMaxFileSize() == -1 && this.snapshot.getMaxFileSize() > 0) {
+      return TableDescriptorBuilder.newBuilder(htd).setValue(TableDescriptorBuilder.MAX_FILESIZE,
+        Long.toString(this.snapshot.getMaxFileSize())).build();
+    }
+    return htd;
+  }
+
+  private void preSnapshot(MasterProcedureEnv env) throws IOException {
+    env.getMasterServices().getSnapshotManager().prepareWorkingDirectory(snapshot);
+
+    MasterCoprocessorHost cpHost = env.getMasterCoprocessorHost();
+    if (cpHost != null) {
+      cpHost.preSnapshot(ProtobufUtil.createSnapshotDesc(snapshot), htd);
+    }
+  }
+
+  private void postSnapshot(MasterProcedureEnv env) throws IOException {
+    MasterCoprocessorHost cpHost = env.getMasterCoprocessorHost();
+    if (cpHost != null) {
+      cpHost.postSnapshot(ProtobufUtil.createSnapshotDesc(snapshot), htd);
+    }
+  }
+
+  private void verifySnapshot(MasterProcedureEnv env) throws IOException {
+    int verifyThreshold = env.getMasterConfiguration()
+      .getInt("hbase.snapshot.remote.verify.threshold", 10000);
+    int numRegions = (int) env.getAssignmentManager()
+      .getTableRegions(snapshotTable, false)
+      .stream().filter(r -> RegionReplicaUtil.isDefaultReplica(r)).count();
+    MasterSnapshotVerifier verifier =
+      new MasterSnapshotVerifier(env.getMasterServices(), snapshot, workingDirFS);
+
+    if (numRegions >= verifyThreshold) {
+      verifier.verifySnapshot(false);

Review comment:
       We do not need to create the verifier for this case?

##########
File path: hbase-protocol-shaded/src/main/protobuf/server/master/Master.proto
##########
@@ -437,10 +437,14 @@ message IsCleanerChoreEnabledResponse {
 
 message SnapshotRequest {
   required SnapshotDescription snapshot = 1;
+  optional bool zkCoordinated = 2 [default = true];

Review comment:
       We should not allow client to specify the implementation?

##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/SnapshotVerifyProcedure.java
##########
@@ -0,0 +1,196 @@
+/*
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+  http://www.apache.org/licenses/LICENSE-2.0
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+ */
+
+package org.apache.hadoop.hbase.master.procedure;
+
+import org.apache.hadoop.hbase.DoNotRetryIOException;
+import org.apache.hadoop.hbase.ServerName;
+import org.apache.hadoop.hbase.client.RegionInfo;
+import org.apache.hadoop.hbase.procedure2.*;

Review comment:
       nits: avoid star imports.

##########
File path: hbase-protocol-shaded/src/main/protobuf/server/master/Master.proto
##########
@@ -437,10 +437,14 @@ message IsCleanerChoreEnabledResponse {
 
 message SnapshotRequest {
   required SnapshotDescription snapshot = 1;
+  optional bool zkCoordinated = 2 [default = true];
+  optional uint64 nonce_group = 3 [default = 0];
+  optional uint64 nonce = 4 [default = 0];
 }
 
 message SnapshotResponse {
   required int64 expected_timeout = 1;
+  optional int64 proc_id = 2;

Review comment:
       OK, so you do not introduce a new method. Then I think the implementation should be easier, just test whether the proc_id is set? If not, it should be the old implementation, otherwise it should be the new implementation.

##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/SnapshotProcedure.java
##########
@@ -0,0 +1,408 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hbase.master.procedure;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.concurrent.ThreadPoolExecutor;
+import java.util.stream.Collectors;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hbase.ServerName;
+import org.apache.hadoop.hbase.TableName;
+import org.apache.hadoop.hbase.client.RegionInfo;
+import org.apache.hadoop.hbase.client.RegionReplicaUtil;
+import org.apache.hadoop.hbase.client.TableDescriptor;
+import org.apache.hadoop.hbase.client.TableDescriptorBuilder;
+import org.apache.hadoop.hbase.client.TableState;
+import org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher;
+import org.apache.hadoop.hbase.master.MasterCoprocessorHost;
+import org.apache.hadoop.hbase.master.MetricsSnapshot;
+import org.apache.hadoop.hbase.master.snapshot.MasterSnapshotVerifier;
+import org.apache.hadoop.hbase.mob.MobUtils;
+import org.apache.hadoop.hbase.monitoring.MonitoredTask;
+import org.apache.hadoop.hbase.monitoring.TaskMonitor;
+import org.apache.hadoop.hbase.procedure2.Procedure;
+import org.apache.hadoop.hbase.procedure2.ProcedureStateSerializer;
+import org.apache.hadoop.hbase.procedure2.ProcedureSuspendedException;
+import org.apache.hadoop.hbase.procedure2.ProcedureYieldException;
+import org.apache.hadoop.hbase.snapshot.ClientSnapshotDescriptionUtils;
+import org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException;
+import org.apache.hadoop.hbase.snapshot.SnapshotDescriptionUtils;
+import org.apache.hadoop.hbase.snapshot.SnapshotManifest;
+import org.apache.hadoop.hbase.util.CommonFSUtils;
+import org.apache.hadoop.hbase.util.ModifyRegionUtils;
+import org.apache.yetus.audience.InterfaceAudience;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil;
+import org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProcedureProtos.SnapshotProcedureStateData;
+import org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProcedureProtos.SnapshotState;
+import org.apache.hadoop.hbase.shaded.protobuf.generated.SnapshotProtos.SnapshotDescription;
+
+/**
+ *  A procedure used to take snapshot on tables.
+ */
+@InterfaceAudience.Private
+public class SnapshotProcedure
+    extends AbstractStateMachineTableProcedure<SnapshotState> {
+  private static final Logger LOG = LoggerFactory.getLogger(SnapshotProcedure.class);
+  private final MetricsSnapshot metricsSnapshot = new MetricsSnapshot();
+
+  private Configuration conf;
+  private SnapshotDescription snapshot;
+  private Path rootDir;
+  private Path snapshotDir;
+  private Path workingDir;
+  private FileSystem workingDirFS;
+  private TableName snapshotTable;
+  private MonitoredTask status;
+  private SnapshotManifest snapshotManifest;
+  private TableDescriptor htd;
+  private ForeignExceptionDispatcher monitor;
+
+  public SnapshotProcedure() { }
+
+  public SnapshotProcedure(final MasterProcedureEnv env, final SnapshotDescription snapshot) {
+    super(env);
+    this.snapshot = snapshot;
+  }
+
+  @Override
+  public TableName getTableName() {
+    return TableName.valueOf(snapshot.getTable());
+  }
+
+  @Override
+  public TableOperationType getTableOperationType() {
+    return TableOperationType.SNAPSHOT;
+  }
+
+  @Override
+  protected LockState acquireLock(MasterProcedureEnv env) {
+    // AbstractStateMachineTableProcedure acquires exclusive table lock by default,
+    // but we may need to downgrade it to shared lock for some reasons:
+    // a. exclusive lock has a negative effect on assigning region. See HBASE-21480 for details.
+    // b. we want to support taking multiple different snapshots on same table on the same time.
+    if (env.getProcedureScheduler().waitTableSharedLock(this, getTableName())) {
+      return LockState.LOCK_EVENT_WAIT;
+    }
+    return LockState.LOCK_ACQUIRED;
+  }
+
+  @Override
+  protected void releaseLock(MasterProcedureEnv env) {
+    env.getProcedureScheduler().wakeTableSharedLock(this, getTableName());
+  }
+
+  @Override
+  protected boolean holdLock(MasterProcedureEnv env) {
+    // In order to avoid enabling/disabling/modifying/deleting table during snapshot,
+    // we don't release lock during suspend
+    return true;
+  }
+
+  @Override
+  protected Flow executeFromState(MasterProcedureEnv env, SnapshotState state)
+      throws ProcedureSuspendedException, ProcedureYieldException, InterruptedException {
+    LOG.info("{} execute state={}", this, state);
+
+    try {
+      switch (state) {
+        case SNAPSHOT_PREPARE:
+          prepareSnapshot(env);
+          setNextState(SnapshotState.SNAPSHOT_PRE_OPERATION);
+          return Flow.HAS_MORE_STATE;
+        case SNAPSHOT_PRE_OPERATION:
+          preSnapshot(env);
+          setNextState(SnapshotState.SNAPSHOT_WRITE_SNAPSHOT_INFO);
+          return Flow.HAS_MORE_STATE;
+        case SNAPSHOT_WRITE_SNAPSHOT_INFO:
+          SnapshotDescriptionUtils.writeSnapshotInfo(snapshot, workingDir, workingDirFS);
+          TableState tableState =
+            env.getMasterServices().getTableStateManager().getTableState(snapshotTable);
+          if (tableState.isEnabled()) {
+            setNextState(SnapshotState.SNAPSHOT_SNAPSHOT_ONLINE_REGIONS);
+          } else if (tableState.isDisabled()) {
+            setNextState(SnapshotState.SNAPSHOT_SNAPSHOT_OFFLINE_REGIONS);
+          }
+          return Flow.HAS_MORE_STATE;
+        case SNAPSHOT_SNAPSHOT_ONLINE_REGIONS:
+          addChildProcedure(createRemoteSnapshotProcedures(env));
+          setNextState(SnapshotState.SNAPSHOT_SNAPSHOT_OFFLINE_REGIONS);
+          return Flow.HAS_MORE_STATE;
+        case SNAPSHOT_SNAPSHOT_OFFLINE_REGIONS:
+          snapshotOfflineRegions(env);
+          if (MobUtils.hasMobColumns(htd)) {
+            setNextState(SnapshotState.SNAPSHOT_SNAPSHOT_MOB_REGION);
+          } else {
+            setNextState(SnapshotState.SNAPSHOT_CONSOLIDATE_SNAPSHOT);
+          }
+          return Flow.HAS_MORE_STATE;
+        case SNAPSHOT_SNAPSHOT_MOB_REGION:
+          snapshotMobRegion();
+          setNextState(SnapshotState.SNAPSHOT_CONSOLIDATE_SNAPSHOT);
+          return Flow.HAS_MORE_STATE;
+        case SNAPSHOT_CONSOLIDATE_SNAPSHOT:
+          // flush the in-memory state, and write the single manifest
+          status.setStatus("Consolidate snapshot: " + snapshot.getName());
+          snapshotManifest.consolidate();
+          setNextState(SnapshotState.SNAPSHOT_VERIFIER_SNAPSHOT);
+          return Flow.HAS_MORE_STATE;
+        case SNAPSHOT_VERIFIER_SNAPSHOT:
+          status.setStatus("Verifying snapshot: " + snapshot.getName());
+          verifySnapshot(env);
+          setNextState(SnapshotState.SNAPSHOT_COMPLETE_SNAPSHOT);
+          return Flow.HAS_MORE_STATE;
+        case SNAPSHOT_COMPLETE_SNAPSHOT:
+          completeSnapshot(env);
+          setNextState(SnapshotState.SNAPSHOT_POST_OPERATION);
+          return Flow.HAS_MORE_STATE;
+        case SNAPSHOT_POST_OPERATION:
+          postSnapshot(env);
+          return Flow.NO_MORE_STATE;
+        default:
+          throw new UnsupportedOperationException("unhandled state=" + state);
+      }
+    } catch (Exception e) {
+      if (e instanceof CorruptedSnapshotException) {

Review comment:
       Better add this catch to the specific state? We will jump to the beginning of the procedure, I do not think we can do this in later states?

##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/SnapshotRegionProcedure.java
##########
@@ -0,0 +1,258 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hbase.master.procedure;
+
+import com.google.errorprone.annotations.RestrictedApi;
+import org.apache.hadoop.hbase.ServerName;
+import org.apache.hadoop.hbase.TableName;
+import org.apache.hadoop.hbase.client.RegionInfo;
+import org.apache.hadoop.hbase.master.RegionState;
+import org.apache.hadoop.hbase.master.assignment.RegionStateNode;
+import org.apache.hadoop.hbase.master.assignment.RegionStates;
+import org.apache.hadoop.hbase.master.assignment.ServerState;
+import org.apache.hadoop.hbase.master.procedure.RSProcedureDispatcher.RegionSnapshotOperation;
+import org.apache.hadoop.hbase.procedure2.*;

Review comment:
       nits: avoid star imports.

##########
File path: hbase-protocol-shaded/src/main/protobuf/server/region/Admin.proto
##########
@@ -277,6 +284,7 @@ message ExecuteProceduresRequest {
   repeated OpenRegionRequest open_region = 1;
   repeated CloseRegionRequest close_region = 2;
   repeated RemoteProcedureRequest proc = 3;
+  repeated SnapshotRegionRequest snapshot_region = 4;

Review comment:
       It can not be implemented as a RemoteProcedure?

##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/SnapshotProcedure.java
##########
@@ -0,0 +1,408 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hbase.master.procedure;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.concurrent.ThreadPoolExecutor;
+import java.util.stream.Collectors;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hbase.ServerName;
+import org.apache.hadoop.hbase.TableName;
+import org.apache.hadoop.hbase.client.RegionInfo;
+import org.apache.hadoop.hbase.client.RegionReplicaUtil;
+import org.apache.hadoop.hbase.client.TableDescriptor;
+import org.apache.hadoop.hbase.client.TableDescriptorBuilder;
+import org.apache.hadoop.hbase.client.TableState;
+import org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher;
+import org.apache.hadoop.hbase.master.MasterCoprocessorHost;
+import org.apache.hadoop.hbase.master.MetricsSnapshot;
+import org.apache.hadoop.hbase.master.snapshot.MasterSnapshotVerifier;
+import org.apache.hadoop.hbase.mob.MobUtils;
+import org.apache.hadoop.hbase.monitoring.MonitoredTask;
+import org.apache.hadoop.hbase.monitoring.TaskMonitor;
+import org.apache.hadoop.hbase.procedure2.Procedure;
+import org.apache.hadoop.hbase.procedure2.ProcedureStateSerializer;
+import org.apache.hadoop.hbase.procedure2.ProcedureSuspendedException;
+import org.apache.hadoop.hbase.procedure2.ProcedureYieldException;
+import org.apache.hadoop.hbase.snapshot.ClientSnapshotDescriptionUtils;
+import org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException;
+import org.apache.hadoop.hbase.snapshot.SnapshotDescriptionUtils;
+import org.apache.hadoop.hbase.snapshot.SnapshotManifest;
+import org.apache.hadoop.hbase.util.CommonFSUtils;
+import org.apache.hadoop.hbase.util.ModifyRegionUtils;
+import org.apache.yetus.audience.InterfaceAudience;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil;
+import org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProcedureProtos.SnapshotProcedureStateData;
+import org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProcedureProtos.SnapshotState;
+import org.apache.hadoop.hbase.shaded.protobuf.generated.SnapshotProtos.SnapshotDescription;
+
+/**
+ *  A procedure used to take snapshot on tables.
+ */
+@InterfaceAudience.Private
+public class SnapshotProcedure
+    extends AbstractStateMachineTableProcedure<SnapshotState> {
+  private static final Logger LOG = LoggerFactory.getLogger(SnapshotProcedure.class);
+  private final MetricsSnapshot metricsSnapshot = new MetricsSnapshot();
+
+  private Configuration conf;
+  private SnapshotDescription snapshot;
+  private Path rootDir;
+  private Path snapshotDir;
+  private Path workingDir;
+  private FileSystem workingDirFS;
+  private TableName snapshotTable;
+  private MonitoredTask status;
+  private SnapshotManifest snapshotManifest;
+  private TableDescriptor htd;
+  private ForeignExceptionDispatcher monitor;
+
+  public SnapshotProcedure() { }
+
+  public SnapshotProcedure(final MasterProcedureEnv env, final SnapshotDescription snapshot) {
+    super(env);
+    this.snapshot = snapshot;
+  }
+
+  @Override
+  public TableName getTableName() {
+    return TableName.valueOf(snapshot.getTable());
+  }
+
+  @Override
+  public TableOperationType getTableOperationType() {
+    return TableOperationType.SNAPSHOT;
+  }
+
+  @Override
+  protected LockState acquireLock(MasterProcedureEnv env) {
+    // AbstractStateMachineTableProcedure acquires exclusive table lock by default,
+    // but we may need to downgrade it to shared lock for some reasons:
+    // a. exclusive lock has a negative effect on assigning region. See HBASE-21480 for details.
+    // b. we want to support taking multiple different snapshots on same table on the same time.
+    if (env.getProcedureScheduler().waitTableSharedLock(this, getTableName())) {

Review comment:
       Is this enough? The SplitTableRegionProcedure and MergeTableRegionsProcedure also take shared lock on table, so taking shared lock can not prevent split/merge at the same time.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org