You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Stephen O'Donnell (Jira)" <ji...@apache.org> on 2022/02/25 11:47:00 UTC
[jira] [Commented] (HDDS-6260) EC: Validate container close workflow is OK with EC Containers

    [ https://issues.apache.org/jira/browse/HDDS-6260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17498083#comment-17498083 ] 

Stephen O'Donnell commented on HDDS-6260:
-----------------------------------------

The current container replica state logic mostly assumes RATIS containers. The QUASI_CLOSED state is defined as:

{quote}

The replica could not successfully transition to closed state because the Raft pipeline does not exist anymore.

{quote}

In the case of EC, we never have a raft pipeline, and therefore the containers should never be allowed into this state. However as I mentioned in the Jira description, this is happening. It appears to be due to the close container handling code in the DN. Here, it first marks the replicas as CLOSING, and then it does the following:


{code:java}
case CLOSING:
  // If the container is part of open pipeline, close it via write channel
  if (ozoneContainer.getWriteChannel()
      .isExist(closeCommand.getPipelineID())) {
    ContainerCommandRequestProto request =
        getContainerCommandRequestProto(datanodeDetails,
            closeCommand.getContainerID(),
            command.getEncodedToken());
    ozoneContainer.getWriteChannel()
        .submitRequest(request, closeCommand.getPipelineID());
  } else {
    controller.quasiCloseContainer(containerId);
    LOG.info("Marking Container {} quasi closed", containerId);
  }
  break; {code}
I think this means an EC replica will always end up as QUASI_CLOSED.

In normal container report handling, when it see a QUASI_CLOSED replica for a container, it will move the container to QUASI_CLOSED in SCM. The only way for it to move to closed, is when a DN reports a CLOSED replica. At this point replication Manager needs to get involved to move a replica to CLOSED.

It does this by checking if the container can be force closed, which means over 50% of the replicas are in QUASI_CLOSED state:


{code:java}
/**
 * Returns true if more than 50% of the container replicas with unique
 * originNodeId are in QUASI_CLOSED state.
 *
 * @param container Container to check
 * @param replicas Set of ContainerReplicas
 * @return true if we can force close the container, false otherwise
 */
private boolean canForceCloseContainer(final ContainerInfo container,
    final Set<ContainerReplica> replicas) {
  Preconditions.assertTrue(container.getState() ==
      LifeCycleState.QUASI_CLOSED);
  final int replicationFactor =
      container.getReplicationConfig().getRequiredNodes();
  final long uniqueQuasiClosedReplicaCount = replicas.stream()
      .filter(r -> r.getState() == State.QUASI_CLOSED)
      .map(ContainerReplica::getOriginDatanodeId)
      .distinct()
      .count();
  return uniqueQuasiClosedReplicaCount > (replicationFactor / 2);
} {code}
Then it goes ahead and closes the replicas with the highest block commit sequence id:


{code:java}
/**
 * Force close the container replica(s) with highest sequence Id.
 *
 * <p>
 *   Note: We should force close the container only if >50% (quorum)
 *   of replicas with unique originNodeId are in QUASI_CLOSED state.
 * </p>
 *
 * @param container ContainerInfo
 * @param replicas Set of ContainerReplicas
 */
private void forceCloseContainer(final ContainerInfo container,
                                 final Set<ContainerReplica> replicas) {
  Preconditions.assertTrue(container.getState() ==
      LifeCycleState.QUASI_CLOSED);

  final List<ContainerReplica> quasiClosedReplicas = replicas.stream()
      .filter(r -> r.getState() == State.QUASI_CLOSED)
      .collect(Collectors.toList());

  final Long sequenceId = quasiClosedReplicas.stream()
      .map(ContainerReplica::getSequenceId)
      .max(Long::compare)
      .orElse(-1L);

  LOG.info("Force closing container {} with BCSID {}," +
      " which is in QUASI_CLOSED state.",
      container.containerID(), sequenceId);

  quasiClosedReplicas.stream()
      .filter(r -> sequenceId != -1L)
      .filter(replica -> replica.getSequenceId().equals(sequenceId))
      .forEach(replica -> sendCloseCommand(
          container, replica.getDatanodeDetails(), true));
} {code}
Later, the replicas with lesser BCSID are removed by handling the unstable containers, but right now that code is excluded from processing EC containers, as we have not designed EC recovery yet.

As it stands, I think some replicas of an EC container will be stuck in Quasi_Closed forever if the BCSID does match - and they may not match as small keys will not update some of the data containers.

I also believe we should exclude EC containers from entering the QUASI_CLOSED state, as it is related to RATIS, which EC does not use.

The question then is, does EC require some other consensus to close down a container? Should we ensure the BSCID is maintained across all the replicas, even if we don't always need to write to a given replica (for small keys)? The we would know a replica is incomplete is the BSCID does not match across all replicas?

 

> EC: Validate container close workflow is OK with EC Containers
> --------------------------------------------------------------
>
>                 Key: HDDS-6260
>                 URL: https://issues.apache.org/jira/browse/HDDS-6260
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Stephen O'Donnell
>            Assignee: Stephen O'Donnell
>            Priority: Major
>
> After closing the container with "ozone admin container close 2" the container does not progress beyond quasi_closed until replication manager processes it, when it forces it closed if the sequence number for the container all have the same value. 
> {code}
> bash-4.2$ ozone admin container info 2
> Container id: 2
> Pipeline id: 8a62f588-0fff-497d-9589-aa14ef43c86b
> Container State: QUASI_CLOSED
> Datanodes: [aef6aa64-1251-4707-aaaa-141c9d03b6b1/ozone_datanode_2.ozone_default,
> 9b650b01-a361-471a-98a8-2aa4daff5fee/ozone_datanode_5.ozone_default,
> d4d57905-08c8-4f4f-8a5f-cf04b066b82c/ozone_datanode_3.ozone_default]
> Replicas: [State: QUASI_CLOSED; ReplicaIndex: 5; Origin: aef6aa64-1251-4707-aaaa-141c9d03b6b1; Location: aef6aa64-1251-4707-aaaa-141c9d03b6b1/ozone_datanode_2.ozone_default,
> State: QUASI_CLOSED; ReplicaIndex: 4; Origin: 9b650b01-a361-471a-98a8-2aa4daff5fee; Location: 9b650b01-a361-471a-98a8-2aa4daff5fee/ozone_datanode_5.ozone_default,
> State: QUASI_CLOSED; ReplicaIndex: 1; Origin: d4d57905-08c8-4f4f-8a5f-cf04b066b82c; Location: d4d57905-08c8-4f4f-8a5f-cf04b066b82c/ozone_datanode_3.ozone_default]
> {code}
> Need to check if the container close handing and quasi_close handling needs any changes for EC.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org