You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Mark Gui (Jira)" <ji...@apache.org> on 2022/06/15 11:13:00 UTC
[jira] [Comment Edited] (HDDS-6890) EC: Potential read digest mismatch with over-replicated container.

    [ https://issues.apache.org/jira/browse/HDDS-6890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17554505#comment-17554505 ] 

Mark Gui edited comment on HDDS-6890 at 6/15/22 11:12 AM:
----------------------------------------------------------

[~sodonnell] Because In BlockInputStream, we'll get a *Standalone pipeline* built in `ECBlockInputStream#

getOrOpenStream()`, here we are actually refreshing the standalone one instead of the EC one.
{code:java}
// ECBlockInputStream.java
protected BlockExtendedInputStream E(int locationIndex) {
  BlockExtendedInputStream stream = blockStreams[locationIndex];
  if (stream == null) {
    // To read an EC block, we create a STANDALONE pipeline that contains the
    // single location for the block index we want to read. The EC blocks are
    // indexed from 1 to N, however the data locations are stored in the
    // dataLocations array indexed from zero.
    Pipeline pipeline = Pipeline.newBuilder()
        .setReplicationConfig(StandaloneReplicationConfig.getInstance(
            HddsProtos.ReplicationFactor.ONE))                         // <- This is the pipeline we are refreshing here
        .setNodes(Arrays.asList(dataLocations[locationIndex]))
        .setId(PipelineID.randomId())
        .setState(Pipeline.PipelineState.CLOSED)
        .build();

    BlockLocationInfo blkInfo = new BlockLocationInfo.Builder()
        .setBlockID(blockInfo.getBlockID())
        .setLength(internalBlockLength(locationIndex + 1))
        .setPipeline(blockInfo.getPipeline())
        .setToken(blockInfo.getToken())
        .setPartNumber(blockInfo.getPartNumber())
        .build();
    stream = streamFactory.create(
        StandaloneReplicationConfig.getInstance(
            HddsProtos.ReplicationFactor.ONE),
        blkInfo, pipeline,
        blockInfo.getToken(), verifyChecksum, xceiverClientFactory,
        refreshFunction);
    blockStreams[locationIndex] = stream;
  }
  return stream;
} {code}
Oh, of course I think you are right, we should go to the ECPipelineProvider here based on the replicationConfig of the container.


was (Author: markgui):
[~sodonnell] Because In BlockInputStream, we'll get a *Standalone pipeline* built in `ECBlockInputStream#

getOrOpenStream()`, here we are actually refreshing the standalone one instead of the EC one.
{code:java}
// ECBlockInputStream.java
protected BlockExtendedInputStream E(int locationIndex) {
  BlockExtendedInputStream stream = blockStreams[locationIndex];
  if (stream == null) {
    // To read an EC block, we create a STANDALONE pipeline that contains the
    // single location for the block index we want to read. The EC blocks are
    // indexed from 1 to N, however the data locations are stored in the
    // dataLocations array indexed from zero.
    Pipeline pipeline = Pipeline.newBuilder()
        .setReplicationConfig(StandaloneReplicationConfig.getInstance(
            HddsProtos.ReplicationFactor.ONE))                         // <- This is the pipeline we are refreshing here
        .setNodes(Arrays.asList(dataLocations[locationIndex]))
        .setId(PipelineID.randomId())
        .setState(Pipeline.PipelineState.CLOSED)
        .build();

    BlockLocationInfo blkInfo = new BlockLocationInfo.Builder()
        .setBlockID(blockInfo.getBlockID())
        .setLength(internalBlockLength(locationIndex + 1))
        .setPipeline(blockInfo.getPipeline())
        .setToken(blockInfo.getToken())
        .setPartNumber(blockInfo.getPartNumber())
        .build();
    stream = streamFactory.create(
        StandaloneReplicationConfig.getInstance(
            HddsProtos.ReplicationFactor.ONE),
        blkInfo, pipeline,
        blockInfo.getToken(), verifyChecksum, xceiverClientFactory,
        refreshFunction);
    blockStreams[locationIndex] = stream;
  }
  return stream;
} {code}
 

 

> EC: Potential read digest mismatch with over-replicated container.
> ------------------------------------------------------------------
>
>                 Key: HDDS-6890
>                 URL: https://issues.apache.org/jira/browse/HDDS-6890
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Mark Gui
>            Priority: Major
>
> We've encountered a potential issue on the read path when we have over-replicated EC container replicas after Offline Recovery.
> Here is the scenario:
>  # 20 DNs, EC 10+4
>  # A background script keep writing and reading keys.(freon ockg & ockv)
>  # A background script that randomly kills DNs (< 4 for each round), wait for all containers to recover to the expected number of replicas(14). Then go for next round.
>  # After there are 8 killed DNs, restart all dead DNs before.
>  # Then we have potentially over-replicated containers.
>  # At that time, there are concurrent ockv and container delete commands in the cluster.
>  # And the ockv reports a digest mismatch during the short period of several minutes.
>  # But after all the deletions are done and all containers return to an expected number of replicas(14), the ockv command returns successfully.
> Outputs for the ockv command below:
> {code:java}
> 2022-06-10 13:21:27,923 [main] INFO storage.BlockInputStream: Unable to read information for block conID: 12 locID: 109611004723202307 bcsId: 0 from pipeline PipelineID=35db70e6-f362-4b3a-ab98-a89509547455: ContainerID 12 does not exist
> 2022-06-10 13:21:28,613 [Thread-4] INFO freon.ProgressBar: Progress: 0.00 % (0 out of 10)
> 2022-06-10 13:21:29,614 [Thread-4] INFO freon.ProgressBar: Progress: 0.00 % (0 out of 10)
> 2022-06-10 13:21:30,614 [Thread-4] INFO freon.ProgressBar: Progress: 0.00 % (0 out of 10)
> 2022-06-10 13:21:31,922 [Thread-4] INFO freon.ProgressBar: Progress: 0.00 % (0 out of 10)
> 2022-06-10 13:21:32,922 [Thread-4] INFO freon.ProgressBar: Progress: 0.00 % (0 out of 10)
> 2022-06-10 13:21:33,922 [Thread-4] INFO freon.ProgressBar: Progress: 0.00 % (0 out of 10)
> 2022-06-10 13:21:33,978 [pool-2-thread-1] ERROR scm.XceiverClientGrpc: Failed to execute command cmdType: GetBlock
> traceID: ""
> containerID: 12
> datanodeUuid: "5b542d9f-5e91-4cc2-af48-af3c33b8c472"
> getBlock {
>   blockID {
>     containerID: 12
>     localID: 109611004723202307
>     blockCommitSequenceId: 0
>   }
> }
>  on the pipeline Pipeline[ Id: 8151bbc7-672c-42d6-9c40-c80ce69d4b3f, Nodes: 5b542d9f-5e91-4cc2-af48-af3c33b8c472{ip: 9.27.142.97, host: 9.27.142.97, ports: [REPLICATION=3096, RATIS=3092, RATIS_ADMIN=3095, RATIS_SERVER=3094, STANDALONE=3093], networkLocation: /default-rack, certSerialId: null, persistedOpState: IN_SERVICE, persistedOpStateExpiryEpochSec: 0}, ReplicationConfig: STANDALONE/ONE, State:CLOSED, leaderId:, CreationTimestamp2022-06-10T13:21:33.921+08:00[Asia/Shanghai]].
> 2022-06-10 13:21:34,014 [pool-2-thread-1] INFO storage.BlockInputStream: Unable to read information for block conID: 12 locID: 109611004723202307 bcsId: 0 from pipeline PipelineID=8151bbc7-672c-42d6-9c40-c80ce69d4b3f: ContainerID 12 does not exist
> 2022-06-10 13:21:34,142 [pool-2-thread-5] ERROR scm.XceiverClientGrpc: Failed to execute command cmdType: GetBlock
> traceID: ""
> containerID: 12
> datanodeUuid: "5b542d9f-5e91-4cc2-af48-af3c33b8c472"
> getBlock {
>   blockID {
>     containerID: 12
>     localID: 109611004723202316
>     blockCommitSequenceId: 0
>   }
> }
>  on the pipeline Pipeline[ Id: f421fc76-eebb-4474-8c77-c57944144df6, Nodes: 5b542d9f-5e91-4cc2-af48-af3c33b8c472{ip: 9.27.142.97, host: 9.27.142.97, ports: [REPLICATION=3096, RATIS=3092, RATIS_ADMIN=3095, RATIS_SERVER=3094, STANDALONE=3093], networkLocation: /default-rack, certSerialId: null, persistedOpState: IN_SERVICE, persistedOpStateExpiryEpochSec: 0}, ReplicationConfig: STANDALONE/ONE, State:CLOSED, leaderId:, CreationTimestamp2022-06-10T13:21:34.074+08:00[Asia/Shanghai]].
> 2022-06-10 13:21:34,143 [pool-2-thread-5] INFO storage.BlockInputStream: Unable to read information for block conID: 12 locID: 109611004723202316 bcsId: 0 from pipeline PipelineID=f421fc76-eebb-4474-8c77-c57944144df6: ContainerID 12 does not exist
> 2022-06-10 13:21:34,315 [pool-2-thread-7] ERROR scm.XceiverClientGrpc: Failed to execute command cmdType: GetBlock
> traceID: ""
> containerID: 12
> datanodeUuid: "5b542d9f-5e91-4cc2-af48-af3c33b8c472"
> getBlock {
>   blockID {
>     containerID: 12
>     localID: 109611004723202310
>     blockCommitSequenceId: 0
>   }
> }
>  on the pipeline Pipeline[ Id: 21b59767-0ea9-4214-9f56-61fd31d795f2, Nodes: 5b542d9f-5e91-4cc2-af48-af3c33b8c472{ip: 9.27.142.97, host: 9.27.142.97, ports: [REPLICATION=3096, RATIS=3092, RATIS_ADMIN=3095, RATIS_SERVER=3094, STANDALONE=3093], networkLocation: /default-rack, certSerialId: null, persistedOpState: IN_SERVICE, persistedOpStateExpiryEpochSec: 0}, ReplicationConfig: STANDALONE/ONE, State:CLOSED, leaderId:, CreationTimestamp2022-06-10T13:21:34.215+08:00[Asia/Shanghai]].
> 2022-06-10 13:21:34,315 [pool-2-thread-7] INFO storage.BlockInputStream: Unable to read information for block conID: 12 locID: 109611004723202310 bcsId: 0 from pipeline PipelineID=21b59767-0ea9-4214-9f56-61fd31d795f2: ContainerID 12 does not exist
> 2022-06-10 13:21:35,402 [Thread-4] INFO freon.ProgressBar: Progress: 0.00 % (0 out of 10)
> 2022-06-10 13:21:36,418 [Thread-4] INFO freon.ProgressBar: Progress: 0.00 % (0 out of 10)
> 2022-06-10 13:21:37,419 [Thread-4] INFO freon.ProgressBar: Progress: 0.00 % (0 out of 10)
> 2022-06-10 13:21:38,001 [pool-2-thread-5] ERROR freon.BaseFreonGenerator: Error on executing task 4
> java.lang.IllegalStateException: Reference (=first) message digest doesn't match with digest of stripe_align_1654804614/4
>         at org.apache.hadoop.ozone.freon.OzoneClientKeyValidator.validateDigest(OzoneClientKeyValidator.java:169)
>         at org.apache.hadoop.ozone.freon.OzoneClientKeyValidator.validateKey(OzoneClientKeyValidator.java:133)
>         at org.apache.hadoop.ozone.freon.BaseFreonGenerator.tryNextTask(BaseFreonGenerator.java:184)
>         at org.apache.hadoop.ozone.freon.BaseFreonGenerator.taskLoop(BaseFreonGenerator.java:164)
>         at org.apache.hadoop.ozone.freon.BaseFreonGenerator.lambda$startTaskRunners$1(BaseFreonGenerator.java:147)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> 2022-06-10 13:21:38,419 [Thread-4] INFO freon.ProgressBar: Progress: 10.00 % (1 out of 10)
> 2022-06-10 13:21:39,419 [Thread-4] INFO freon.ProgressBar: Progress: 100.00 % (10 out of 10)
> One ore more freon test is failed.
> 2022-06-10 13:21:39,823 [shutdown-hook-0] INFO metrics: type=TIMER, name=key-validate, count=10, min=5918.240321, max=7710.826968, mean=6891.752193143025, stddev=481.39096683740087, median=6798.957973, p75=7302.544457, p95=7710.826968, p98=7710.826968, p99=7710.826968, p999=7710.826968, mean_rate=1.1586365637687632, m1=0.0, m5=0.0, m15=0.0, rate_unit=events/second, duration_unit=milliseconds
> 2022-06-10 13:21:39,824 [shutdown-hook-0] INFO freon.BaseFreonGenerator: Total execution time (sec): 22
> 2022-06-10 13:21:39,824 [shutdown-hook-0] INFO freon.BaseFreonGenerator: Failures: 1
> 2022-06-10 13:21:39,824 [shutdown-hook-0] INFO freon.BaseFreonGenerator: Successful executions: 9
>  [check_object] size=52428800 prefix=stripe_align_1654804614 fail!!!! {code}
> I've got some analysis for this problem below:
> The key clue of this problem is the following log:
> {code:java}
> 2022-06-10 13:21:27,923 [main] INFO storage.BlockInputStream: Unable to read information for block conID: 12 locID: 109611004723202307 bcsId: 0 from pipeline PipelineID=35db70e6-f362-4b3a-ab98-a89509547455: ContainerID 12 does not exist {code}
> This should be caused by a container delete command received before the ReadChunk request reached the DN.
> And `BlockInputStream` will try to refresh the pipeline and retry read, see:
> {code:java}
> // BlockInputStream.java
> private void refreshPipeline(IOException cause) throws IOException {
>   LOG.info("Unable to read information for block {} from pipeline {}: {}",
>       blockID, pipeline.getId(), cause.getMessage());
>   if (refreshPipelineFunction != null) {
>     LOG.debug("Re-fetching pipeline for block {}", blockID);
>     Pipeline newPipeline = refreshPipelineFunction.apply(blockID);
>     if (newPipeline == null || newPipeline.sameDatanodes(pipeline)) {
>       LOG.warn("No new pipeline for block {}", blockID);
>       throw cause;
>     } else {
>       LOG.debug("New pipeline got for block {}", blockID);
>       this.pipeline = newPipeline;
>     }
>   } else {
>     throw cause;
>   }
> } {code}
> But actually here the pipeline we try to refresh from SCM is a {*}Standalone pipeline{*}, so after SCM got this request via the SCM interface `getContainerWithPipelineBatch`, it will return a strange Standalone pipeline with all the DNs for this container in the class `SimplePipelineProvider`, see:
> {code:java}
> // SimplePipelineProvider.java
> @Override
> public Pipeline createForRead(StandaloneReplicationConfig replicationConfig,
>     Set<ContainerReplica> replicas) {
>   return create(replicationConfig, replicas
>       .stream()
>       .map(ContainerReplica::getDatanodeDetails)
>       .collect(Collectors.toList()));
> } {code}
> So for the retry read, it will pick the first node to read regardless of the replicaIndex, so we will get wrong chunks from the first node to read and thus the mismatch problem above, see:
> {code:java}
> // XceiverClientGrpc.java
> public void connect() throws Exception {
>   // connect to the closest node, if closest node doesn't exist, delegate to
>   // first node, which is usually the leader in the pipeline.
>   DatanodeDetails dn = topologyAwareRead ? this.pipeline.getClosestNode() :
>       this.pipeline.getFirstNode();
>   // just make a connection to the picked datanode at the beginning
>   connectToDatanode(dn);
> }{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org