You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "George Huang (Jira)" <ji...@apache.org> on 2021/05/11 15:30:00 UTC

[jira] [Updated] (HDDS-5216) ozone freon randomkeys failed after leader SCM node is down

     [ https://issues.apache.org/jira/browse/HDDS-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

George Huang updated HDDS-5216:
-------------------------------
    Attachment: hdds-5216.zip

> ozone freon randomkeys failed after leader SCM node is down
> -----------------------------------------------------------
>
>                 Key: HDDS-5216
>                 URL: https://issues.apache.org/jira/browse/HDDS-5216
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: SCM HA
>    Affects Versions: 1.2.0
>            Reporter: George Huang
>            Priority: Major
>         Attachments: hdds-5216.zip
>
>
> under .../compose/ozone-ha, create a HA cluster:
> docker-compose up -d --scale datanode=3
> Initial SCM roles as following:
> bash-4.2$ ozone admin scm roles
> [scm1:9865:FOLLOWER, scm2:9865:FOLLOWER, scm3:9865:LEADER]
> Running freon random key generator as following:
> ozone freon randomkeys --numOfVolumes=10 --numOfBuckets 50 --numOfKeys 50  --replicationType=RATIS --factor=THREE
> While freon randomkeys was running, put all SCM nodes under blockade and stop leader SCM node:
> blockade status:
> NODE            CONTAINER ID    STATUS  IP              NETWORK    PARTITION  
>                 18f9c1e2d52f    UP      172.31.0.9      NORMAL                
> ozone-ha_scm1_1                                                                
>                 25c74f0a9271    UP      172.31.0.6      NORMAL                
> ozone-ha_scm2_1                                                                
>                 8808d10ccb3a    DOWN                    UNKNOWN
> freon randomkeys failed with following error message:
> Some test result msg as following:
> 6:00:30,131 [pool-2-thread-3] ERROR freon.RandomKeyGenerator: Exception while adding key: key-21-80493 in bucket: bucket-44-63818 of volume: vol-1-95998.
> INTERNAL_ERROR org.apache.hadoop.ozone.om.exceptions.OMException: No Route to Host from  om1/172.31.0.11 to scm3:9863 failed on socket timeout exception: java.net.NoRouteToHostException: No route to host; For more details see:  [http://wiki.apache.org/hadoop/NoRouteToHost]
> at org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:604)
> at org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.openKey(OzoneManagerProtocolClientSideTranslatorPB.java:595)
> at org.apache.hadoop.ozone.client.rpc.RpcClient.createKey(RpcClient.java:756)
> at org.apache.hadoop.ozone.client.OzoneBucket.createKey(OzoneBucket.java:502)
> at org.apache.hadoop.ozone.freon.RandomKeyGenerator.createKey(RandomKeyGenerator.java:703)
> at org.apache.hadoop.ozone.freon.RandomKeyGenerator.access$1100(RandomKeyGenerator.java:86)
> at org.apache.hadoop.ozone.freon.RandomKeyGenerator$ObjectCreator.run(RandomKeyGenerator.java:621)
> at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:834)
>  44.10% |?????????????????????????????????????????????                                                        |  11024/25000 Time: 0:09:002021-05-11 06:00:37,231 [pool-2-thread-7] INFO metrics.RatisMetrics: Creating Metrics Registry : ratis.client_message_metrics.client-EA7B54107DBD->4c51bca8-cc0a-4c20-84dd-b5a7cb18c4ac
> 2021-05-11 06:00:37,231 [pool-2-thread-7] WARN impl.MetricRegistriesImpl: First MetricRegistry has been created without registering reporters. You may need to call MetricRegistries.global().addReporterRegistration(...) before.
>  100.00% |?????????????????????????????????????????????????????????????????????????????????????????????????????|  25000/25000 Time: 0:15:39
> INTERNAL_ERROR org.apache.hadoop.ozone.om.exceptions.OMException: No Route to Host from  om1/172.31.0.11 to scm3:9863 failed on socket timeout exception: java.net.NoRouteToHostException: No route to host; For more details see:  [http://wiki.apache.org/hadoop/NoRouteToHost]
> ***************************************************
> Status: Failed
> Git Base Revision: 7a3bc90b05f257c8ace2f76d74264906f0f7a932
> Number of Volumes created: 10
> Number of Buckets created: 500
> Number of Keys added: 24991
> Ratis replication factor: THREE
> Ratis replication type: RATIS
> Average Time spent in volume creation: 00:00:00,114
> Average Time spent in bucket creation: 00:00:01,263
> Average Time spent in key creation: 00:02:48,698
> Average Time spent in key write: 00:00:04,216
> Total bytes written: 255907840
> Total Execution time: 00:15:39,968
> ***************************************************
> In this case, I'd expect the freon test would still finish successfully.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org