You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Rui Wang (Jira)" <ji...@apache.org> on 2020/09/14 03:25:00 UTC

[jira] [Updated] (HDDS-4237) Testing Infrastructure for network partitioning

     [ https://issues.apache.org/jira/browse/HDDS-4237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rui Wang updated HDDS-4237:
---------------------------
    Description: 
Network partitioning can cause Brian-split case where there are two leaders exist. We need some sort of testing Infrastructure/framework to simulate such case and verify whether our  SCM HA implementation can achieve strong consistency.

There might be two ways suggested by Mukul Kumar Singh:

a) Blockade tests, blockade is a docker based framework where the
network for one DN can be isolated from the other

b) MiniOzoneChaosCluster - This is a unit test based test, where a
random datanode was killed and this helped in finding out issues with
the consistency.


We might need similar solution for SCM: block SCM leader network and also increase timeout to make old leader do not turn into candidate.

  was:
Network partitioning can cause Brian-split case where there are two leaders exist. We need some sort of testing Infrastructure/framework to simulate such case and verify whether our  SCM HA implementation can achieve strong consistency.

There might be two ways suggested by Mukul Kumar Singh:

a) Blockade tests, blockade is a docker based framework where the
network for one DN can be isolated from the other

b) MiniOzoneChaosCluster - This is a unit test based test, where a
random datanode was killed and this helped in finding out issues with
the consistency.


> Testing Infrastructure for network partitioning
> -----------------------------------------------
>
>                 Key: HDDS-4237
>                 URL: https://issues.apache.org/jira/browse/HDDS-4237
>             Project: Hadoop Distributed Data Store
>          Issue Type: Sub-task
>            Reporter: Rui Wang
>            Priority: Major
>
> Network partitioning can cause Brian-split case where there are two leaders exist. We need some sort of testing Infrastructure/framework to simulate such case and verify whether our  SCM HA implementation can achieve strong consistency.
> There might be two ways suggested by Mukul Kumar Singh:
> a) Blockade tests, blockade is a docker based framework where the
> network for one DN can be isolated from the other
> b) MiniOzoneChaosCluster - This is a unit test based test, where a
> random datanode was killed and this helped in finding out issues with
> the consistency.
> We might need similar solution for SCM: block SCM leader network and also increase timeout to make old leader do not turn into candidate.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org