You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-dev@jackrabbit.apache.org by Michael Dürig <md...@apache.org> on 2014/06/25 17:28:07 UTC

Oak resilience

Hi,

We should start thinking about resilience goals for Oak and how to 
verify them. To get this started I tried to come up with a a definition 
for what we mean by Oak being resilient [1]. Please have a look and let 
me know whether this makes sense, what I missed and what I got wrong.

In addition I created a JIRA issue [2], which should serve as a 
container for verifying the individual goals once we have an agreement 
on those.

Michael

[1] https://wiki.apache.org/jackrabbit/Resilience
[2] https://issues.apache.org/jira/browse/OAK-1844

Re: Oak resilience

Posted by Michael Dürig <md...@apache.org>.
Hi,

To follow up on this I was thinking of implementing a couple of typical 
tests like:

- Oak runs out of disk space during operation: verify that Oak comes up 
again and that there is no corruption / data loss after recovering from 
the out of disk condition.

- Data corruption on disk: verify that the damage does not escalate 
beyond the initially corrupted data and that Oak becomes operational 
again once the corruption has been fixed and that there is no data loss 
in addition to the data immediately affected by the corruption.

- Lost connection to database: verify that Oak becomes operational again 
once the connection is re-established and that there is no data loss 
beyond the scope of the lost connection.

Is anyone aware of tooling that could be of help for implementing such 
tests? The general pattern is:

- bring up Oak,
- simulate some access pattern,
- disrupt parts of the system (disk, process, network, memory),
- recover from the disruption,
- optionally apply some recovery procedure on Oak,
- assert that Oak is operational again and the damage done does not 
escalate beyond the immediate scope.

Michael



On 25.6.14 5:28 , Michael Dürig wrote:
>
> Hi,
>
> We should start thinking about resilience goals for Oak and how to
> verify them. To get this started I tried to come up with a a definition
> for what we mean by Oak being resilient [1]. Please have a look and let
> me know whether this makes sense, what I missed and what I got wrong.
>
> In addition I created a JIRA issue [2], which should serve as a
> container for verifying the individual goals once we have an agreement
> on those.
>
> Michael
>
> [1] https://wiki.apache.org/jackrabbit/Resilience
> [2] https://issues.apache.org/jira/browse/OAK-1844