You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-dev@jackrabbit.apache.org by Michael Dürig <md...@apache.org> on 2014/06/25 17:28:07 UTC
Oak resilience
Hi,
We should start thinking about resilience goals for Oak and how to
verify them. To get this started I tried to come up with a a definition
for what we mean by Oak being resilient [1]. Please have a look and let
me know whether this makes sense, what I missed and what I got wrong.
In addition I created a JIRA issue [2], which should serve as a
container for verifying the individual goals once we have an agreement
on those.
Michael
[1] https://wiki.apache.org/jackrabbit/Resilience
[2] https://issues.apache.org/jira/browse/OAK-1844
Re: Oak resilience
Posted by Michael Dürig <md...@apache.org>.
Hi,
To follow up on this I was thinking of implementing a couple of typical
tests like:
- Oak runs out of disk space during operation: verify that Oak comes up
again and that there is no corruption / data loss after recovering from
the out of disk condition.
- Data corruption on disk: verify that the damage does not escalate
beyond the initially corrupted data and that Oak becomes operational
again once the corruption has been fixed and that there is no data loss
in addition to the data immediately affected by the corruption.
- Lost connection to database: verify that Oak becomes operational again
once the connection is re-established and that there is no data loss
beyond the scope of the lost connection.
Is anyone aware of tooling that could be of help for implementing such
tests? The general pattern is:
- bring up Oak,
- simulate some access pattern,
- disrupt parts of the system (disk, process, network, memory),
- recover from the disruption,
- optionally apply some recovery procedure on Oak,
- assert that Oak is operational again and the damage done does not
escalate beyond the immediate scope.
Michael
On 25.6.14 5:28 , Michael Dürig wrote:
>
> Hi,
>
> We should start thinking about resilience goals for Oak and how to
> verify them. To get this started I tried to come up with a a definition
> for what we mean by Oak being resilient [1]. Please have a look and let
> me know whether this makes sense, what I missed and what I got wrong.
>
> In addition I created a JIRA issue [2], which should serve as a
> container for verifying the individual goals once we have an agreement
> on those.
>
> Michael
>
> [1] https://wiki.apache.org/jackrabbit/Resilience
> [2] https://issues.apache.org/jira/browse/OAK-1844