You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Viktor Somogyi <vi...@gmail.com> on 2017/09/01 12:07:00 UTC
Re: Fault Injection
Hi Colin,
I'd be interested in this and also think it's a valuable thing to have this
for the community and would greatly increase the test coverage.
Saw you already have a PR, I'll give a review as I have time :).
Viktor
On Tue, Aug 22, 2017 at 9:36 PM, Timothy Chen <tn...@gmail.com> wrote:
> Hi Colin,
>
> The Kibosh code is just a README for now, is it going to be published soon?
>
> Tim
>
> On Tue, Aug 22, 2017 at 11:44 AM, Colin McCabe <cm...@apache.org> wrote:
> > Hi all,
> >
> > I've been working on a fault injector for Apache Kafka. The general
> > idea is to create faults such as network partitions or disk failures,
> > and see what happens in the cluster. The fault injector can run as part
> > of a ducktape system test, or standalone.
> >
> > The fault injector has two processes: a coordinator, and an agent. The
> > agent process is responsible for actually implementing the faults. For
> > example, it might run iptables, send signals to processes, generate a
> > lot of load, or do something else to disrupt the computer it is running
> > on. We run an agent process on each node where we would like to
> > potentially inject faults. So it will run alongside the brokers,
> > zookeeper nodes, etc.
> >
> > The coordinator process is responsible for communicating with the agent
> > processes and for scheduling faults. For example, the coordinator can
> > be instructed to create a fault immediately on several nodes. Or it can
> > be instructed to create faults over time, based on a pseudorandom seed.
> > Both the coordinator and the agent expose a REST interface that accepts
> > objects serialized via JSON.
> >
> > I think two kinds of faults will be especially interesting: network
> > faults, and disk errors. Simulating network faults in a Linux
> > environment is relatively straightforward using iptables. Disk errors
> > are tougher to simulate, but I have written a FUSE filesystem to do
> > this. The filesystem essentially simulates a bind mount in most cases,
> > but it can take a JSON specification telling it to inject certain
> > faults. (Disk errors seem especially relevant to the ongoing work on
> > JBOD.)
> >
> > Although it's not a user-visible component, I think having a fault
> > injector will be really great for Kafka users. It will really help us
> > stress test Kafka in more situations. I'm going to post some patches in
> > a day or two-- it would be great to get some feedback. Check out
> > https://cwiki.apache.org/confluence/display/KAFKA/Fault+Injection
> >
> > best,
> > Colin
>
Re: Fault Injection
Posted by Colin McCabe <cm...@apache.org>.
Thanks, Victor. Also check out the fault injection umbrella JIRA here:
https://issues.apache.org/jira/browse/KAFKA-5775 with more subtasks.
cheers,
Colin
On Fri, Sep 1, 2017, at 05:07, Viktor Somogyi wrote:
> Hi Colin,
>
> I'd be interested in this and also think it's a valuable thing to have
> this
> for the community and would greatly increase the test coverage.
> Saw you already have a PR, I'll give a review as I have time :).
>
> Viktor
>
> On Tue, Aug 22, 2017 at 9:36 PM, Timothy Chen <tn...@gmail.com> wrote:
>
> > Hi Colin,
> >
> > The Kibosh code is just a README for now, is it going to be published soon?
> >
> > Tim
> >
> > On Tue, Aug 22, 2017 at 11:44 AM, Colin McCabe <cm...@apache.org> wrote:
> > > Hi all,
> > >
> > > I've been working on a fault injector for Apache Kafka. The general
> > > idea is to create faults such as network partitions or disk failures,
> > > and see what happens in the cluster. The fault injector can run as part
> > > of a ducktape system test, or standalone.
> > >
> > > The fault injector has two processes: a coordinator, and an agent. The
> > > agent process is responsible for actually implementing the faults. For
> > > example, it might run iptables, send signals to processes, generate a
> > > lot of load, or do something else to disrupt the computer it is running
> > > on. We run an agent process on each node where we would like to
> > > potentially inject faults. So it will run alongside the brokers,
> > > zookeeper nodes, etc.
> > >
> > > The coordinator process is responsible for communicating with the agent
> > > processes and for scheduling faults. For example, the coordinator can
> > > be instructed to create a fault immediately on several nodes. Or it can
> > > be instructed to create faults over time, based on a pseudorandom seed.
> > > Both the coordinator and the agent expose a REST interface that accepts
> > > objects serialized via JSON.
> > >
> > > I think two kinds of faults will be especially interesting: network
> > > faults, and disk errors. Simulating network faults in a Linux
> > > environment is relatively straightforward using iptables. Disk errors
> > > are tougher to simulate, but I have written a FUSE filesystem to do
> > > this. The filesystem essentially simulates a bind mount in most cases,
> > > but it can take a JSON specification telling it to inject certain
> > > faults. (Disk errors seem especially relevant to the ongoing work on
> > > JBOD.)
> > >
> > > Although it's not a user-visible component, I think having a fault
> > > injector will be really great for Kafka users. It will really help us
> > > stress test Kafka in more situations. I'm going to post some patches in
> > > a day or two-- it would be great to get some feedback. Check out
> > > https://cwiki.apache.org/confluence/display/KAFKA/Fault+Injection
> > >
> > > best,
> > > Colin
> >