You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by Sergey Shelukhin <Se...@microsoft.com.INVALID> on 2019/02/11 23:42:09 UTC

DFSClient/DistriburedFileSystem fault injection?

Hi.
I've been looking for a client-side solution for fault injection in HDFS.
We had a naturally unstable HDFS cluster that helped uncover a lot of issues in HBase; now that it has been stabilized, we miss it already :)

To keep testing without actually disrupting others' use of HDFS or having to deploy a new version, I was thinking about having a client-side schema (e.g. fhdfs) map to a wrapper over the standard DFS that would inject failures and delays according to some configs, similar to https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/FaultInjectFramework.html 

However I wonder if something like this exists already?

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org


RE: DFSClient/DistriburedFileSystem fault injection?

Posted by Sergey Shelukhin <Se...@microsoft.com.INVALID>.
Adding the user list :)

-----Original Message-----
From: Sergey Shelukhin 
Sent: Monday, February 11, 2019 3:42 PM
To: hdfs-dev@hadoop.apache.org
Subject: DFSClient/DistriburedFileSystem fault injection?

Hi.
I've been looking for a client-side solution for fault injection in HDFS.
We had a naturally unstable HDFS cluster that helped uncover a lot of issues in HBase; now that it has been stabilized, we miss it already :)

To keep testing without actually disrupting others' use of HDFS or having to deploy a new version, I was thinking about having a client-side schema (e.g. fhdfs) map to a wrapper over the standard DFS that would inject failures and delays according to some configs, similar to https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/FaultInjectFramework.html 

However I wonder if something like this exists already?

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org


Re: DFSClient/DistriburedFileSystem fault injection?

Posted by Wei-Chiu Chuang <we...@cloudera.com.INVALID>.
For what is worth,
At Cloudera we have an internal fault injection tool that is based on Chaos
Monkey.
We use it to kill disks or kill nodes, for example. So it doesn't
instrument HDFS directly.

On Mon, Feb 11, 2019 at 3:42 PM Sergey Shelukhin
<Se...@microsoft.com.invalid> wrote:

> Hi.
> I've been looking for a client-side solution for fault injection in HDFS.
> We had a naturally unstable HDFS cluster that helped uncover a lot of
> issues in HBase; now that it has been stabilized, we miss it already :)
>
> To keep testing without actually disrupting others' use of HDFS or having
> to deploy a new version, I was thinking about having a client-side schema
> (e.g. fhdfs) map to a wrapper over the standard DFS that would inject
> failures and delays according to some configs, similar to
> https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/FaultInjectFramework.html
>
> However I wonder if something like this exists already?
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
> For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
>
>

RE: DFSClient/DistriburedFileSystem fault injection?

Posted by Sergey Shelukhin <Se...@microsoft.com.INVALID>.
Adding the user list :)

-----Original Message-----
From: Sergey Shelukhin 
Sent: Monday, February 11, 2019 3:42 PM
To: hdfs-dev@hadoop.apache.org
Subject: DFSClient/DistriburedFileSystem fault injection?

Hi.
I've been looking for a client-side solution for fault injection in HDFS.
We had a naturally unstable HDFS cluster that helped uncover a lot of issues in HBase; now that it has been stabilized, we miss it already :)

To keep testing without actually disrupting others' use of HDFS or having to deploy a new version, I was thinking about having a client-side schema (e.g. fhdfs) map to a wrapper over the standard DFS that would inject failures and delays according to some configs, similar to https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/FaultInjectFramework.html 

However I wonder if something like this exists already?

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
For additional commands, e-mail: user-help@hadoop.apache.org


RE: DFSClient/DistriburedFileSystem fault injection?

Posted by Sergey Shelukhin <Se...@microsoft.com.INVALID>.
Yeah, trying to do the injection client-side to avoid disruption to other users (and having to deploy/reconfigure HDFS).
I was hoping someone has already created that :) We will probably create it at some point and may try to submit a patch later.

-----Original Message-----
From: Stephen Loughran <st...@cloudera.com.INVALID> 
Sent: Tuesday, February 12, 2019 3:33 PM
To: Sergey Shelukhin <Se...@microsoft.com.invalid>
Cc: hdfs-dev@hadoop.apache.org
Subject: Re: DFSClient/DistriburedFileSystem fault injection?

Sergey -you trying to simulate failures client side, or do you have an NN Which actually injects failures all the way up the IPC stack?

as if its just client, couldn't registering a fault-injecting client as fs.hdfs.impl could do that

FWIW, in the s3a connector we have the "inconsistent" s3 client which mimics some symptoms of delayed consistency; it has a path, a probability of happening and a delay before things become visible. This is in the main hadoop-aws JAR, and is turned on by a configuration switch (yes, it prints a big warning). With a single switch to turn it on, its trivial to enable it in tests

On Mon, Feb 11, 2019 at 11:42 PM Sergey Shelukhin <Se...@microsoft.com.invalid> wrote:

> Hi.
> I've been looking for a client-side solution for fault injection in HDFS.
> We had a naturally unstable HDFS cluster that helped uncover a lot of 
> issues in HBase; now that it has been stabilized, we miss it already 
> :)
>
> To keep testing without actually disrupting others' use of HDFS or 
> having to deploy a new version, I was thinking about having a 
> client-side schema (e.g. fhdfs) map to a wrapper over the standard DFS 
> that would inject failures and delays according to some configs, 
> similar to
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fhado
> op.apache.org%2Fdocs%2Fr2.7.2%2Fhadoop-project-dist%2Fhadoop-hdfs%2FFa
> ultInjectFramework.html&amp;data=02%7C01%7CSergey.Shelukhin%40microsof
> t.com%7Cd586bd648a164a9dd45108d69142773e%7C72f988bf86f141af91ab2d7cd01
> 1db47%7C1%7C1%7C636856111994781945&amp;sdata=9hiTYoOeDiKnz%2FNAvSRl%2F
> AhqXtdIF%2FQcUwzQjorfJNU%3D&amp;reserved=0
>
> However I wonder if something like this exists already?
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
> For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
>
>

Re: DFSClient/DistriburedFileSystem fault injection?

Posted by Stephen Loughran <st...@cloudera.com.INVALID>.
Sergey -you trying to simulate failures client side, or do you have an NN
Which actually injects failures all the way up the IPC stack?

as if its just client, couldn't registering a fault-injecting client as
fs.hdfs.impl could do that

FWIW, in the s3a connector we have the "inconsistent" s3 client which
mimics some symptoms of delayed consistency; it has a path, a probability
of happening
and a delay before things become visible. This is in the main hadoop-aws
JAR, and is turned on by a configuration switch (yes, it prints a big
warning). With a single switch to turn it on, its trivial to enable it in
tests

On Mon, Feb 11, 2019 at 11:42 PM Sergey Shelukhin
<Se...@microsoft.com.invalid> wrote:

> Hi.
> I've been looking for a client-side solution for fault injection in HDFS.
> We had a naturally unstable HDFS cluster that helped uncover a lot of
> issues in HBase; now that it has been stabilized, we miss it already :)
>
> To keep testing without actually disrupting others' use of HDFS or having
> to deploy a new version, I was thinking about having a client-side schema
> (e.g. fhdfs) map to a wrapper over the standard DFS that would inject
> failures and delays according to some configs, similar to
> https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/FaultInjectFramework.html
>
> However I wonder if something like this exists already?
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
> For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
>
>