You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by "James Clampffer (JIRA)" <ji...@apache.org> on 2016/06/07 16:39:21 UTC

[jira] [Reopened] (HDFS-9890) libhdfs++: Add test suite to simulate network issues

     [ https://issues.apache.org/jira/browse/HDFS-9890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Clampffer reopened HDFS-9890:
-----------------------------------

 Ended up having merge issues. "git apply -3" worked fine but some of the changes weren't compatible with the current codebase.

I've reverted the change on HDFS-8707 and will work on getting a good rebase posted.

> libhdfs++: Add test suite to simulate network issues
> ----------------------------------------------------
>
>                 Key: HDFS-9890
>                 URL: https://issues.apache.org/jira/browse/HDFS-9890
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: hdfs-client
>            Reporter: James Clampffer
>            Assignee: Xiaowei Zhu
>         Attachments: HDFS-9890.HDFS-8707.000.patch, HDFS-9890.HDFS-8707.001.patch, HDFS-9890.HDFS-8707.002.patch, HDFS-9890.HDFS-8707.003.patch, HDFS-9890.HDFS-8707.004.patch, HDFS-9890.HDFS-8707.005.patch, HDFS-9890.HDFS-8707.006.patch, HDFS-9890.HDFS-8707.007.patch, hs_err_pid26832.log, hs_err_pid4944.log
>
>
> I propose adding a test suite to simulate various network issues/failures in order to get good test coverage on some of the retry paths that aren't easy to hit in mock unit tests.
> At the moment the only things that hit the retry paths are the gmock unit tests.  The gmock are only as good as their mock implementations which do a great job of simulating protocol correctness but not more complex interactions.  They also can't really simulate the types of lock contention and subtle memory stomps that show up while doing hundreds or thousands of concurrent reads.   We should add a new minidfscluster test that focuses on heavy read/seek load and then randomly convert error codes returned by network functions into errors.
> List of things to simulate(while heavily loaded), roughly in order of how badly I think they need to be tested at the moment:
> -Rpc connection disconnect
> -Rpc connection slowed down enough to cause a timeout and trigger retry
> -DN connection disconnect



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org