You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kudu.apache.org by al...@apache.org on 2017/05/05 05:45:05 UTC

kudu git commit: [kudu-jepsen] added more info on troubleshooting

Repository: kudu
Updated Branches:
  refs/heads/master e7334c2e6 -> 46a886cd2


[kudu-jepsen] added more info on troubleshooting

Added more information on distinguishing 'errors' from 'failures' in the
kudu-jepsen test output.

Change-Id: I9b97b744d969b73ede2fcb7a3509915b130c655b
Reviewed-on: http://gerrit.cloudera.org:8080/6774
Tested-by: Kudu Jenkins
Reviewed-by: David Ribeiro Alves <da...@gmail.com>


Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/46a886cd
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/46a886cd
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/46a886cd

Branch: refs/heads/master
Commit: 46a886cd22e86d3fdfee62aa6115054138db3a33
Parents: e7334c2
Author: Alexey Serbin <as...@cloudera.com>
Authored: Mon May 1 14:16:12 2017 -0700
Committer: Alexey Serbin <as...@cloudera.com>
Committed: Fri May 5 05:44:09 2017 +0000

----------------------------------------------------------------------
 java/kudu-jepsen/README.adoc | 37 ++++++++++++++++++++++++++++++-------
 1 file changed, 30 insertions(+), 7 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kudu/blob/46a886cd/java/kudu-jepsen/README.adoc
----------------------------------------------------------------------
diff --git a/java/kudu-jepsen/README.adoc b/java/kudu-jepsen/README.adoc
index 4c32428..7dfaaee 100644
--- a/java/kudu-jepsen/README.adoc
+++ b/java/kudu-jepsen/README.adoc
@@ -21,7 +21,7 @@
 
 A link:http://clojure.org[Clojure] library designed to run
 link:http://kudu.apache.org[Apache Kudu] consistency tests using
-the link:https://aphyr.com/tags/Jepsen[Jepsen] framework. Curently, a simple
+the link:https://aphyr.com/tags/Jepsen[Jepsen] framework. Currently, a simple
 linearizability test for read/write register is implemented and run
 for several fault injection scenarios.
 
@@ -102,16 +102,30 @@ In the Jepsen terminology, Kudu master and tserver nodes are playing
 is run plays *Jepsen control node* role.
 
 === Troubleshooting
+When Jepsen's analysis doesn't find inconsistencies in the history of operations
+it outputs the following in the end of a test:
+[listing]
+----
+Everything looks good! ヽ(‘ー`)ノ
+----
+
+However, it might not be the case. If so, it's crucial to understand why the
+test failed.
+
 The majority of the kudu-jepsen test failures can be put into two classification
 buckets:
 
 * An error happened while setting up the testing environment, contacting
-  machines at the Kudu cluster, starting up Kudu server-side components, etc.
+  machines at the Kudu cluster, starting up Kudu server-side components, or in
+  any of the other third-party components the Jepsen uses (like clj-ssh), etc.
 * The Jepsen's analysis detected inconsistent history of operations.
 
 The former class of failures might be a manifestation of wrong configuration,
-a problem with the test environment or a bug in the test code itself.
-Those issues manifest themselves in messages like the following:
+a problem with the test environment, a bug in the test code itself or some
+other intermittent failure. Usually, encountering issues like that means the
+consistency analysis (which is the last step of a test scenario) cannot run.
+Such issues are reported as _errors_ in the summary message. E.g., the example
+summary message below reports on 10 errors in 10 tests ran:
 [listing]
 ----
 21:41:42 Ran 10  tests containing 10 assertions.
@@ -121,9 +135,18 @@ To get more details, take a closer look at the output of `mvn clojure:run`
 or at particular `jepsen.log` files under
 $KUDU_HOME/java/kudu-jepsen/store/rw-register/<test_timestamp> directories.
 
-The latter class of failures represents more serious issues: manifestations
-of non-linearizable history of operations. If Jepsen finds a such an
-inconsistency, it outputs something like the the following into the log:
+The latter class represents more serious issue: a manifestation of
+non-linearizable history of operations. This is reported as _failure_ in the
+summary message. E.g., the summary message below reports finding 2 instances
+of non-linearizable history among 10 tests ran:
+[listing]
+----
+22:21:52 Ran 10  tests containing 10 assertions.
+22:21:52 2 failures, 0 errors.
+----
+
+If Jepsen's analysis finds non-linearizable history of operations, it outputs
+the following in the end of a test:
 [listing]
 ----
 Analysis invalid! (ノಥ益ಥ)ノ ┻━┻