You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Adar Dembo (Code Review)" <ge...@cloudera.org> on 2017/08/28 06:15:31 UTC

[kudu-CR] WIP: use C++ ExternalMiniCluster for Java and Python tests

Hello Dan Burkert, Todd Lipcon,

I'd like you to do a code review.  Please visit

    http://gerrit.cloudera.org:8080/7853

to review the following change.

Change subject: WIP: use C++ ExternalMiniCluster for Java and Python tests
......................................................................

WIP: use C++ ExternalMiniCluster for Java and Python tests

Maintaining Kudu clients across various languages has been an ongoing
maintenance burden. Even when the client is just a thin wrapper around
another client (e.g. Kudu Python bindings), a great deal of work goes into
client testability. In practice, this has meant a bespoke mini cluster
implementation for each language. On the surface this doesn't seem that bad;
we just need to spawn some masters and tservers, right? Well, the work
quickly adds up:

o While the C++ mini cluster is heavily used and has seen many improvements,
  the Java mini cluster has not received the same kind of love, and is less
  robust as a result. KUDU-1976 is a great example of this deficiency.
o With the inclusion of authz came the addition of a "mini KDC", a special
  daemon for Kerberized mini clusters. It was originally implemented in C++
  and ported to Java, but has yet to be ported to the Python client; this is
  one of the obstacles towards porting full authz support to Python.
o Dan has been prototyping Hive Metastore and Sentry integration for Kudu,
  the testing of which will require "mini HMS" and possibly "mini Sentry"
  testing implementations in C++, Java, and eventually, Python.

In sum, good support for non-C++ mini clusters is an ongoing commitment and
requires a great deal of work. This work hasn't always been forthcoming, and
the non-C++ clusters are deficient as a result. But it doesn't have to be
this way! Here's a thought: what if we reused the C++ mini cluster for tests
written in these other languages? We could write a "proxy" application whose
job it is to manage the C++ mini cluster and expose a rudimentary API that's
easily programmable from Java and Python.

This patch attempts to do just that. It adds a "run_cluster" mode to the
Kudu CLI. When invoked, it spawns an ExternalMiniCluster and provides a
simple, machine-readable shell over stdin/stdout. The shell responds to
commands by manipulating the cluster and its daemons, and kills them when
the shell client disconnects. As a proof of concept, the patch also replaces
the bespoke Java mini cluster with callouts to the new shell.

I should add that I like the idea of shipping "run_cluster" into production
as part of the CLI, as it helps realize the vision of a single Kudu artifact
that can provide Kudu testability for any integrating product.

WIP because, well, it should be pretty obvious. I was able to get through a
full run of "mvn verify" locally, so I have confidence that this can work.
But I'd like to solicit feedback on the general approach before spending
more time applying spit and polish.

Change-Id: I0e693921ef780dc4a06e536c6b7408f7f0b252f6
---
M java/kudu-client/src/test/java/org/apache/kudu/client/BaseKuduTest.java
D java/kudu-client/src/test/java/org/apache/kudu/client/MiniKdc.java
M java/kudu-client/src/test/java/org/apache/kudu/client/MiniKuduCluster.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestClientFailoverSupport.java
D java/kudu-client/src/test/java/org/apache/kudu/client/TestMiniKdc.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestMiniKuduCluster.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestMultipleLeaderFailover.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestUtils.java
D java/kudu-client/src/test/resources/flags
M src/kudu/security/test/mini_kdc.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/tool_action.h
A src/kudu/tools/tool_action_test.cc
M src/kudu/tools/tool_main.cc
14 files changed, 532 insertions(+), 1,210 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/53/7853/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7853
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I0e693921ef780dc4a06e536c6b7408f7f0b252f6
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] tool: add cluster shell action

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change. ( http://gerrit.cloudera.org:8080/7853 )

Change subject: tool: add cluster shell action
......................................................................


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/CMakeLists.txt
File src/kudu/tools/CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/CMakeLists.txt@120
PS8, Line 120: mini-cluster
> This should use an underscore to be consistent with the rest of our librari
nevermind, bad suggestion



-- 
To view, visit http://gerrit.cloudera.org:8080/7853
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0e693921ef780dc4a06e536c6b7408f7f0b252f6
Gerrit-Change-Number: 7853
Gerrit-PatchSet: 8
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 04 Oct 2017 14:11:35 +0000
Gerrit-HasComments: Yes

[kudu-CR] tool: add cluster shell action

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has removed Kudu Jenkins from this change.  ( http://gerrit.cloudera.org:8080/7853 )

Change subject: tool: add cluster shell action
......................................................................


Removed reviewer Kudu Jenkins with the following votes:

* Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/7853
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteReviewer
Gerrit-Change-Id: I0e693921ef780dc4a06e536c6b7408f7f0b252f6
Gerrit-Change-Number: 7853
Gerrit-PatchSet: 8
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] WIP: use C++ ExternalMiniCluster for Java and Python tests

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Dan Burkert, Kudu Jenkins, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/7853

to look at the new patch set (#3).

Change subject: WIP: use C++ ExternalMiniCluster for Java and Python tests
......................................................................

WIP: use C++ ExternalMiniCluster for Java and Python tests

Maintaining Kudu clients across various languages has been an ongoing
maintenance burden. Even when the client is just a thin wrapper around
another client (e.g. Kudu Python bindings), a great deal of work goes into
client testability. In practice, this has meant a bespoke mini cluster
implementation for each language. On the surface this doesn't seem that bad;
we just need to spawn some masters and tservers, right? Well, the work
quickly adds up:

o While the C++ mini cluster is heavily used and has seen many improvements,
  the Java mini cluster has not received the same kind of love, and is less
  robust as a result. KUDU-1976 is a great example of this deficiency.
o With the inclusion of authz came the addition of a "mini KDC", a special
  daemon for Kerberized mini clusters. It was originally implemented in C++
  and ported to Java, but has yet to be ported to the Python client; this is
  one of the obstacles towards porting full authz support to Python.
o Dan has been prototyping Hive Metastore and Sentry integration for Kudu,
  the testing of which will require "mini HMS" and possibly "mini Sentry"
  testing implementations in C++, Java, and eventually, Python.

In sum, good support for non-C++ mini clusters is an ongoing commitment and
requires a great deal of work. This work hasn't always been forthcoming, and
the non-C++ clusters are deficient as a result. But it doesn't have to be
this way! Here's a thought: what if we reused the C++ mini cluster for tests
written in these other languages? We could write a "proxy" application whose
job it is to manage the C++ mini cluster and expose a rudimentary API that's
easily programmable from Java and Python.

This patch attempts to do just that. It adds a "shell" mode to the Kudu CLI
which provides a rudimentary control shell that can be used to spin up an
ExternalMiniCluster. The shell is controlled via a wire protocol over
stdin/stdout. The first cut of the protocol was sh-like, with simple
word-based commands. It was then revised into a PB-based protocol (with
optional JSON encoding) based on feedback. As a proof of concept, the patch
also replaces the bespoke Java mini cluster with callouts to the new shell.

I should add that I like the idea of shipping "shell" into production as
part of the CLI, as it helps realize the vision of a single Kudu artifact
that can provide Kudu testability for any integrating product.

WIP because, well, it should be pretty obvious. I was able to get through a
full run of "mvn verify" locally, so I have confidence that this can work.
But I'd like to solicit feedback on the general approach before spending
more time applying spit and polish.

Change-Id: I0e693921ef780dc4a06e536c6b7408f7f0b252f6
---
M java/kudu-client/src/test/java/org/apache/kudu/client/BaseKuduTest.java
D java/kudu-client/src/test/java/org/apache/kudu/client/MiniKdc.java
M java/kudu-client/src/test/java/org/apache/kudu/client/MiniKuduCluster.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestClientFailoverSupport.java
D java/kudu-client/src/test/java/org/apache/kudu/client/TestMiniKdc.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestMiniKuduCluster.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestMultipleLeaderFailover.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestUtils.java
D java/kudu-client/src/test/resources/flags
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/TestContext.scala
M src/kudu/integration-tests/external_mini_cluster.h
M src/kudu/security/test/mini_kdc.cc
M src/kudu/tools/CMakeLists.txt
A src/kudu/tools/tool.proto
M src/kudu/tools/tool_action.h
A src/kudu/tools/tool_action_test.cc
M src/kudu/tools/tool_main.cc
17 files changed, 919 insertions(+), 1,256 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/53/7853/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7853
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0e693921ef780dc4a06e536c6b7408f7f0b252f6
Gerrit-Change-Number: 7853
Gerrit-PatchSet: 3
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] tool: add cluster shell action

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Dan Burkert, Jean-Daniel Cryans, Kudu Jenkins, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/7853

to look at the new patch set (#8).

Change subject: tool: add cluster shell action
......................................................................

tool: add cluster shell action

Maintaining Kudu clients across various languages has been an ongoing
maintenance burden. Even when the client is just a thin wrapper around
another client (e.g. Kudu Python bindings), a great deal of work goes into
client testability. In practice, this has meant a bespoke mini cluster
implementation for each language. On the surface this doesn't seem that bad;
we just need to spawn some masters and tservers, right? Well, the work
quickly adds up:

o While the C++ mini cluster is heavily used and has seen many improvements,
  the Java mini cluster has not received the same kind of love, and is less
  robust as a result. KUDU-1976 is a great example of this deficiency.
o With the inclusion of authn came the addition of a "mini KDC", a special
  daemon for Kerberized mini clusters. It was originally implemented in C++
  and ported to Java, but has yet to be ported to the Python client; this is
  one of the obstacles towards porting full authn support to Python.
o Dan has been prototyping Hive Metastore and Sentry integration for Kudu,
  the testing of which will require "mini HMS" and possibly "mini Sentry"
  testing implementations in C++, Java, and eventually, Python.

In sum, good support for non-C++ mini clusters is an ongoing commitment and
requires a great deal of work. This work hasn't always been forthcoming, and
the non-C++ clusters are deficient as a result. But it doesn't have to be
this way! Here's a thought: what if we reused the C++ mini cluster for tests
written in these other languages? We could write a "proxy" application whose
job it is to manage the C++ mini cluster and expose a rudimentary API that's
easily programmable from Java and Python.

This patch attempts to do just that. It adds a "shell" action to the Kudu
CLI which provides a rudimentary control shell that can be used to spin up
an ExternalMiniCluster. The shell is controlled via a wire protocol over
stdin/stdout. The protocol is protobuf-based with optional JSON encoding.

I should add that I like the idea of shipping "shell" into production as
part of the CLI, as it helps realize the vision of a single Kudu artifact
that can provide Kudu testability for any integrating product.

Change-Id: I0e693921ef780dc4a06e536c6b7408f7f0b252f6
---
M src/kudu/mini-cluster/external_mini_cluster.h
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
A src/kudu/tools/tool.proto
M src/kudu/tools/tool_action.h
M src/kudu/tools/tool_action_common.cc
M src/kudu/tools/tool_action_common.h
A src/kudu/tools/tool_action_test.cc
M src/kudu/tools/tool_main.cc
9 files changed, 1,129 insertions(+), 14 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/53/7853/8
-- 
To view, visit http://gerrit.cloudera.org:8080/7853
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0e693921ef780dc4a06e536c6b7408f7f0b252f6
Gerrit-Change-Number: 7853
Gerrit-PatchSet: 8
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] WIP: use C++ ExternalMiniCluster for Java and Python tests

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/7853

to look at the new patch set (#2).

Change subject: WIP: use C++ ExternalMiniCluster for Java and Python tests
......................................................................

WIP: use C++ ExternalMiniCluster for Java and Python tests

Maintaining Kudu clients across various languages has been an ongoing
maintenance burden. Even when the client is just a thin wrapper around
another client (e.g. Kudu Python bindings), a great deal of work goes into
client testability. In practice, this has meant a bespoke mini cluster
implementation for each language. On the surface this doesn't seem that bad;
we just need to spawn some masters and tservers, right? Well, the work
quickly adds up:

o While the C++ mini cluster is heavily used and has seen many improvements,
  the Java mini cluster has not received the same kind of love, and is less
  robust as a result. KUDU-1976 is a great example of this deficiency.
o With the inclusion of authz came the addition of a "mini KDC", a special
  daemon for Kerberized mini clusters. It was originally implemented in C++
  and ported to Java, but has yet to be ported to the Python client; this is
  one of the obstacles towards porting full authz support to Python.
o Dan has been prototyping Hive Metastore and Sentry integration for Kudu,
  the testing of which will require "mini HMS" and possibly "mini Sentry"
  testing implementations in C++, Java, and eventually, Python.

In sum, good support for non-C++ mini clusters is an ongoing commitment and
requires a great deal of work. This work hasn't always been forthcoming, and
the non-C++ clusters are deficient as a result. But it doesn't have to be
this way! Here's a thought: what if we reused the C++ mini cluster for tests
written in these other languages? We could write a "proxy" application whose
job it is to manage the C++ mini cluster and expose a rudimentary API that's
easily programmable from Java and Python.

This patch attempts to do just that. It adds a "run_cluster" mode to the
Kudu CLI. When invoked, it spawns an ExternalMiniCluster and provides a
simple, machine-readable shell over stdin/stdout. The shell responds to
commands by manipulating the cluster and its daemons, and kills them when
the shell client disconnects. As a proof of concept, the patch also replaces
the bespoke Java mini cluster with callouts to the new shell.

I should add that I like the idea of shipping "run_cluster" into production
as part of the CLI, as it helps realize the vision of a single Kudu artifact
that can provide Kudu testability for any integrating product.

WIP because, well, it should be pretty obvious. I was able to get through a
full run of "mvn verify" locally, so I have confidence that this can work.
But I'd like to solicit feedback on the general approach before spending
more time applying spit and polish.

Change-Id: I0e693921ef780dc4a06e536c6b7408f7f0b252f6
---
M java/kudu-client/src/test/java/org/apache/kudu/client/BaseKuduTest.java
D java/kudu-client/src/test/java/org/apache/kudu/client/MiniKdc.java
M java/kudu-client/src/test/java/org/apache/kudu/client/MiniKuduCluster.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestClientFailoverSupport.java
D java/kudu-client/src/test/java/org/apache/kudu/client/TestMiniKdc.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestMiniKuduCluster.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestMultipleLeaderFailover.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestUtils.java
D java/kudu-client/src/test/resources/flags
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/TestContext.scala
M src/kudu/security/test/mini_kdc.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/tool_action.h
A src/kudu/tools/tool_action_test.cc
M src/kudu/tools/tool_main.cc
15 files changed, 532 insertions(+), 1,211 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/53/7853/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7853
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0e693921ef780dc4a06e536c6b7408f7f0b252f6
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] tool: new action for running mini-clusters

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change. ( http://gerrit.cloudera.org:8080/7853 )

Change subject: tool: new action for running mini-clusters
......................................................................


Patch Set 13: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/7853
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0e693921ef780dc4a06e536c6b7408f7f0b252f6
Gerrit-Change-Number: 7853
Gerrit-PatchSet: 13
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Fri, 06 Oct 2017 22:15:18 +0000
Gerrit-HasComments: No

[kudu-CR] tool: new action for running mini-clusters

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Dan Burkert, Jean-Daniel Cryans, Kudu Jenkins, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/7853

to look at the new patch set (#12).

Change subject: tool: new action for running mini-clusters
......................................................................

tool: new action for running mini-clusters

Maintaining Kudu clients across various languages has been an ongoing
maintenance burden. Even when the client is just a thin wrapper around
another client (e.g. Kudu Python bindings), a great deal of work goes into
client testability. In practice, this has meant a bespoke mini cluster
implementation for each language. On the surface this doesn't seem that bad;
we just need to spawn some masters and tservers, right? Well, the work
quickly adds up:

o While the C++ mini cluster is heavily used and has seen many improvements,
  the Java mini cluster has not received the same kind of love, and is less
  robust as a result. KUDU-1976 is a great example of this deficiency.
o With the inclusion of authn came the addition of a "mini KDC", a special
  daemon for Kerberized mini clusters. It was originally implemented in C++
  and ported to Java, but has yet to be ported to the Python client; this is
  one of the obstacles towards porting full authn support to Python.
o Dan has been prototyping Hive Metastore and Sentry integration for Kudu,
  the testing of which will require "mini HMS" and possibly "mini Sentry"
  testing implementations in C++, Java, and eventually, Python.

In sum, good support for non-C++ mini clusters is an ongoing commitment and
requires a great deal of work. This work hasn't always been forthcoming, and
the non-C++ clusters are deficient as a result. But it doesn't have to be
this way! Here's a thought: what if we reused the C++ mini cluster for tests
written in these other languages? We could write a "proxy" application whose
job it is to manage the C++ mini cluster and expose a rudimentary API that's
easily programmable from Java and Python.

This patch attempts to do just that. It adds a "mini_cluster" action to the
Kudu CLI which provides a rudimentary control shell that can be used to spin
up an ExternalMiniCluster. The shell is controlled via a wire protocol over
stdin/stdout. The protocol is protobuf-based with optional JSON encoding.

I should add that I like the idea of shipping "mini_cluster" into production
as part of the CLI, as it helps realize the vision of a single Kudu artifact
that can provide Kudu testability for any integrating product.

Change-Id: I0e693921ef780dc4a06e536c6b7408f7f0b252f6
---
M src/kudu/mini-cluster/external_mini_cluster.h
M src/kudu/security/test/mini_kdc.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
A src/kudu/tools/tool.proto
M src/kudu/tools/tool_action.h
M src/kudu/tools/tool_action_common.cc
M src/kudu/tools/tool_action_common.h
A src/kudu/tools/tool_action_test.cc
M src/kudu/tools/tool_main.cc
10 files changed, 1,173 insertions(+), 15 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/53/7853/12
-- 
To view, visit http://gerrit.cloudera.org:8080/7853
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0e693921ef780dc4a06e536c6b7408f7f0b252f6
Gerrit-Change-Number: 7853
Gerrit-PatchSet: 12
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] tool: add cluster shell action

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Dan Burkert, Kudu Jenkins, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/7853

to look at the new patch set (#6).

Change subject: tool: add cluster shell action
......................................................................

tool: add cluster shell action

Maintaining Kudu clients across various languages has been an ongoing
maintenance burden. Even when the client is just a thin wrapper around
another client (e.g. Kudu Python bindings), a great deal of work goes into
client testability. In practice, this has meant a bespoke mini cluster
implementation for each language. On the surface this doesn't seem that bad;
we just need to spawn some masters and tservers, right? Well, the work
quickly adds up:

o While the C++ mini cluster is heavily used and has seen many improvements,
  the Java mini cluster has not received the same kind of love, and is less
  robust as a result. KUDU-1976 is a great example of this deficiency.
o With the inclusion of authn came the addition of a "mini KDC", a special
  daemon for Kerberized mini clusters. It was originally implemented in C++
  and ported to Java, but has yet to be ported to the Python client; this is
  one of the obstacles towards porting full authn support to Python.
o Dan has been prototyping Hive Metastore and Sentry integration for Kudu,
  the testing of which will require "mini HMS" and possibly "mini Sentry"
  testing implementations in C++, Java, and eventually, Python.

In sum, good support for non-C++ mini clusters is an ongoing commitment and
requires a great deal of work. This work hasn't always been forthcoming, and
the non-C++ clusters are deficient as a result. But it doesn't have to be
this way! Here's a thought: what if we reused the C++ mini cluster for tests
written in these other languages? We could write a "proxy" application whose
job it is to manage the C++ mini cluster and expose a rudimentary API that's
easily programmable from Java and Python.

This patch attempts to do just that. It adds a "shell" action to the Kudu
CLI which provides a rudimentary control shell that can be used to spin up
an ExternalMiniCluster. The shell is controlled via a wire protocol over
stdin/stdout. The protocol is protobuf-based with optional JSON encoding.

I should add that I like the idea of shipping "shell" into production as
part of the CLI, as it helps realize the vision of a single Kudu artifact
that can provide Kudu testability for any integrating product.

Change-Id: I0e693921ef780dc4a06e536c6b7408f7f0b252f6
---
M src/kudu/mini-cluster/external_mini_cluster.h
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
A src/kudu/tools/tool.proto
M src/kudu/tools/tool_action.h
M src/kudu/tools/tool_action_common.cc
M src/kudu/tools/tool_action_common.h
A src/kudu/tools/tool_action_test.cc
M src/kudu/tools/tool_main.cc
9 files changed, 1,104 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/53/7853/6
-- 
To view, visit http://gerrit.cloudera.org:8080/7853
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0e693921ef780dc4a06e536c6b7408f7f0b252f6
Gerrit-Change-Number: 7853
Gerrit-PatchSet: 6
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] tool: new action for running mini-clusters

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Dan Burkert, Jean-Daniel Cryans, Kudu Jenkins, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/7853

to look at the new patch set (#11).

Change subject: tool: new action for running mini-clusters
......................................................................

tool: new action for running mini-clusters

Maintaining Kudu clients across various languages has been an ongoing
maintenance burden. Even when the client is just a thin wrapper around
another client (e.g. Kudu Python bindings), a great deal of work goes into
client testability. In practice, this has meant a bespoke mini cluster
implementation for each language. On the surface this doesn't seem that bad;
we just need to spawn some masters and tservers, right? Well, the work
quickly adds up:

o While the C++ mini cluster is heavily used and has seen many improvements,
  the Java mini cluster has not received the same kind of love, and is less
  robust as a result. KUDU-1976 is a great example of this deficiency.
o With the inclusion of authn came the addition of a "mini KDC", a special
  daemon for Kerberized mini clusters. It was originally implemented in C++
  and ported to Java, but has yet to be ported to the Python client; this is
  one of the obstacles towards porting full authn support to Python.
o Dan has been prototyping Hive Metastore and Sentry integration for Kudu,
  the testing of which will require "mini HMS" and possibly "mini Sentry"
  testing implementations in C++, Java, and eventually, Python.

In sum, good support for non-C++ mini clusters is an ongoing commitment and
requires a great deal of work. This work hasn't always been forthcoming, and
the non-C++ clusters are deficient as a result. But it doesn't have to be
this way! Here's a thought: what if we reused the C++ mini cluster for tests
written in these other languages? We could write a "proxy" application whose
job it is to manage the C++ mini cluster and expose a rudimentary API that's
easily programmable from Java and Python.

This patch attempts to do just that. It adds a "mini_cluster" action to the
Kudu CLI which provides a rudimentary control shell that can be used to spin
up an ExternalMiniCluster. The shell is controlled via a wire protocol over
stdin/stdout. The protocol is protobuf-based with optional JSON encoding.

I should add that I like the idea of shipping "mini_cluster" into production
as part of the CLI, as it helps realize the vision of a single Kudu artifact
that can provide Kudu testability for any integrating product.

Change-Id: I0e693921ef780dc4a06e536c6b7408f7f0b252f6
---
M src/kudu/mini-cluster/external_mini_cluster.h
M src/kudu/security/test/mini_kdc.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
A src/kudu/tools/tool.proto
M src/kudu/tools/tool_action.h
M src/kudu/tools/tool_action_common.cc
M src/kudu/tools/tool_action_common.h
A src/kudu/tools/tool_action_test.cc
M src/kudu/tools/tool_main.cc
10 files changed, 1,175 insertions(+), 15 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/53/7853/11
-- 
To view, visit http://gerrit.cloudera.org:8080/7853
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0e693921ef780dc4a06e536c6b7408f7f0b252f6
Gerrit-Change-Number: 7853
Gerrit-PatchSet: 11
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] tool: add cluster shell action

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Dan Burkert, Kudu Jenkins, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/7853

to look at the new patch set (#7).

Change subject: tool: add cluster shell action
......................................................................

tool: add cluster shell action

Maintaining Kudu clients across various languages has been an ongoing
maintenance burden. Even when the client is just a thin wrapper around
another client (e.g. Kudu Python bindings), a great deal of work goes into
client testability. In practice, this has meant a bespoke mini cluster
implementation for each language. On the surface this doesn't seem that bad;
we just need to spawn some masters and tservers, right? Well, the work
quickly adds up:

o While the C++ mini cluster is heavily used and has seen many improvements,
  the Java mini cluster has not received the same kind of love, and is less
  robust as a result. KUDU-1976 is a great example of this deficiency.
o With the inclusion of authn came the addition of a "mini KDC", a special
  daemon for Kerberized mini clusters. It was originally implemented in C++
  and ported to Java, but has yet to be ported to the Python client; this is
  one of the obstacles towards porting full authn support to Python.
o Dan has been prototyping Hive Metastore and Sentry integration for Kudu,
  the testing of which will require "mini HMS" and possibly "mini Sentry"
  testing implementations in C++, Java, and eventually, Python.

In sum, good support for non-C++ mini clusters is an ongoing commitment and
requires a great deal of work. This work hasn't always been forthcoming, and
the non-C++ clusters are deficient as a result. But it doesn't have to be
this way! Here's a thought: what if we reused the C++ mini cluster for tests
written in these other languages? We could write a "proxy" application whose
job it is to manage the C++ mini cluster and expose a rudimentary API that's
easily programmable from Java and Python.

This patch attempts to do just that. It adds a "shell" action to the Kudu
CLI which provides a rudimentary control shell that can be used to spin up
an ExternalMiniCluster. The shell is controlled via a wire protocol over
stdin/stdout. The protocol is protobuf-based with optional JSON encoding.

I should add that I like the idea of shipping "shell" into production as
part of the CLI, as it helps realize the vision of a single Kudu artifact
that can provide Kudu testability for any integrating product.

Change-Id: I0e693921ef780dc4a06e536c6b7408f7f0b252f6
---
M src/kudu/mini-cluster/external_mini_cluster.h
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
A src/kudu/tools/tool.proto
M src/kudu/tools/tool_action.h
M src/kudu/tools/tool_action_common.cc
M src/kudu/tools/tool_action_common.h
A src/kudu/tools/tool_action_test.cc
M src/kudu/tools/tool_main.cc
9 files changed, 1,130 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/53/7853/7
-- 
To view, visit http://gerrit.cloudera.org:8080/7853
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0e693921ef780dc4a06e536c6b7408f7f0b252f6
Gerrit-Change-Number: 7853
Gerrit-PatchSet: 7
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] tool: add cluster shell action

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change. ( http://gerrit.cloudera.org:8080/7853 )

Change subject: tool: add cluster shell action
......................................................................


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool.proto
File src/kudu/tools/tool.proto:

http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool.proto@28
PS8, Line 28:   // The ID of the cluster, unique to the control shell.
> What's the usecase for having multiple clusters?  Can't a client just spawn
Taking this a step further, I think the API would be better/simpler if the cluster creation arguments were provided as flags on the tool itself, so it automatically brings up the cluster when the tool is run.  This makes the case of 'give me a mini cluster when I run the tool, shut it all down when I control-D' really easy.  I would love this API, personally.



-- 
To view, visit http://gerrit.cloudera.org:8080/7853
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0e693921ef780dc4a06e536c6b7408f7f0b252f6
Gerrit-Change-Number: 7853
Gerrit-PatchSet: 8
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 03 Oct 2017 18:51:11 +0000
Gerrit-HasComments: Yes

[kudu-CR] WIP: use C++ ExternalMiniCluster for Java and Python tests

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: WIP: use C++ ExternalMiniCluster for Java and Python tests
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7853/2//COMMIT_MSG
Commit Message:

PS2, Line 47: WIP because, well, it should be pretty obvious. I was able to get through a
            : full run of "mvn verify" locally, so I have confidence that this can work.
            : But I'd like to solicit feedback on the general approach before spending
            : more time applying spit and polish.
general approach seems reasonable to me.

The only concern/question I have is about the formatting and compatibility requirements of the CLI requests/responses. Using something like protobuf or JSON would make it easier to ensure that the commands are extensible, potentially self-documenting, and machine parseable. On the other hand, protobuf at least would require more dependencies to be used in the embedding languages.

Given we have protobuf to/from-JSON support, maybe we could use protobuf to "define" the API, but use JSON to serialize it? Then we don't need to worry about parsing and generation and random things like "what if the minicluster base dir path has a space in it", etc.


-- 
To view, visit http://gerrit.cloudera.org:8080/7853
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0e693921ef780dc4a06e536c6b7408f7f0b252f6
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] tool: add cluster shell action

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/7853 )

Change subject: tool: add cluster shell action
......................................................................


Patch Set 8:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool_action_common.cc
File src/kudu/tools/tool_action_common.cc:

http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool_action_common.cc@685
PS8, Line 685: buf.append("\n");
> It's necessary; the format is newline delimited JSON messages.
Ah, I see.  For some reason I thought the serialized JSON is also preceded with length of the block, but it seems it's not and '\n' is used as a delimiter.


http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool_action_common.cc@718
PS8, Line 718:   CHECK_EQ(buf->length(), r);
> Correct, the issue is a short read, not a zero read.
I'm not sure that's possible for pipes.  From 'man 7 pipe':

If a process attempts to read from an empty pipe, then read(2) will
block until data is available.  If a process attempts to write to a
full pipe (see below), then write(2) blocks until sufficient data has
been read from the pipe to allow the write to complete.

Do you have an example where short reads are visible for pipes?  Just curious.


http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool_action_common.cc@735
PS8, Line 735:   CHECK_EQ(buf.length(), r);
> Short writes are also possible.
I agree that in general short reads are possible, but I'm not sure that's possible for pipes.  As for writes, I don't think short writes are possible at all (unless there is an error, interrupt by signal handler, or the descriptor is set to work in non-blocking mode).

Could you point to some reference/example where it's clear to see that 'short' write is possible in blocking mode?

I found some info at http://www.linux-mag.com/id/308/ (which is Linux specific and might be obsolete now), but even from there I could not see write might return earlier in non-blocking mode.


http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool_action_test.cc
File src/kudu/tools/tool_action_test.cc:

http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool_action_test.cc@346
PS8, Line 346:       ActionBuilder("shell", &RunControlShell)
> I'm in favor of mini_cluster because it's discoverable, self-describing, an
I also think 'kudu test mini_cluster' is a good option.  I'm not advocating for 'control' over 'mini_cluster', just throwing in some more options to choose from.



-- 
To view, visit http://gerrit.cloudera.org:8080/7853
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0e693921ef780dc4a06e536c6b7408f7f0b252f6
Gerrit-Change-Number: 7853
Gerrit-PatchSet: 8
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 03 Oct 2017 22:04:47 +0000
Gerrit-HasComments: Yes

[kudu-CR] tool: new action for running mini-clusters

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Dan Burkert, Jean-Daniel Cryans, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/7853

to look at the new patch set (#9).

Change subject: tool: new action for running mini-clusters
......................................................................

tool: new action for running mini-clusters

Maintaining Kudu clients across various languages has been an ongoing
maintenance burden. Even when the client is just a thin wrapper around
another client (e.g. Kudu Python bindings), a great deal of work goes into
client testability. In practice, this has meant a bespoke mini cluster
implementation for each language. On the surface this doesn't seem that bad;
we just need to spawn some masters and tservers, right? Well, the work
quickly adds up:

o While the C++ mini cluster is heavily used and has seen many improvements,
  the Java mini cluster has not received the same kind of love, and is less
  robust as a result. KUDU-1976 is a great example of this deficiency.
o With the inclusion of authn came the addition of a "mini KDC", a special
  daemon for Kerberized mini clusters. It was originally implemented in C++
  and ported to Java, but has yet to be ported to the Python client; this is
  one of the obstacles towards porting full authn support to Python.
o Dan has been prototyping Hive Metastore and Sentry integration for Kudu,
  the testing of which will require "mini HMS" and possibly "mini Sentry"
  testing implementations in C++, Java, and eventually, Python.

In sum, good support for non-C++ mini clusters is an ongoing commitment and
requires a great deal of work. This work hasn't always been forthcoming, and
the non-C++ clusters are deficient as a result. But it doesn't have to be
this way! Here's a thought: what if we reused the C++ mini cluster for tests
written in these other languages? We could write a "proxy" application whose
job it is to manage the C++ mini cluster and expose a rudimentary API that's
easily programmable from Java and Python.

This patch attempts to do just that. It adds a "mini_cluster" action to the
Kudu CLI which provides a rudimentary control shell that can be used to spin
up an ExternalMiniCluster. The shell is controlled via a wire protocol over
stdin/stdout. The protocol is protobuf-based with optional JSON encoding.

I should add that I like the idea of shipping "mini_cluster" into production
as part of the CLI, as it helps realize the vision of a single Kudu artifact
that can provide Kudu testability for any integrating product.

Change-Id: I0e693921ef780dc4a06e536c6b7408f7f0b252f6
---
M src/kudu/mini-cluster/external_mini_cluster.h
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
A src/kudu/tools/tool.proto
M src/kudu/tools/tool_action.h
M src/kudu/tools/tool_action_common.cc
M src/kudu/tools/tool_action_common.h
A src/kudu/tools/tool_action_test.cc
M src/kudu/tools/tool_main.cc
9 files changed, 1,132 insertions(+), 14 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/53/7853/9
-- 
To view, visit http://gerrit.cloudera.org:8080/7853
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0e693921ef780dc4a06e536c6b7408f7f0b252f6
Gerrit-Change-Number: 7853
Gerrit-PatchSet: 9
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] tool: add cluster shell action

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/7853 )

Change subject: tool: add cluster shell action
......................................................................


Patch Set 7:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/7853/7/src/kudu/tools/kudu-tool-test.cc
File src/kudu/tools/kudu-tool-test.cc:

http://gerrit.cloudera.org:8080/#/c/7853/7/src/kudu/tools/kudu-tool-test.cc@89
PS7, Line 89: #include "kudu/tools/tool_action_common.h"
> warning: #includes are not sorted properly [llvm-include-order]
Done


http://gerrit.cloudera.org:8080/#/c/7853/7/src/kudu/tools/tool_action_common.h
File src/kudu/tools/tool_action_common.h:

http://gerrit.cloudera.org:8080/#/c/7853/7/src/kudu/tools/tool_action_common.h@39
PS7, Line 39: class faststring;
> warning: invalid case style for class 'faststring' [readability-identifier-
It's a forward declaration of an existing class...


http://gerrit.cloudera.org:8080/#/c/7853/7/src/kudu/tools/tool_action_common.cc
File src/kudu/tools/tool_action_common.cc:

http://gerrit.cloudera.org:8080/#/c/7853/7/src/kudu/tools/tool_action_common.cc@38
PS7, Line 38: #include "kudu/client/client.h"
> warning: #includes are not sorted properly [llvm-include-order]
Done


http://gerrit.cloudera.org:8080/#/c/7853/7/src/kudu/tools/tool_action_common.cc@148
PS7, Line 148: using tserver::TabletServerAdminServiceProxy;
> warning: using decl 'TabletServerAdminServiceProxy' is unused [misc-unused-
This is used in template specializations in this file. If I remove it, the build fails.


http://gerrit.cloudera.org:8080/#/c/7853/7/src/kudu/tools/tool_action_common.cc@149
PS7, Line 149: using tserver::TabletServerServiceProxy;
> warning: using decl 'TabletServerServiceProxy' is unused [misc-unused-using
This is used in template specializations in this file. If I remove it, the build fails.


http://gerrit.cloudera.org:8080/#/c/7853/7/src/kudu/tools/tool_action_test.cc
File src/kudu/tools/tool_action_test.cc:

http://gerrit.cloudera.org:8080/#/c/7853/7/src/kudu/tools/tool_action_test.cc@76
PS7, Line 76: using std::cin;
> warning: using decl 'cin' is unused [misc-unused-using-decls]
Done


http://gerrit.cloudera.org:8080/#/c/7853/7/src/kudu/tools/tool_action_test.cc@77
PS7, Line 77: using std::cout;
> warning: using decl 'cout' is unused [misc-unused-using-decls]
Done


http://gerrit.cloudera.org:8080/#/c/7853/7/src/kudu/tools/tool_action_test.cc@78
PS7, Line 78: using std::endl;
> warning: using decl 'endl' is unused [misc-unused-using-decls]
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/7853
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0e693921ef780dc4a06e536c6b7408f7f0b252f6
Gerrit-Change-Number: 7853
Gerrit-PatchSet: 7
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Mon, 02 Oct 2017 18:49:46 +0000
Gerrit-HasComments: Yes

[kudu-CR] WIP: use C++ ExternalMiniCluster for Java and Python tests

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: WIP: use C++ ExternalMiniCluster for Java and Python tests
......................................................................


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/7853/2//COMMIT_MSG
Commit Message:

PS2, Line 36: run_cluster
bikesheddy, but I'd prefer 'kudu mini_cluster'.  Mini cluster is a pretty well established term of art.


PS2, Line 47: WIP because, well, it should be pretty obvious. I was able to get through a
            : full run of "mvn verify" locally, so I have confidence that this can work.
            : But I'd like to solicit feedback on the general approach before spending
            : more time applying spit and polish.
> Would it be possible to use JSON format as an internal format  of the tool,
I'm on board with protobuf for schema / json as encoded format as well.


http://gerrit.cloudera.org:8080/#/c/7853/2/java/kudu-client/src/test/java/org/apache/kudu/client/MiniKuduCluster.java
File java/kudu-client/src/test/java/org/apache/kudu/client/MiniKuduCluster.java:

Line 83:     if (responses.size() == 0) {
Is this possible?  I would expect not due to the do part of do/while.


-- 
To view, visit http://gerrit.cloudera.org:8080/7853
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0e693921ef780dc4a06e536c6b7408f7f0b252f6
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] tool: add cluster shell action

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/7853 )

Change subject: tool: add cluster shell action
......................................................................


Patch Set 8:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool_action_common.cc
File src/kudu/tools/tool_action_common.cc:

http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool_action_common.cc@663
PS8, Line 663: LOG(INFO)
nit: looks like more VLOG(1) to me.


http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool_action_common.cc@669
PS8, Line 669: LOG(INFO)
ditto: VLOG(1) ?


http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool_action_common.cc@680
PS8, Line 680: unable to print JSON to stdout
nit: 'unable to serialize into JSON: $0'?

The printing to stdout is done later at line 701, right?


http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool_action_common.cc@685
PS8, Line 685: buf.append("\n");
I'm curious whether this is really necessary or it's just for better troubleshooting, etc.?  In any case, it would be nice if you could add a comment explaining the reason of having '\n' appended.


http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool_action_common.cc@718
PS8, Line 718:   CHECK_EQ(buf->length(), r);
> This isn't guaranteed, per man 2 read:
But in case of pipes, reading 0 from the read side of the pipe  is possible with pipes only if the writer closes the write side, no?


http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool_action_test.cc
File src/kudu/tools/tool_action_test.cc:

http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool_action_test.cc@216
PS8, Line 216:           opts.master_rpc_ports = { 11030, 11031, 11032 };
> It's checked above on line 196.
Yes, num_master can only be 3 at this point if the code at line 196 stands as is.  My thinking was to assert that even if the code above changes and this piece is not updated.


http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool_action_test.cc@346
PS8, Line 346:       ActionBuilder("shell", &RunControlShell)
> I'm opposed to calling this a shell.  There is precedent for the term shell
I have a proposal as well (you can consider it as bike shedding) -- it can be called 'control'.  So, the invocation would be 'kudu test control' instead of 'kudu test shell'.



-- 
To view, visit http://gerrit.cloudera.org:8080/7853
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0e693921ef780dc4a06e536c6b7408f7f0b252f6
Gerrit-Change-Number: 7853
Gerrit-PatchSet: 8
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 03 Oct 2017 18:15:23 +0000
Gerrit-HasComments: Yes

[kudu-CR] tool: add cluster shell action

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/7853 )

Change subject: tool: add cluster shell action
......................................................................


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool_action_common.cc
File src/kudu/tools/tool_action_common.cc:

http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool_action_common.cc@718
PS8, Line 718:   CHECK_EQ(buf->length(), r);
> I'm not sure that's possible for pipes.  From 'man 7 pipe':
Ah, I played with that and I can see you are right: it's possible for pipes as well.  For some reason, I missread the man page.  Sorry.



-- 
To view, visit http://gerrit.cloudera.org:8080/7853
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0e693921ef780dc4a06e536c6b7408f7f0b252f6
Gerrit-Change-Number: 7853
Gerrit-PatchSet: 8
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 03 Oct 2017 22:24:31 +0000
Gerrit-HasComments: Yes

[kudu-CR] tool: new action for running mini-clusters

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change. ( http://gerrit.cloudera.org:8080/7853 )

Change subject: tool: new action for running mini-clusters
......................................................................


Patch Set 12:

(1 comment)

I think I have some additional feedback on the proto API, but I wanted to run this myself to get some experience before suggesting changes.  I ran into a linker error on macos, perhaps we can debug when we're in the office together:

Undefined symbols for architecture x86_64:
  "kudu::HostPortPB::MergePartialFromCodedStream(google::protobuf::io::CodedInputStream*)", referenced from:
      bool google::protobuf::internal::WireFormatLite::ReadMessageNoVirtual<kudu::HostPortPB>(google::protobuf::io::CodedInputStream*, kudu::HostPortPB*) in tool.pb.cc.o
  "kudu::HostPortPB::Clear()", referenced from:
      kudu::tools::DaemonInfoPB::Clear() in tool.pb.cc.o
  "kudu::HostPortPB::MergeFrom(kudu::HostPortPB const&)", referenced from:
      kudu::tools::DaemonInfoPB::MergeFrom(kudu::tools::DaemonInfoPB const&) in tool.pb.cc.o
  "kudu::HostPortPB::HostPortPB(kudu::HostPortPB const&)", referenced from:
      kudu::tools::DaemonInfoPB::DaemonInfoPB(kudu::tools::DaemonInfoPB const&) in tool.pb.cc.o
  "kudu::HostPortPB::HostPortPB()", referenced from:
      kudu::tools::DaemonInfoPB::mutable_bound_rpc_address() in tool.pb.cc.o
  "kudu::AppStatusPB::MergePartialFromCodedStream(google::protobuf::io::CodedInputStream*)", referenced from:
      bool google::protobuf::internal::WireFormatLite::ReadMessageNoVirtual<kudu::AppStatusPB>(google::protobuf::io::CodedInputStream*, kudu::AppStatusPB*) in tool.pb.cc.o
  "kudu::AppStatusPB::Clear()", referenced from:
      kudu::tools::ControlShellResponsePB::Clear() in tool.pb.cc.o
  "kudu::AppStatusPB::MergeFrom(kudu::AppStatusPB const&)", referenced from:
      kudu::tools::ControlShellResponsePB::MergeFrom(kudu::tools::ControlShellResponsePB const&) in tool.pb.cc.o
  "kudu::AppStatusPB::AppStatusPB(kudu::AppStatusPB const&)", referenced from:
      kudu::tools::ControlShellResponsePB::ControlShellResponsePB(kudu::tools::ControlShellResponsePB const&) in tool.pb.cc.o
  "kudu::AppStatusPB::AppStatusPB()", referenced from:
      kudu::tools::ControlShellResponsePB::mutable_error() in tool.pb.cc.o
  "kudu::_HostPortPB_default_instance_", referenced from:
      kudu::HostPortPB::internal_default_instance() in tool.pb.cc.o
      kudu::tools::DaemonInfoPB::bound_rpc_address() const in tool.pb.cc.o
  "kudu::_AppStatusPB_default_instance_", referenced from:
      kudu::AppStatusPB::internal_default_instance() in tool.pb.cc.o
      kudu::tools::ControlShellResponsePB::error() const in tool.pb.cc.o
  "kudu::protobuf_kudu_2fcommon_2fcommon_2eproto::InitDefaults()", referenced from:
      kudu::tools::protobuf_kudu_2ftools_2ftool_2eproto::TableStruct::InitDefaultsImpl() in tool.pb.cc.o
  "kudu::protobuf_kudu_2fcommon_2fcommon_2eproto::AddDescriptors()", referenced from:
      kudu::tools::protobuf_kudu_2ftools_2ftool_2eproto::(anonymous namespace)::AddDescriptorsImpl() in tool.pb.cc.o
  "kudu::protobuf_kudu_2fcommon_2fwire_5fprotocol_2eproto::InitDefaults()", referenced from:
      kudu::tools::protobuf_kudu_2ftools_2ftool_2eproto::TableStruct::InitDefaultsImpl() in tool.pb.cc.o
  "kudu::protobuf_kudu_2fcommon_2fwire_5fprotocol_2eproto::AddDescriptors()", referenced from:
      kudu::tools::protobuf_kudu_2ftools_2ftool_2eproto::(anonymous namespace)::AddDescriptorsImpl() in tool.pb.cc.o
  "kudu::HostPortPB::ByteSizeLong() const", referenced from:
      unsigned long google::protobuf::internal::WireFormatLite::MessageSizeNoVirtual<kudu::HostPortPB>(kudu::HostPortPB const&) in tool.pb.cc.o
  "kudu::HostPortPB::IsInitialized() const", referenced from:
      kudu::tools::DaemonInfoPB::IsInitialized() const in tool.pb.cc.o
  "kudu::HostPortPB::InternalSerializeWithCachedSizesToArray(bool, unsigned char*) const", referenced from:
      unsigned char* google::protobuf::internal::WireFormatLite::InternalWriteMessageNoVirtualToArray<kudu::HostPortPB>(int, kudu::HostPortPB const&, bool, unsigned char*) in tool.pb.cc.o
  "kudu::AppStatusPB::ByteSizeLong() const", referenced from:
      unsigned long google::protobuf::internal::WireFormatLite::MessageSizeNoVirtual<kudu::AppStatusPB>(kudu::AppStatusPB const&) in tool.pb.cc.o
  "kudu::AppStatusPB::IsInitialized() const", referenced from:
      kudu::tools::ControlShellResponsePB::IsInitialized() const in tool.pb.cc.o
  "kudu::AppStatusPB::InternalSerializeWithCachedSizesToArray(bool, unsigned char*) const", referenced from:
      unsigned char* google::protobuf::internal::WireFormatLite::InternalWriteMessageNoVirtualToArray<kudu::AppStatusPB>(int, kudu::AppStatusPB const&, bool, unsigned char*) in tool.pb.cc.o

http://gerrit.cloudera.org:8080/#/c/7853/12/src/kudu/tools/tool.proto
File src/kudu/tools/tool.proto:

http://gerrit.cloudera.org:8080/#/c/7853/12/src/kudu/tools/tool.proto@68
PS12, Line 68:   UNKNOWN_DAEMON = 999;
Use 0 for unknown and count up from there.  This is a best-practice, and is enforced as of proto3.  We have other UNKNOWN=999; instances in the codebase because we retroactively added it after already having a 0 value.



-- 
To view, visit http://gerrit.cloudera.org:8080/7853
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0e693921ef780dc4a06e536c6b7408f7f0b252f6
Gerrit-Change-Number: 7853
Gerrit-PatchSet: 12
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Fri, 06 Oct 2017 18:14:02 +0000
Gerrit-HasComments: Yes

[kudu-CR] WIP: use C++ ExternalMiniCluster for Java and Python tests

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: WIP: use C++ ExternalMiniCluster for Java and Python tests
......................................................................


Patch Set 2:

(2 comments)

> I think it's a great idea.  What about more machine-oriented
 > interface for the CLI tool?  Do you expect it to be transformed
 > into something JSON-like in the nearest future?
 
See the comment Todd left; we're discussing just that.

 > Maybe, it's worth introducing running a proxy along with
 > minicluster and providing something like REST interface instead of
 > CLI for the tests?

I didn't implement a TCP-based connection between the proxy and the tests exactly so that the entire "port already in use" class of issues can be avoided. When communication is over a TCP socket, we either need to use a well-known port, which is prone to conflicts, or an ephemeral port, whose number needs to be communicated back to the tests. If we've already got a channel for communicating the port number (probably stdout), we can use that channel for control too.

I think a socket-based approach would make more sense if there were multiple consumers of a single mini cluster, but that's just not the case with our tests.

http://gerrit.cloudera.org:8080/#/c/7853/2//COMMIT_MSG
Commit Message:

PS2, Line 20: authz
> nit here and below: I think it should be 'authn' -- the kerberos-related ac
You are right, my bad.


PS2, Line 47: WIP because, well, it should be pretty obvious. I was able to get through a
            : full run of "mvn verify" locally, so I have confidence that this can work.
            : But I'd like to solicit feedback on the general approach before spending
            : more time applying spit and polish.
> general approach seems reasonable to me.
I went back and forth on this.

JSON (or protobuf, or thrift, or or or...) would certainly make the RPC system far more robust and maintainable. But we'd lose the ability to actually use run_cluster interactively from the command line, because no one wants to write JSON or whatever by hand.

Right now the noun/verb word-based RPC allows for interactivity and that's how I did much of the early testing. It's not robust, but it won't be broken by simple things like e.g. a minicluster dir with a space (it would be broken by unexpected newlines though).

Do you think the pros of using a real serialization format outweigh the loss of interactivity?


-- 
To view, visit http://gerrit.cloudera.org:8080/7853
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0e693921ef780dc4a06e536c6b7408f7f0b252f6
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] WIP: use C++ ExternalMiniCluster for Java and Python tests

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Dan Burkert, Kudu Jenkins, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/7853

to look at the new patch set (#4).

Change subject: WIP: use C++ ExternalMiniCluster for Java and Python tests
......................................................................

WIP: use C++ ExternalMiniCluster for Java and Python tests

Maintaining Kudu clients across various languages has been an ongoing
maintenance burden. Even when the client is just a thin wrapper around
another client (e.g. Kudu Python bindings), a great deal of work goes into
client testability. In practice, this has meant a bespoke mini cluster
implementation for each language. On the surface this doesn't seem that bad;
we just need to spawn some masters and tservers, right? Well, the work
quickly adds up:

o While the C++ mini cluster is heavily used and has seen many improvements,
  the Java mini cluster has not received the same kind of love, and is less
  robust as a result. KUDU-1976 is a great example of this deficiency.
o With the inclusion of authn came the addition of a "mini KDC", a special
  daemon for Kerberized mini clusters. It was originally implemented in C++
  and ported to Java, but has yet to be ported to the Python client; this is
  one of the obstacles towards porting full authn support to Python.
o Dan has been prototyping Hive Metastore and Sentry integration for Kudu,
  the testing of which will require "mini HMS" and possibly "mini Sentry"
  testing implementations in C++, Java, and eventually, Python.

In sum, good support for non-C++ mini clusters is an ongoing commitment and
requires a great deal of work. This work hasn't always been forthcoming, and
the non-C++ clusters are deficient as a result. But it doesn't have to be
this way! Here's a thought: what if we reused the C++ mini cluster for tests
written in these other languages? We could write a "proxy" application whose
job it is to manage the C++ mini cluster and expose a rudimentary API that's
easily programmable from Java and Python.

This patch attempts to do just that. It adds a "shell" mode to the Kudu CLI
which provides a rudimentary control shell that can be used to spin up an
ExternalMiniCluster. The shell is controlled via a wire protocol over
stdin/stdout. The first cut of the protocol was sh-like, with simple
word-based commands. It was then revised into a PB-based protocol (with
optional JSON encoding) based on feedback. As a proof of concept, the patch
also replaces the bespoke Java mini cluster with callouts to the new shell.

I should add that I like the idea of shipping "shell" into production as
part of the CLI, as it helps realize the vision of a single Kudu artifact
that can provide Kudu testability for any integrating product.

WIP because, well, it should be pretty obvious. I was able to get through a
full run of "mvn verify" locally, so I have confidence that this can work.
But I'd like to solicit feedback on the general approach before spending
more time applying spit and polish.

Change-Id: I0e693921ef780dc4a06e536c6b7408f7f0b252f6
---
M java/kudu-client/src/test/java/org/apache/kudu/client/BaseKuduTest.java
D java/kudu-client/src/test/java/org/apache/kudu/client/MiniKdc.java
M java/kudu-client/src/test/java/org/apache/kudu/client/MiniKuduCluster.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestClientFailoverSupport.java
D java/kudu-client/src/test/java/org/apache/kudu/client/TestMiniKdc.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestMiniKuduCluster.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestMultipleLeaderFailover.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestUtils.java
D java/kudu-client/src/test/resources/flags
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/TestContext.scala
M src/kudu/integration-tests/external_mini_cluster.h
M src/kudu/security/test/mini_kdc.cc
M src/kudu/tools/CMakeLists.txt
A src/kudu/tools/tool.proto
M src/kudu/tools/tool_action.h
A src/kudu/tools/tool_action_test.cc
M src/kudu/tools/tool_main.cc
17 files changed, 919 insertions(+), 1,256 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/53/7853/4
-- 
To view, visit http://gerrit.cloudera.org:8080/7853
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0e693921ef780dc4a06e536c6b7408f7f0b252f6
Gerrit-Change-Number: 7853
Gerrit-PatchSet: 4
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] tool: new action for running mini-clusters

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Dan Burkert, Jean-Daniel Cryans, Kudu Jenkins, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/7853

to look at the new patch set (#10).

Change subject: tool: new action for running mini-clusters
......................................................................

tool: new action for running mini-clusters

Maintaining Kudu clients across various languages has been an ongoing
maintenance burden. Even when the client is just a thin wrapper around
another client (e.g. Kudu Python bindings), a great deal of work goes into
client testability. In practice, this has meant a bespoke mini cluster
implementation for each language. On the surface this doesn't seem that bad;
we just need to spawn some masters and tservers, right? Well, the work
quickly adds up:

o While the C++ mini cluster is heavily used and has seen many improvements,
  the Java mini cluster has not received the same kind of love, and is less
  robust as a result. KUDU-1976 is a great example of this deficiency.
o With the inclusion of authn came the addition of a "mini KDC", a special
  daemon for Kerberized mini clusters. It was originally implemented in C++
  and ported to Java, but has yet to be ported to the Python client; this is
  one of the obstacles towards porting full authn support to Python.
o Dan has been prototyping Hive Metastore and Sentry integration for Kudu,
  the testing of which will require "mini HMS" and possibly "mini Sentry"
  testing implementations in C++, Java, and eventually, Python.

In sum, good support for non-C++ mini clusters is an ongoing commitment and
requires a great deal of work. This work hasn't always been forthcoming, and
the non-C++ clusters are deficient as a result. But it doesn't have to be
this way! Here's a thought: what if we reused the C++ mini cluster for tests
written in these other languages? We could write a "proxy" application whose
job it is to manage the C++ mini cluster and expose a rudimentary API that's
easily programmable from Java and Python.

This patch attempts to do just that. It adds a "mini_cluster" action to the
Kudu CLI which provides a rudimentary control shell that can be used to spin
up an ExternalMiniCluster. The shell is controlled via a wire protocol over
stdin/stdout. The protocol is protobuf-based with optional JSON encoding.

I should add that I like the idea of shipping "mini_cluster" into production
as part of the CLI, as it helps realize the vision of a single Kudu artifact
that can provide Kudu testability for any integrating product.

Change-Id: I0e693921ef780dc4a06e536c6b7408f7f0b252f6
---
M src/kudu/mini-cluster/external_mini_cluster.h
M src/kudu/security/test/mini_kdc.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
A src/kudu/tools/tool.proto
M src/kudu/tools/tool_action.h
M src/kudu/tools/tool_action_common.cc
M src/kudu/tools/tool_action_common.h
A src/kudu/tools/tool_action_test.cc
M src/kudu/tools/tool_main.cc
10 files changed, 1,156 insertions(+), 15 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/53/7853/10
-- 
To view, visit http://gerrit.cloudera.org:8080/7853
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0e693921ef780dc4a06e536c6b7408f7f0b252f6
Gerrit-Change-Number: 7853
Gerrit-PatchSet: 10
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] tool: add cluster shell action

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/7853 )

Change subject: tool: add cluster shell action
......................................................................


Patch Set 8:

(11 comments)

Just skimmed through.  Will take a deeper look tomorrow.

http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/kudu-tool-test.cc
File src/kudu/tools/kudu-tool-test.cc:

http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/kudu-tool-test.cc@1843
PS8, Line 1843:     ASSERT_FALSE(s.ok());
              :     ASSERT_TRUE(s.IsInvalidArgument());
here and below: why is it necessary to verify for ASSERT_FALSE(s.ok()); if next line is about ASSERT_TRUE(s.IsInvlidArgument())?


http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool.proto
File src/kudu/tools/tool.proto:

http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool.proto@60
PS8, Line 60: The generated ID of the cluster, unique to the control shell.
maybe just: 'ID of the cluster to destroy.' ?


http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool.proto@91
PS8, Line 91: required HostPortPB process_address
How does it work for multiple addresses (corresponding to '--rpc_bind_addresses' flag)?  Or we never want to have that test cluster bound to multiple addresses?


http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool.proto@104
PS8, Line 104: HostPortPB
Same question if multiple addresses are allowed: which address should be specified here -- the first one or any of the bound addresses?


http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool_action_test.cc
File src/kudu/tools/tool_action_test.cc:

http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool_action_test.cc@20
PS8, Line 20: <stdlib.h>
nit: please use <cstdlib> instead.


http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool_action_test.cc@64
PS8, Line 64: ;
nit: remove extra semicolon


http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool_action_test.cc@102
PS8, Line 102: InvalidArgument
nit: maybe, NotFound is the better choice here?


http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool_action_test.cc@133
PS8, Line 133: InvalidArgument
ditto regarding NotFound


http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool_action_test.cc@150
PS8, Line 150: FLAGS_serialization == "json"
As I see, case-insensitive comparison (i.e. boost::iequals(v, "json")) is used in the validator.  Maybe, it makes sense to compare case-insensitive comparison here as well?


http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool_action_test.cc@153
PS8, Line 153: "pb", FLAGS_serialization
ditto for case-insensitive comparison.


http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool_action_test.cc@216
PS8, Line 216:           opts.master_rpc_ports = { 11030, 11031, 11032 };
nit: add DCHECK_EQ(3, opts.num_masters) ?



-- 
To view, visit http://gerrit.cloudera.org:8080/7853
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0e693921ef780dc4a06e536c6b7408f7f0b252f6
Gerrit-Change-Number: 7853
Gerrit-PatchSet: 8
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 03 Oct 2017 04:30:17 +0000
Gerrit-HasComments: Yes

[kudu-CR] tool: new action for running mini-clusters

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/7853 )

Change subject: tool: new action for running mini-clusters
......................................................................

tool: new action for running mini-clusters

Maintaining Kudu clients across various languages has been an ongoing
maintenance burden. Even when the client is just a thin wrapper around
another client (e.g. Kudu Python bindings), a great deal of work goes into
client testability. In practice, this has meant a bespoke mini cluster
implementation for each language. On the surface this doesn't seem that bad;
we just need to spawn some masters and tservers, right? Well, the work
quickly adds up:

o While the C++ mini cluster is heavily used and has seen many improvements,
  the Java mini cluster has not received the same kind of love, and is less
  robust as a result. KUDU-1976 is a great example of this deficiency.
o With the inclusion of authn came the addition of a "mini KDC", a special
  daemon for Kerberized mini clusters. It was originally implemented in C++
  and ported to Java, but has yet to be ported to the Python client; this is
  one of the obstacles towards porting full authn support to Python.
o Dan has been prototyping Hive Metastore and Sentry integration for Kudu,
  the testing of which will require "mini HMS" and possibly "mini Sentry"
  testing implementations in C++, Java, and eventually, Python.

In sum, good support for non-C++ mini clusters is an ongoing commitment and
requires a great deal of work. This work hasn't always been forthcoming, and
the non-C++ clusters are deficient as a result. But it doesn't have to be
this way! Here's a thought: what if we reused the C++ mini cluster for tests
written in these other languages? We could write a "proxy" application whose
job it is to manage the C++ mini cluster and expose a rudimentary API that's
easily programmable from Java and Python.

This patch attempts to do just that. It adds a "mini_cluster" action to the
Kudu CLI which provides a rudimentary control shell that can be used to spin
up an ExternalMiniCluster. The shell is controlled via a wire protocol over
stdin/stdout. The protocol is protobuf-based with optional JSON encoding.

I should add that I like the idea of shipping "mini_cluster" into production
as part of the CLI, as it helps realize the vision of a single Kudu artifact
that can provide Kudu testability for any integrating product.

Change-Id: I0e693921ef780dc4a06e536c6b7408f7f0b252f6
Reviewed-on: http://gerrit.cloudera.org:8080/7853
Reviewed-by: Dan Burkert <da...@apache.org>
Tested-by: Kudu Jenkins
---
M src/kudu/mini-cluster/external_mini_cluster.h
M src/kudu/security/test/mini_kdc.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
A src/kudu/tools/tool.proto
M src/kudu/tools/tool_action.h
M src/kudu/tools/tool_action_common.cc
M src/kudu/tools/tool_action_common.h
A src/kudu/tools/tool_action_test.cc
M src/kudu/tools/tool_main.cc
10 files changed, 1,209 insertions(+), 14 deletions(-)

Approvals:
  Dan Burkert: Looks good to me, approved
  Kudu Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/7853
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I0e693921ef780dc4a06e536c6b7408f7f0b252f6
Gerrit-Change-Number: 7853
Gerrit-PatchSet: 14
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] WIP: use C++ ExternalMiniCluster for Java and Python tests

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: WIP: use C++ ExternalMiniCluster for Java and Python tests
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7853/2//COMMIT_MSG
Commit Message:

PS2, Line 47: WIP because, well, it should be pretty obvious. I was able to get through a
            : full run of "mvn verify" locally, so I have confidence that this can work.
            : But I'd like to solicit feedback on the general approach before spending
            : more time applying spit and polish.
> I went back and forth on this.
I'm just worried that, as we add features, the parsing is going to get more complicated than just noun/verb. eg what if I want to run a kerberized minicluster, but with a custom realm, or with two KDCs set up for cross-realm trust in the future, etc?

Maybe we could have the most simple "start a non-kerberized cluster with 3 servers" be done with interactive use, since a developer might want to use this for local playing around, but then make anything more complicated (eg custom flags, more exotic configs) require JSON?


-- 
To view, visit http://gerrit.cloudera.org:8080/7853
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0e693921ef780dc4a06e536c6b7408f7f0b252f6
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] tool: new action for running mini-clusters

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/7853 )

Change subject: tool: new action for running mini-clusters
......................................................................


Patch Set 10:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool.proto
File src/kudu/tools/tool.proto:

http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool.proto@28
PS8, Line 28: // If not provided, the defaults from that class will be used instead.
> > Once you've gone that far, it's trivial to convert that optional<External
Fine.

I still think there's room for this in the future (e.g. disaster recovery where there may be two clusters that need to be aware of one another), but I suppose it'll be relatively easy to add then.


http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool.proto@29
PS8, Line 29: //
> required fields should never be added to protobuf messages.  This isn't spe
It's a moot point now that I've removed multi-cluster functionality, though I did make the process_id in Start/StopProcessRequestPB optional since you feel strongly about this.


http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool.proto@97
PS8, Line 97: }
> Would this API be used for HMS/Sentry processes if/when those are added to 
Yeah. As we discussed, I changed this to use ID-based addressing, where the IDs are "master0", "tserver1", "kdc", etc. That also means you can stop a KDC through it now.



-- 
To view, visit http://gerrit.cloudera.org:8080/7853
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0e693921ef780dc4a06e536c6b7408f7f0b252f6
Gerrit-Change-Number: 7853
Gerrit-PatchSet: 10
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 04 Oct 2017 23:22:17 +0000
Gerrit-HasComments: Yes

[kudu-CR] tool: add cluster shell action

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change. ( http://gerrit.cloudera.org:8080/7853 )

Change subject: tool: add cluster shell action
......................................................................


Patch Set 8:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/CMakeLists.txt
File src/kudu/tools/CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/CMakeLists.txt@120
PS8, Line 120: mini-cluster
This should use an underscore to be consistent with the rest of our libraries.


http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool.proto
File src/kudu/tools/tool.proto:

http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool.proto@28
PS8, Line 28:   // The ID of the cluster, unique to the control shell.
What's the usecase for having multiple clusters?  Can't a client just spawn multiple instances if they need multiple clusters?


http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool.proto@29
PS8, Line 29:   required int32 cluster_id = 1;
use optional


http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool.proto@61
PS8, Line 61:   required int32 cluster_id = 1;
use optional


http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool.proto@71
PS8, Line 71:   required int32 cluster_id = 1;
optional


http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool.proto@80
PS8, Line 80:   required int32 cluster_id = 1;
optional, etc.


http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool_action_common.cc
File src/kudu/tools/tool_action_common.cc:

http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool_action_common.cc@613
PS8, Line 613:       // Read and accumulate one byte at a time, looking for the newline.
Seems like it would be easier and more efficient to keep the read buffer as a field, do bigger reads, and use something like strnchr to find the newline.


http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool_action_common.cc@616
PS8, Line 616:         faststring one_byte;
hoist this out of the loop.


http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool_action_common.cc@718
PS8, Line 718:   CHECK_EQ(buf->length(), r);
This isn't guaranteed, per man 2 read:

> Upon successful completion, read(), readv(), and pread() return the number of bytes actually read and placed in the buffer.  The system guarantees to read the number of bytes requested if the descriptor references a normal file that has that many bytes left before the end-of-file, but in no other case.

I think you may see this in practice if the client sends the length header, and then sends the request body split across multiple write() calls (perhaps pausing in between to make it more racy).


http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool_action_test.cc
File src/kudu/tools/tool_action_test.cc:

http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool_action_test.cc@216
PS8, Line 216:           opts.master_rpc_ports = { 11030, 11031, 11032 };
> nit: add DCHECK_EQ(3, opts.num_masters) ?
It's checked above on line 196.

Isn't the hard-coded master ports going to cause problems for multi-master, multi-cluster shells?  I think this is another good reason not to support multiple clusters.


http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool_action_test.cc@346
PS8, Line 346:       ActionBuilder("shell", &RunControlShell)
I'm opposed to calling this a shell.  There is precedent for the term shell with noSQL databases, and it's not this.  People regularly ask if there's a Kudu shell, and they aren't asking for this.  Eventually we might have a proper shell in the kudu tool.  I think 'kudu mini_cluster' is a better name, given that mini-cluster is an established term of art.



-- 
To view, visit http://gerrit.cloudera.org:8080/7853
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0e693921ef780dc4a06e536c6b7408f7f0b252f6
Gerrit-Change-Number: 7853
Gerrit-PatchSet: 8
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 03 Oct 2017 15:24:47 +0000
Gerrit-HasComments: Yes

[kudu-CR] tool: new action for running mini-clusters

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change. ( http://gerrit.cloudera.org:8080/7853 )

Change subject: tool: new action for running mini-clusters
......................................................................


Patch Set 13: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/7853
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0e693921ef780dc4a06e536c6b7408f7f0b252f6
Gerrit-Change-Number: 7853
Gerrit-PatchSet: 13
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Fri, 06 Oct 2017 22:20:02 +0000
Gerrit-HasComments: No

[kudu-CR] tool: add cluster shell action

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/7853 )

Change subject: tool: add cluster shell action
......................................................................


Patch Set 8: Verified+1

Overriding Jenkins, the 10 test failures were all due to an unsynchronized clock on one dist-test slave.


-- 
To view, visit http://gerrit.cloudera.org:8080/7853
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0e693921ef780dc4a06e536c6b7408f7f0b252f6
Gerrit-Change-Number: 7853
Gerrit-PatchSet: 8
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Mon, 02 Oct 2017 20:50:57 +0000
Gerrit-HasComments: No

[kudu-CR] tool: add cluster shell action

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Dan Burkert, Kudu Jenkins, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/7853

to look at the new patch set (#5).

Change subject: tool: add cluster shell action
......................................................................

tool: add cluster shell action

Maintaining Kudu clients across various languages has been an ongoing
maintenance burden. Even when the client is just a thin wrapper around
another client (e.g. Kudu Python bindings), a great deal of work goes into
client testability. In practice, this has meant a bespoke mini cluster
implementation for each language. On the surface this doesn't seem that bad;
we just need to spawn some masters and tservers, right? Well, the work
quickly adds up:

o While the C++ mini cluster is heavily used and has seen many improvements,
  the Java mini cluster has not received the same kind of love, and is less
  robust as a result. KUDU-1976 is a great example of this deficiency.
o With the inclusion of authn came the addition of a "mini KDC", a special
  daemon for Kerberized mini clusters. It was originally implemented in C++
  and ported to Java, but has yet to be ported to the Python client; this is
  one of the obstacles towards porting full authn support to Python.
o Dan has been prototyping Hive Metastore and Sentry integration for Kudu,
  the testing of which will require "mini HMS" and possibly "mini Sentry"
  testing implementations in C++, Java, and eventually, Python.

In sum, good support for non-C++ mini clusters is an ongoing commitment and
requires a great deal of work. This work hasn't always been forthcoming, and
the non-C++ clusters are deficient as a result. But it doesn't have to be
this way! Here's a thought: what if we reused the C++ mini cluster for tests
written in these other languages? We could write a "proxy" application whose
job it is to manage the C++ mini cluster and expose a rudimentary API that's
easily programmable from Java and Python.

This patch attempts to do just that. It adds a "shell" action to the Kudu
CLI which provides a rudimentary control shell that can be used to spin up
an ExternalMiniCluster. The shell is controlled via a wire protocol over
stdin/stdout. The protocol is protobuf-based with optional JSON encoding.

I should add that I like the idea of shipping "shell" into production as
part of the CLI, as it helps realize the vision of a single Kudu artifact
that can provide Kudu testability for any integrating product.

Change-Id: I0e693921ef780dc4a06e536c6b7408f7f0b252f6
---
M src/kudu/mini-cluster/external_mini_cluster.h
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
A src/kudu/tools/tool.proto
M src/kudu/tools/tool_action.h
M src/kudu/tools/tool_action_common.cc
M src/kudu/tools/tool_action_common.h
A src/kudu/tools/tool_action_test.cc
M src/kudu/tools/tool_main.cc
9 files changed, 1,104 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/53/7853/5
-- 
To view, visit http://gerrit.cloudera.org:8080/7853
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0e693921ef780dc4a06e536c6b7408f7f0b252f6
Gerrit-Change-Number: 7853
Gerrit-PatchSet: 5
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] WIP: use C++ ExternalMiniCluster for Java and Python tests

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/7853 )

Change subject: WIP: use C++ ExternalMiniCluster for Java and Python tests
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7853/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/7853/2//COMMIT_MSG@47
PS2, Line 47: 
            : WIP because, well, it should be pretty obvious. I was able to get through a
            : full run of "mvn verify" locally, so I have confidence that this can work.
            : But I'd like to solicit feedback on
> I'm on board with protobuf for schema / json as encoded format as well.
OK, PS3 changes the interface to use a protobuf-based schema with an optional JSON serialization. The former provides more type safety for clients who are willing to eat the protobuf dependency and the latter for those who aren't.



-- 
To view, visit http://gerrit.cloudera.org:8080/7853
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0e693921ef780dc4a06e536c6b7408f7f0b252f6
Gerrit-Change-Number: 7853
Gerrit-PatchSet: 3
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 27 Sep 2017 02:13:20 +0000
Gerrit-HasComments: Yes

[kudu-CR] tool: add cluster shell action

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/7853 )

Change subject: tool: add cluster shell action
......................................................................


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool_action_common.cc
File src/kudu/tools/tool_action_common.cc:

http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool_action_common.cc@735
PS8, Line 735:   CHECK_EQ(buf.length(), r);
> I agree that in general short reads are possible, but I'm not sure that's p
Yep, as I found, short reads are possible for pipes, indeed (shame on me: I forgot my previous experience and mis-sread the man page).  However, for writing into pipes, I'm not sure it's the case.

From the other side, if it's possible to implement the write path to work even with short writes as well (regardless whether it's possible or not for pipes), that might be the best.

Please forgive my bikeshedding :)



-- 
To view, visit http://gerrit.cloudera.org:8080/7853
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0e693921ef780dc4a06e536c6b7408f7f0b252f6
Gerrit-Change-Number: 7853
Gerrit-PatchSet: 8
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 03 Oct 2017 22:55:34 +0000
Gerrit-HasComments: Yes

[kudu-CR] WIP: use C++ ExternalMiniCluster for Java and Python tests

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: WIP: use C++ ExternalMiniCluster for Java and Python tests
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7853/2//COMMIT_MSG
Commit Message:

PS2, Line 47: WIP because, well, it should be pretty obvious. I was able to get through a
            : full run of "mvn verify" locally, so I have confidence that this can work.
            : But I'd like to solicit feedback on the general approach before spending
            : more time applying spit and polish.
> I went back and forth on this.
Would it be possible to use JSON format as an internal format  of the tool, while translating the command-line arguments into a JSON document which would be consumed internally?  Basically, it's about a shim layer which translates the command-line arguments into a JSON representation.

Vice versa: it could be the same for the output.  As for the output, there are libraries like libxo to handle that in a uniform way:
  https://github.com/Juniper/libxo


-- 
To view, visit http://gerrit.cloudera.org:8080/7853
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0e693921ef780dc4a06e536c6b7408f7f0b252f6
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] WIP: use C++ ExternalMiniCluster for Java and Python tests

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: WIP: use C++ ExternalMiniCluster for Java and Python tests
......................................................................


Patch Set 2:

(1 comment)

I think it's a great idea.  What about more machine-oriented interface for the CLI tool?  Do you expect it to be transformed into something JSON-like in the nearest future?

Maybe, it's worth introducing running a proxy along with minicluster and providing something like REST interface instead of CLI for the tests?

http://gerrit.cloudera.org:8080/#/c/7853/2//COMMIT_MSG
Commit Message:

PS2, Line 20: authz
nit here and below: I think it should be 'authn' -- the kerberos-related activity is related to authentication, not authorization (and Sentry integration is supposed to take care of the fine-grained authorization in the future)


-- 
To view, visit http://gerrit.cloudera.org:8080/7853
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0e693921ef780dc4a06e536c6b7408f7f0b252f6
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] tool: new action for running mini-clusters

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/7853 )

Change subject: tool: new action for running mini-clusters
......................................................................


Patch Set 8:

(27 comments)

http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/CMakeLists.txt
File src/kudu/tools/CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/CMakeLists.txt@120
PS8, Line 120: mini-cluster
> nevermind, bad suggestion
Heh, between the time you made the suggestion and the time you rescinded it I wrote a patch to change mini-cluster to mini_cluster (and integration-tests to itest_util). So, too late.


http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/kudu-tool-test.cc
File src/kudu/tools/kudu-tool-test.cc:

http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/kudu-tool-test.cc@1843
PS8, Line 1843:     ASSERT_FALSE(s.ok());
              :     ASSERT_TRUE(s.IsInvalidArgument());
> here and below: why is it necessary to verify for ASSERT_FALSE(s.ok()); if 
I instinctively did it this way, but I have no good explanation for why. I'll remove it.


http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool.proto
File src/kudu/tools/tool.proto:

http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool.proto@28
PS8, Line 28:   // The ID of the cluster, unique to the control shell.
> Taking this a step further, I think the API would be better/simpler if the 
My original prototype used gflags for cluster options. Once I transitioned to this structured control protocol, I didn't think it made sense to do that anymore, because that meant marshaling some stuff via PB/json (commands) and the rest through gflags. For trivial options it doesn't matter, but for more complicated ones (like extra_tserver_flags), I'd prefer to marshal everything the same way. In the future I imagine we'll expose even more options and I'd much rather do so via PB/json than figuring out how to shoehorn them into gflags. Alexey also suggested this approach, I believe.

Given the above, we have to bring up the shell "empty" and force clients to explicitly create a cluster. But we also want to support "cluster stop" separately from "cluster stop and destroy", because some tests might want to restart their cluster. So that means the shell needs to be at least stateful enough to differentiate between "cluster exists" and "cluster does not exist".

Once you've gone that far, it's trivial to convert that optional<ExternalMiniCluster> into a map<int, ExternalMiniCluster> and support multiple clusters, so I did it even though there's no explicit use case. FWIW, think reusing a single control instance for that is preferable than forcing clients to manage multiple instances.


http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool.proto@29
PS8, Line 29:   required int32 cluster_id = 1;
> use optional
Why? This message is useless if it doesn't include the cluster_id.


http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool.proto@60
PS8, Line 60: The generated ID of the cluster, unique to the control shell.
> maybe just: 'ID of the cluster to destroy.' ?
Done


http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool.proto@61
PS8, Line 61:   required int32 cluster_id = 1;
> use optional
See above.


http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool.proto@71
PS8, Line 71:   required int32 cluster_id = 1;
> optional
See above.


http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool.proto@80
PS8, Line 80:   required int32 cluster_id = 1;
> optional, etc.
See above.


http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool.proto@91
PS8, Line 91: required HostPortPB process_address
> How does it work for multiple addresses (corresponding to '--rpc_bind_addre
Minicluster daemons are only ever bound to one address, and that address (with the port) is always unique and can be used to identify the daemon.

Will we ever bind daemons to multiple addresses? Possibly, but it's tough to anticipate exactly what that would look like. Commands that identify a daemon via address could still work because every address is still unique, but Get{Master,Tserver}Addresses may have to change.


http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool.proto@104
PS8, Line 104: HostPortPB
> Same question if multiple addresses are allowed: which address should be sp
See above.


http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool_action_common.cc
File src/kudu/tools/tool_action_common.cc:

http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool_action_common.cc@613
PS8, Line 613:       // Read and accumulate one byte at a time, looking for the newline.
> Seems like it would be easier and more efficient to keep the read buffer as
It would be, but I didn't think it really mattered since this control protocol (and JSON mode) is seldom used, and it means maintaining real state (i.e. a partially read buffer).

I'll add a TODO though.


http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool_action_common.cc@616
PS8, Line 616:         faststring one_byte;
> hoist this out of the loop.
Done


http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool_action_common.cc@663
PS8, Line 663: LOG(INFO)
> nit: looks like more VLOG(1) to me.
Done


http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool_action_common.cc@669
PS8, Line 669: LOG(INFO)
> ditto: VLOG(1) ?
Done


http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool_action_common.cc@680
PS8, Line 680: unable to print JSON to stdout
> nit: 'unable to serialize into JSON: $0'?
Done


http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool_action_common.cc@684
PS8, Line 684:       buf.append(serialized);
> Have you checked what happens if the message has a string field containing 
It looks like they're properly escaped. I tested by adding a "\n" to a cluster's data_root option and the protocol still worked.


http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool_action_common.cc@685
PS8, Line 685: buf.append("\n");
> Ah, I see.  For some reason I thought the serialized JSON is also preceded 
Correct; see the SerializationMode class comment.


http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool_action_common.cc@718
PS8, Line 718:   CHECK_EQ(buf->length(), r);
> Ah, I played with that and I can see you are right: it's possible for pipes
Hmm, OK, I guess I'll make this resilient to short reads.


http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool_action_common.cc@735
PS8, Line 735:   CHECK_EQ(buf.length(), r);
> Yep, as I found, short reads are possible for pipes, indeed (shame on me: I
Done


http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool_action_test.cc
File src/kudu/tools/tool_action_test.cc:

http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool_action_test.cc@20
PS8, Line 20: <stdlib.h>
> nit: please use <cstdlib> instead.
Done


http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool_action_test.cc@64
PS8, Line 64: ;
> nit: remove extra semicolon
Done


http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool_action_test.cc@102
PS8, Line 102: InvalidArgument
> nit: maybe, NotFound is the better choice here?
Done


http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool_action_test.cc@133
PS8, Line 133: InvalidArgument
> ditto regarding NotFound
Done


http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool_action_test.cc@150
PS8, Line 150: FLAGS_serialization == "json"
> As I see, case-insensitive comparison (i.e. boost::iequals(v, "json")) is u
Done


http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool_action_test.cc@153
PS8, Line 153: "pb", FLAGS_serialization
> ditto for case-insensitive comparison.
Done


http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool_action_test.cc@216
PS8, Line 216:           opts.master_rpc_ports = { 11030, 11031, 11032 };
> Yes, num_master can only be 3 at this point if the code at line 196 stands 
Yes, the hard-coding means multiple clusters won't work right now with multiple shells, but that's something we've discussed fixing (i.e. by binding to ephemeral ports when creating the cluster, then passing those bound sockets into the processes themselves).


http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool_action_test.cc@346
PS8, Line 346:       ActionBuilder("shell", &RunControlShell)
> I also think 'kudu test mini_cluster' is a good option.  I'm not advocating
I'll change it to "mini_cluster", but I'm still going to also refer to it as a "control shell" in documentation. I think it's important, otherwise people might get the impression that just running the tool will cause something to happen rather than drop them into a shell.



-- 
To view, visit http://gerrit.cloudera.org:8080/7853
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0e693921ef780dc4a06e536c6b7408f7f0b252f6
Gerrit-Change-Number: 7853
Gerrit-PatchSet: 8
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 04 Oct 2017 19:08:20 +0000
Gerrit-HasComments: Yes

[kudu-CR] tool: new action for running mini-clusters

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change. ( http://gerrit.cloudera.org:8080/7853 )

Change subject: tool: new action for running mini-clusters
......................................................................


Patch Set 8:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool.proto
File src/kudu/tools/tool.proto:

http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool.proto@28
PS8, Line 28:   // The ID of the cluster, unique to the control shell.
> Once you've gone that far, it's trivial to convert that optional<ExternalMiniCluster> into a map<int, ExternalMiniCluster> and support multiple clusters, so I did it even though there's no explicit use case. FWIW, think reusing a single control instance for that is preferable than forcing clients to manage multiple instances.

And it's trivially easy for the user to manage this on their end through a similar map of ID -> MiniCluster.  I don't see the advantage of building it into the protocol when there's a perfectly fine workaround, and we don't know of any potential uses.


http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool.proto@29
PS8, Line 29:   required int32 cluster_id = 1;
> Why? This message is useless if it doesn't include the cluster_id.
required fields should never be added to protobuf messages.  This isn't specific to mini-cluster, it's just a protobuf best practice.


http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool.proto@97
PS8, Line 97: message StopProcessRequestPB {
Would this API be used for HMS/Sentry processes if/when those are added to the mini cluster?


http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool_action_test.cc
File src/kudu/tools/tool_action_test.cc:

http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool_action_test.cc@346
PS8, Line 346:       ActionBuilder("shell", &RunControlShell)
> I'll change it to "mini_cluster", but I'm still going to also refer to it a
SGTM, i'd just suggest fully qualifying it as 'mini-cluster control shell' wherever it might be ambiguous.



-- 
To view, visit http://gerrit.cloudera.org:8080/7853
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0e693921ef780dc4a06e536c6b7408f7f0b252f6
Gerrit-Change-Number: 7853
Gerrit-PatchSet: 8
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 04 Oct 2017 19:29:55 +0000
Gerrit-HasComments: Yes

[kudu-CR] tool: new action for running mini-clusters

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Dan Burkert, Jean-Daniel Cryans, Kudu Jenkins, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/7853

to look at the new patch set (#13).

Change subject: tool: new action for running mini-clusters
......................................................................

tool: new action for running mini-clusters

Maintaining Kudu clients across various languages has been an ongoing
maintenance burden. Even when the client is just a thin wrapper around
another client (e.g. Kudu Python bindings), a great deal of work goes into
client testability. In practice, this has meant a bespoke mini cluster
implementation for each language. On the surface this doesn't seem that bad;
we just need to spawn some masters and tservers, right? Well, the work
quickly adds up:

o While the C++ mini cluster is heavily used and has seen many improvements,
  the Java mini cluster has not received the same kind of love, and is less
  robust as a result. KUDU-1976 is a great example of this deficiency.
o With the inclusion of authn came the addition of a "mini KDC", a special
  daemon for Kerberized mini clusters. It was originally implemented in C++
  and ported to Java, but has yet to be ported to the Python client; this is
  one of the obstacles towards porting full authn support to Python.
o Dan has been prototyping Hive Metastore and Sentry integration for Kudu,
  the testing of which will require "mini HMS" and possibly "mini Sentry"
  testing implementations in C++, Java, and eventually, Python.

In sum, good support for non-C++ mini clusters is an ongoing commitment and
requires a great deal of work. This work hasn't always been forthcoming, and
the non-C++ clusters are deficient as a result. But it doesn't have to be
this way! Here's a thought: what if we reused the C++ mini cluster for tests
written in these other languages? We could write a "proxy" application whose
job it is to manage the C++ mini cluster and expose a rudimentary API that's
easily programmable from Java and Python.

This patch attempts to do just that. It adds a "mini_cluster" action to the
Kudu CLI which provides a rudimentary control shell that can be used to spin
up an ExternalMiniCluster. The shell is controlled via a wire protocol over
stdin/stdout. The protocol is protobuf-based with optional JSON encoding.

I should add that I like the idea of shipping "mini_cluster" into production
as part of the CLI, as it helps realize the vision of a single Kudu artifact
that can provide Kudu testability for any integrating product.

Change-Id: I0e693921ef780dc4a06e536c6b7408f7f0b252f6
---
M src/kudu/mini-cluster/external_mini_cluster.h
M src/kudu/security/test/mini_kdc.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
A src/kudu/tools/tool.proto
M src/kudu/tools/tool_action.h
M src/kudu/tools/tool_action_common.cc
M src/kudu/tools/tool_action_common.h
A src/kudu/tools/tool_action_test.cc
M src/kudu/tools/tool_main.cc
10 files changed, 1,209 insertions(+), 14 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/53/7853/13
-- 
To view, visit http://gerrit.cloudera.org:8080/7853
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0e693921ef780dc4a06e536c6b7408f7f0b252f6
Gerrit-Change-Number: 7853
Gerrit-PatchSet: 13
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] tool: add cluster shell action

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change. ( http://gerrit.cloudera.org:8080/7853 )

Change subject: tool: add cluster shell action
......................................................................


Patch Set 8:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool_action_common.cc
File src/kudu/tools/tool_action_common.cc:

http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool_action_common.cc@684
PS8, Line 684:       buf.append(serialized);
Have you checked what happens if the message has a string field containing a newline?


http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool_action_common.cc@685
PS8, Line 685: buf.append("\n");
> I'm curious whether this is really necessary or it's just for better troubl
It's necessary; the format is newline delimited JSON messages.


http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool_action_common.cc@718
PS8, Line 718:   CHECK_EQ(buf->length(), r);
> But in case of pipes, reading 0 from the read side of the pipe  is possible
Correct, the issue is a short read, not a zero read.


http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool_action_common.cc@735
PS8, Line 735:   CHECK_EQ(buf.length(), r);
Short writes are also possible.


http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool_action_test.cc
File src/kudu/tools/tool_action_test.cc:

http://gerrit.cloudera.org:8080/#/c/7853/8/src/kudu/tools/tool_action_test.cc@346
PS8, Line 346:       ActionBuilder("shell", &RunControlShell)
> I have a proposal as well (you can consider it as bike shedding) -- it can 
I'm in favor of mini_cluster because it's discoverable, self-describing, and unambiguous.  I don't think 'control' has any of those qualities.



-- 
To view, visit http://gerrit.cloudera.org:8080/7853
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0e693921ef780dc4a06e536c6b7408f7f0b252f6
Gerrit-Change-Number: 7853
Gerrit-PatchSet: 8
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 03 Oct 2017 18:30:05 +0000
Gerrit-HasComments: Yes