You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Michael Brown (Code Review)" <ge...@cloudera.org> on 2018/04/30 21:28:55 UTC

[Impala-ASF-CR](2.x) IMPALA-6070: Further improvements to test-with-docker.

Hello Joe McDonnell, Impala Public Jenkins,

I'd like you to do a code review. Please visit

    http://gerrit.cloudera.org:8080/10248

to review the following change.


Change subject: IMPALA-6070: Further improvements to test-with-docker.
......................................................................

IMPALA-6070: Further improvements to test-with-docker.

This commit tackles a few additions and improvements to
test-with-docker. In general, I'm adding workloads (e.g., exhaustive,
rat-check), tuning memory setting and parallelism, and trying to speed
things up.

Bug fixes:

* Embarassingly, I was still skipping thrift-server-test in the backend
  tests. This was a mistake in handling feedback from my last review.

* I made the timeline a little bit taller to clip less.

Adding workloads:

* I added the RAT licensing check.

* I added exhaustive runs. This led me to model the suites a little
  bit more in Python, with a class representing a suite with a
  bunch of data about the suite. It's not perfect and still
  coupled with the entrypoint.sh shell script, but it feels
  workable. As part of adding exhaustive tests, I had
  to re-work the timeout handling, since now different
  suites meaningfully have different timeouts.

Speed ups:

* To speed up test runs, I added a mechanism to split py.test suites into
  multiple shards with a py.test argument. This involved a little bit of work in
  conftest.py, and exposing $RUN_CUSTOM_CLUSTER_TESTS_ARGS in run-all-tests.sh.

  Furthermore, I moved a bit more logic about managing the
  list of suites into Python.

* Doing the full build with "-notests" and only building
  the backend tests in the relevant target that needs them. This speeds
  up "docker commit" significantly by removing about 20GB from the
  container.  I had to indicates that expr-codegen-test depends on
  expr-codegen-test-ir, which was missing.

* I sped up copying the Kudu data: previously I did
  both a move and a copy; now I'm doing a move followed by a move. One
  of the moves is cross-filesystem so is slow, but this does half the
  amount of copying.

Memory usage:

* I tweaked the memlimit_gb settings to have a higher default. I've been
  fighting empirically to have the tests run well on c4.8xlarge and
  m4.10xlarge.

The more memory a minicluster and test suite run uses, the fewer parallel
suites we can run. By observing the peak processes at the tail of a run (with a
new "memory_usage" function that uses a ps/sort/awk trick) and by observing
peak container total_rss, I found that we had several JVMs that
didn't have Xmx settings set. I added Xms/Xmx settings in a few
places:

 * The non-first Impalad does very little JVM work, so having
   an Xmx keeps it small, even in the parallel tests.
 * Datanodes do work, but they essentially were never garbage
   collecting, because JVM defaults let them use up to 1/4th
   the machine memory. (I observed this based on RSS at the
   end of the run; nothing fancier.) Adding Xms/Xmx settings
   helped.
 * Similarly, I piped the settings through to HBase.

A few daemons still run without resource limitations, but they don't
seem to be a problem.

Change-Id: I43fe124f00340afa21ad1eeb6432d6d50151ca7c
Reviewed-on: http://gerrit.cloudera.org:8080/10123
Reviewed-by: Joe McDonnell <jo...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/exprs/CMakeLists.txt
M bin/run-all-tests.sh
M docker/entrypoint.sh
M docker/monitor.py
M docker/test-with-docker.py
M docker/timeline.html.template
M testdata/bin/run-hbase.sh
M testdata/cluster/node_templates/common/etc/init.d/hdfs-common
M tests/conftest.py
M tests/run-tests.py
10 files changed, 425 insertions(+), 148 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/10248/1
-- 
To view, visit http://gerrit.cloudera.org:8080/10248
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: newchange
Gerrit-Change-Id: I43fe124f00340afa21ad1eeb6432d6d50151ca7c
Gerrit-Change-Number: 10248
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>

[Impala-ASF-CR](2.x) IMPALA-6070: Further improvements to test-with-docker.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10248 )

Change subject: IMPALA-6070: Further improvements to test-with-docker.
......................................................................


Patch Set 1:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2385/


-- 
To view, visit http://gerrit.cloudera.org:8080/10248
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: comment
Gerrit-Change-Id: I43fe124f00340afa21ad1eeb6432d6d50151ca7c
Gerrit-Change-Number: 10248
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Comment-Date: Mon, 30 Apr 2018 22:52:05 +0000
Gerrit-HasComments: No

[Impala-ASF-CR](2.x) IMPALA-6070: Further improvements to test-with-docker.

Posted by "Michael Brown (Code Review)" <ge...@cloudera.org>.
Michael Brown has posted comments on this change. ( http://gerrit.cloudera.org:8080/10248 )

Change subject: IMPALA-6070: Further improvements to test-with-docker.
......................................................................


Patch Set 1:

(1 comment)

Not a clean cherry-pick. See below.

http://gerrit.cloudera.org:8080/#/c/10248/1/testdata/bin/run-hbase.sh
File testdata/bin/run-hbase.sh:

http://gerrit.cloudera.org:8080/#/c/10248/1/testdata/bin/run-hbase.sh@39
PS1, Line 39: export HBASE_HEAPSIZE=1g
The fixed conflict was here: On master, we expected to add L39 under an unrelated IMPALA_MINICLUSTER_PROFILE conditional. On 2.x, that conditional is not here. I confirmed HBASE_HEAPSIZE is an env variable in our toolchain Hbase 1.2.0, so it seems OK to have this here.



-- 
To view, visit http://gerrit.cloudera.org:8080/10248
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: comment
Gerrit-Change-Id: I43fe124f00340afa21ad1eeb6432d6d50151ca7c
Gerrit-Change-Number: 10248
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Comment-Date: Mon, 30 Apr 2018 21:32:05 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR](2.x) IMPALA-6070: Further improvements to test-with-docker.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10248 )

Change subject: IMPALA-6070: Further improvements to test-with-docker.
......................................................................


Patch Set 1: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/10248
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: comment
Gerrit-Change-Id: I43fe124f00340afa21ad1eeb6432d6d50151ca7c
Gerrit-Change-Number: 10248
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Comment-Date: Tue, 01 May 2018 02:46:37 +0000
Gerrit-HasComments: No

[Impala-ASF-CR](2.x) IMPALA-6070: Further improvements to test-with-docker.

Posted by "Philip Zeyliger (Code Review)" <ge...@cloudera.org>.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/10248 )

Change subject: IMPALA-6070: Further improvements to test-with-docker.
......................................................................


Patch Set 1: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/10248
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: comment
Gerrit-Change-Id: I43fe124f00340afa21ad1eeb6432d6d50151ca7c
Gerrit-Change-Number: 10248
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Comment-Date: Mon, 30 Apr 2018 22:18:02 +0000
Gerrit-HasComments: No

[Impala-ASF-CR](2.x) IMPALA-6070: Further improvements to test-with-docker.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/10248 )

Change subject: IMPALA-6070: Further improvements to test-with-docker.
......................................................................

IMPALA-6070: Further improvements to test-with-docker.

This commit tackles a few additions and improvements to
test-with-docker. In general, I'm adding workloads (e.g., exhaustive,
rat-check), tuning memory setting and parallelism, and trying to speed
things up.

Bug fixes:

* Embarassingly, I was still skipping thrift-server-test in the backend
  tests. This was a mistake in handling feedback from my last review.

* I made the timeline a little bit taller to clip less.

Adding workloads:

* I added the RAT licensing check.

* I added exhaustive runs. This led me to model the suites a little
  bit more in Python, with a class representing a suite with a
  bunch of data about the suite. It's not perfect and still
  coupled with the entrypoint.sh shell script, but it feels
  workable. As part of adding exhaustive tests, I had
  to re-work the timeout handling, since now different
  suites meaningfully have different timeouts.

Speed ups:

* To speed up test runs, I added a mechanism to split py.test suites into
  multiple shards with a py.test argument. This involved a little bit of work in
  conftest.py, and exposing $RUN_CUSTOM_CLUSTER_TESTS_ARGS in run-all-tests.sh.

  Furthermore, I moved a bit more logic about managing the
  list of suites into Python.

* Doing the full build with "-notests" and only building
  the backend tests in the relevant target that needs them. This speeds
  up "docker commit" significantly by removing about 20GB from the
  container.  I had to indicates that expr-codegen-test depends on
  expr-codegen-test-ir, which was missing.

* I sped up copying the Kudu data: previously I did
  both a move and a copy; now I'm doing a move followed by a move. One
  of the moves is cross-filesystem so is slow, but this does half the
  amount of copying.

Memory usage:

* I tweaked the memlimit_gb settings to have a higher default. I've been
  fighting empirically to have the tests run well on c4.8xlarge and
  m4.10xlarge.

The more memory a minicluster and test suite run uses, the fewer parallel
suites we can run. By observing the peak processes at the tail of a run (with a
new "memory_usage" function that uses a ps/sort/awk trick) and by observing
peak container total_rss, I found that we had several JVMs that
didn't have Xmx settings set. I added Xms/Xmx settings in a few
places:

 * The non-first Impalad does very little JVM work, so having
   an Xmx keeps it small, even in the parallel tests.
 * Datanodes do work, but they essentially were never garbage
   collecting, because JVM defaults let them use up to 1/4th
   the machine memory. (I observed this based on RSS at the
   end of the run; nothing fancier.) Adding Xms/Xmx settings
   helped.
 * Similarly, I piped the settings through to HBase.

A few daemons still run without resource limitations, but they don't
seem to be a problem.

Change-Id: I43fe124f00340afa21ad1eeb6432d6d50151ca7c
Reviewed-on: http://gerrit.cloudera.org:8080/10123
Reviewed-by: Joe McDonnell <jo...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
Reviewed-on: http://gerrit.cloudera.org:8080/10248
Reviewed-by: Philip Zeyliger <ph...@cloudera.com>
---
M be/src/exprs/CMakeLists.txt
M bin/run-all-tests.sh
M docker/entrypoint.sh
M docker/monitor.py
M docker/test-with-docker.py
M docker/timeline.html.template
M testdata/bin/run-hbase.sh
M testdata/cluster/node_templates/common/etc/init.d/hdfs-common
M tests/conftest.py
M tests/run-tests.py
10 files changed, 425 insertions(+), 148 deletions(-)

Approvals:
  Philip Zeyliger: Looks good to me, approved
  Impala Public Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/10248
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: merged
Gerrit-Change-Id: I43fe124f00340afa21ad1eeb6432d6d50151ca7c
Gerrit-Change-Number: 10248
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>