You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by mi...@apache.org on 2018/10/26 19:18:54 UTC

[1/3] impala git commit: [DOCS] 2 Typos fixed in NVL2 examples

Repository: impala
Updated Branches:
  refs/heads/master 93ee538c5 -> de0c6bd6b


[DOCS] 2 Typos fixed in NVL2 examples

Change-Id: Ib3ac978398eb3de1877e3cd26f662a34c3f131d0
Reviewed-on: http://gerrit.cloudera.org:8080/11795
Reviewed-by: Alex Rodoni <ar...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/9bd22a3c
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/9bd22a3c
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/9bd22a3c

Branch: refs/heads/master
Commit: 9bd22a3ce65a3186153b5d7988eae2bb48559b26
Parents: 93ee538
Author: Alex Rodoni <ar...@cloudera.com>
Authored: Thu Oct 25 19:05:07 2018 -0700
Committer: Alex Rodoni <ar...@cloudera.com>
Committed: Fri Oct 26 02:17:07 2018 +0000

----------------------------------------------------------------------
 docs/topics/impala_conditional_functions.xml | 27 +++++++++--------------
 1 file changed, 11 insertions(+), 16 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/9bd22a3c/docs/topics/impala_conditional_functions.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_conditional_functions.xml b/docs/topics/impala_conditional_functions.xml
index 78dd62a..b500091 100644
--- a/docs/topics/impala_conditional_functions.xml
+++ b/docs/topics/impala_conditional_functions.xml
@@ -677,24 +677,19 @@ END</codeblock>
         </dt>
 
         <dd>
-          <b>Purpose:</b> Returns the second argument, <codeph>ifNotNull</codeph>, if the first
-          argument is not <codeph>NULL</codeph>. Returns the third argument,
-          <codeph>ifNull</codeph>, if the first argument is <codeph>NULL</codeph>.
-          <p>
-            Equivalent to the <codeph>NVL2()</codeph> function in Oracle Database.
-          </p>
-
-          <p>
-            <b>Return type:</b> Same as the first argument value
-          </p>
-
+          <b>Purpose:</b> Returns the second argument,
+            <codeph>ifNotNull</codeph>, if the first argument is not
+            <codeph>NULL</codeph>. Returns the third argument,
+            <codeph>ifNull</codeph>, if the first argument is
+            <codeph>NULL</codeph>. <p> Equivalent to the <codeph>NVL2()</codeph>
+            function in Oracle Database. </p>
+          <p>
+            <b>Return type:</b> Same as the first argument value </p>
           <p conref="../shared/impala_common.xml#common/added_in_290"/>
-
-          <p conref="../shared/impala_common.xml#common/example_blurb"
-          />
-<codeblock>
+          <p conref="../shared/impala_common.xml#common/example_blurb"/>
+          <codeblock>
 SELECT NVL2(NULL, 999, 0); -- Returns 0
-SELECT NVL2('ABC', 'Is Null', 'Is Not Null); -- Returns 'Is Not Null'</codeblock>
+SELECT NVL2('ABC', 'Is Not Null', 'Is Null'); -- Returns 'Is Not Null'</codeblock>
         </dd>
 
       </dlentry>


[3/3] impala git commit: test-with-docker: allow built images to be used with "docker run" easily.

Posted by mi...@apache.org.
test-with-docker: allow built images to be used with "docker run" easily.

Configures the built container to enter into a script that
starts the minicluster. As a result, "docker run -ti <container>" will
launch the user into a shell with the Impala minicluster and the
impala development cluster running.

To handle cases where users don't specify --privileged, we skip
Kudu if it NTP seems unavailable.

Change-Id: Ib8d6a28d4cb4ab019cd72415024b23374a6d9e2f
Reviewed-on: http://gerrit.cloudera.org:8080/11781
Reviewed-by: Philip Zeyliger <ph...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/de0c6bd6
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/de0c6bd6
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/de0c6bd6

Branch: refs/heads/master
Commit: de0c6bd6bd0db163d2820ae238bee5887f410f52
Parents: c170107
Author: Philip Zeyliger <ph...@cloudera.com>
Authored: Mon Oct 22 21:38:14 2018 -0700
Committer: Impala Public Jenkins <im...@cloudera.com>
Committed: Fri Oct 26 18:44:58 2018 +0000

----------------------------------------------------------------------
 docker/entrypoint.sh       | 65 ++++++++++++++++++++++++++++++++++-------
 docker/test-with-docker.py |  3 ++
 2 files changed, 58 insertions(+), 10 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/de0c6bd6/docker/entrypoint.sh
----------------------------------------------------------------------
diff --git a/docker/entrypoint.sh b/docker/entrypoint.sh
index 1dbc6c1..50a38bd 100755
--- a/docker/entrypoint.sh
+++ b/docker/entrypoint.sh
@@ -145,16 +145,18 @@ function start_minicluster {
   # presumably because there's only one layer involved. See
   # https://issues.apache.org/jira/browse/KUDU-1419.
   set -x
-  pushd /home/impdev/Impala/testdata
-  for x in cluster/cdh*/node-*/var/lib/kudu/*/wal; do
-    echo $x
-    # This mv takes time, as it's actually copying into the latest layer.
-    mv $x $x-orig
-    mkdir $x
-    mv $x-orig/* $x
-    rmdir $x-orig
-  done
-  popd
+  if [ "true" = $KUDU_IS_SUPPORTED ]; then
+    pushd /home/impdev/Impala/testdata
+    for x in cluster/cdh*/node-*/var/lib/kudu/*/wal; do
+      echo $x
+      # This mv takes time, as it's actually copying into the latest layer.
+      mv $x $x-orig
+      mkdir $x
+      mv $x-orig/* $x
+      rmdir $x-orig
+    done
+    popd
+  fi
 
   # Wait for postgresql to really start; if it doesn't, Hive Metastore will fail to start.
   for i in {1..120}; do
@@ -387,6 +389,42 @@ function configure_timezone() {
   fi
 }
 
+# Exposes a shell, with the container booted with
+# a minicluster.
+function shell() {
+  echo "Starting minicluster and Impala."
+  # Logs is typically a symlink; remove it if so.
+  rm logs || true
+  mkdir -p logs
+  boot_container
+  impala_environment
+  # Kudu requires --privileged for the Docker container; see
+  # https://issues.apache.org/jira/browse/KUDU-2000. Because
+  # our goal here is convenience for new developers, we
+  # skip kudu if "ntptime" doesn't work, which is a good
+  # proxy for Kudu won't start.
+  if ! ntptime > /dev/null; then
+    export KUDU_IS_SUPPORTED=false
+    KUDU_MSG="Kudu is not started."
+  fi
+  start_minicluster
+  bin/start-impala-cluster.py
+  cat <<"EOF"
+
+==========================================================
+Welcome to the Impala development environment.
+
+The "minicluster" is running; i.e., HDFS, HBase, Hive,
+etc. are running. $KUDU_MSG
+
+To get started, perhaps run:
+  impala-shell.sh -q 'select count(*) from tpcds.web_page'
+==========================================================
+
+EOF
+  exec bash
+}
+
 function main() {
   set -e
 
@@ -394,6 +432,13 @@ function main() {
   CMD="$1"
   shift
 
+  # Treat shell specialy to avoid the extra logging and | cat below.
+  if [[ $CMD = "shell" ]]; then
+    shell
+    # shell shoud have exec'd, so if we get here, it's a failure.
+    exit 1
+  fi
+
   echo ">>> ${CMD} $@ (begin)"
   # Dump environment, for debugging
   env | grep -vE "AWS_(SECRET_)?ACCESS_KEY"

http://git-wip-us.apache.org/repos/asf/impala/blob/de0c6bd6/docker/test-with-docker.py
----------------------------------------------------------------------
diff --git a/docker/test-with-docker.py b/docker/test-with-docker.py
index b350e4c..42808cf 100755
--- a/docker/test-with-docker.py
+++ b/docker/test-with-docker.py
@@ -653,6 +653,9 @@ class TestWithDocker(object):
       self.image = _check_output(
           ["docker", "commit",
            "-c", "LABEL pwd=" + self.git_root,
+           "-c", "USER impdev",
+           "-c", "WORKDIR /home/impdev/Impala",
+           "-c", 'CMD ["/home/impdev/Impala/docker/entrypoint.sh", "shell"]',
            container.id, "impala:built-" + self.name]).strip()
       logging.info("Committed docker image: %s", self.image)
     finally:


[2/3] impala git commit: IMPALA-7698: Add centos support to bootstrap_system.

Posted by mi...@apache.org.
IMPALA-7698: Add centos support to bootstrap_system.

Largely, the changes involve conditionalizing some invocations to
account for differences between RH and Ubuntu. The trickiest bits were
timezone-related test errors (see below), postgresql permissions (need
to accept md5 passwords from localhost) and default ulimits (1024 user
processes/threads is not enough).

To test this, I built using test-with-docker. In additional to the
ulimit issue, I ran into the fact that /tmp needed 1777 permissions for
the postgresql socket, and entrypoint.sh had a few places that needed
special cases. At the moment, the data load ran fine, as did most of the
tests. I observed a test that relied on a python2.7-ism fail, which is
part of the point of this.

In the course of development, I encountered a handful of tests fail with
"Encounter parse error: failed to open /usr/share/zoneinfo/GMT-08:00 -
No such file or directory.", which was reproduced as follows:

    [localhost:21000] default> use functional_orc_def; select * from alltypes;
    ...
    WARNINGS: Encounter parse error: failed to open /usr/share/zoneinfo/GMT-08:00 - No such file or directory.

With Quanlong's help, I learned what was happening. test-with-docker was
translating my time zone (America/Los_Angeles) to US/Pacific-New,
because realpath(/etc/localtime) = US/Pacific-New. This timezone exists
in centos:6, so that wasn't a problem. However, this timezone does not
exist in the package "tzdata-java", which is the copy of the timezone
information used by Java. (There are bugs here that may have been fixed
in centos:7.) As a result, when ORC asks (by using
TimeZone.getDefault().getID()) the JDK
(src/solaris/native/java/util/TimeZone_md.c) for the default timezone,
it can't find the same name as /etc/localtime points to in its
repository and defaults to "GMT-08:00". This string then gets written
into the ORC files generated by Hive as part of data load, and then the
C++ library can't read them. This is fixed by changing "realpath"
to "readlink" in test-with-docker.py.

centos:7 is not addressed by this change. The move to systemd makes
"service sshd start" (and the same for postgresql) not work, and
additional care needs to be done to work around that.

This change is a joint effort with Laszlo Gaal.

Change-Id: Id54294d7607f51de87a9de373dcfc4a33f4bedf5
Reviewed-on: http://gerrit.cloudera.org:8080/11731
Reviewed-by: Philip Zeyliger <ph...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/c1701074
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/c1701074
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/c1701074

Branch: refs/heads/master
Commit: c1701074d6e94d98a43ab049ef807ac1b368180f
Parents: 9bd22a3
Author: Philip Zeyliger <ph...@cloudera.com>
Authored: Mon Oct 15 09:42:48 2018 -0700
Committer: Impala Public Jenkins <im...@cloudera.com>
Committed: Fri Oct 26 08:43:22 2018 +0000

----------------------------------------------------------------------
 bin/bootstrap_system.sh    | 174 ++++++++++++++++++++++++++++------------
 docker/entrypoint.sh       |  32 ++++++--
 docker/test-with-docker.py |  23 ++++--
 3 files changed, 167 insertions(+), 62 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/c1701074/bin/bootstrap_system.sh
----------------------------------------------------------------------
diff --git a/bin/bootstrap_system.sh b/bin/bootstrap_system.sh
index e5f9fa9..0861279 100755
--- a/bin/bootstrap_system.sh
+++ b/bin/bootstrap_system.sh
@@ -40,6 +40,8 @@
 
 set -eu -o pipefail
 
+: ${IMPALA_HOME:=~/Impala}
+
 if [[ -t 1 ]] # if on an interactive terminal
 then
   echo "This script will clobber some system settings. Are you sure you want to"
@@ -63,21 +65,44 @@ fi
 
 set -x
 
-source /etc/lsb-release
+# Determine whether we're running on redhat or ubuntu
+REDHAT=
+UBUNTU=
+if [[ -f /etc/redhat-release ]]; then
+  REDHAT=true
+  # TODO: restrict redhat versions
+else
+  source /etc/lsb-release
+  if ! [[ $DISTRIB_ID = Ubuntu ]]
+  then
+    echo "This script only supports Ubuntu or RedHat" >&2
+    exit 1
+  fi
 
-if ! [[ $DISTRIB_ID = Ubuntu ]]
-then
-  echo "This script only supports Ubuntu" >&2
-  exit 1
+  if ! [[ $DISTRIB_RELEASE = 16.04 ]]
+  then
+    echo "This script only supports 16.04 of Ubuntu" >&2
+    exit 1
+  fi
+  UBUNTU=true
 fi
 
-if ! [[ $DISTRIB_RELEASE = 16.04 ]]
-then
-  echo "This script only supports 16.04" >&2
-  exit 1
-fi
+# Helper function to execute following command only on Ubuntu
+function ubuntu {
+  if [[ "$UBUNTU" == true ]]; then
+    "$@"
+  fi
+}
 
-REAL_APT_GET=$(which apt-get)
+# Helper function to execute following command only on RedHat
+function redhat {
+  if [[ "$REDHAT" == true ]]; then
+    "$@"
+  fi
+}
+
+# Note that yum has its own retries; see yum.conf(5).
+REAL_APT_GET=$(ubuntu which apt-get)
 function apt-get {
   for ITER in $(seq 1 20); do
     echo "ATTEMPT: ${ITER}"
@@ -91,49 +116,55 @@ function apt-get {
   return 1
 }
 
-echo ">>> Installing packages"
-
-apt-get update
-apt-get --yes install apt-utils
-apt-get --yes install git
-
-echo ">>> Checking out Impala"
-
-# If there is no Impala git repo, get one now
-
-: ${IMPALA_HOME:=~/Impala}
-if ! [[ -d "$IMPALA_HOME" ]]
-then
-  time -p git clone https://git-wip-us.apache.org/repos/asf/impala.git "$IMPALA_HOME"
-fi
-cd "$IMPALA_HOME"
-SET_IMPALA_HOME="export IMPALA_HOME=$(pwd)"
-echo "$SET_IMPALA_HOME" >> ~/.bashrc
-eval "$SET_IMPALA_HOME"
-
 echo ">>> Installing build tools"
-apt-get --yes install ccache g++ gcc libffi-dev liblzo2-dev libkrb5-dev \
+ubuntu apt-get update
+ubuntu apt-get --yes install ccache g++ gcc libffi-dev liblzo2-dev libkrb5-dev \
         krb5-admin-server krb5-kdc krb5-user libsasl2-dev libsasl2-modules \
         libsasl2-modules-gssapi-mit libssl-dev make maven ninja-build ntp \
         ntpdate python-dev python-setuptools postgresql ssh wget vim-common psmisc \
-        lsof openjdk-8-jdk openjdk-8-source openjdk-8-dbg
+        lsof openjdk-8-jdk openjdk-8-source openjdk-8-dbg apt-utils git
+
+redhat sudo yum install -y curl gcc gcc-c++ git krb5-devel krb5-server krb5-workstation \
+        libevent-devel libffi-devel make ntp ntpdate openssl-devel cyrus-sasl \
+        cyrus-sasl-gssapi cyrus-sasl-devel cyrus-sasl-plain \
+        python-devel python-setuptools postgresql postgresql-server \
+        wget vim-common nscd cmake lzo-devel fuse-devel snappy-devel zlib-devel \
+        psmisc lsof openssh-server redhat-lsb java-1.8.0-openjdk-devel \
+        java-1.8.0-openjdk-src python-argparse
+
+# CentOS repos don't contain ccache, so install from EPEL
+redhat sudo yum install -y epel-release
+redhat sudo yum install -y ccache
+
+# Clean up yum caches
+redhat sudo yum clean all
+
+# Download ant and mvn for centos
+redhat sudo wget -nv \
+  https://www-us.apache.org/dist/maven/maven-3/3.5.4/binaries/apache-maven-3.5.4-bin.tar.gz \
+  https://www-us.apache.org/dist/ant/binaries/apache-ant-1.9.13-bin.tar.gz
+redhat sha512sum -c - <<< '2a803f578f341e164f6753e410413d16ab60fabe31dc491d1fe35c984a5cce696bc71f57757d4538fe7738be04065a216f3ebad4ef7e0ce1bb4c51bc36d6be86  apache-maven-3.5.4-bin.tar.gz'
+redhat sha512sum -c - <<< 'c8321aa223f70d7e64d3d0274263000cfffb46fbea61488534e26f9f0245d99e9872d0888e35cd3274416392a13f80c748c07750caaeffa5f9cae1220020715f  apache-ant-1.9.13-bin.tar.gz'
+redhat sudo tar -C /usr/local -xzf apache-maven-3.5.4-bin.tar.gz
+redhat sudo tar -C /usr/local -xzf apache-ant-1.9.13-bin.tar.gz
+redhat sudo ln -s /usr/local/apache-maven-3.5.4/bin/mvn /usr/local/bin
+redhat sudo ln -s /usr/local/apache-ant-1.9.13/bin/ant /usr/local/bin
 
 if ! { service --status-all | grep -E '^ \[ \+ \]  ssh$'; }
 then
-  sudo service ssh start
+  ubuntu sudo service ssh start
+  # TODO: CentOS/RH 7 uses systemd, and this doesn't work.
+  redhat sudo service sshd start
 fi
 
 # TODO: config ccache to give it plenty of space
 # TODO: check that there is enough space on disk to do a build and data load
 # TODO: make this work with non-bash shells
 
-SET_JAVA_HOME="export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64"
-echo "$SET_JAVA_HOME" >> "${IMPALA_HOME}/bin/impala-config-local.sh"
-eval "$SET_JAVA_HOME"
-
 echo ">>> Configuring system"
 
-sudo service ntp stop
+ubuntu sudo service ntp stop
+redhat sudo service ntpd stop
 sudo ntpdate us.pool.ntp.org
 # If on EC2, use Amazon's ntp servers
 if which dmidecode && { sudo dmidecode -s bios-version | grep amazon; }
@@ -146,24 +177,31 @@ fi
 # --privileged docker container, and a non-privileged container cannot run ntpdate, which
 # is strictly needed by Kudu.
 # TODO: Make privileged docker start ntpd
-sudo service ntp start || grep docker /proc/1/cgroup
+ubuntu sudo service ntp start || grep docker /proc/1/cgroup
+redhat sudo service ntpd start || grep docker /proc/1/cgroup
 
 # IMPALA-3932, IMPALA-3926
-if [[ $DISTRIB_RELEASE = 16.04 ]]
+if [[ $UBUNTU = true && $DISTRIB_RELEASE = 16.04 ]]
 then
   SET_LD_LIBRARY_PATH='export LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu/${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}'
+  echo "$SET_LD_LIBRARY_PATH" >> "${IMPALA_HOME}/bin/impala-config-local.sh"
+  eval "$SET_LD_LIBRARY_PATH"
 fi
-echo "$SET_LD_LIBRARY_PATH" >> "${IMPALA_HOME}/bin/impala-config-local.sh"
-eval "$SET_LD_LIBRARY_PATH"
-
-# TODO: What are the security implications of this?
-for PG_AUTH_FILE in /etc/postgresql/*/main/pg_hba.conf
-do
-  sudo sed -ri 's/local +all +all +peer/local all all trust/g' $PG_AUTH_FILE
-done
-sudo service postgresql restart
-sudo /etc/init.d/postgresql reload
-sudo service postgresql restart
+
+redhat sudo service postgresql initdb
+sudo service postgresql stop
+
+# These configurations expose connectiong to PostgreSQL via md5-hashed
+# passwords over TCP to localhost, and the local socket is trusted
+# widely.
+ubuntu sudo sed -ri 's/local +all +all +peer/local all all trust/g' \
+  /etc/postgresql/*/main/pg_hba.conf
+redhat sudo sed -ri 's/local +all +all +ident/local all all trust/g' \
+  /var/lib/pgsql/data/pg_hba.conf
+# Accept md5 passwords from localhost
+redhat sudo sed -i -e 's,\(host.*\)ident,\1md5,' /var/lib/pgsql/data/pg_hba.conf
+
+sudo service postgresql start
 
 # Set up postgress for HMS
 if ! [[ 1 = $(sudo -u postgres psql -At -c "SELECT count(*) FROM pg_roles WHERE rolname = 'hiveuser';") ]]
@@ -220,6 +258,38 @@ sudo chown $(whoami) /var/lib/hadoop-hdfs/
 # TODO: restrict this to only the users it is needed for
 echo "* - nofile 1048576" | sudo tee -a /etc/security/limits.conf
 
+# Default on CentOS limits a user to 1024 processes (threads) , which isn't
+# enough for minicluster with all of its friends.
+redhat sudo sed -i 's,\*\s*soft\s*nproc\s*1024,* soft nproc unlimited,' \
+  /etc/security/limits.d/90-nproc.conf
+
+echo ">>> Checking out Impala"
+
+# If there is no Impala git repo, get one now
+if ! [[ -d "$IMPALA_HOME" ]]
+then
+  time -p git clone https://git-wip-us.apache.org/repos/asf/impala.git "$IMPALA_HOME"
+fi
+cd "$IMPALA_HOME"
+SET_IMPALA_HOME="export IMPALA_HOME=$(pwd)"
+echo "$SET_IMPALA_HOME" >> ~/.bashrc
+eval "$SET_IMPALA_HOME"
+
+# Ubuntu and RH install JDK's in slightly different paths.
+if [[ $UBUNTU == true ]]; then
+  SET_JAVA_HOME="export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64"
+else
+  # Assert that there's only one glob match.
+  [ 1 == $(compgen -G "/usr/lib/jvm/java-1.8.0-openjdk-*" | wc -l) ]
+  SET_JAVA_HOME="export JAVA_HOME=$(compgen -G '/usr/lib/jvm/java-1.8.0-openjdk-*')"
+fi
+
+echo "$SET_JAVA_HOME" >> "${IMPALA_HOME}/bin/impala-config-local.sh"
+eval "$SET_JAVA_HOME"
+
+# Assert that we have a java available
+test -f $JAVA_HOME/bin/java
+
 # LZO is not needed to compile or run Impala, but it is needed for the data load
 echo ">>> Checking out Impala-lzo"
 : ${IMPALA_LZO_HOME:="${IMPALA_HOME}/../Impala-lzo"}

http://git-wip-us.apache.org/repos/asf/impala/blob/c1701074/docker/entrypoint.sh
----------------------------------------------------------------------
diff --git a/docker/entrypoint.sh b/docker/entrypoint.sh
index 205bd31..1dbc6c1 100755
--- a/docker/entrypoint.sh
+++ b/docker/entrypoint.sh
@@ -67,12 +67,24 @@ function build() {
     paste <(cut -d : -f3 /etc/passwd) <(cut -d : -f1 /etc/passwd) | sort -n
     exit 1
   fi
-  apt-get update
-  apt-get install -y sudo git lsb-release python
+  if which apt-get > /dev/null; then
+    apt-get update
+    apt-get install -y sudo git lsb-release python
+  else
+    yum -y install sudo git python
+  fi
 
-  adduser --disabled-password --gecos "" --uid $1 impdev
-  echo "impdev ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers
+  if ! id impdev; then
+    # Adduser is slightly different on CentOS and Ubuntu
+    if which apt-get; then
+      adduser --disabled-password --gecos "" --uid $1 impdev
+    else
+      adduser --uid $1 impdev
+    fi
+    echo "impdev ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers
+  fi
 
+  ulimit -a
   su impdev -c "$0 build_impdev"
 }
 
@@ -120,7 +132,11 @@ function start_minicluster {
   sudo service postgresql start
 
   # Required for starting HBase
-  sudo service ssh start
+  if [ -f /etc/redhat-release ]; then
+    sudo service sshd start
+  else
+    sudo service ssh start
+  fi
 
   (echo ">>> Copying Kudu Data") 2> /dev/null
   # Move around Kudu's WALs to avoid issue with Docker filesystems (aufs and
@@ -162,6 +178,11 @@ function build_impdev() {
   # Assert we're impdev now.
   [ "$(id -un)" = impdev ]
 
+  # Bump "Max processes" ulimit to the hard limit; default
+  # on CentOS 6 can be 1024, which isn't enough for minicluster.
+  ulimit -u $(cat /proc/self/limits | grep 'Max processes' | awk '{ print $4 }')
+  ulimit -a
+
   # Link in ccache from host.
   ln -s /ccache /home/impdev/.ccache
 
@@ -376,6 +397,7 @@ function main() {
   echo ">>> ${CMD} $@ (begin)"
   # Dump environment, for debugging
   env | grep -vE "AWS_(SECRET_)?ACCESS_KEY"
+  ulimit -a
   set -x
   # The "| cat" here avoids "set -e"/errexit from exiting the
   # script right away.

http://git-wip-us.apache.org/repos/asf/impala/blob/c1701074/docker/test-with-docker.py
----------------------------------------------------------------------
diff --git a/docker/test-with-docker.py b/docker/test-with-docker.py
index 88f4563..b350e4c 100755
--- a/docker/test-with-docker.py
+++ b/docker/test-with-docker.py
@@ -155,6 +155,8 @@ def main():
       action='store_true', default=True,
       help="Whether to remove image when done.")
   group.add_argument('--no-cleanup-image', dest="cleanup_image", action='store_false')
+  parser.add_argument('--base-image', dest="base_image", default="ubuntu:16.04",
+      help="Base OS image to use. ubuntu:16.04 and centos:6 are known to work.")
   parser.add_argument(
       '--build-image', metavar='IMAGE',
       help='Skip building, and run tests on pre-existing image.')
@@ -203,7 +205,7 @@ def main():
       suite_concurrency=args.suite_concurrency,
       impalad_mem_limit_bytes=args.impalad_mem_limit_bytes,
       tail=args.tail,
-      env=args.env)
+      env=args.env, base_image=args.base_image)
 
   fh = logging.FileHandler(os.path.join(_make_dir_if_not_exist(t.log_dir), "log.txt"))
   fh.setFormatter(logging.Formatter(LOG_FORMAT))
@@ -436,7 +438,7 @@ class TestWithDocker(object):
   def __init__(self, build_image, suite_names, name, cleanup_containers,
                cleanup_image, ccache_dir, test_mode,
                suite_concurrency, parallel_test_concurrency,
-               impalad_mem_limit_bytes, tail, env):
+               impalad_mem_limit_bytes, tail, env, base_image):
     self.build_image = build_image
     self.name = name
     self.containers = []
@@ -473,6 +475,7 @@ class TestWithDocker(object):
     self.impalad_mem_limit_bytes = impalad_mem_limit_bytes
     self.tail = tail
     self.env = env
+    self.base_image = base_image
 
     # Map suites back into objects; we ignore case for this mapping.
     suites = []
@@ -509,9 +512,18 @@ class TestWithDocker(object):
       extras = ["-e", "TEST_TEST_WITH_DOCKER=true"] + extras
 
     # According to localtime(5), /etc/localtime is supposed
-    # to be a symlink to somewhere inside /usr/share/zoneinfo
+    # to be a symlink to somewhere inside /usr/share/zoneinfo.
+    # Note that sometimes the symlink tree may be
+    # complicated, e.g.:
+    #  /etc/localtime ->
+    #    /usr/share/zoneinfo/America/Los_Angeles ->  (readlink)
+    #      ../US/Pacific-New                         (realpath)
+    # Using both readlink and realpath should work, but we've
+    # encountered one scenario (centos:6) where the Java tzdata
+    # database doesn't have US/Pacific-New, but has America/Los_Angeles.
+    # This is deemed sufficient to tip the scales to using readlink.
     assert os.path.islink("/etc/localtime")
-    localtime_link_target = os.path.realpath("/etc/localtime")
+    localtime_link_target = os.readlink("/etc/localtime")
     assert localtime_link_target.startswith("/usr/share/zoneinfo")
 
     # Workaround for what appears to be https://github.com/moby/moby/issues/13885
@@ -624,7 +636,7 @@ class TestWithDocker(object):
   def _create_build_image(self):
     """Creates the "build image", with Impala compiled and data loaded."""
     container = self._create_container(
-        image="ubuntu:16.04", name=self.name + "-build",
+        image=self.base_image, name=self.name + "-build",
         logdir="build",
         logname="log-build.txt",
         # entrypoint.sh will create a user with our uid; this
@@ -769,6 +781,7 @@ class TestSuiteRunner(object):
     # io-file-mgr-test expects a real-ish file system at /tmp;
     # we mount a temporary directory into the container to appease it.
     tmpdir = tempfile.mkdtemp(prefix=test_with_docker.name + "-" + self.name)
+    os.chmod(tmpdir, 01777)
     # Container names are sometimes used as hostnames, and DNS names shouldn't
     # have underscores.
     container_name = test_with_docker.name + "-" + self.name.replace("_", "-")