You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kudu.apache.org by gr...@apache.org on 2020/04/21 17:03:03 UTC

[kudu] branch master updated (9ec5727 -> 98b1c5d)

This is an automated email from the ASF dual-hosted git repository.

granthenke pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/kudu.git.


    from 9ec5727  [docs] add guide to symbolize stack addresses
     new 386cc74  [docs] Update schema documentation
     new 98b1c5d  [docker] Fix mini-ranger tests in the build image

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 docker/Dockerfile           | 108 ++++++++++++++++++++++++++++++--------------
 docker/bootstrap-dev-env.sh |   2 +
 docs/schema_design.adoc     |  39 ++++++++++++----
 3 files changed, 106 insertions(+), 43 deletions(-)


[kudu] 02/02: [docker] Fix mini-ranger tests in the build image

Posted by gr...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

granthenke pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/kudu.git

commit 98b1c5d843d4df306373a6aad4885658e07719f1
Author: Grant Henke <gr...@apache.org>
AuthorDate: Thu Apr 16 10:34:50 2020 -0500

    [docker] Fix mini-ranger tests in the build image
    
    This patch fixes the mini-ranger tests when run in Docker. A few changes
    were made to support running these tests:
    - Keep Postgress and Ranger source in thirdparty
    - Run the Kudu build and tests as the Kudu user
    - Ensure the kudu user owns all the files
    
    Additionally to simplify testing and improve the usuablilty of the build
    images a few other changes were made:
    - The sudo package was added and the kudu user was made a sudoer
    - The default entrypoint for build images is `/bin/bash`
    
    Change-Id: I41c9c9ca8bb02a6d9d6e16b3197a1e883f642098
    Reviewed-on: http://gerrit.cloudera.org:8080/15756
    Tested-by: Kudu Jenkins
    Reviewed-by: Attila Bukor <ab...@apache.org>
---
 docker/Dockerfile           | 108 ++++++++++++++++++++++++++++++--------------
 docker/bootstrap-dev-env.sh |   2 +
 2 files changed, 77 insertions(+), 33 deletions(-)

diff --git a/docker/Dockerfile b/docker/Dockerfile
index 6afcc0d..8cf3fa5 100644
--- a/docker/Dockerfile
+++ b/docker/Dockerfile
@@ -53,6 +53,9 @@ LABEL org.label-schema.name="Apache Kudu Runtime Base" \
       org.label-schema.vcs-url=$VCS_URL \
       org.label-schema.version=$VERSION
 
+# Entry point to bash.
+CMD ["/bin/bash"]
+
 #
 # ---- Dev ----
 # Builds a base image that has all the development libraries for Kudu pre-installed.
@@ -72,6 +75,9 @@ RUN ./bootstrap-dev-env.sh \
 
 ENV PATH /usr/lib/ccache:/usr/lib64/ccache/:$PATH
 
+# Entry point to bash.
+CMD ["/bin/bash"]
+
 # Common label arguments.
 # VCS_REF is not specified to improve docker caching.
 ARG DOCKERFILE
@@ -99,21 +105,34 @@ LABEL org.label-schema.name="Apache Kudu Development Base" \
 #
 FROM dev AS thirdparty
 
-WORKDIR /kudu
+ARG UID=1000
+ARG GID=1000
+ARG BUILD_DIR="/kudu"
+
+# Setup the kudu user and create the neccessary directories.
+# We do this before copying any files othwerwise the image size is doubled by the chown change.
+RUN groupadd -g ${GID} kudu || groupmod -n kudu $(getent group ${GID} | cut -d: -f1) \
+    && useradd --shell /bin/bash -u ${UID} -g ${GID} -m kudu \
+    && echo 'kudu ALL=(ALL) NOPASSWD:ALL' >> /etc/sudoers \
+    && mkdir -p ${BUILD_DIR} && chown -R kudu:kudu ${BUILD_DIR}
+# Run the build as the kudu user.
+USER kudu
+
+WORKDIR ${BUILD_DIR}
 # We only copy the needed files for thirdparty so docker can handle caching.
-COPY ./thirdparty thirdparty
-COPY ./build-support/enable_devtoolset.sh \
+COPY --chown=kudu:kudu ./thirdparty thirdparty
+COPY --chown=kudu:kudu ./build-support/enable_devtoolset.sh \
   ./build-support/enable_devtoolset_inner.sh \
   build-support/
-COPY ./build-support/ccache-clang build-support/ccache-clang
-COPY ./build-support/ccache-devtoolset-3 build-support/ccache-devtoolset-3
+COPY --chown=kudu:kudu ./build-support/ccache-clang build-support/ccache-clang
+COPY --chown=kudu:kudu ./build-support/ccache-devtoolset-3 build-support/ccache-devtoolset-3
 RUN build-support/enable_devtoolset.sh \
   thirdparty/build-if-necessary.sh \
   # Remove the files left behind that we don't need.
-  # Remove all the source files except the hive, hadoop, and sentry sources
+  # Remove all the source files except the hadoop, hive, postgresql, ranger, and sentry sources
   # which are pre-built and symlinked into the installed/common/opt directory.
   && find thirdparty/src/* -maxdepth 0 -type d  \
-    \( ! -name 'hadoop-*' ! -name 'hive-*' ! -name 'sentry-*' \) \
+    \( ! -name 'hadoop-*' ! -name 'hive-*' ! -name 'postgresql-*' ! -name 'ranger-*' ! -name 'sentry-*' \) \
     -prune -exec rm -rf {} \; \
   # Remove all the build files except the llvm build which is symlinked into
   # the clang-toolchain directory.
@@ -129,6 +148,9 @@ ARG VCS_TYPE
 ARG VCS_URL
 ARG VERSION
 
+# Entry point to bash.
+CMD ["/bin/bash"]
+
 LABEL name="Apache Kudu Thirdparty" \
       description="An image that has Kudu's thirdparty dependencies pre-built." \
       # Common labels.
@@ -147,6 +169,7 @@ LABEL name="Apache Kudu Thirdparty" \
 #
 FROM thirdparty AS build
 
+ARG BUILD_DIR="/kudu"
 ARG BUILD_TYPE=release
 ARG LINK_TYPE=static
 ARG STRIP=1
@@ -156,23 +179,25 @@ ARG VCS_REF
 
 # Use the bash shell for all RUN commands.
 SHELL ["/bin/bash", "-c"]
+# Run the build as the kudu user.
+USER kudu
 
-WORKDIR /kudu
+WORKDIR ${BUILD_DIR}
 # Copy the C++ build source.
 # We copy the minimal source to optimize docker cache hits.
-COPY ./build-support build-support
-COPY ./docs/support docs/support
-COPY ./cmake_modules cmake_modules
-COPY ./examples/cpp examples/cpp
-COPY ./src src
-COPY ./CMakeLists.txt ./version.txt ./
+COPY --chown=kudu:kudu ./build-support build-support
+COPY --chown=kudu:kudu ./docs/support docs/support
+COPY --chown=kudu:kudu ./cmake_modules cmake_modules
+COPY --chown=kudu:kudu ./examples/cpp examples/cpp
+COPY --chown=kudu:kudu ./src src
+COPY --chown=kudu:kudu ./CMakeLists.txt ./version.txt ./
 
 # Copy the java build source.
 # Some parts of the C++ build depend on Java code.
-COPY ./java /kudu/java
+COPY --chown=kudu:kudu ./java ${BUILD_DIR}/java
 
 # Build the c++ code.
-WORKDIR /kudu/build/$BUILD_TYPE
+WORKDIR ${BUILD_DIR}/build/$BUILD_TYPE
 # Ensure we don't rebuild thirdparty. Instead let docker handle caching.
 ENV NO_REBUILD_THIRDPARTY=1
 RUN ../../build-support/enable_devtoolset.sh \
@@ -186,25 +211,29 @@ RUN ../../build-support/enable_devtoolset.sh \
   && make -j${PARALLEL} \
   # Install the client libraries for the python build to use.
   # TODO: Use custom install location when the python build can be configured to use it.
-  && make install \
+  && sudo make install \
   # Strip the binaries to reduce the images size.
   && if [ "$STRIP" == "1" ]; then find "bin" -name "kudu*" -type f -exec strip {} \;; fi \
   # Strip the client libraries to reduce the images size
   && if [[ "$STRIP" == "1" ]]; then find "/usr/local" -name "libkudu*" -type f -exec strip {} \;; fi
 
 # Build the java code.
-WORKDIR /kudu/java
+WORKDIR ${BUILD_DIR}/java
 RUN ./gradlew jar
 
 # Copy the python build source.
-COPY ./python /kudu/python
+COPY --chown=kudu:kudu ./python ${BUILD_DIR}/python
 # Build the python code.
-WORKDIR /kudu/python
-RUN pip install -r requirements.txt \
+WORKDIR ${BUILD_DIR}/python
+RUN pip install --user -r requirements.txt \
   && python setup.py sdist
 
 # Copy any remaining source files.
-COPY . /kudu
+WORKDIR ${BUILD_DIR}
+COPY --chown=kudu:kudu . ${BUILD_DIR}
+
+# Entry point to bash.
+CMD ["/bin/bash"]
 
 # Common label arguments.
 ARG DOCKERFILE
@@ -234,24 +263,39 @@ LABEL name="Apache Kudu Build" \
 #
 FROM runtime AS kudu-python
 
+ARG UID=1000
+ARG GID=1000
+ARG BUILD_DIR="/kudu"
+ARG INSTALL_DIR="/opt/kudu"
+
+# Setup the kudu user and create the neccessary directories.
+# We do this before copying any files othwerwise the image size is doubled by the chown change.
+RUN groupadd -g ${GID} kudu || groupmod -n kudu $(getent group ${GID} | cut -d: -f1) \
+    && useradd --shell /bin/bash -u ${UID} -g ${GID} -m kudu \
+    && mkdir -p ${INSTALL_DIR} && chown -R kudu:kudu ${INSTALL_DIR}
+
 COPY ./docker/bootstrap-python-env.sh /
 RUN ./bootstrap-python-env.sh \
   && rm bootstrap-python-env.sh
 
-ARG INSTALL_DIR="/opt/kudu"
+# Install as the kudu user.
+USER kudu
 
 ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib64
 WORKDIR $INSTALL_DIR/python
 # Copy the requirements file.
 COPY ./python/requirements.txt requirements.txt
-COPY --from=build /usr/local/lib/libkudu_client* /usr/local/lib/
-COPY --from=build /usr/local/include/kudu /usr/local/include/kudu
-COPY --from=build /kudu/python/dist/kudu-python-*.tar.gz .
-RUN pip install -r requirements.txt \
+COPY --chown=kudu:kudu --from=build /usr/local/lib/libkudu_client* /usr/local/lib/
+COPY --chown=kudu:kudu --from=build /usr/local/include/kudu /usr/local/include/kudu
+COPY --chown=kudu:kudu --from=build ${BUILD_DIR}/python/dist/kudu-python-*.tar.gz .
+RUN pip install --user -r requirements.txt \
     && rm -rf requirements.txt \
-    && pip install kudu-python-*.tar.gz \
+    && pip install --user kudu-python-*.tar.gz \
     && rm -rf kudu-python-*.tar.gz
 
+# Entry point to Python.
+CMD ["python"]
+
 ARG DOCKERFILE
 ARG MAINTAINER
 ARG URL
@@ -271,9 +315,6 @@ LABEL org.label-schema.name="Apache Kudu Python Client" \
       org.label-schema.vcs-url=$VCS_URL \
       org.label-schema.version=$VERSION
 
-# Entry point to the python.
-CMD ["python"]
-
 #
 # ---- Kudu ----
 # Builds a runtime image with the Kudu binaries pre-installed.
@@ -282,6 +323,7 @@ FROM runtime AS kudu
 
 ARG UID=1000
 ARG GID=1000
+ARG BUILD_DIR="/kudu"
 ARG INSTALL_DIR="/opt/kudu"
 ARG DATA_DIR="/var/lib/kudu"
 
@@ -294,13 +336,13 @@ RUN groupadd -g ${GID} kudu || groupmod -n kudu $(getent group ${GID} | cut -d:
 
 # Copy the binaries.
 WORKDIR $INSTALL_DIR/bin
-COPY --chown=kudu:kudu --from=build /kudu/build/latest/bin/kudu ./
+COPY --chown=kudu:kudu --from=build ${BUILD_DIR}/build/latest/bin/kudu ./
 # Add to the binaries to the path.
 ENV PATH=$INSTALL_DIR/bin/:$PATH
 
 # Copy the web files.
 WORKDIR $INSTALL_DIR
-COPY --chown=kudu:kudu --from=build /kudu/www ./www
+COPY --chown=kudu:kudu --from=build ${BUILD_DIR}/www ./www
 
 # Copy the entrypoint script.
 COPY --chown=kudu:kudu ./docker/kudu-entrypoint.sh /
diff --git a/docker/bootstrap-dev-env.sh b/docker/bootstrap-dev-env.sh
index fcdc496..8d4fbbc 100755
--- a/docker/bootstrap-dev-env.sh
+++ b/docker/bootstrap-dev-env.sh
@@ -57,6 +57,7 @@ if [[ -f "/usr/bin/yum" ]]; then
     pkgconfig \
     redhat-lsb-core \
     rsync \
+    sudo \
     unzip \
     vim-common \
     which \
@@ -148,6 +149,7 @@ elif [[ -f "/usr/bin/apt-get" ]]; then
     pkg-config \
     python \
     rsync \
+    sudo \
     unzip \
     vim-common \
     wget


[kudu] 01/02: [docs] Update schema documentation

Posted by gr...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

granthenke pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/kudu.git

commit 386cc74c1a15d10e85989b67e92cfe3d6b134f44
Author: Grant Henke <gr...@apache.org>
AuthorDate: Sun Apr 19 10:39:29 2020 -0500

    [docs] Update schema documentation
    
    This patch adds more details on the VARCHAR type to the schema
    docs. It also adds the DATE type and includes a small update to
    remove the explicit Hbase call out.
    
    Change-Id: I681e0af517b08c348420b3b217c393797717d3fc
    Reviewed-on: http://gerrit.cloudera.org:8080/15757
    Tested-by: Kudu Jenkins
    Reviewed-by: Volodymyr Verovkin <ve...@cloudera.com>
    Reviewed-by: Hao Hao <ha...@cloudera.com>
---
 docs/schema_design.adoc | 39 +++++++++++++++++++++++++++++----------
 1 file changed, 29 insertions(+), 10 deletions(-)

diff --git a/docs/schema_design.adoc b/docs/schema_design.adoc
index 9b05991..0c1e0a6 100644
--- a/docs/schema_design.adoc
+++ b/docs/schema_design.adoc
@@ -72,13 +72,14 @@ column types include:
 * 16-bit signed integer
 * 32-bit signed integer
 * 64-bit signed integer
+* date (32-bit days since the Unix epoch)
 * unixtime_micros (64-bit microseconds since the Unix epoch)
 * single-precision (32-bit) IEEE-754 floating-point number
 * double-precision (64-bit) IEEE-754 floating-point number
 * decimal (see <<decimal>> for details)
+* varchar (see <<varchar>> for details)
 * UTF-8 encoded string (up to 64KB uncompressed)
 * binary (up to 64KB uncompressed)
-* VARCHAR type with configurable maximum length (up to 64KB uncompressed)
 
 Kudu takes advantage of strongly-typed columns and a columnar on-disk storage
 format to provide efficient encoding and serialization. To make the most of
@@ -90,9 +91,9 @@ be specified on a per-column basis.
 [[no_version_column]]
 [IMPORTANT]
 .No Version or Timestamp Column
-Unlike HBase, Kudu does not provide a version or timestamp column to track changes
-to a row. If version or timestamp information is needed, the schema should include
-an explicit version or timestamp column.
+Kudu does not provide a version or timestamp column to track changes to a row.
+If version or timestamp information is needed, the schema should include an
+explicit version or timestamp column.
 
 [[decimal]]
 === Decimal Type
@@ -136,6 +137,24 @@ Before encoding and compression:
 NOTE: The precision and scale of `decimal` columns cannot be changed by altering
 the table.
 
+[[varchar]]
+=== Varchar Type
+
+The `varchar` type is a UTF-8 encoded string (up to 64KB uncompressed) with a
+fixed maximum character length. This type is especially useful when migrating
+from or integrating with legacy systems that support the `varchar` type.
+If a maximum character length is not required the `string` type should be
+used instead.
+
+The `varchar` type is a parameterized type that takes a length attribute.
+
+*Length* represents the maximum number of UTF-8 characters allowed. Values
+with characters greater than the limit will be truncated. This value must
+be between 1 and 65535 and has no default. Note that some other systems
+may represent the length limit in bytes instead of characters. That means
+that Kudu may be able to represent longer values in the case of multi-byte
+UTF-8 characters.
+
 [[encoding]]
 === Column Encoding
 
@@ -145,12 +164,12 @@ of the column.
 .Encoding Types
 [options="header"]
 |===
-| Column Type             | Encoding                       | Default
-| int8, int16, int32      | plain, bitshuffle, run length  | bitshuffle
-| int64, unixtime_micros  | plain, bitshuffle, run length  | bitshuffle
-| float, double, decimal  | plain, bitshuffle              | bitshuffle
-| bool                    | plain, run length              | run length
-| string, binary, varchar | plain, prefix, dictionary      | dictionary
+| Column Type               | Encoding                       | Default
+| int8, int16, int32, int64 | plain, bitshuffle, run length  | bitshuffle
+| date, unixtime_micros     | plain, bitshuffle, run length  | bitshuffle
+| float, double, decimal    | plain, bitshuffle              | bitshuffle
+| bool                      | plain, run length              | run length
+| string, varchar, binary   | plain, prefix, dictionary      | dictionary
 |===
 
 [[plain]]