You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Paul Rogers (Jira)" <ji...@apache.org> on 2020/02/02 19:06:00 UTC

[jira] [Comment Edited] (DRILL-7563) Docker & Kubernetes Drill server container

    [ https://issues.apache.org/jira/browse/DRILL-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17028499#comment-17028499 ] 

Paul Rogers edited comment on DRILL-7563 at 2/2/20 7:05 PM:
------------------------------------------------------------

h4. Launch Script Extensions

When we created Drill-on-YARN, we found we had to adjust several of Drill's launch scripts to align them with the way YARN works. The same will likely be true for Docker and K8s. The goal for Drill 1.17 is to work around these limitations so we can use the official 1.17 release as the basis. For 1.18 and later, we have an opportunity to improve the scripts.

* It is generally a good idea to make Docker images as small as possible. Add a script to strip out unneeded files. For example, a Drillbit container does not need Sqlline or DoY. Alternatively, a Docker-specific build which excludes unneded files.
* Launch option, similar to {{run}}, which will run Drill as pid 0 (so it can receive shutdown signals), and which writes logs to stdout (typical of K8s pods). This means, at least, changing the "Starting drillbit, logging to /var/log/drill/drillbit.out" message.
* Environment variable for the ZK connect string, rather than burying it in {{drill-override.conf}}. This can be as simple as, if {{ZK_CONNECT}} is set, add the following to {{DRILL_JAVA_OPTS}}: {{-Ddrill.exec.zk.connect=$ZK_CONNECT}}.
* Provide another layer of "distrib" configuration for {{drill-override.conf}} similar to that provided by {{distrib-env.sh}}. Use this to configure the local UDF registry. That way, the "user's" version of the file can be set from a K8s config map.


was (Author: paul.rogers):
h4. Launch Script Extensions

When we created Drill-on-YARN, we found we had to adjust several of Drill's launch scripts to align them with the way YARN works. The same will likely be true for Docker and K8s. The goal for Drill 1.17 is to work around these limitations so we can use the official 1.17 release as the basis. For 1.18 and later, we have an opportunity to improve the scripts.

* It is generally a good idea to make Docker images as small as possible. Add a script to strip out unneeded files. For example, a Drillbit container does not need Sqlline or DoY. Alternatively, a Docker-specific build which excludes unneded files.
* Launch option, similar to {{run}}, which will run Drill as pid 0 (so it can receive shutdown signals), and which writes logs to stdout (typical of K8s pods). This means, at least, changing the "Starting drillbit, logging to /var/log/drill/drillbit.out" message.
* Environment variable for the ZK connect string, rather than burying it in {{drill-override.conf}}. This can be as simple as, if {{ZK_CONNECT}} is set, add the following to {{DRILL_JAVA_OPTS}}: {{-Ddrill.exec.zk.connect=$ZK_CONNECT}}.

> Docker & Kubernetes Drill server container
> ------------------------------------------
>
>                 Key: DRILL-7563
>                 URL: https://issues.apache.org/jira/browse/DRILL-7563
>             Project: Apache Drill
>          Issue Type: Improvement
>    Affects Versions: 1.17.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Major
>
> Drill provides two Docker containers:
> * [Build Drill from sources|https://github.com/apache/drill/blob/master/Dockerfile]
> * [Run Drill in interactive embedded mode|https://github.com/apache/drill/blob/master/distribution/Dockerfile]
> User feedback suggests that these are not quite the right solutions to run Drill in a K8s (or OpenShift) cluster. In addition, we need a container to run a Drill server. This ticket summarizes the tasks involved.
> h4. Container Image
> The container image should:
> * Start with the OpenJDK base image with minimal extra packages.
> * Download and install an official Drill release.
> We may then want to provide two derived images:
> The Drillbit image which:
> * Configures Drill for production and as needed in the following steps.
> * Provides entry points for the Drillbit and for Sqlline
> * Exposes Drill's four ports
> * Accept as parameters things like the ZK host IP(s).
> The Sqlline image, meant to be run in interactive mode (like the current embedded image) and which:
> * Accept as parameters the ZK host IP(s).
> Both should be published to the official Drill DockerHub account: https://hub.docker.com/r/apache/drill
> h4. Runtime Environment
> Drill has very few dependencies, but it must have a running ZK.
> * Start a [ZK container|https://hub.docker.com/_/zookeeper/].
> * A place to store logs, which can be in the container by default, stored on the host file system via a volume mount.
> * Access to a data source, which can be configured via a storage plugin stored in ZK.
> * Ensure graceful shutdown integration with the Docker shutdown mechanism.
> h4. Running Drill in Docker
> Users must run at least one Drillbit, and may run more. Users may want to run Sqlline.
> * The Drillbit container requires, at a minimum, the IP address of the ZK instance(s).
> * The Sqlline container requires only the ZK instances, from which it can find the Drillbit.
> Uses will want to customize some parts of Drill: at least memory, perhaps any of the other options. Provide a way to pass this information into the container to avoid the need to rebuild the container to change configuration.
> h4. Running Drill in K8s
> The containers should be usable in "plain" Docker. Today, however, many people use K8s to orchestrate Docker. Thus, the Drillbit (but probably not the Sqlline) container should be designed to work with K8s. An example set of K8s YAML files should illustrate:
> * Create a host-mount file system to capture Drill logs and query profiles.
> * Optionally write Drill logs to stdout, to be captured by {{fluentd}} or similar tools.
> * Pass Drill configuration (both HOCON and envrironment) as config maps.
> * Pass ZK as an environment variable (the value of which would, one presumes, come from some kind of service discovery system.)
> The result is that the user should be able to manually tinker with the YAML files, then use {{kubeadm}} to launch, monitor and stop Drill. The user sets cluster size manually by launching the desired number of Drill pods.
> h4. Helm Chart for Drill
> The next step is to wrap the YAML files in a Helm chart, with parameters exposed for the config options noted above.
> h4. Drill Operator for K8s
>  
> Full K8s integration will require an operator to manage the Drill cluster. K8s operators are often written in Go, though doing so is not necessary. Drill already includes Drill-on-YARN which is, essential a "YARN operator." Repurpose this code to work with K8s as the target cluster manager rather than YARN. Reuse the same operations from DoY: configure, start, resize and stop a cluster.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)