You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Paul Rogers (Jira)" <ji...@apache.org> on 2020/02/19 18:56:00 UTC

[jira] [Created] (DRILL-7593) Standardize local paths

Paul Rogers created DRILL-7593:
----------------------------------

             Summary: Standardize local paths
                 Key: DRILL-7593
                 URL: https://issues.apache.org/jira/browse/DRILL-7593
             Project: Apache Drill
          Issue Type: Improvement
    Affects Versions: 1.17.0
            Reporter: Paul Rogers


Discovered in the context of DRILL-7589 (PR #1987) is the idea of standardizing our set of local file system paths used when Drill runs in embedded mode. There may also be an opportunity to unify local file system paths used in distributed mode.

In distributed mode, we use ZK for distribution: all shared data must be in a location visible to all Drillbits: either ZK or a DFS. There is some need for local storage such as for UDF staging and for spill files.

In local mode, all persistent storage occurs on the local file system; there is no ZK and there is no need to coordinate a set of Drillbits.

At present, the local paths are spread all over the config system. Code that wants to set up local paths (such as {{DirTestWatcher}}) must handle each directory specially. Then, either {{ClusterFixture}} or a unit test must set the proper config property to match the directory selection.

For example, from {{drill-module.conf}}:

{noformat}
drill.tmp-dir: "/tmp"
drill.tmp-dir: ${?DRILL_TMP_DIR}
...
 sys.store.provider: {
 local: {
 path: "/tmp/drill",
 }
 trace: {
 directory: "/tmp/drill-trace",
 filesystem: "file:///"
 },
 tmp: {
 directories: ["/tmp/drill"],
 filesystem: "drill-local:///"
 },
 compile: {
 code_dir: "/tmp/drill/codegen"
...
 spill: {
 // *** Options common to all the operators that may spill
 // File system to use. Local file system by default.
 fs: "file:///",
 // List of directories to use. Directories are created
 // if they do not exist.
 directories: [ "/tmp/drill/spill" ]
...
 udf: {
 directory: {
 // Base directory for remote and local udf directories, unique among clusters.
{noformat}

And probably more. To move where Drill stores temp files, the user must change all of these properties.

Fortunately, [~arina] did a nice job with the UDF directories: they all are computed from the base directory:

{noformat}
 directory: {
 // Base directory for remote and local udf directories, unique among clusters.
 base: ${drill.exec.zk.root}"/udf",

// Path to local udf directory, always created on local file system.
 // Root for these directory is generated at runtime unless Drill temporary directory is set.
 local: ${drill.exec.udf.directory.base}"/udf/local",

// Set this property if custom file system should be used to create remote directories, ex: fs: "file:///".
 // fs: "",
 // Set this property if custom absolute root should be used for remote directories, ex: root: "/app/drill".
 // root: "",

// Relative path to all remote udf directories.
 // Directories are created under default file system taken from Hadoop configuration
 // unless ${drill.exec.udf.directory.fs} is set.
 // User home directory is used as root unless ${drill.exec.udf.directory.root} is set.
 staging: ${drill.exec.udf.directory.base}"/staging",
 registry: ${drill.exec.udf.directory.base}"/registry",
 tmp: ${drill.exec.udf.directory.base}"/tmp"
 }
{noformat}

So, can we do the same thing for all the other local directories? Allow each to be custom-set, but default them to be computed from a single base directory. That way, if a unit test or install wants to move the Drill local directories to, say, {{/var/drill/tmp}}, they only need change a single config line and everything else follows automatically.

This can be done in the existing conf file as was done for UDFs. And, I guess to preserve compatibility, we'd have to leave the properties where they are; we'd just change their values.

This ticket asks to:

* Work out a good solution.
* Implement it in the config system
* Scrub the unit tests and {{DirTestWatcher}} to determine where we can simplify code by reusing this solution rather than ad-hoc, per directory configs.
* Modify {{DirTestWatcher}} to coordinate with the config system: Set the base directory in config, then use the configured paths for each of the persistent store, profile, UDF and other directories.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)