You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by arina-ielchiieva <gi...@git.apache.org> on 2016/11/30 16:03:44 UTC

[GitHub] drill pull request #672: DRILL-5085: Add / update description for dynamic UD...

GitHub user arina-ielchiieva opened a pull request:

    https://github.com/apache/drill/pull/672

    DRILL-5085: Add / update description for dynamic UDFs directories in \u2026

    \u2026drill-env.sh and drill-module.conf

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/arina-ielchiieva/drill DRILL-5085

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/672.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #672
    
----
commit 6a67ea1a036054e7c81dc3a3339626dc142a8219
Author: Arina Ielchiieva <ar...@gmail.com>
Date:   2016-11-30T15:30:35Z

    DRILL-5085: Add / update description for dynamic UDFs directories in drill-env.sh and drill-module.conf

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #672: DRILL-5085: Add / update description for dynamic UD...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/672#discussion_r90267374
  
    --- Diff: exec/java-exec/src/main/resources/drill-module.conf ---
    @@ -207,11 +207,12 @@ drill.exec: {
           // Set this property if custom absolute root should be used for remote directories, ex: root: "/app/drill".
           // root: "",
     
    -      // Relative base directory for all udf directories (local and remote).
    +      // Base directory for all udf directories (local and remote).
           base: ${drill.exec.zk.root}"/udf",
     
    -      // Relative path to local udf directory.
    -      // Generated temporary directory is used as root unless ${DRILL_TMP_DIR} or ${drill.tmp-dir} is set.
    +      // Relative local udf directory. Drill temporary directory is used as root for this directory.
    +      // Drill temporary directory is set to generated temporary directory
    +      // unless ${DRILL_TMP_DIR} or ${drill.tmp-dir} is set.
    --- End diff --
    
    Still not sure this entirely makes sense. This is a local file system directory. We describe ow the default is based on DRILL_TMP_DIR, etc. That all make sense.
    
    But the actual default value is based on the base directory in the file system configured by fs, which might be HDFS.
    
    And, since we specify a value here, the `drill.tmp.dir` will never be used.
    
    Do we really mean:
    ```
    local: "${drill-tmp-dir}/udfs"
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #672: DRILL-5085: Add / update description for dynamic UD...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/672#discussion_r90687564
  
    --- Diff: distribution/src/resources/drill-override-example.conf ---
    @@ -170,7 +170,17 @@ drill.exec: {
         threadpool_size: 8,
         decode_threadpool_size: 1
       },
    -  debug.error_on_leak: true
    +  debug.error_on_leak: true,
    +  udf: {
    --- End diff --
    
    Comment?
    
    # Settings for Dynamic UDFs. See (link to docs)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #672: DRILL-5085: Add / update description for dynamic UD...

Posted by arina-ielchiieva <gi...@git.apache.org>.
Github user arina-ielchiieva commented on a diff in the pull request:

    https://github.com/apache/drill/pull/672#discussion_r90483802
  
    --- Diff: distribution/src/resources/drill-env.sh ---
    @@ -141,4 +141,9 @@
     # Arguments passed to sqlline (the Drill shell) at all times: whether
     # Drill is embedded in Sqlline or not.
     
    -#export DRILL_SHELL_JAVA_OPTS="..."
    \ No newline at end of file
    +#export DRILL_SHELL_JAVA_OPTS="..."
    +
    +# Location Drill should use for temporary files, such as downloaded dynamic UDFs jars.
    +# Set to "/tmp" by default.
    +#
    +# export DRILL_TMP_DIR="..."
    --- End diff --
    
    Done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #672: DRILL-5085: Add / update description for dynamic UD...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/672#discussion_r90689224
  
    --- Diff: exec/java-exec/src/main/resources/drill-module.conf ---
    @@ -45,11 +45,13 @@ drill.client: {
       supports-complex-types: true
     }
     
    -// Directory is used as base for temporary storage of Dynamic UDF jars.
    -// Set this property if you want to have custom temporary directory, instead of generated at runtime.
    -// By default ${DRILL_TMP_DIR} is used if set.
    -// drill.tmp-dir: "/tmp"
    -// drill.tmp-dir: ${?DRILL_TMP_DIR}
    +// Location Drill uses for temporary files, such as downloaded dynamic UDFs jars.
    --- End diff --
    
    Can we leave it that, if blank, it will use the system-generated directory as you have in 1.9? This means we'll still need your getTmpDir method. To do the check for blank.
    
    Also, as it turns out, we already have a temp dir setting used elsewhere. Look in drill-override-example.conf under drill.sys.store.provider.local.path. It uses "/tmp/drill". Then it adds subdirs for storage plugins and what-not.
    
    Can we use "/tmp/drill/udf" as the base temp storage location for udfs to be consistent? Then we'd add the cluster-id, etc. as in your "cluster-temp-dir" below.
    
    And, yes, we have lots of places where we specify the temp file system (UDFs, the one cited above, the (unused) cache.hazel, the temp dirs for spilling, ...). We do need to create a project (later) to rationalize all of this...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill issue #672: DRILL-5085: Add / update description for dynamic UDFs dire...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on the issue:

    https://github.com/apache/drill/pull/672
  
    Oh, and can we copy some of these changes into drill-override-example.conf in distribution/src/resources? Users see the defaults & explanations there, but not those in drill-module.conf.
    
    Maybe just fs, base and local. The others should be fine at their default value for almost all users.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #672: DRILL-5085: Add / update description for dynamic UD...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/672#discussion_r90268383
  
    --- Diff: exec/java-exec/src/main/resources/drill-module.conf ---
    @@ -45,7 +45,7 @@ drill.client: {
       supports-complex-types: true
     }
     
    -// Directory is used as base for temporary storage of Dynamic UDF jars.
    +// Directory is used as root for temporary storage of Dynamic UDF jars.
    --- End diff --
    
    Where does the user configure the suffix added to this directory to get the temp dir for a given cluster? For example, should we have something like:
    
    drill.cluster-temp-dir: "${drill.tmp-dir}/${drill.exec.zk.root}"
    
    Note that tmp itself is a shared resource. If it is in /tmp, then multiple clusters will use it. If it is in $DRILL_HOME/tmp, then still multiple clusters will use it. Only if it is configured as $DRILL_SITE/tmp can we be guaranteed only one cluster will use the directory (because only one cluster can use each $DRILL_SITE dir -- that's how $DRILL_SITE is defined.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill issue #672: DRILL-5085: Add / update description for dynamic UDFs dire...

Posted by sudheeshkatkam <gi...@git.apache.org>.
Github user sudheeshkatkam commented on the issue:

    https://github.com/apache/drill/pull/672
  
    +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #672: DRILL-5085: Add / update description for dynamic UD...

Posted by arina-ielchiieva <gi...@git.apache.org>.
Github user arina-ielchiieva commented on a diff in the pull request:

    https://github.com/apache/drill/pull/672#discussion_r90484226
  
    --- Diff: exec/java-exec/src/main/resources/drill-module.conf ---
    @@ -207,11 +207,12 @@ drill.exec: {
           // Set this property if custom absolute root should be used for remote directories, ex: root: "/app/drill".
           // root: "",
     
    -      // Relative base directory for all udf directories (local and remote).
    +      // Base directory for all udf directories (local and remote).
           base: ${drill.exec.zk.root}"/udf",
     
    -      // Relative path to local udf directory.
    -      // Generated temporary directory is used as root unless ${DRILL_TMP_DIR} or ${drill.tmp-dir} is set.
    +      // Relative local udf directory. Drill temporary directory is used as root for this directory.
    +      // Drill temporary directory is set to generated temporary directory
    +      // unless ${DRILL_TMP_DIR} or ${drill.tmp-dir} is set.
    --- End diff --
    
    Updated to local: "${drill.cluster-tmp-dir}/udf/local".


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill issue #672: DRILL-5085: Add / update description for dynamic UDFs dire...

Posted by arina-ielchiieva <gi...@git.apache.org>.
Github user arina-ielchiieva commented on the issue:

    https://github.com/apache/drill/pull/672
  
    @paul-rogers made changes after 2CR. Partially reverted temporary directory implementation logic to provide backward compatibility. Please review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #672: DRILL-5085: Add / update description for dynamic UD...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/672#discussion_r90688304
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/registry/RemoteFunctionRegistry.java ---
    @@ -189,6 +188,7 @@ private void prepareStores(PersistentStoreProvider storeProvider, ClusterCoordin
        * if not set, uses user home directory instead.
        */
       private void prepareAreas(DrillConfig config) {
    +    logger.info("Preparing three remote udf areas: staging, registry and tmp.");
    --- End diff --
    
    Although we suggested keeping the file path stuff as-is, the new logging messages are great; let's keep them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #672: DRILL-5085: Add / update description for dynamic UD...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/672#discussion_r90687630
  
    --- Diff: distribution/src/resources/drill-override-example.conf ---
    @@ -170,7 +170,17 @@ drill.exec: {
         threadpool_size: 8,
         decode_threadpool_size: 1
       },
    -  debug.error_on_leak: true
    +  debug.error_on_leak: true,
    +  udf: {
    +    retry-attempts: 10,
    --- End diff --
    
    Comment to explain this? Just lift wording from the docs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #672: DRILL-5085: Add / update description for dynamic UD...

Posted by arina-ielchiieva <gi...@git.apache.org>.
Github user arina-ielchiieva commented on a diff in the pull request:

    https://github.com/apache/drill/pull/672#discussion_r91085531
  
    --- Diff: exec/java-exec/src/main/resources/drill-module.conf ---
    @@ -45,11 +45,13 @@ drill.client: {
       supports-complex-types: true
     }
     
    -// Directory is used as base for temporary storage of Dynamic UDF jars.
    -// Set this property if you want to have custom temporary directory, instead of generated at runtime.
    -// By default ${DRILL_TMP_DIR} is used if set.
    -// drill.tmp-dir: "/tmp"
    -// drill.tmp-dir: ${?DRILL_TMP_DIR}
    +// Location Drill uses for temporary files, such as downloaded dynamic UDFs jars.
    --- End diff --
    
    1. Reverted generated directory logic.
    2. "/tmp/drill/udf" + cluster-id is not good idea, since local udf directory has similar location as remote. And remote directory is generated as cluster-id + /udf. So I suggest to keep it as is. Local registry is returned to use its previous composition logic.
    3. Removed "cluster-temp-dir", we don't need it for now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #672: DRILL-5085: Add / update description for dynamic UD...

Posted by arina-ielchiieva <gi...@git.apache.org>.
Github user arina-ielchiieva commented on a diff in the pull request:

    https://github.com/apache/drill/pull/672#discussion_r91078578
  
    --- Diff: distribution/src/resources/drill-override-example.conf ---
    @@ -170,7 +170,17 @@ drill.exec: {
         threadpool_size: 8,
         decode_threadpool_size: 1
       },
    -  debug.error_on_leak: true
    +  debug.error_on_leak: true,
    +  udf: {
    +    retry-attempts: 10,
    --- End diff --
    
    Done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #672: DRILL-5085: Add / update description for dynamic UD...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/672#discussion_r90688097
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/FunctionImplementationRegistry.java ---
    @@ -377,14 +374,12 @@ private ScanResult scan(ClassLoader classLoader, Path path, URL[] urls) throws I
        * Creates local udf directory, if it doesn't exist.
        * Checks if local udf directory is a directory and if current application has write rights on it.
        * Attempts to clean up local udf directory in case jars were left after previous drillbit run.
    -   * Local udf directory path is concatenated from drill temporary directory and ${drill.exec.udf.directory.local}.
        *
        * @param config drill config
        * @return path to local udf directory
        */
       private Path getLocalUdfDir(DrillConfig config) {
    -    tmpDir = getTmpDir(config);
    -    File udfDir = new File(tmpDir, config.getString(ExecConstants.UDF_DIRECTORY_LOCAL));
    +    File udfDir = new File(config.getString(ExecConstants.UDF_DIRECTORY_LOCAL));
    --- End diff --
    
    Although we talked about using the new system you've implemented here, we have to consider backward compatibility.
    
    Since the original behavior is already visible to users in Drill 1.8, I think we need to leave your original design.
    
    At some point, we'll need to rationalize how Drill handles temp files and storage in DFS. But, until then, your 1.8 design is fine.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #672: DRILL-5085: Add / update description for dynamic UD...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/672#discussion_r90266159
  
    --- Diff: distribution/src/resources/drill-env.sh ---
    @@ -141,4 +141,9 @@
     # Arguments passed to sqlline (the Drill shell) at all times: whether
     # Drill is embedded in Sqlline or not.
     
    -#export DRILL_SHELL_JAVA_OPTS="..."
    \ No newline at end of file
    +#export DRILL_SHELL_JAVA_OPTS="..."
    +
    +# Location Drill should use for temporary files, such as downloaded dynamic UDFs jars.
    +# Set to "/tmp" by default.
    +#
    +# export DRILL_TMP_DIR="..."
    --- End diff --
    
    Nit: add a newline after this line.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #672: DRILL-5085: Add / update description for dynamic UD...

Posted by arina-ielchiieva <gi...@git.apache.org>.
Github user arina-ielchiieva commented on a diff in the pull request:

    https://github.com/apache/drill/pull/672#discussion_r91085385
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/registry/RemoteFunctionRegistry.java ---
    @@ -189,6 +188,7 @@ private void prepareStores(PersistentStoreProvider storeProvider, ClusterCoordin
        * if not set, uses user home directory instead.
        */
       private void prepareAreas(DrillConfig config) {
    +    logger.info("Preparing three remote udf areas: staging, registry and tmp.");
    --- End diff --
    
    Sure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #672: DRILL-5085: Add / update description for dynamic UD...

Posted by arina-ielchiieva <gi...@git.apache.org>.
Github user arina-ielchiieva commented on a diff in the pull request:

    https://github.com/apache/drill/pull/672#discussion_r91085335
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/FunctionImplementationRegistry.java ---
    @@ -377,14 +374,12 @@ private ScanResult scan(ClassLoader classLoader, Path path, URL[] urls) throws I
        * Creates local udf directory, if it doesn't exist.
        * Checks if local udf directory is a directory and if current application has write rights on it.
        * Attempts to clean up local udf directory in case jars were left after previous drillbit run.
    -   * Local udf directory path is concatenated from drill temporary directory and ${drill.exec.udf.directory.local}.
        *
        * @param config drill config
        * @return path to local udf directory
        */
       private Path getLocalUdfDir(DrillConfig config) {
    -    tmpDir = getTmpDir(config);
    -    File udfDir = new File(tmpDir, config.getString(ExecConstants.UDF_DIRECTORY_LOCAL));
    +    File udfDir = new File(config.getString(ExecConstants.UDF_DIRECTORY_LOCAL));
    --- End diff --
    
    Ok, reverted usage of generated temporary directory logic.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #672: DRILL-5085: Add / update description for dynamic UD...

Posted by arina-ielchiieva <gi...@git.apache.org>.
Github user arina-ielchiieva commented on a diff in the pull request:

    https://github.com/apache/drill/pull/672#discussion_r91077966
  
    --- Diff: distribution/src/resources/drill-override-example.conf ---
    @@ -170,7 +170,17 @@ drill.exec: {
         threadpool_size: 8,
         decode_threadpool_size: 1
       },
    -  debug.error_on_leak: true
    +  debug.error_on_leak: true,
    +  udf: {
    --- End diff --
    
    Done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #672: DRILL-5085: Add / update description for dynamic UD...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/drill/pull/672


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill issue #672: DRILL-5085: Add / update description for dynamic UDFs dire...

Posted by arina-ielchiieva <gi...@git.apache.org>.
Github user arina-ielchiieva commented on the issue:

    https://github.com/apache/drill/pull/672
  
    1. Made changes after CR.
    2. Added udf config example into drill-override-example.conf.
    3. Minor refactoring.
    
    @paul-rogers please review when possible.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---