You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@drill.apache.org by kkhatua <gi...@git.apache.org> on 2017/08/25 23:41:21 UTC

[GitHub] drill pull request #922: DRILL-5741: During startup Drill should not exceed ...

GitHub user kkhatua opened a pull request:

    https://github.com/apache/drill/pull/922

    DRILL-5741: During startup Drill should not exceed the available memory

    Providing an environment variable - DRILLBIT_MAX_PROC_MEM to ensure that a Drillbit's max memory parameters, cumulatively, don't exceed the value specified.
    The variable can be defined in KB, MB, or  GB; similar in syntax to how the JVM MaxHeap is specified.
    e.g. 
    ```
    DRILLBIT_MAX_PROC_MEM=13G
    DRILLBIT_MAX_PROC_MEM=8192m
    DRILLBIT_MAX_PROC_MEM=4194304K
    ```
    
    In addition, you can specify it as a percent of the available free memory prior to the Drillbit starting up:
    `DRILLBIT_MAX_PROC_MEM=40%
    `
    For a system with with 28GB free memory, when set to 40% the Drillbit (with settings defined in drill-env.sh) fails startup with the following message:
    ```
    2017-08-25 14:58:57  [ERROR]    Unable to start Drillbit due to memory constraint violations
      Max Memory Permitted : 40% (11 of 28 GB free memory)
      Total Memory Requested : 26 GB
      Check the following settings to possibly modify (or increase the Max Memory Permitted):
            -Xmx8g
            -XX:MaxDirectMemorySize=16g
            -XX:ReservedCodeCacheSize=1G
            -XX:MaxPermSize=512M
    
    ```

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kkhatua/drill DRILL-5741

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/922.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #922
    
----
commit 04b04d8e4b0c24fef053bbbd6ecebfaebad99d9c
Author: Kunal Khatua <kk...@maprtech.com>
Date:   2017-08-25T23:35:37Z

    DRILL-5741: During startup Drill should not exceed the available memory
    
    Providing an environment variable - DRILLBIT_MAX_PROC_MEM to ensure that a Drillbit's max memory parameters, cumulatively, don't exceed the value specified.
    The variable can be defined in KB, MB, or  GB; similar in syntax to how the JVM MaxHeap is specified.
    e.g. 
    ```
    DRILLBIT_MAX_PROC_MEM=13G
    DRILLBIT_MAX_PROC_MEM=8192m
    DRILLBIT_MAX_PROC_MEM=4194304K
    ```
    
    In addition, you can specify it as a percent of the available free memory prior to the Drillbit starting up:
    `DRILLBIT_MAX_PROC_MEM=40%
    `
    For a system with with 28GB free memory, when set to 40% the Drillbit (with settings defined in drill-env.sh) fails startup with the following message:
    ```
    2017-08-25 14:58:57  [ERROR]    Unable to start Drillbit due to memory constraint violations
      Max Memory Permitted : 40% (11 of 28 GB free memory)
      Total Memory Requested : 26 GB
      Check the following settings to possibly modify (or increase the Max Memory Permitted):
            -Xmx8g
            -XX:MaxDirectMemorySize=16g
            -XX:ReservedCodeCacheSize=1G
            -XX:MaxPermSize=512M
    ```

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill issue #922: DRILL-5741: During startup Drill should not exceed the ava...

Posted by kkhatua <gi...@git.apache.org>.

Github user kkhatua commented on the issue:

https://github.com/apache/drill/pull/922

@paul-rogers
The reason this would be needed is because the system admin might not be the one managing the specifics of the Drill memory allocations. This allows a node admin to define the limits within which Drill will work, while the specifications can be delegated to the drill-env.sh script which is managed by a power user.

There isn't a conflict with the way a service like Warden would allocate memory, because it does not need to manage the internal allocations, only the overall. The Drill service will continue to own its configuration's specifics.

Your solution is pretty much in line with what the PR proposes. The exception being that, in the presence of the env variable defined, the specifications **cannot** oversubscribe the available (or max permissible) memory. In case of oversubscribing, Drill will not startup.

If there are uber resource managers (for e.g. YARN), they have the option of either

- providing this limit and relying on files like ```drill-env.sh``` to specify the limits...

- provide the explicit parameters for Heap, DirectMemory, etc to the JVM and **not** have to specify ```DRILLBIT_MAX_PROC_MEM``` .

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill issue #922: DRILL-5741: During startup Drill should not exceed the ava...

Posted by paul-rogers <gi...@git.apache.org>.

Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/922

This may be one of those times when we need to resort to a bit of design thinking.

The core idea is that the user sets one environment variable to check the others. The first issue is that, if the user can't do the sums to set the Drill memory allocation right (with respect to actual memory), not sure how they will get the total memory variable right.

OK, so we get the memory from the system, then do a percentage. That is better. But, what is the system memory? Is it total memory? Suppose the user says Drill gets 60%. We can now check. But, Drill is distributed. Newer nodes in a cluster may have 256GB, older nodes 128GB. Drill demands symmetrical resources so the memory given to Drill must be identical on all nodes, regardless of system memory. So, the percent of total system memory idea doesn't work in practice.

So, maybe we express memory as the total *free* memory. Cool. We give Drill 60%. Drill starts and everything is fine. Now, we also give Spark 60%. Spark starts. It complains in its logs (assuming we make this same change to the Spark startup scripts.) But, Spark uses its memory and causes Drill to fail. We check Drill logs. Nada. We have to check Spark's logs. Now, imagine doing this with five apps; the app that complains may not be the one to fail. And, imagine doing this across 100 nodes. Won't scale.

Note that the problem is that we checked memory statically at startup. But, our problem was that things changed later: we launched an over-subscribed Spark. So, our script must run continuously, constantly checking if any new apps are launched. Since some apps grow memory over time, we have to check all other apps for total memory usage against that allocated to Drill.

Now, presumably, all other apps are doing the same: Spark is continually checking, Storm is doing so, and so on. Now, the admin needs to gather all these logs (across dozens of nodes) and extract meaning. What we need, then, is a network endpoint to publish the information and a tool to gather and report that data. We've just invented monitoring tools.

Take a step back, what we really want to know is available system memory vs. that consumed by apps. So, what we want is a Linux-level monitoring of free memory. And, since we have other things to do, we want alerts when free memory drops below some point. We've now invented alerting tools.

Now, we got into this mess because we launched apps without concern about the total memory usage on each node. That is, we didn't plan our app load to fit into our available memory. So, we turn this around. We've got 128GB (say) of memory. How do we run only those apps that fit, deferring those that don't? We've just invented YARN, Mesos, Kubernetes and the like.

Now we get to the reason for the -1. The proposed change adds significant complexity to the scripts, *but can never solve the actual oversubscription problem*. For that, we need a global resource manager.

Now, suppose that someone wants to run Drill without such a manager. Perhaps some distribution does not provide this tool and instead provides a tool that simply launches processes, leaving it to each process to struggle with its own resources. In such an environment, the vendor can add a check, such as this one, that will fire on all nodes and warn the user about potential oversubscription *on that node*, *at that moment*, *for that app* in *one app's log file*.

To facilitate this, we can do two things.

1. In the vendor-specific `distrib-env.sh` file, do any memory setting adjustments that are wanted.
2. Modify `drillbit.sh` to call a `drill-check.sh` script, if it exists, just prior to launching Drill.
3. In the vendor-specific `distrib-env.sh` file, do the check proposed here.

The only change needed in Apache Drill is step 2. Then each vendor can add the checks if they don't provide a resource manager. Those vendors (or users) that use YARN or Mesos or whatever don't need the checks because they have overall tools that solves the problem for them.

Thanks!

[GitHub] drill issue #922: DRILL-5741: During startup Drill should not exceed the ava...

Posted by paul-rogers <gi...@git.apache.org>.

Github user paul-rogers commented on the issue:

    https://github.com/apache/drill/pull/922
  
    -1
    
    This seems like a good idea. But, I don't think this is the way we should go for reasons identified in the JIRA.
    
    For one, we count on the user to provide a parameter to describe their system memory to work around a problem where the user provides another parameter that is not consistent with their system memory. Why would the user set one correctly but the other wrong?
    
    This change overly complicates the shell scripts. This change conflicts with the way that YARN, Warden and other tools allocate memory.
    
    A better solution would be:
    
    1. When the Drillbit starts, obtain (from the system) the available memory. (Or, if running under YARN, obtain the YARN-allocated memory from an env var. Same for Warden or Mesos or...)
    2. Do the math on the memory allocations.
    3. If the memory is too large, write a warning to the log file.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill issue #922: DRILL-5741: During startup Drill should not exceed the ava...

Posted by paul-rogers <gi...@git.apache.org>.

Github user paul-rogers commented on the issue:

    https://github.com/apache/drill/pull/922
  
    @kkhatua, thanks for the note. No doubt the proposed model helps in _some_ cases. The problem is, it adds complexity for other cases.
    
    Perhaps offer the change as an additional script that can be called from `drill_env.sh` for those sites that need it, rather than making it part of the core, common script. That way, this solution is orthogonal to, rather than part of, a solution driven by a resource manager such as YARN, Mesos or Warden.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---