You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@drill.apache.org by br...@apache.org on 2018/03/22 18:53:28 UTC

drill git commit: 1.13 doc updates - cgroups memory updates

Repository: drill
Updated Branches:
  refs/heads/gh-pages 828b4a597 -> f36b4e8eb


1.13 doc updates - cgroups memory updates


Project: http://git-wip-us.apache.org/repos/asf/drill/repo
Commit: http://git-wip-us.apache.org/repos/asf/drill/commit/f36b4e8e
Tree: http://git-wip-us.apache.org/repos/asf/drill/tree/f36b4e8e
Diff: http://git-wip-us.apache.org/repos/asf/drill/diff/f36b4e8e

Branch: refs/heads/gh-pages
Commit: f36b4e8eb7b7b9516516d651ac2c2b42a69ff06e
Parents: 828b4a5
Author: Bridget Bevens <bb...@maprtech.com>
Authored: Thu Mar 22 11:52:43 2018 -0700
Committer: Bridget Bevens <bb...@maprtech.com>
Committed: Thu Mar 22 11:52:43 2018 -0700

----------------------------------------------------------------------
 .../020-configuring-drill-memory.md             | 96 ++++++++++++++++----
 ...-configuring-cgroups-to-control-cpu-usage.md | 94 +++++++++++++++++++
 ...ndix-e-using-cgroups-to-control-cpu-usage.md | 94 -------------------
 3 files changed, 171 insertions(+), 113 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/drill/blob/f36b4e8e/_docs/configure-drill/020-configuring-drill-memory.md
----------------------------------------------------------------------
diff --git a/_docs/configure-drill/020-configuring-drill-memory.md b/_docs/configure-drill/020-configuring-drill-memory.md
index f2c4122..c9018bc 100644
--- a/_docs/configure-drill/020-configuring-drill-memory.md
+++ b/_docs/configure-drill/020-configuring-drill-memory.md
@@ -1,6 +1,6 @@
 ---
 title: "Configuring Drill Memory"
-date: 2018-03-14 00:58:05 UTC
+date: 2018-03-22 18:52:44 UTC
 parent: "Configure Drill"
 ---
 
@@ -11,42 +11,84 @@ Drill performs well when executing operations in memory instead of storing the o
 The JVM heap memory does not limit the amount of direct memory available in a Drillbit. The on-heap memory for Drill is typically set at 4-8G (default is 4), which should
 suffice because Drill avoids having data sit in heap memory.  
 
-The following sections describe how to modify the memory allocated to each Drillbit and queries:  
+The following sections describe how to allocate memory to Drillbits and queries, and how to enable bounds checking if you see performance issues:  
 
 ## Modifying Memory Allocated to a Drillbit  
 
-Modify the memory allocated to each Drillbit in a cluster in the Drillbit startup script, `<drill_installation_directory>/conf/drill-env.sh`. You must [restart Drill]({{ site.baseurl }}/docs/starting-drill-in-distributed-mode) after you modify the script.
+Modify the memory allocated to each Drillbit in a cluster in the Drillbit startup script, `<drill_installation_directory>/conf/drill-env.sh`. You must [restart Drill]({{ site.baseurl }}/docs/starting-drill-in-distributed-mode) after you modify the script.  
 
-{% include startnote.html %}If DRILL_MAX_DIRECT_MEMORY is not set, the limit depends on the amount of available direct memory.{% include endnote.html %}
+The `drill-env.sh` file contains the following options:
 
+    #export DRILLBIT_MAX_PROC_MEM=${DRILLBIT_MAX_PROC_MEM:-"13G"}
+    //Maximum cumulative memory allocated to the Drill process during startup. This option was introduced in Drill 1.13.
 
-The `drill-env.sh` file contains the following options:
+    #export DRILL_HEAP=${DRILL_HEAP:-"4G"}
+    //Maximum theoretical heap limit for the JVM per node.
+
+    #export DRILL_MAX_DIRECT_MEMORY=${DRILL_MAX_DIRECT_MEMORY:-"8G"}  
+    //Java direct memory limit per node.
+
+    #export DRILLBIT_CODE_CACHE_SIZE=${DRILLBIT_CODE_CACHE_SIZE:-"1G"} 
+    //Do not modify the DRILLBIT_CODE_CACHE_SIZE. The value for this parameter is auto-computed based on the heap size and cannot exceed 1GB. 
+
+{% include startnote.html %}If DRILL_MAX_DIRECT_MEMORY is not set, the limit depends on the amount of available direct memory.{% include endnote.html %}
 
-    #export DRILL_HEAP=${DRILL_HEAP:-"4G”}  
-    #export DRILL_MAX_DIRECT_MEMORY=${DRILL_MAX_DIRECT_MEMORY:-"8G"}
 
 To customize memory limits, uncomment the line needed and change the setting:  
 
-    export DRILL_HEAP=${DRILL_HEAP:-"<limit>”}
-    export DRILL_MAX_DIRECT_MEMORY=${DRILL_MAX_DIRECT_MEMORY:-“<limit>"}  
+    export DRILLBIT_MAX_PROC_MEM=${DRILLBIT_MAX_PROC_MEM:-"<limit>"}
+    export DRILL_HEAP=${DRILL_HEAP:-"<limit>"}
+    export DRILL_MAX_DIRECT_MEMORY=${DRILL_MAX_DIRECT_MEMORY:-"<limit>"}   
 
-DRILL_MAX_HEAP is the maximum theoretical heap limit for the JVM per node.  
-DRILL_MAX_DIRECT_MEMORY is the Java direct memory limit per node.  
 
-If performance is an issue, add -Dbounds=false, as shown in the following example:
+For example, if you set `DRILLBIT_MAX_PROC_MEM` to 40G, the total amount of memory allocated to the following memory parameters cannot exceed 40G:  
 
-    export DRILL_JAVA_OPTS="$DRILL_JAVA_OPTS -Dbounds=false"  
+       DRILL_HEAP=8G
+       DRILL_MAX_DIRECT_MEMORY=10G
+       DRILLBIT_CODE_CACHE_SIZE=1024M
 
-As of Drill 1.13, bounds checking for direct memory is disabled by default. To enable bounds checking for direct memory, use the DRILLBIT_JAVA_OPTS variable to pass the `drill.exec.memory.enable_unsafe_bounds_check` parameter in $DRILL_HOME/conf/drill-env.sh, as shown:  
+At startup, the auto-setup.sh script (introduced in Drill 1.13) performs a check to see if these memory parameters are declared. If the parameters are declared, the script performs a check to verify that the cumulative memory of the parameters does not exceed the value specified by `DRILLBIT_MAX_PROC_MEM`. If the cumulative memory exceeds the total amount of memory defined by `DRILLBIT_MAX_PROC_MEM`, Drill returns an error message with instructions. See the example below.
 
-    export DRILL_JAVA_OPTS="$DRILL_JAVA_OPTS -Ddrill.exec.memory.enable_unsafe_bounds_check=true"  
+If any of the memory parameters are undefined, the script adjusts the settings for the undefined parameters to ensure that the total memory allocated is within the value defined by `DRILLBIT_MAX_PROC_MEM`.
 
+By default, `DRILLBIT_MAX_PROC_MEM` is not defined. You can define `DRILLBIT_MAX_PROC_MEM` in KB, MB, or GB, as shown:  
 
-For earlier versions of Drill (prior to 1.13), bounds checking is enabled by default. To disable bounds checking, set the `drill.enable_unsafe_memory_access` parameter to true, as shown:  
+       DRILLBIT_MAX_PROC_MEM=13G
+       DRILLBIT_MAX_PROC_MEM=8192M
+       DRILLBIT_MAX_PROC_MEM=4194304K
+
+Alternatively, you can set `DRILLBIT_MAX_PROC_MEM` as a percentage of total memory:  
+
+       DRILLBIT_MAX_PROC_MEM=50%
+
+If you do not set this variable, it is disabled. If you set this variable, you can unset it to disable it.  
+
+**Example**  
+
+If a system has 48GB of free memory and you set the following parameters in drill-env.sh:  
+
+       DRILLBIT_MAX_PROC_MEM=25%
+       DRILL_HEAP=8G
+       DRILL_MAX_DIRECT_MEMORY=10G
+       DRILLBIT_CODE_CACHE_SIZE=1024M  
 
+The Drillbit fails on startup with the following messages:
 
-    export DRILL_JAVA_OPTS="$DRILL_JAVA_OPTS -Ddrill.enable_unsafe_memory_access=true"  
+       [WARN] 25% of System Memory (47 GB) translates to 12 GB
+       [ERROR] Unable to start Drillbit due to memory constraint violations Total Memory Requested : 19 GB 
+       Check and modify the settings or increase the maximum amount of memory permitted.
 
+If DRILLBIT_MAX_PROC_MEM is increased to 50%; the Drillbit starts up with the following warnings:  
+
+       [WARN] 50% of the system memory (48 GB) translates to 24 GB.
+       [WARN] You have an allocation of 4 GB that is currently unused from a total of 24 GB. 
+       You can increase your existing memory configuration to use this extra memory.  
+
+Additionally, if the available free memory is less than the allocation, the following additional warnings are provided under the assumption that the operating system will reclaim more free memory when required:
+
+       [WARN] Total Memory Allocation for Drillbit (19GB) exceeds available free memory (11GB).
+       [WARN] Drillbit will start up, but can potentially crash due to oversubscribing of system memory.  
+  
 
 ##Modifying Memory Allocated to Queries  
 
@@ -55,7 +97,7 @@ You can configure the amount of memory that Drill allocates to each query as a h
 
 If you modify the memory allocated per query and continue to experience out-of-memory errors, you can try reducing the value of the [`planner.width.max_per_node`]({{site.baseurl}}/docs/configuration-options-introduction/) option. Reducing the value of this option reduces the level of parallelism per node. However, this may increase the amount of time required for a query to complete.  
 
-Another option you can modify is the `drill.exec.memory.operator.output_batch_size` option, introduced in Drill 1.13. The  `drill.exec.memory.operator.output_batch_size` option limits the amount of memory that the Flatten, Merge Join, and External Sort operators allocate to outgoing batches. Limiting the memory allocated to outgoing batches can improve concurrency and prevent queries from failing with out-of-memory errors.
+You can also modify the `drill.exec.memory.operator.output_batch_size` option, introduced in Drill 1.13. The `drill.exec.memory.operator.output_batch_size` option limits the amount of memory that the Flatten, Merge Join, and External Sort operators allocate to outgoing batches. Limiting the memory allocated to outgoing batches can improve concurrency and prevent queries from failing with out-of-memory errors.
  
 The average row size of the outgoing batch (calculated from the incoming batch size) determines the number of rows that can fit into the available memory for the batch. If your queries fail with memory errors, reduce the value of the `drill.exec.memory.operator.output_batch_size` option to reduce the output batch size. 
 
@@ -65,5 +107,21 @@ The default value is 16777216 (16 MB). The maximum allowed value is 536870912 (5
 
 Use the ALTER SYSTEM SET command to change the settings, as shown:  
 
-       ALTER SYSTEM SET `drill.exec.memory.operator.output_batch_size` = <value>;
+       ALTER SYSTEM SET `drill.exec.memory.operator.output_batch_size` = <value>;  
+
+##Bounds Checking 
+
+If performance is an issue, add -Dbounds=false, as shown in the following example:
+
+    export DRILL_JAVA_OPTS="$DRILL_JAVA_OPTS -Dbounds=false"  
+
+As of Drill 1.13, bounds checking for direct memory is disabled by default. To enable bounds checking for direct memory, use the DRILLBIT_JAVA_OPTS variable to pass the `drill.exec.memory.enable_unsafe_bounds_check` parameter in $DRILL_HOME/conf/drill-env.sh, as shown:  
+
+    export DRILL_JAVA_OPTS="$DRILL_JAVA_OPTS -Ddrill.exec.memory.enable_unsafe_bounds_check=true"  
+
+
+For earlier versions of Drill (prior to 1.13), bounds checking is enabled by default. To disable bounds checking, set the `drill.enable_unsafe_memory_access` parameter to true, as shown:  
+
+
+    export DRILL_JAVA_OPTS="$DRILL_JAVA_OPTS -Ddrill.enable_unsafe_memory_access=true"
   
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/drill/blob/f36b4e8e/_docs/configure-drill/121-configuring-cgroups-to-control-cpu-usage.md
----------------------------------------------------------------------
diff --git a/_docs/configure-drill/121-configuring-cgroups-to-control-cpu-usage.md b/_docs/configure-drill/121-configuring-cgroups-to-control-cpu-usage.md
new file mode 100644
index 0000000..34c5c10
--- /dev/null
+++ b/_docs/configure-drill/121-configuring-cgroups-to-control-cpu-usage.md
@@ -0,0 +1,94 @@
+---
+title: "Configuring cgroups to Control CPU Usage"
+date: 2018-03-22 22:14:41 UTC  
+parent: "Configure Drill"
+---   
+
+Linux cgroups (control groups) enable you to limit system resources to defined user groups or processes. As of Drill 1.13, you can configure a cgroup for Drill to enforce CPU limits on the Drillbit service. You can set a CPU limit for the Drill cgroup on each Drill node in the /etc/cgconfig.conf file.
+
+You can set the CPU limit as a soft or hard limit, or both. The hard limit takes precedence over the soft limit. When Drill hits the hard limit, in-progress queries may not complete.  
+
+##CPU Limits  
+
+You set the soft and hard limits with parameters in the /etc/cgconfig.conf file. The following sections describe the parameters for soft and hard limits.  
+
+**Soft Limit Parameter**  
+You set the soft limit with the `cpu.shares` parameter. When you set a soft limit, Drill can exceed the CPU allocated if extra CPU is available for use on the system. Drill can continue to use CPU until there is contention with other processes over the CPU or Drill hits the hard limit.  
+
+**Hard Limit Parameters**  
+You set the hard limit with the `cpu.cfs_period_us` and `cpu.cfs_quota_us` parameters. The `cpu.cfs_period_us` and `cpu.cfs_quota_us` parameters set a hard limit on the amount of CPU time that the Drill process can use.  
+
+- **`cpu.cfs_period_us`**   
+The `cpu.cfs_quota_us` parameter specifies a segment of time (in microseconds represented by `us` for µs) for how often the access to CPU resources should be reallocated. For example, if tasks in a cgroup can access a single CPU for 0.2 seconds out of every 1 second, set cpu.cfs_quota_us to 200000 and cpu.cfs_period_us to 1000000. The upper limit of the `cpu.cfs_quota_us` parameter is 1 second and the lower limit is 1000 microseconds.    
+
+
+- **`cpu.cfs_quota_us`**  
+The `cpu.cfs_quota_us` parameter specifies the total amount of runtime (in microseconds represented by `us` for µs) for which all tasks in the Drill cgroup can run during one period (as defined by cpu.cfs_period_us). As soon as tasks in the Drill cgroup use the time specified by the quota, they are throttled for the remainder of the time specified by the period and not allowed to run until the next period. For example, if tasks in the Drill cgroup can access a single CPU for 0.2 seconds out of every 1 second, set `cpu.cfs_quota_us` to 200000 and `cpu.cfs_period_us` to 1000000. A value of -1 indicates that the group does not have any restrictions on CPU.  
+
+##Before You Begin
+Each Drill node must have the libcgroup package installed to configure CPU limits for a Drill cgroup. The libcgroup package installs the cgconfig service required to configure and manage the Drill cgroup.
+
+You can install the libcgroup package using the `yum install` command, as shown:  
+
+       yum install libcgroup  
+
+##Configuring CPU Limits
+Complete the following steps to set a hard and/or soft limit on Drill CPU usage for the Drill process running on the node:  
+
+1-Start the cgconfig service:  
+
+        service cgconfig start
+
+2-Add a cgroup for Drill in the /etc/cgconfig.conf file:    
+
+              group drillcpu {
+                     cpu {
+                            cpu.shares = 320;
+                            cpu.cfs_quota_us = 400000;
+                            cpu.cfs_period_us = 100000;
+                            }
+                     }  
+**Note:** The cgroup name is specific to the Drill cgroup and does not correlate with any other configuration. You can give this group any name you prefer. The name drillcpu is used as an example.  
+  
+In the configuration example, the `cpu.shares` parameter sets the soft limit. The other two parameters, `cpu.cfs_quota_us` and `cpu.cfs_period_us`, set the hard limit. If you prefer to set only one type of limit, remove the parameters that do not apply.  
+
+To set a soft limit, allocate a specific number of CPU shares to the Drill cgroup in the configuration. Calculate the CPU shares as:  
+
+       1024 (CPU allocated to Drill/Total available CPU)
+
+In the example, CPU shares was calculated as:  
+
+       1024 (10/32) = 320
+
+
+To set a hard limit, add limits to the `cpu.cfs_quota_us` and `cpu.cfs_period_us` parameters. In the configuration example, the Drill process can fully utilize 4 CPU.  
+
+**Note:** The hard limit parameter settings persist after each cgroup service restart. Alternatively, you can set the parameters at the session level using the following commands:  
+
+       echo 400000 > /cgroup/cpu/drillcpu/cpu.cfs_quota_us
+       echo 100000 > /cgroup/cpu/drillcpu/cpu.cfs_period_us
+
+3-(Optional) If you want the cgconfig service to automatically restart upon system reboots, run the following command:  
+
+       chkconfig cgconfig on  
+  
+4-Run the following command to add the Drill process ID (PID) to the /cgroup/cpu/drillcpu/cgroup.procs file, as shown:  
+
+       echo 25809 > /cgroup/cpu/drillcpu/cgroup.procs
+
+**Note:** You must perform this step each time a Drillbit restarts.  
+
+For additional information, refer to the following documentation:  
+- [https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html)  
+- [resource_management_guide/sect-cpu-example_usage](resource_management_guide/sect-cpu-example_usage)  
+- [https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html)
+- [resource_management_guide/sec-cpu_and_memory-use_case](resource_management_guide/sec-cpu_and_memory-use_case)
+
+
+
+
+
+
+
+
+

http://git-wip-us.apache.org/repos/asf/drill/blob/f36b4e8e/_docs/drill-on-yarn/094-appendix-e-using-cgroups-to-control-cpu-usage.md
----------------------------------------------------------------------
diff --git a/_docs/drill-on-yarn/094-appendix-e-using-cgroups-to-control-cpu-usage.md b/_docs/drill-on-yarn/094-appendix-e-using-cgroups-to-control-cpu-usage.md
deleted file mode 100644
index 3664e13..0000000
--- a/_docs/drill-on-yarn/094-appendix-e-using-cgroups-to-control-cpu-usage.md
+++ /dev/null
@@ -1,94 +0,0 @@
----
-title: "Appendix E: Using cgroups to Control CPU Usage"
-date:  
-parent: "Drill-on-YARN"
----   
-
-Linux cgroups (control groups) enable you to limit system resources to defined user groups or processes. As of Drill 1.13, you can configure a cgroup for Drill (running under YARN) to enforce CPU limits on the Drillbit service by setting the CPU limit on each Drill node in the /etc/cgconfig.conf file.
-
-You can set the CPU limit as a soft or hard limit, or both. The hard limit takes precedence over the soft limit. When Drill hits the hard limit, in-progress queries may not complete.  
-
-##CPU Limits  
-
-You set the soft and hard limits with parameters in the /etc/cgconfig.conf file. The following sections describe the parameters for soft and hard limits.  
-
-**Soft Limit Parameter**  
-You set the soft limit with the `cpu.shares` parameter. When you set a soft limit, Drill can exceed the CPU allocated if extra CPU is available for use on the system. Drill can continue to use CPU until there is contention with other processes over the CPU or Drill hits the hard limit.  
-
-**Hard Limit Parameters**  
-You set the hard limit with the `cpu.cfs_period_us` and `cpu.cfs_quota_us` parameters. The `cpu.cfs_period_us` and `cpu.cfs_quota_us` parameters set a hard limit on the amount of CPU time that the Drill process can use.  
-
-- **`cpu.cfs_period_us`**   
-The `cpu.cfs_quota_us` parameter specifies a segment of time (in microseconds represented by `us` for µs) for how often the access to CPU resources should be reallocated. For example, if tasks in a cgroup can access a single CPU for 0.2 seconds out of every 1 second, set cpu.cfs_quota_us to 200000 and cpu.cfs_period_us to 1000000. The upper limit of the `cpu.cfs_quota_us` parameter is 1 second and the lower limit is 1000 microseconds.    
-
-
-- **`cpu.cfs_quota_us`**  
-The `cpu.cfs_quota_us` parameter specifies the total amount of runtime (in microseconds represented by `us` for µs) for which all tasks in the Drill cgroup can run during one period (as defined by cpu.cfs_period_us). As soon as tasks in the Drill cgroup use the time specified by the quota, they are throttled for the remainder of the time specified by the period and not allowed to run until the next period. For example, if tasks in the Drill cgroup can access a single CPU for 0.2 seconds out of every 1 second, set `cpu.cfs_quota_us` to 200000 and `cpu.cfs_period_us` to 1000000. A value of -1 indicates that the group does not have any restrictions on CPU.  
-
-##Before You Begin
-Each Drill node must have the libcgroup package installed to configure CPU limits for a Drill cgroup. The libcgroup package installs the cgconfig service required to configure and manage the Drill cgroup.
-
-You can install the libcgroup package using the `yum install` command, as shown:  
-
-       yum install libcgroup  
-
-##Configuring CPU Limits
-Complete the following steps to set a hard and/or soft limit on Drill CPU usage for the Drill process running on the node:  
-
-1-Start the cgconfig service:  
-
-        service cgconfig start
-
-2-Add a cgroup for Drill in the /etc/cgconfig.conf file:    
-
-              group drillcpu {
-                     cpu {
-                            cpu.shares = 320;
-                            cpu.cfs_quota_us = 400000;
-                            cpu.cfs_period_us = 100000;
-                            }
-                     }  
-**Note:** The cgroup name is specific to the Drill cgroup and does not correlate with any other configuration. You can give this group any name you prefer. The name drillcpu is used as an example.  
-  
-In the configuration example, the `cpu.shares` parameter sets the soft limit. The other two parameters, `cpu.cfs_quota_us` and `cpu.cfs_period_us`, set the hard limit. If you prefer to set only one type of limit, remove the parameters that do not apply.  
-
-To set a soft limit, allocate a specific number of CPU shares to the Drill cgroup in the configuration. Calculate the CPU shares as:  
-
-       1024 (CPU allocated to Drill/Total available CPU)
-
-In the example, CPU shares was calculated as:  
-
-       1024 (10/32) = 320
-
-
-To set a hard limit, add limits to the `cpu.cfs_quota_us` and `cpu.cfs_period_us` parameters. In the configuration example, the Drill process can fully utilize 4 CPU.  
-
-**Note:** The hard limit parameter settings persist after each cgroup service restart. Alternatively, you can set the parameters at the session level using the following commands:  
-
-       echo 400000 > /cgroup/cpu/drillcpu/cpu.cfs_quota_us
-       echo 100000 > /cgroup/cpu/drillcpu/cpu.cfs_period_us
-
-3-(Optional) If you want the cgconfig service to automatically restart upon system reboots, run the following command:  
-
-       chkconfig cgconfig on  
-  
-4-Run the following command to add the Drill process ID (PID) to the /cgroup/cpu/drillcpu/cgroup.procs file, as shown:  
-
-       echo 25809 > /cgroup/cpu/drillcpu/cgroup.procs
-
-**Note:** You must perform this step each time a Drillbit restarts.  
-
-For additional information, refer to the following documentation:  
-- [https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html)  
-- [resource_management_guide/sect-cpu-example_usage](resource_management_guide/sect-cpu-example_usage)  
-- [https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html)
-- [resource_management_guide/sec-cpu_and_memory-use_case](resource_management_guide/sec-cpu_and_memory-use_case)
-
-
-
-
-
-
-
-
-