You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2019/05/10 09:35:25 UTC

[GitHub] [incubator-druid] clintropolis commented on a change in pull request #7629: Add basic tuning guide, getting started page, updated clustering docs

clintropolis commented on a change in pull request #7629: Add basic tuning guide, getting started page, updated clustering docs
URL: https://github.com/apache/incubator-druid/pull/7629#discussion_r282775078
 
 

 ##########
 File path: docs/content/operations/basic-cluster-tuning.md
 ##########
 @@ -0,0 +1,365 @@
+---
+layout: doc_page
+title: "Basic Cluster Tuning"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+# Basic Cluster Tuning
+
+This document provides basic guidelines for configuration properties and cluster architecture considerations related to performance tuning of an Apache Druid (incubating) deployment. 
+
+Please note that this document provides general guidelines and rules-of-thumb: these are not absolute, universal rules for cluster tuning, and this introductory guide is not an exhaustive description of all Druid tuning properties, which are described in the [configuration reference](../configuration/index.html).
+
+If you have questions on tuning Druid for specific use cases, or questions on configuration properties not covered in this guide, please ask the [Druid user mailing list or other community channels](https://druid.apache.org/community/).
+
+## Process-specific guidelines
+
+### Historical
+
+#### Heap sizing
+
+The biggest contributions to heap usage on Historicals are:
+- Partial unmerged query results from segments
+- The stored maps for [lookups](../querying/lookups.html).
+
+A general rule-of-thumb for sizing the Historical heap is `(0.5GB * number of CPU cores)`, with an upper limit of ~24GB.
+
+This rule-of-thumb scales using the number of CPU cores as a convenient proxy for hardware size and level of concurrency (note: this formula is not a hard rule for sizing Historical heaps).
+
+Having a heap that is too large can result in excessively long GC collection pauses, the ~24GB upper limit is imposed to avoid this.
+
+If caching is enabled on Historicals, the cache is stored on heap, sized by `druid.cache.sizeInBytes`.
+
+Running out of heap on the Historicals can indicate misconfiguration or usage patterns that are overloading the cluster.
+
+##### Lookups
+
+If you are using lookups, calculate the total size of the lookup maps being loaded. 
+
+Druid performs an atomic swap when updating lookup maps (both the old map and the new map will exist in heap during the swap), so the maximum potential heap usage from lookup maps will be (2 * total size of all loaded lookups).
+
+Be sure to add `(2 * total size of all loaded lookups)` to your heap size in addition to the `(0.5GB * number of CPU cores)` guideline.
+
+#### Processing Threads and Buffers
+
+Please see the [General Guidelines for Processing Threads and Buffers](#general-guidelines-for-processing-threads-and-buffers) section for an overview of processing thread/buffer configuration.
+
+On Historicals:
+- `druid.processing.numThreads` should generally be set to `(number of cores - 1)`: a smaller value can result in CPU underutilization, while going over the number of cores can result in unnecessary CPU contention.
+- `druid.processing.buffer.sizeBytes` can be set to 500MB.
+- `druid.processing.numMergeBuffers`, a 1:4 ratio of  merge buffers to processing threads is a reasonable choice for general use.
+
+#### Direct Memory Sizing
+
+The processing and merge buffers described above are direct memory buffers.
+
+When a historical processes a query, it must open a set of segments for reading. This also requires some direct memory space, described in [segment decompression buffers](#segment-decompression).
+
+A formula for estimating direct memory usage follows:
+
+(`druid.processing.numThreads` + `druid.processing.numMergeBuffers` + 1) * `druid.processing.buffer.sizeBytes`
+
+The `+ 1` factor is a fuzzy estimate meant to account for the segment decompression buffers.
+
+#### Connection Pool Sizing
+
+Please see the [General Connection Pool Guidelines](#general-connection-pool-guidelines) section for an overview of connection pool configuration.
+
+For Historicals, `druid.server.http.numThreads` should be set to a value slightly higher than the sum of `druid.broker.http.numConnections` across all the Brokers in the cluster.
+
+Tuning the cluster so that each Historical can accept 50 queries and 10 non-queries is a reasonable starting point.
+
+#### Segment Cache Size
+
+`druid.server.maxSize` controls the total size of segment data that can be assigned by the Coordinator to a Historical.
+
+Segments are memory-mapped by Historical processes using any available free system memory (i.e., memory not used by the Historical heap/direct memory buffers or other processes on the system). Segments that are not currently in memory will be paged from disk when queried.
+
+Therefore, `druid.server.maxSize` should be set such that a Historical is not allocated an excessive amount of segment data. As the value of (`free system memory` / `druid.server.maxSize`) increases, a greater proportion of segments can be kept in memory, allowing for better query performance.
+
+#### Number of Historicals
+
+The number of Historicals needed in a cluster depends on how much data the cluster has. For good performance, you will want enough Historicals such that each Historical has a good (`free system memory` / `druid.server.maxSize`) ratio, as described in the segment cache size section above.
+
+Having a smaller number of big servers is generally better than having a large number of small servers, as long as you have enough fault tolerance for your use case.
+
+#### SSD storage
+
+We recommend using SSDs for storage on the Historicals, as they handle segment data stored on disk.
+
+#### Total Memory Usage
+
+To estimate total memory usage of the Historical under these guidelines:
+
+- Heap: `(0.5GB * number of CPU cores) + (2 * total size of lookup maps) + druid.cache.sizeInBytes`
+- Direct Memory: `(druid.processing.numThreads + druid.processing.numMergeBuffers + 1) * druid.processing.buffer.sizeBytes`
+
+### Broker
+
+#### Heap Sizing
+
+The biggest contributions to heap usage on Brokers are:
+- Partial unmerged query results from Historicals and Tasks
+- The segment timeline
 
 Review comment:
   It might be worth including that this also consists of like the locations of all the segments on all historicals and realtime tasks, but I'm not sure how to do that concisely.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org