You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@carbondata.apache.org by ja...@apache.org on 2019/12/30 09:42:02 UTC
[carbondata] branch master updated: [DOC][FAQ] add faq for how to deal with slow task

This is an automated email from the ASF dual-hosted git repository.

jackylk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/carbondata.git


The following commit(s) were added to refs/heads/master by this push:
     new 1ded13e  [DOC][FAQ] add faq for how to deal with slow task
1ded13e is described below

commit 1ded13efa0a00f9b04b0714292aedc738b2f2d8d
Author: litao <li...@126.com>
AuthorDate: Wed Dec 18 20:25:23 2019 +0800

    [DOC][FAQ] add faq for how to deal with slow task
    
    This closes #3514
---
 docs/faq.md | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/docs/faq.md b/docs/faq.md
index 9ba7082..16cdfa5 100644
--- a/docs/faq.md
+++ b/docs/faq.md
@@ -29,6 +29,7 @@
 * [Why all executors are showing success in Spark UI even after Dataload command failed at Driver side?](#why-all-executors-are-showing-success-in-spark-ui-even-after-dataload-command-failed-at-driver-side)
 * [Why different time zone result for select query output when query SDK writer output?](#why-different-time-zone-result-for-select-query-output-when-query-sdk-writer-output)
 * [How to check LRU cache memory footprint?](#how-to-check-lru-cache-memory-footprint)
+* [How to deal with the trailing task in query?](#How-to-deal-with-the-trailing-task-in-query)
 
 # TroubleShooting
 
@@ -227,6 +228,29 @@ This property will enable the DEBUG log for the CarbonLRUCache and UnsafeMemoryM
 **Note:** If  `Removed entry from InMemory LRU cache` are frequently observed in logs, you may have to increase the configured LRU size.
 
 To observe the LRU cache from heap dump, check the heap used by CarbonLRUCache class.
+
+## How to deal with the trailing task in query?
+
+When tuning query performance, user may found that a few tasks slow down the overall query progress.  To improve performance in such case, user can set spark.locality.wait and spark.speculation=true to enable speculation in spark, which will launch multiple task and get the result the one of the task which is finished first. Besides, user can also consider following configurations to further improve performance in this case.
+
+**Example:**
+
+```
+spark.locality.wait = 500
+spark.speculation = true
+spark.speculation.quantile = 0.75
+spark.speculation.multiplier = 5
+spark.blacklist.enabled = false
+```
+
+**Note:** 
+
+spark.locality control data locality the value of 500 is used to shorten the waiting time of spark. 
+
+spark.speculation is a group of configuration, that can monitor trailing tasks and start new tasks when conditions are met.
+
+spark.blacklist.enabled, avoid reduction of available executors due to blacklist mechanism.
+
 ## Getting tablestatus.lock issues When loading data
 
   **Symptom**