You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Riza Suminto (Jira)" <ji...@apache.org> on 2022/11/22 21:10:00 UTC
[jira] [Commented] (IMPALA-10001) Find good default value for SORT_RUN_BYTES_LIMIT
[ https://issues.apache.org/jira/browse/IMPALA-10001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17637456#comment-17637456 ]
Riza Suminto commented on IMPALA-10001:
---------------------------------------
Setting SORT_RUN_BYTES_LIMIT comes with a risk of unnecessarily spilling when the query can actually fit all data in memory.
We have been using 512MB in our tpcds-impala-kit script for sometime now:
[https://github.com/cloudera/impala-tpcds-kit/blob/d829fc392a70df8300a8d9fd265977fa078a2dab/scripts/impala-insert.sql#L8]
Got to chat with [~noemi] who has been experimenting with sort implementation a lot.
Generally we don't want to set SORT_RUN_BYTES_LIMIT too low as it can cause too frequent spilling. But we also don't want to set it too high such that the cost for in-memory sort + spilling an already too large sort-run can block for minutes. SORT_RUN_BYTES_LIMIT=2G might be ideal to balance in-memory sort time vs spill time.
> Find good default value for SORT_RUN_BYTES_LIMIT
> ------------------------------------------------
>
> Key: IMPALA-10001
> URL: https://issues.apache.org/jira/browse/IMPALA-10001
> Project: IMPALA
> Issue Type: Improvement
> Components: Perf Investigation
> Reporter: Riza Suminto
> Priority: Minor
>
> IMPALA-6692 add query option SORT_RUN_BYTES_LIMIT to trigger early sort before the query hit memory limit.
> Currently, it is disabled as default. We need to find a good default value for this query option.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org