You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2019/08/07 03:28:00 UTC

[jira] [Commented] (IMPALA-8376) Add per-directory limits for scratch disk usage

    [ https://issues.apache.org/jira/browse/IMPALA-8376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901664#comment-16901664 ] 

ASF subversion and git services commented on IMPALA-8376:
---------------------------------------------------------

Commit 411189a8d733a66c363c72f8c404123d68640a3e in impala's branch refs/heads/master from Tim Armstrong
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=411189a ]

IMPALA-8376: directory limits for scratch usage

This extends the --scratch_dirs syntax to support specifying a max
capacity per directory, similarly to the --data_cache confirmation.
The capacity is delimited from the directory name with ":" and
uses the usual syntax for specifying memory. The following are
valid arguments:
* --scratch_dirs=/dir1,/dir2 (no limits)
* --scratch_dirs=/dir1,/dir2:25G (only a limit on /dir2)
* --scratch_dirs=/dir1:5MB,/dir2 (only a limit on /dir)
* --scratch_dirs=/dir1:-1,/dir2:0 (alternative ways of
  expressing no limit)

The usage is tracked with a metric per directory. Allocations
from that directory start to fail when the limit is exceeded.
These metrics are exposed as
tmp-file-mgr.scratch-space-bytes-used.dir-0,
tmp-file-mgr.scratch-space-bytes-used.dir-1, etc.

Also add support for parsing terabyte specifiers to a utility
function that is used for parsing many configurations.

Testing:
Added a unit test to exercise TmpFileMgr.

Manually ran a spilling query on an impalad with multiple scratch dirs
configured with different limits. Confirmed via metrics that the
capacities were enforced.

Change-Id: I696146a65dbb97f1ba200ae472358ae2db6eb441
Reviewed-on: http://gerrit.cloudera.org:8080/13986
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Add per-directory limits for scratch disk usage
> -----------------------------------------------
>
>                 Key: IMPALA-8376
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8376
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Backend
>            Reporter: Tim Armstrong
>            Assignee: Tim Armstrong
>            Priority: Major
>              Labels: resource-management
>             Fix For: Impala 3.3.0
>
>
> The current syntax is:
> {noformat}
> --scratch_dirs=/data/1/impala/impalad,/data/10/impala/impalad,/data/11/impala/impalad,/data/2/impala/impalad,/data/3/impala/impalad,/data/4/impala/impalad,/data/5/impala/impalad,/data/6/impala/impalad,/data/7/impala/impalad,/data/8/impala/impalad,/data/9/impala/impalad,/data/12/impala/impalad
> {noformat}
> The current syntax for the data cache is
> {noformat}
> --data_cache_dir=/tmp --data_cache_size=500MB
> {noformat}
> One idea is to allow optionally specifying the limit after each directory:
> {noformat}
> --scratch_dirs=/data/1/impala/impalad:500MB,/data/10/impala/impalad:2GB,/data/11/impala/impalad
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org