You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2020/06/26 15:41:00 UTC

[jira] [Commented] (IMPALA-9697) Support priority based scratch directory selection

    [ https://issues.apache.org/jira/browse/IMPALA-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17146412#comment-17146412 ] 

ASF subversion and git services commented on IMPALA-9697:
---------------------------------------------------------

Commit 3b9ae415e22296683fd905e590c02fe7de3d668c in impala's branch refs/heads/master from Abhishek Rawat
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=3b9ae41 ]

IMPALA-9697: Support priority based scratch directory selection

The '--scratch_dirs' configuration option now supports specifying the
priority of the scratch direcotry. The lower the numeric value, the
higher is the priority. If priority is not specified then default
priority with value numeric_limits<int>::max() is used.

Valid formats for specifying the priority are:
- <dir-path>:<limit>:<priority>
- <dir-path>::<priority>
Following formats use default priority:
- <dir-path>
- <dir-path>:<limit>
- <dir-path>:<limit>:

The new logic in TmpFileGroup::AllocateSpace() tries to find a target
file using a prioritized round-robin scheme. Files are ordered in
decreasing order of their priority. The priority of a file is same as
the priority of the related directory. A target file is selected by
always searching in the ordered list starting from the file with highest
priority. If multiple files have same priority, then the target file is
selected in a round robin manner.

Testing:
- Added unit and e2e tests for priority based spilling logic.

Change-Id: I381c3a358e1382e6696325fec74667f1fa18dd17
Reviewed-on: http://gerrit.cloudera.org:8080/16091
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Support priority based scratch directory selection 
> ---------------------------------------------------
>
>                 Key: IMPALA-9697
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9697
>             Project: IMPALA
>          Issue Type: Task
>            Reporter: Abhishek Rawat
>            Assignee: Abhishek Rawat
>            Priority: Major
>             Fix For: Impala 4.0
>
>
> The `‑‑scratch_dirs` startup flag uses the given scratch directories in a round robin manner. This may not always be ideal since these directories could come from different class of storage system volumes having different performance characteristics (SSD vs HDD, local storage vs network attached storage, etc.). Giving user an option to configure the priority of their scratch directories could help them optimize their workload based on their storage system configuration.
> One possible way could be that the user pass the priority as part of the `–scratch_dirs` startup flag using <directory>:<spill_priority>. The directories will be selected for spilling based on their priorities and if multiple directories have the same priority then they will be selected in a round robin fashion. In the below example, dir1 will be used as a spill victim until its full and then dir2, dir3, and dir4 will be used in a round robin fashion.
> {code:java}
> ‑‑scratch_dirs="dir1:200GB:0, dir2:1024GB:1, dir3:1024GB:1, dir4:1024GB:1"{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org