You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Abhishek Rawat (Code Review)" <ge...@cloudera.org> on 2020/06/18 00:07:49 UTC

[Impala-ASF-CR] IMPALA-9697: Support priority based scratch directory selection

Abhishek Rawat has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16091


Change subject: IMPALA-9697: Support priority based scratch directory selection
......................................................................

IMPALA-9697: Support priority based scratch directory selection

The '--scratch_dirs' configuration option now supports specifying the
priority of the scratch direcotry. The lower the numeric value, the
higher is the priority. If priority is not specified then default
priority with value numeric_limits<int>::max() is used.

Valid formats for specifying the priority are:
- <dir-path>:<limit>:<priority>
- <dir-path>::<priority>
Following formats use default priority:
- <dir-path>
- <dir-path>:<limit>
- <dir-path>:<limit>:

The new logic in TmpFileGroup::AllocateSpace() tries to find a target
file using a prioritized round-robin scheme. Files are ordered in
decreasing order of their priority. The priority of a file is same as
the priority of the related directory. A target file is selected by
always searching in the ordered list starting from the file with highest
priority. If multiple files have same priority, then the target file is
selected in a round robin manner.

Testing:
- Added unit and e2e tests for priority based spilling logic.

Change-Id: I381c3a358e1382e6696325fec74667f1fa18dd17
---
M be/src/runtime/tmp-file-mgr-test.cc
M be/src/runtime/tmp-file-mgr.cc
M be/src/runtime/tmp-file-mgr.h
M tests/custom_cluster/test_scratch_disk.py
4 files changed, 363 insertions(+), 47 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/16091/1
-- 
To view, visit http://gerrit.cloudera.org:8080/16091
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I381c3a358e1382e6696325fec74667f1fa18dd17
Gerrit-Change-Number: 16091
Gerrit-PatchSet: 1
Gerrit-Owner: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-9697: Support priority based scratch directory selection

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16091 )

Change subject: IMPALA-9697: Support priority based scratch directory selection
......................................................................


Patch Set 4: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/16091
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I381c3a358e1382e6696325fec74667f1fa18dd17
Gerrit-Change-Number: 16091
Gerrit-PatchSet: 4
Gerrit-Owner: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 25 Jun 2020 17:04:48 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9697: Support priority based scratch directory selection

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16091 )

Change subject: IMPALA-9697: Support priority based scratch directory selection
......................................................................


Patch Set 4: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6049/


-- 
To view, visit http://gerrit.cloudera.org:8080/16091
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I381c3a358e1382e6696325fec74667f1fa18dd17
Gerrit-Change-Number: 16091
Gerrit-PatchSet: 4
Gerrit-Owner: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 25 Jun 2020 22:09:04 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9697: Support priority based scratch directory selection

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16091 )

Change subject: IMPALA-9697: Support priority based scratch directory selection
......................................................................


Patch Set 6: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/16091
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I381c3a358e1382e6696325fec74667f1fa18dd17
Gerrit-Change-Number: 16091
Gerrit-PatchSet: 6
Gerrit-Owner: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 26 Jun 2020 05:19:13 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9697: Support priority based scratch directory selection

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16091 )

Change subject: IMPALA-9697: Support priority based scratch directory selection
......................................................................


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6049/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/16091
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I381c3a358e1382e6696325fec74667f1fa18dd17
Gerrit-Change-Number: 16091
Gerrit-PatchSet: 4
Gerrit-Owner: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 25 Jun 2020 17:04:49 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9697: Support priority based scratch directory selection

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16091 )

Change subject: IMPALA-9697: Support priority based scratch directory selection
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/6388/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/16091
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I381c3a358e1382e6696325fec74667f1fa18dd17
Gerrit-Change-Number: 16091
Gerrit-PatchSet: 2
Gerrit-Owner: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Mon, 22 Jun 2020 16:13:53 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9697: Support priority based scratch directory selection

Posted by "Abhishek Rawat (Code Review)" <ge...@cloudera.org>.
Abhishek Rawat has posted comments on this change. ( http://gerrit.cloudera.org:8080/16091 )

Change subject: IMPALA-9697: Support priority based scratch directory selection
......................................................................


Patch Set 2:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/16091/1/be/src/runtime/tmp-file-mgr.h
File be/src/runtime/tmp-file-mgr.h:

http://gerrit.cloudera.org:8080/#/c/16091/1/be/src/runtime/tmp-file-mgr.h@103
PS1, Line 103:     TmpDir(const std::string& path, int64_t bytes_limit, int priority,
> line too long (99 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/16091/1/be/src/runtime/tmp-file-mgr.h@104
PS1, Line 104:       IntGauge* bytes_used_metric)
> line too long (105 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/16091/1/be/src/runtime/tmp-file-mgr.cc
File be/src/runtime/tmp-file-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/16091/1/be/src/runtime/tmp-file-mgr.cc@194
PS1, Line 194:         LOG(ERROR) << "Malformed scratch directory capacity configuration '"
> line too long (92 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/16091/1/be/src/runtime/tmp-file-mgr.cc@207
PS1, Line 207:         LOG(ERROR) << "Malformed scratch directory priority configuration '"
> line too long (92 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/16091/1/be/src/runtime/tmp-file-mgr.cc@259
PS1, Line 259:         tmp_dirs_.emplace_back(scratch_subdir_path.string(), tmp_dirs[i]->bytes_limit,
> tab used for whitespace
Done


http://gerrit.cloudera.org:8080/#/c/16091/1/tests/custom_cluster/test_scratch_disk.py
File tests/custom_cluster/test_scratch_disk.py:

http://gerrit.cloudera.org:8080/#/c/16091/1/tests/custom_cluster/test_scratch_disk.py@20
PS1, Line 20: import os
> flake8: F401 'copy' imported but unused
Done


http://gerrit.cloudera.org:8080/#/c/16091/1/tests/custom_cluster/test_scratch_disk.py@230
PS1, Line 230: ,
> flake8: E231 missing whitespace after ','
Done


http://gerrit.cloudera.org:8080/#/c/16091/1/tests/custom_cluster/test_scratch_disk.py@231
PS1, Line 231: ,
> flake8: E231 missing whitespace after ','
Done


http://gerrit.cloudera.org:8080/#/c/16091/1/tests/custom_cluster/test_scratch_disk.py@232
PS1, Line 232: ,
> flake8: E231 missing whitespace after ','
Done


http://gerrit.cloudera.org:8080/#/c/16091/1/tests/custom_cluster/test_scratch_disk.py@233
PS1, Line 233: ,
> flake8: E231 missing whitespace after ','
Done


http://gerrit.cloudera.org:8080/#/c/16091/1/tests/custom_cluster/test_scratch_disk.py@234
PS1, Line 234: 
> flake8: E231 missing whitespace after ','
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/16091
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I381c3a358e1382e6696325fec74667f1fa18dd17
Gerrit-Change-Number: 16091
Gerrit-PatchSet: 2
Gerrit-Owner: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Mon, 22 Jun 2020 15:50:42 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9697: Support priority based scratch directory selection

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16091 )

Change subject: IMPALA-9697: Support priority based scratch directory selection
......................................................................


Patch Set 2:

(4 comments)

I think this basically looks good, thanks for adding all the tests.

http://gerrit.cloudera.org:8080/#/c/16091/2/be/src/runtime/tmp-file-mgr-test.cc
File be/src/runtime/tmp-file-mgr-test.cc:

http://gerrit.cloudera.org:8080/#/c/16091/2/be/src/runtime/tmp-file-mgr-test.cc@1126
PS2, Line 1126:   const int alloc_size = 1024;
Comment that each scratch directory can fit exactly one allocation?


http://gerrit.cloudera.org:8080/#/c/16091/2/be/src/runtime/tmp-file-mgr.h
File be/src/runtime/tmp-file-mgr.h:

http://gerrit.cloudera.org:8080/#/c/16091/2/be/src/runtime/tmp-file-mgr.h@116
PS2, Line 116:     int const priority;
nit: const int. Should be const int64_t for bytes_limit too.

'git blame' shows that I'm responsible for bytes_limit, but no idea why I wrote it that way.


http://gerrit.cloudera.org:8080/#/c/16091/2/be/src/runtime/tmp-file-mgr.cc
File be/src/runtime/tmp-file-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/16091/2/be/src/runtime/tmp-file-mgr.cc@216
PS2, Line 216:     tmp_dirs.emplace_back(std::move(tmp_dir));
Why not just directly emplace_back, i.e.

 tmp_dirs.emplace_back(new TmpDir(...));


http://gerrit.cloudera.org:8080/#/c/16091/2/be/src/runtime/tmp-file-mgr.cc@540
PS2, Line 540:       << "Invalid index range";
Maybe print the values in the DCHECK error



-- 
To view, visit http://gerrit.cloudera.org:8080/16091
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I381c3a358e1382e6696325fec74667f1fa18dd17
Gerrit-Change-Number: 16091
Gerrit-PatchSet: 2
Gerrit-Owner: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 24 Jun 2020 23:37:27 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9697: Support priority based scratch directory selection

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16091 )

Change subject: IMPALA-9697: Support priority based scratch directory selection
......................................................................


Patch Set 6:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6055/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/16091
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I381c3a358e1382e6696325fec74667f1fa18dd17
Gerrit-Change-Number: 16091
Gerrit-PatchSet: 6
Gerrit-Owner: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 26 Jun 2020 05:19:14 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9697: Support priority based scratch directory selection

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16091 )

Change subject: IMPALA-9697: Support priority based scratch directory selection
......................................................................


Patch Set 1:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/16091/1/be/src/runtime/tmp-file-mgr-test.cc
File be/src/runtime/tmp-file-mgr-test.cc:

http://gerrit.cloudera.org:8080/#/c/16091/1/be/src/runtime/tmp-file-mgr-test.cc@898
PS1, Line 898:                        "/tmp/tmp-file-mgr-test3:1234:3,/tmp/tmp-file-mgr-test4:99999999:4,"
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/16091/1/be/src/runtime/tmp-file-mgr-test.cc@899
PS1, Line 899:                        "/tmp/tmp-file-mgr-test5:200tb:5,/tmp/tmp-file-mgr-test6:100MB:6"));
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/16091/1/be/src/runtime/tmp-file-mgr.h
File be/src/runtime/tmp-file-mgr.h:

http://gerrit.cloudera.org:8080/#/c/16091/1/be/src/runtime/tmp-file-mgr.h@103
PS1, Line 103:     TmpDir(const std::string& path, int64_t bytes_limit, int priority, IntGauge* bytes_used_metric)
line too long (99 > 90)


http://gerrit.cloudera.org:8080/#/c/16091/1/be/src/runtime/tmp-file-mgr.h@104
PS1, Line 104:       : path(path), bytes_limit(bytes_limit), priority(priority), bytes_used_metric(bytes_used_metric) {}
line too long (105 > 90)


http://gerrit.cloudera.org:8080/#/c/16091/1/be/src/runtime/tmp-file-mgr.cc
File be/src/runtime/tmp-file-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/16091/1/be/src/runtime/tmp-file-mgr.cc@194
PS1, Line 194:         LOG(ERROR) << "Malformed scratch directory capacity configuration '" << tmp_dir_spec
line too long (92 > 90)


http://gerrit.cloudera.org:8080/#/c/16091/1/be/src/runtime/tmp-file-mgr.cc@207
PS1, Line 207:         LOG(ERROR) << "Malformed scratch directory priority configuration '" << tmp_dir_spec
line too long (92 > 90)


http://gerrit.cloudera.org:8080/#/c/16091/1/be/src/runtime/tmp-file-mgr.cc@259
PS1, Line 259: 	tmp_dirs_.emplace_back(scratch_subdir_path.string(), tmp_dirs[i]->bytes_limit,
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/16091/1/tests/custom_cluster/test_scratch_disk.py
File tests/custom_cluster/test_scratch_disk.py:

http://gerrit.cloudera.org:8080/#/c/16091/1/tests/custom_cluster/test_scratch_disk.py@20
PS1, Line 20: import copy
flake8: F401 'copy' imported but unused


http://gerrit.cloudera.org:8080/#/c/16091/1/tests/custom_cluster/test_scratch_disk.py@230
PS1, Line 230: ,
flake8: E231 missing whitespace after ','


http://gerrit.cloudera.org:8080/#/c/16091/1/tests/custom_cluster/test_scratch_disk.py@231
PS1, Line 231: ,
flake8: E231 missing whitespace after ','


http://gerrit.cloudera.org:8080/#/c/16091/1/tests/custom_cluster/test_scratch_disk.py@232
PS1, Line 232: ,
flake8: E231 missing whitespace after ','


http://gerrit.cloudera.org:8080/#/c/16091/1/tests/custom_cluster/test_scratch_disk.py@233
PS1, Line 233: ,
flake8: E231 missing whitespace after ','


http://gerrit.cloudera.org:8080/#/c/16091/1/tests/custom_cluster/test_scratch_disk.py@234
PS1, Line 234: ,
flake8: E231 missing whitespace after ','



-- 
To view, visit http://gerrit.cloudera.org:8080/16091
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I381c3a358e1382e6696325fec74667f1fa18dd17
Gerrit-Change-Number: 16091
Gerrit-PatchSet: 1
Gerrit-Owner: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Thu, 18 Jun 2020 00:08:37 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9697: Support priority based scratch directory selection

Posted by "Abhishek Rawat (Code Review)" <ge...@cloudera.org>.
Abhishek Rawat has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/16091 )

Change subject: IMPALA-9697: Support priority based scratch directory selection
......................................................................

IMPALA-9697: Support priority based scratch directory selection

The '--scratch_dirs' configuration option now supports specifying the
priority of the scratch direcotry. The lower the numeric value, the
higher is the priority. If priority is not specified then default
priority with value numeric_limits<int>::max() is used.

Valid formats for specifying the priority are:
- <dir-path>:<limit>:<priority>
- <dir-path>::<priority>
Following formats use default priority:
- <dir-path>
- <dir-path>:<limit>
- <dir-path>:<limit>:

The new logic in TmpFileGroup::AllocateSpace() tries to find a target
file using a prioritized round-robin scheme. Files are ordered in
decreasing order of their priority. The priority of a file is same as
the priority of the related directory. A target file is selected by
always searching in the ordered list starting from the file with highest
priority. If multiple files have same priority, then the target file is
selected in a round robin manner.

Testing:
- Added unit and e2e tests for priority based spilling logic.

Change-Id: I381c3a358e1382e6696325fec74667f1fa18dd17
---
M be/src/runtime/tmp-file-mgr-test.cc
M be/src/runtime/tmp-file-mgr.cc
M be/src/runtime/tmp-file-mgr.h
M tests/custom_cluster/test_scratch_disk.py
4 files changed, 368 insertions(+), 48 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/16091/2
-- 
To view, visit http://gerrit.cloudera.org:8080/16091
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I381c3a358e1382e6696325fec74667f1fa18dd17
Gerrit-Change-Number: 16091
Gerrit-PatchSet: 2
Gerrit-Owner: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-9697: Support priority based scratch directory selection

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16091 )

Change subject: IMPALA-9697: Support priority based scratch directory selection
......................................................................


Patch Set 5: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/16091
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I381c3a358e1382e6696325fec74667f1fa18dd17
Gerrit-Change-Number: 16091
Gerrit-PatchSet: 5
Gerrit-Owner: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 25 Jun 2020 22:45:20 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9697: Support priority based scratch directory selection

Posted by "Abhishek Rawat (Code Review)" <ge...@cloudera.org>.
Abhishek Rawat has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/16091 )

Change subject: IMPALA-9697: Support priority based scratch directory selection
......................................................................

IMPALA-9697: Support priority based scratch directory selection

The '--scratch_dirs' configuration option now supports specifying the
priority of the scratch direcotry. The lower the numeric value, the
higher is the priority. If priority is not specified then default
priority with value numeric_limits<int>::max() is used.

Valid formats for specifying the priority are:
- <dir-path>:<limit>:<priority>
- <dir-path>::<priority>
Following formats use default priority:
- <dir-path>
- <dir-path>:<limit>
- <dir-path>:<limit>:

The new logic in TmpFileGroup::AllocateSpace() tries to find a target
file using a prioritized round-robin scheme. Files are ordered in
decreasing order of their priority. The priority of a file is same as
the priority of the related directory. A target file is selected by
always searching in the ordered list starting from the file with highest
priority. If multiple files have same priority, then the target file is
selected in a round robin manner.

Testing:
- Added unit and e2e tests for priority based spilling logic.

Change-Id: I381c3a358e1382e6696325fec74667f1fa18dd17
---
M be/src/runtime/tmp-file-mgr-test.cc
M be/src/runtime/tmp-file-mgr.cc
M be/src/runtime/tmp-file-mgr.h
M tests/custom_cluster/test_scratch_disk.py
4 files changed, 370 insertions(+), 49 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/16091/3
-- 
To view, visit http://gerrit.cloudera.org:8080/16091
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I381c3a358e1382e6696325fec74667f1fa18dd17
Gerrit-Change-Number: 16091
Gerrit-PatchSet: 3
Gerrit-Owner: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-9697: Support priority based scratch directory selection

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16091 )

Change subject: IMPALA-9697: Support priority based scratch directory selection
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/6360/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/16091
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I381c3a358e1382e6696325fec74667f1fa18dd17
Gerrit-Change-Number: 16091
Gerrit-PatchSet: 1
Gerrit-Owner: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Thu, 18 Jun 2020 00:32:44 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9697: Support priority based scratch directory selection

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16091 )

Change subject: IMPALA-9697: Support priority based scratch directory selection
......................................................................


Patch Set 5:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6052/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/16091
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I381c3a358e1382e6696325fec74667f1fa18dd17
Gerrit-Change-Number: 16091
Gerrit-PatchSet: 5
Gerrit-Owner: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 25 Jun 2020 22:45:21 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9697: Support priority based scratch directory selection

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16091 )

Change subject: IMPALA-9697: Support priority based scratch directory selection
......................................................................

IMPALA-9697: Support priority based scratch directory selection

The '--scratch_dirs' configuration option now supports specifying the
priority of the scratch direcotry. The lower the numeric value, the
higher is the priority. If priority is not specified then default
priority with value numeric_limits<int>::max() is used.

Valid formats for specifying the priority are:
- <dir-path>:<limit>:<priority>
- <dir-path>::<priority>
Following formats use default priority:
- <dir-path>
- <dir-path>:<limit>
- <dir-path>:<limit>:

The new logic in TmpFileGroup::AllocateSpace() tries to find a target
file using a prioritized round-robin scheme. Files are ordered in
decreasing order of their priority. The priority of a file is same as
the priority of the related directory. A target file is selected by
always searching in the ordered list starting from the file with highest
priority. If multiple files have same priority, then the target file is
selected in a round robin manner.

Testing:
- Added unit and e2e tests for priority based spilling logic.

Change-Id: I381c3a358e1382e6696325fec74667f1fa18dd17
Reviewed-on: http://gerrit.cloudera.org:8080/16091
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/runtime/tmp-file-mgr-test.cc
M be/src/runtime/tmp-file-mgr.cc
M be/src/runtime/tmp-file-mgr.h
M tests/custom_cluster/test_scratch_disk.py
4 files changed, 370 insertions(+), 49 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/16091
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I381c3a358e1382e6696325fec74667f1fa18dd17
Gerrit-Change-Number: 16091
Gerrit-PatchSet: 7
Gerrit-Owner: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-9697: Support priority based scratch directory selection

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16091 )

Change subject: IMPALA-9697: Support priority based scratch directory selection
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/6416/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/16091
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I381c3a358e1382e6696325fec74667f1fa18dd17
Gerrit-Change-Number: 16091
Gerrit-PatchSet: 3
Gerrit-Owner: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 25 Jun 2020 01:41:50 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9697: Support priority based scratch directory selection

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16091 )

Change subject: IMPALA-9697: Support priority based scratch directory selection
......................................................................


Patch Set 5: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6052/


-- 
To view, visit http://gerrit.cloudera.org:8080/16091
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I381c3a358e1382e6696325fec74667f1fa18dd17
Gerrit-Change-Number: 16091
Gerrit-PatchSet: 5
Gerrit-Owner: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 26 Jun 2020 03:43:44 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9697: Support priority based scratch directory selection

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16091 )

Change subject: IMPALA-9697: Support priority based scratch directory selection
......................................................................


Patch Set 3: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16091/2/be/src/runtime/tmp-file-mgr.cc
File be/src/runtime/tmp-file-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/16091/2/be/src/runtime/tmp-file-mgr.cc@540
PS2, Line 540:     for (int index = start; index <= end; ++index) {
> Maybe print the values in the DCHECK error
Checked that this was fixed



-- 
To view, visit http://gerrit.cloudera.org:8080/16091
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I381c3a358e1382e6696325fec74667f1fa18dd17
Gerrit-Change-Number: 16091
Gerrit-PatchSet: 3
Gerrit-Owner: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 25 Jun 2020 17:04:35 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9697: Support priority based scratch directory selection

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16091 )

Change subject: IMPALA-9697: Support priority based scratch directory selection
......................................................................


Patch Set 6: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/16091
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I381c3a358e1382e6696325fec74667f1fa18dd17
Gerrit-Change-Number: 16091
Gerrit-PatchSet: 6
Gerrit-Owner: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 26 Jun 2020 10:20:39 +0000
Gerrit-HasComments: No