You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Yida Wu (Code Review)" <ge...@cloudera.org> on 2020/06/14 21:51:26 UTC

[Impala-ASF-CR] IMPALA-9829: Add metrics for Spilling to S3

Yida Wu has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16078


Change subject: IMPALA-9829: Add metrics for Spilling to S3
......................................................................

IMPALA-9829: Add metrics for Spilling to S3

Metrics of Spilling to HDFS and S3 are added when the HDFS or S3 path has been passed in scatch_dir, like --scratch_dirs='hdfs://hdfspath, s3a://s3path'.

Three types of metrics added in this case:
1. tmp-file-mgr.s3/hdfs.write-latency, unit: ns
2. tmp-file-mgr.s3/hdfs.write-size, unit: Bytes
3. tmp-file-mgr.s3/hdfs.write-io-error

It is a mock function for spill to s3, more metrics or test cases might be added later. Also, the remote paths should be verified to be valid later.

Limitations:
1. No support for HDFS 'port number' by now, due to the current delimiter ':' may conflict the use of 'port number', the rule of the format might be changed later.
2. Only one HDFS disk and one S3 disk can be supported by now.

Change-Id: Ia8d15c691d52af00c51af489cb67386772f3dec4
---
M be/src/runtime/tmp-file-mgr-test.cc
M be/src/runtime/tmp-file-mgr.cc
M be/src/runtime/tmp-file-mgr.h
M be/src/util/hdfs-util.cc
M be/src/util/hdfs-util.h
M common/thrift/metrics.json
6 files changed, 315 insertions(+), 72 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/16078/1
-- 
To view, visit http://gerrit.cloudera.org:8080/16078
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ia8d15c691d52af00c51af489cb67386772f3dec4
Gerrit-Change-Number: 16078
Gerrit-PatchSet: 1
Gerrit-Owner: Yida Wu <wy...@gmail.com>

[Impala-ASF-CR] IMPALA-9829: Add metrics for Spilling to S3

Posted by "Yida Wu (Code Review)" <ge...@cloudera.org>.
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/16078 )

Change subject: IMPALA-9829: Add metrics for Spilling to S3
......................................................................


Patch Set 1:

(18 comments)

http://gerrit.cloudera.org:8080/#/c/16078/1/be/src/runtime/tmp-file-mgr-test.cc
File be/src/runtime/tmp-file-mgr-test.cc:

http://gerrit.cloudera.org:8080/#/c/16078/1/be/src/runtime/tmp-file-mgr-test.cc@1054
PS1, Line 1054:         metrics_->FindMetricForTesting<HistogramMetric>("tmp-file-mgr.hdfs.write-latency");
> line too long (91 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/16078/1/be/src/runtime/tmp-file-mgr-test.cc@1090
PS1, Line 1090:         metrics_->FindMetricForTesting<HistogramMetric>("tmp-file-mgr.hdfs.write-latency");
> line too long (91 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/16078/1/be/src/runtime/tmp-file-mgr-test.cc@1122
PS1, Line 1122:         metrics_->FindMetricForTesting<HistogramMetric>("tmp-file-mgr.hdfs.write-latency");
> line too long (91 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/16078/1/be/src/runtime/tmp-file-mgr.cc
File be/src/runtime/tmp-file-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/16078/1/be/src/runtime/tmp-file-mgr.cc@242
PS1, Line 242:        metrics->RegisterMetric(new HistogramMetric(MetricDefs::Get(TMP_FILE_MGR_HDFS_WRITE_LATENCY, ""), 
> line too long (105 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/16078/1/be/src/runtime/tmp-file-mgr.cc@242
PS1, Line 242:        metrics->RegisterMetric(new HistogramMetric(MetricDefs::Get(TMP_FILE_MGR_HDFS_WRITE_LATENCY, ""), 
> line has trailing whitespace
Done


http://gerrit.cloudera.org:8080/#/c/16078/1/be/src/runtime/tmp-file-mgr.cc@245
PS1, Line 245:         metrics->RegisterMetric(new HistogramMetric(MetricDefs::Get(TMP_FILE_MGR_HDFS_WRITE_SIZE, ""), 
> line has trailing whitespace
Done


http://gerrit.cloudera.org:8080/#/c/16078/1/be/src/runtime/tmp-file-mgr.cc@245
PS1, Line 245:         metrics->RegisterMetric(new HistogramMetric(MetricDefs::Get(TMP_FILE_MGR_HDFS_WRITE_SIZE, ""), 
> line too long (103 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/16078/1/be/src/runtime/tmp-file-mgr.cc@249
PS1, Line 249:       tmp_dirs_.emplace_back(tmp_path.string(), tmp_dirs[i].bytes_limit, tmp_dirs[i].bytes_used_metric);  
> line has trailing whitespace
Done


http://gerrit.cloudera.org:8080/#/c/16078/1/be/src/runtime/tmp-file-mgr.cc@249
PS1, Line 249:       tmp_dirs_.emplace_back(tmp_path.string(), tmp_dirs[i].bytes_limit, tmp_dirs[i].bytes_used_metric);  
> line too long (106 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/16078/1/be/src/runtime/tmp-file-mgr.cc@254
PS1, Line 254:         metrics->RegisterMetric(new HistogramMetric(MetricDefs::Get(TMP_FILE_MGR_S3_WRITE_LATENCY, ""), 
> line too long (104 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/16078/1/be/src/runtime/tmp-file-mgr.cc@254
PS1, Line 254:         metrics->RegisterMetric(new HistogramMetric(MetricDefs::Get(TMP_FILE_MGR_S3_WRITE_LATENCY, ""), 
> line has trailing whitespace
Done


http://gerrit.cloudera.org:8080/#/c/16078/1/be/src/runtime/tmp-file-mgr.cc@257
PS1, Line 257:         metrics->RegisterMetric(new HistogramMetric(MetricDefs::Get(TMP_FILE_MGR_S3_WRITE_SIZE, ""), 
> line has trailing whitespace
Done


http://gerrit.cloudera.org:8080/#/c/16078/1/be/src/runtime/tmp-file-mgr.cc@257
PS1, Line 257:         metrics->RegisterMetric(new HistogramMetric(MetricDefs::Get(TMP_FILE_MGR_S3_WRITE_SIZE, ""), 
> line too long (101 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/16078/1/be/src/runtime/tmp-file-mgr.cc@261
PS1, Line 261:       tmp_dirs_.emplace_back(tmp_path.string(), tmp_dirs[i].bytes_limit, tmp_dirs[i].bytes_used_metric);          
> line too long (114 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/16078/1/be/src/runtime/tmp-file-mgr.cc@261
PS1, Line 261:       tmp_dirs_.emplace_back(tmp_path.string(), tmp_dirs[i].bytes_limit, tmp_dirs[i].bytes_used_metric);          
> line has trailing whitespace
Done


http://gerrit.cloudera.org:8080/#/c/16078/1/be/src/runtime/tmp-file-mgr.cc@290
PS1, Line 290:           LOG(INFO) << "Using scratch directory " << scratch_subdir_path.string() << " on "
> line too long (91 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/16078/1/be/src/util/hdfs-util.cc
File be/src/util/hdfs-util.cc:

http://gerrit.cloudera.org:8080/#/c/16078/1/be/src/util/hdfs-util.cc@76
PS1, Line 76: bool IsSpecificPath(const char* path, const char* specific_prefix, bool check_default_fs) {
> line too long (91 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/16078/1/be/src/util/hdfs-util.cc@79
PS1, Line 79:     return strncmp(ExecEnv::GetInstance()->default_fs().c_str(), specific_prefix, prefix_len) == 0;
> line too long (99 > 90)
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/16078
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8d15c691d52af00c51af489cb67386772f3dec4
Gerrit-Change-Number: 16078
Gerrit-PatchSet: 1
Gerrit-Owner: Yida Wu <wy...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Yida Wu <wy...@gmail.com>
Gerrit-Comment-Date: Mon, 15 Jun 2020 13:45:17 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9829: Add metrics for Spilling to S3

Posted by "Yida Wu (Code Review)" <ge...@cloudera.org>.
Yida Wu has abandoned this change. ( http://gerrit.cloudera.org:8080/16078 )

Change subject: IMPALA-9829: Add metrics for Spilling to S3
......................................................................


Abandoned

open a new commit
-- 
To view, visit http://gerrit.cloudera.org:8080/16078
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: Ia8d15c691d52af00c51af489cb67386772f3dec4
Gerrit-Change-Number: 16078
Gerrit-PatchSet: 2
Gerrit-Owner: Yida Wu <wy...@gmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Yida Wu <wy...@gmail.com>

[Impala-ASF-CR] IMPALA-9829: Add metrics for Spilling to S3

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16078 )

Change subject: IMPALA-9829: Add metrics for Spilling to S3
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/6326/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/16078
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8d15c691d52af00c51af489cb67386772f3dec4
Gerrit-Change-Number: 16078
Gerrit-PatchSet: 2
Gerrit-Owner: Yida Wu <wy...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Yida Wu <wy...@gmail.com>
Gerrit-Comment-Date: Mon, 15 Jun 2020 14:03:08 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9829: Add metrics for Spilling to S3

Posted by "Yida Wu (Code Review)" <ge...@cloudera.org>.
Yida Wu has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/16078 )

Change subject: IMPALA-9829: Add metrics for Spilling to S3
......................................................................

IMPALA-9829: Add metrics for Spilling to S3

Metrics of Spilling to HDFS and S3 are added when the HDFS or S3 path has been passed in scatch_dir, like --scratch_dirs='hdfs://hdfspath, s3a://s3path'.

Three types of metrics added in this case:
1. tmp-file-mgr.s3/hdfs.write-latency, unit: ns
2. tmp-file-mgr.s3/hdfs.write-size, unit: Bytes
3. tmp-file-mgr.s3/hdfs.write-io-error

It is a mock function for spill to s3, more metrics or test cases might be added later. Also, the remote paths should be verified to be valid later.

Limitations:
1. No support for HDFS 'port number' by now, due to the current delimiter ':' may conflict the use of 'port number', the rule of the format might be changed later.
2. Only one HDFS disk and one S3 disk can be supported by now.

Change-Id: Ia8d15c691d52af00c51af489cb67386772f3dec4
---
M be/src/runtime/tmp-file-mgr-test.cc
M be/src/runtime/tmp-file-mgr.cc
M be/src/runtime/tmp-file-mgr.h
M be/src/util/hdfs-util.cc
M be/src/util/hdfs-util.h
M common/thrift/metrics.json
6 files changed, 318 insertions(+), 72 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/16078/2
-- 
To view, visit http://gerrit.cloudera.org:8080/16078
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8d15c691d52af00c51af489cb67386772f3dec4
Gerrit-Change-Number: 16078
Gerrit-PatchSet: 2
Gerrit-Owner: Yida Wu <wy...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-9829: Add metrics for Spilling to S3

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16078 )

Change subject: IMPALA-9829: Add metrics for Spilling to S3
......................................................................


Patch Set 1:

(18 comments)

http://gerrit.cloudera.org:8080/#/c/16078/1/be/src/runtime/tmp-file-mgr-test.cc
File be/src/runtime/tmp-file-mgr-test.cc:

http://gerrit.cloudera.org:8080/#/c/16078/1/be/src/runtime/tmp-file-mgr-test.cc@1054
PS1, Line 1054:         metrics_->FindMetricForTesting<HistogramMetric>("tmp-file-mgr.hdfs.write-latency");
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/16078/1/be/src/runtime/tmp-file-mgr-test.cc@1090
PS1, Line 1090:         metrics_->FindMetricForTesting<HistogramMetric>("tmp-file-mgr.hdfs.write-latency");
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/16078/1/be/src/runtime/tmp-file-mgr-test.cc@1122
PS1, Line 1122:         metrics_->FindMetricForTesting<HistogramMetric>("tmp-file-mgr.hdfs.write-latency");
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/16078/1/be/src/runtime/tmp-file-mgr.cc
File be/src/runtime/tmp-file-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/16078/1/be/src/runtime/tmp-file-mgr.cc@242
PS1, Line 242:        metrics->RegisterMetric(new HistogramMetric(MetricDefs::Get(TMP_FILE_MGR_HDFS_WRITE_LATENCY, ""), 
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/16078/1/be/src/runtime/tmp-file-mgr.cc@242
PS1, Line 242:        metrics->RegisterMetric(new HistogramMetric(MetricDefs::Get(TMP_FILE_MGR_HDFS_WRITE_LATENCY, ""), 
line too long (105 > 90)


http://gerrit.cloudera.org:8080/#/c/16078/1/be/src/runtime/tmp-file-mgr.cc@245
PS1, Line 245:         metrics->RegisterMetric(new HistogramMetric(MetricDefs::Get(TMP_FILE_MGR_HDFS_WRITE_SIZE, ""), 
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/16078/1/be/src/runtime/tmp-file-mgr.cc@245
PS1, Line 245:         metrics->RegisterMetric(new HistogramMetric(MetricDefs::Get(TMP_FILE_MGR_HDFS_WRITE_SIZE, ""), 
line too long (103 > 90)


http://gerrit.cloudera.org:8080/#/c/16078/1/be/src/runtime/tmp-file-mgr.cc@249
PS1, Line 249:       tmp_dirs_.emplace_back(tmp_path.string(), tmp_dirs[i].bytes_limit, tmp_dirs[i].bytes_used_metric);  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/16078/1/be/src/runtime/tmp-file-mgr.cc@249
PS1, Line 249:       tmp_dirs_.emplace_back(tmp_path.string(), tmp_dirs[i].bytes_limit, tmp_dirs[i].bytes_used_metric);  
line too long (106 > 90)


http://gerrit.cloudera.org:8080/#/c/16078/1/be/src/runtime/tmp-file-mgr.cc@254
PS1, Line 254:         metrics->RegisterMetric(new HistogramMetric(MetricDefs::Get(TMP_FILE_MGR_S3_WRITE_LATENCY, ""), 
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/16078/1/be/src/runtime/tmp-file-mgr.cc@254
PS1, Line 254:         metrics->RegisterMetric(new HistogramMetric(MetricDefs::Get(TMP_FILE_MGR_S3_WRITE_LATENCY, ""), 
line too long (104 > 90)


http://gerrit.cloudera.org:8080/#/c/16078/1/be/src/runtime/tmp-file-mgr.cc@257
PS1, Line 257:         metrics->RegisterMetric(new HistogramMetric(MetricDefs::Get(TMP_FILE_MGR_S3_WRITE_SIZE, ""), 
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/16078/1/be/src/runtime/tmp-file-mgr.cc@257
PS1, Line 257:         metrics->RegisterMetric(new HistogramMetric(MetricDefs::Get(TMP_FILE_MGR_S3_WRITE_SIZE, ""), 
line too long (101 > 90)


http://gerrit.cloudera.org:8080/#/c/16078/1/be/src/runtime/tmp-file-mgr.cc@261
PS1, Line 261:       tmp_dirs_.emplace_back(tmp_path.string(), tmp_dirs[i].bytes_limit, tmp_dirs[i].bytes_used_metric);          
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/16078/1/be/src/runtime/tmp-file-mgr.cc@261
PS1, Line 261:       tmp_dirs_.emplace_back(tmp_path.string(), tmp_dirs[i].bytes_limit, tmp_dirs[i].bytes_used_metric);          
line too long (114 > 90)


http://gerrit.cloudera.org:8080/#/c/16078/1/be/src/runtime/tmp-file-mgr.cc@290
PS1, Line 290:           LOG(INFO) << "Using scratch directory " << scratch_subdir_path.string() << " on "
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/16078/1/be/src/util/hdfs-util.cc
File be/src/util/hdfs-util.cc:

http://gerrit.cloudera.org:8080/#/c/16078/1/be/src/util/hdfs-util.cc@76
PS1, Line 76: bool IsSpecificPath(const char* path, const char* specific_prefix, bool check_default_fs) {
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/16078/1/be/src/util/hdfs-util.cc@79
PS1, Line 79:     return strncmp(ExecEnv::GetInstance()->default_fs().c_str(), specific_prefix, prefix_len) == 0;
line too long (99 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/16078
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8d15c691d52af00c51af489cb67386772f3dec4
Gerrit-Change-Number: 16078
Gerrit-PatchSet: 1
Gerrit-Owner: Yida Wu <wy...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Sun, 14 Jun 2020 21:52:11 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9829: Add metrics for Spilling to S3

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16078 )

Change subject: IMPALA-9829: Add metrics for Spilling to S3
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/6318/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/16078
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8d15c691d52af00c51af489cb67386772f3dec4
Gerrit-Change-Number: 16078
Gerrit-PatchSet: 1
Gerrit-Owner: Yida Wu <wy...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Sun, 14 Jun 2020 22:20:43 +0000
Gerrit-HasComments: No