You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Jiawei Wang (Code Review)" <ge...@cloudera.org> on 2019/11/01 06:47:18 UTC

[Impala-ASF-CR] IMPALA-9110: Add table loading time break-down metrics for HdfsTable

Jiawei Wang has uploaded this change for review. ( http://gerrit.cloudera.org:8080/14611


Change subject: IMPALA-9110: Add table loading time break-down metrics for HdfsTable
......................................................................

IMPALA-9110: Add table loading time break-down metrics for HdfsTable

Catalog table loading currently only records the total loading
time. We will need some break-down times, i.e. more detailed
time recording on each loading function.

We added "table-schema-load-duration", "all-column-stats-load
-duration", "all-column-stats-load-duration". And we found that
"storage-metadata-load-duration" has already represent partition
and file metadata loading. So now we have a more detailed
breakdown time for table loading info.

Test:
Will add test after the first round review.

Change-Id: I5381f9316df588b2004876c6cd9fb7e674085b10
---
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
2 files changed, 60 insertions(+), 40 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/14611/1
-- 
To view, visit http://gerrit.cloudera.org:8080/14611
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I5381f9316df588b2004876c6cd9fb7e674085b10
Gerrit-Change-Number: 14611
Gerrit-PatchSet: 1
Gerrit-Owner: Jiawei Wang <ji...@cloudera.com>

[Impala-ASF-CR] IMPALA-9110: Add table loading time break-down metrics for HdfsTable

Posted by "Jiawei Wang (Code Review)" <ge...@cloudera.org>.
Hello Quanlong Huang, Vihang Karajgaonkar, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14611

to look at the new patch set (#2).

Change subject: IMPALA-9110: Add table loading time break-down metrics for HdfsTable
......................................................................

IMPALA-9110: Add table loading time break-down metrics for HdfsTable

Problem:
Catalog table loading currently only records the total loading
time. We will need some break-down times, i.e. more detailed
time recording on each loading function. Also, the table schema
loading is not taken into account for load-duration. We will need
to add some more metrics for that.

Solution:
- We added "hms-load-tbl-schema", "load-duration-all-column-stats".
Also, we logged the loadValidWriteIdList() time. And we found
that "storage-metadata-load-duration" has already represent
partition and file metadata loading. So now we have a more
detailed breakdown time for table loading info.
- Add PrintUtils.printTimeNs for PrettyPrint time in FrontEnd

Test:
1. Will add metrics test after the few rounds of code review.
2. Add Unit tests for PrintUtils.printTime() function

Change-Id: I5381f9316df588b2004876c6cd9fb7e674085b10
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
M fe/src/main/java/org/apache/impala/common/PrintUtils.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/util/PrintUtilsTest.java
8 files changed, 148 insertions(+), 30 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/14611/2
-- 
To view, visit http://gerrit.cloudera.org:8080/14611
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5381f9316df588b2004876c6cd9fb7e674085b10
Gerrit-Change-Number: 14611
Gerrit-PatchSet: 2
Gerrit-Owner: Jiawei Wang <ji...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jiawei Wang <ji...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>

[Impala-ASF-CR] IMPALA-9110: Add table loading time break-down metrics for HdfsTable

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/14611 )

Change subject: IMPALA-9110: Add table loading time break-down metrics for HdfsTable
......................................................................


Patch Set 1:

(3 comments)

I think we missed the loading time of msDb and msTbl objects. We use msClient.getHiveClient().getTable(dbName, tblName) in many places to fetch the msTbl object from HMS. We also care about time spent in those places and it's the real schema-loading-time. May need some refactor to measure this time since it's here and there in our code base.

http://gerrit.cloudera.org:8080/#/c/14611/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14611/1//COMMIT_MSG@13
PS1, Line 13: We added "table-schema-load-duration", "all-column-stats-load
            : -duration", "all-column-stats-load-duration"
Can we also measure the queue time for loading a table? E.g. in TableLoadingMgr.loadAsync, we also want to know how long it takes to start loading the table after we created the tableLoadTask. This is the code path when catalogd launchs or receives a prioritizeLoad request from coordinators.


http://gerrit.cloudera.org:8080/#/c/14611/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/14611/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1229
PS1, Line 1229:   private void loadSchema(org.apache.hadoop.hive.metastore.api.Table msTbl)
I think this function does not invoke any HMS RPCs so the time will be very short. What we care more is the time to get the msTbl object, e.g. https://github.com/apache/impala/blob/50e0fbf889465eb288a5ef69477ccbc627681105/fe/src/main/java/org/apache/impala/catalog/TableLoader.java#L68

Table schema is the required fields of msTbl object (inside StorageDescriptor). So we already loaded it after we get the msTbl object. This function just extracts the schema. The schema loading time should be the time fetching the msTbl object.


http://gerrit.cloudera.org:8080/#/c/14611/1/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

http://gerrit.cloudera.org:8080/#/c/14611/1/fe/src/main/java/org/apache/impala/catalog/Table.java@30
PS1, Line 30: import com.codahale.metrics.Timer;
nit: looks like we import this in below section just before "import com.google.common.xxx" in other files.



-- 
To view, visit http://gerrit.cloudera.org:8080/14611
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5381f9316df588b2004876c6cd9fb7e674085b10
Gerrit-Change-Number: 14611
Gerrit-PatchSet: 1
Gerrit-Owner: Jiawei Wang <ji...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Comment-Date: Mon, 04 Nov 2019 03:53:37 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9110: Add table loading time break-down metrics for HdfsTable

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14611 )

Change subject: IMPALA-9110: Add table loading time break-down metrics for HdfsTable
......................................................................


Patch Set 8:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5273/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/14611
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5381f9316df588b2004876c6cd9fb7e674085b10
Gerrit-Change-Number: 14611
Gerrit-PatchSet: 8
Gerrit-Owner: Jiawei Wang <ji...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jiawei Wang <ji...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yongzhi Chen <yc...@cloudera.com>
Gerrit-Comment-Date: Thu, 21 Nov 2019 21:16:15 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9110: Add table loading time break-down metrics for HdfsTable

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14611 )

Change subject: IMPALA-9110: Add table loading time break-down metrics for HdfsTable
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/4959/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14611
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5381f9316df588b2004876c6cd9fb7e674085b10
Gerrit-Change-Number: 14611
Gerrit-PatchSet: 2
Gerrit-Owner: Jiawei Wang <ji...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jiawei Wang <ji...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Comment-Date: Wed, 06 Nov 2019 06:41:52 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9110: Add table loading time break-down metrics for HdfsTable

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14611 )

Change subject: IMPALA-9110: Add table loading time break-down metrics for HdfsTable
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/4929/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14611
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5381f9316df588b2004876c6cd9fb7e674085b10
Gerrit-Change-Number: 14611
Gerrit-PatchSet: 1
Gerrit-Owner: Jiawei Wang <ji...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Comment-Date: Fri, 01 Nov 2019 07:31:27 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9110: Add table loading time break-down metrics for HdfsTable

Posted by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org>.
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14611 )

Change subject: IMPALA-9110: Add table loading time break-down metrics for HdfsTable
......................................................................


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/14611/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/14611/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@962
PS1, Line 962: updatePartitionsFromHms
Does this also account for the time taken for loading file metadata? If yes, are we logging it somewhere?


http://gerrit.cloudera.org:8080/#/c/14611/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@975
PS1, Line 975: Last for
I would recommend something like Time Taken: <value> ns. here. Also, logging the time value in ns is not much helpful. I think it will be useful since most users will not need a nano-second level resolution. I would suggest to convert it to msec for logging purposes.

Same for the other logs which say "Last for"


http://gerrit.cloudera.org:8080/#/c/14611/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1229
PS1, Line 1229:   private void loadSchema(org.apache.hadoop.hive.metastore.api.Table msTbl)
> I think this function does not invoke any HMS RPCs so the time will be very
+1


http://gerrit.cloudera.org:8080/#/c/14611/1/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

http://gerrit.cloudera.org:8080/#/c/14611/1/fe/src/main/java/org/apache/impala/catalog/Table.java@334
PS1, Line 334:         getMetrics().getTimer("load-duration-valid-write-id-list").time();
I don't think there is much value in having this since we don't expect this to be a bottleneck for most cases. May be just log the time for now?



-- 
To view, visit http://gerrit.cloudera.org:8080/14611
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5381f9316df588b2004876c6cd9fb7e674085b10
Gerrit-Change-Number: 14611
Gerrit-PatchSet: 1
Gerrit-Owner: Jiawei Wang <ji...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jiawei Wang <ji...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Comment-Date: Tue, 05 Nov 2019 18:31:04 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9110: Add table loading time break-down metrics for HdfsTable

Posted by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org>.
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14611 )

Change subject: IMPALA-9110: Add table loading time break-down metrics for HdfsTable
......................................................................


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/14611/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14611/2//COMMIT_MSG@17
PS2, Line 17: We added "hms-load-tbl-schema", "load-duration-all-column-stats".
            : Also, we logged the loadValidWriteIdList() time
Are you planning to add metrics for partition load time separately? I think it will useful to do it in one patch if possible. In my opinion just having a tbl load time without partitions may not be very interesting. Do you think we should have a metric for loading all the partitions of a along with the table? Its also tricky because the time value depends on the number of partitions which are being loaded. So the metric may not give any insights (for example, a large table with many partitions may show small loading times, if we are only loading one partition compared to a small table which loads all the partitions).


http://gerrit.cloudera.org:8080/#/c/14611/2/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

http://gerrit.cloudera.org:8080/#/c/14611/2/fe/src/main/java/org/apache/impala/catalog/Table.java@38
PS2, Line 38: import org.apache.impala.common.*;
I think the convention is to not use * and import each individually for improved readability and not importing unnecessary classes when they are added to the package in the future.


http://gerrit.cloudera.org:8080/#/c/14611/2/fe/src/main/java/org/apache/impala/catalog/TableLoader.java
File fe/src/main/java/org/apache/impala/catalog/TableLoader.java:

http://gerrit.cloudera.org:8080/#/c/14611/2/fe/src/main/java/org/apache/impala/catalog/TableLoader.java@71
PS2, Line 71: hmsLoadSW
Shouldn't this line be after line number 86?


http://gerrit.cloudera.org:8080/#/c/14611/2/fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
File fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java:

http://gerrit.cloudera.org:8080/#/c/14611/2/fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java@264
PS2, Line 264: startTableLoadingSubmitterThreads
Indeed, this is very confusing. Looks like we have pool of threads to submit the loadTask. All these threads are really doing is to taking out the table from the deque, making sure its not being already loaded and then if its not being loaded already, submit it to the next pool tblLoadingPool_ which actually does the loading of the table.

This path is used for loading the tables which are requested by coordinators (prioritized loading) when they are analyzing the queries or the background loading which is enabled with the flag loadInBackground_ is set. I found a bug in the reset() method which adds the tables to the backgroundload unnecessarily (see IMPALA-9139)

I think we can get rid of the pool for submitting the tasks. and we can simply this by making any load request would either be a background load (offerLast in the queue) and prioritized loading (offerFirst in the queue). Since this code is time-consuming, we don't need a pool of threads for doing this. The loadAsync() could be implemented using a prioritizedLoad code-path so that its same as what we have today. Created IMPALA-9140 for this.



-- 
To view, visit http://gerrit.cloudera.org:8080/14611
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5381f9316df588b2004876c6cd9fb7e674085b10
Gerrit-Change-Number: 14611
Gerrit-PatchSet: 2
Gerrit-Owner: Jiawei Wang <ji...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jiawei Wang <ji...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Comment-Date: Fri, 08 Nov 2019 19:15:04 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9110: Add table loading time break-down metrics for HdfsTable

Posted by "Jiawei Wang (Code Review)" <ge...@cloudera.org>.
Hello Quanlong Huang, Yongzhi Chen, Vihang Karajgaonkar, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14611

to look at the new patch set (#7).

Change subject: IMPALA-9110: Add table loading time break-down metrics for HdfsTable
......................................................................

IMPALA-9110: Add table loading time break-down metrics for HdfsTable

A. Problem:
Catalog table loading currently only records the total loading
time. We will need some break-down times, i.e. more detailed
time recording on each loading function. Also, the table schema
loading is not taken into account for load-duration. We will need
to add some more metrics for that.

B. Solution:
- We added "hms-load-tbl-schema", "load-duration.all-column-stats",
"load-duration.all-partitions.total-time",
"load-duration.all-partitions.file-metadata".
Also, we logged the loadValidWriteIdList() time. So now we have
a more detailed breakdown time for table loading info.

The table loading time metrics for HDFS tables are in the following hierarchy:
- Table Schema Loading
- Table Metadata Loading - total time
    - all column stats loading time
    - ValidWriteIds loading time
    - all partitions loading time - total time
        - file metadata loading time
    - storage-metadata-loading-time(standalone metric)

1. Table Schema Loading:
* Meaning: The time for HMS to fetch table object and the real schema loading time.
Normally, the code path is "msClient.getHiveClient().getTable(dbName, tblName)"
* Metric : hms-load-tbl-schema

2. Table Metadata Loading -- total time
* Meaning: The time to load all the table metadata.
The code path is load() function in HdfsTable.load() function.
* Metric: load-duration.total-time

2.1 Table Metadata Loading -- all column stats
* Meaning: load all column stats, this is part of table metadata loading
The code path is HdfsTable.loadAllColumnStats()
* Metric: load-duration.all-column-stats

2.2 Table Metadata Loading -- loadValidWriteIdList
* Meaning: fetch ValidWriteIds from HMS
The code path is HdfsTable.loadValidWriteIdList()
* Metric: no metric recorded for this one. Instead, a debug log is
generated.

2.3 Table Metadata Loading -- storage metadata loading(standalone metric)
* Meaning: Storage related to file system operations during metadata
loading.(The amount of time spent loading metadata from the underlying storage layer.)
* Metric: we rename it to load-duration.storage-metadata. This is a metric introduced by
IMPALA-7322

2.4 Table Metadata Loading -- load all partitions
* Meaning: Load all partitions time, including fetching all partitions
from HMS and loading all partitions. The code path is
MetaStoreUtil.fetchAllPartitions() and HdfsTable.loadAllPartitions()
* Metric: load-duration.all-partitions

2.4.1 Table Metadata Loading -- load all partitions -- load file metadata
* Meaning: The file metadata loading for all all partitions. (This is
part of 2.4). Code path: loadFileMetadataForPartitions() inside
loadAllPartitions()
* Metric: load-duration.all-partitions.file-metadata

C. Extra thing in this commit:
1. Add PrintUtils.printTimeNs for PrettyPrint time in FrontEnd
2. Add explanation for table loading manager

D. Test:
1. Add Unit tests for PrintUtils.printTime() function
2. Manual describe table and verify the table loading metrics are
correct.

Change-Id: I5381f9316df588b2004876c6cd9fb7e674085b10
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HBaseTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
M fe/src/main/java/org/apache/impala/common/PrintUtils.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/util/PrintUtilsTest.java
10 files changed, 186 insertions(+), 35 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/14611/7
-- 
To view, visit http://gerrit.cloudera.org:8080/14611
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5381f9316df588b2004876c6cd9fb7e674085b10
Gerrit-Change-Number: 14611
Gerrit-PatchSet: 7
Gerrit-Owner: Jiawei Wang <ji...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jiawei Wang <ji...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yongzhi Chen <yc...@cloudera.com>

[Impala-ASF-CR] IMPALA-9110: Add table loading time break-down metrics for HdfsTable

Posted by "Jiawei Wang (Code Review)" <ge...@cloudera.org>.
Jiawei Wang has posted comments on this change. ( http://gerrit.cloudera.org:8080/14611 )

Change subject: IMPALA-9110: Add table loading time break-down metrics for HdfsTable
......................................................................


Patch Set 2:

(4 comments)

Thanks for the feedback! I add the time for measuring HMS loading table schema.

For the table loading queue time, Quanlong suggest we should do the following:

"Table queue time is not related to the table itself. But just like query queue time, it's an important metric to understand why the loading is slow.

I think we can track it as a global metric that not bind to any tables. Showing p90th table loading queue time in recent time window (e.g. 1h) can be helpful."

It makes a lot of sense to me. But I am wondering how will we approve this. Might need some guidance on getting the queue time done.

http://gerrit.cloudera.org:8080/#/c/14611/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/14611/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@962
PS1, Line 962: 
> Does this also account for the time taken for loading file metadata? If yes
I am not sure if I understand the question here.

The time taken for updatePartitionsFromHms() will also contribute to storageMetadataLoadTime_ (which is time spent in the source systems loading/reloading the fs metadata for the table)

And we have this metrics "storage-metadata-load-duration" to keep track of the storageMetadataLoadTime_.


http://gerrit.cloudera.org:8080/#/c/14611/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@975
PS1, Line 975: _ = load
> I would recommend something like Time Taken: <value> ns. here. Also, loggin
Adding a PrettyPrint function for Time in frontend. Done.


http://gerrit.cloudera.org:8080/#/c/14611/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1229
PS1, Line 1229:   /**
> I think this function does not invoke any HMS RPCs so the time will be very
Done


http://gerrit.cloudera.org:8080/#/c/14611/1/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

http://gerrit.cloudera.org:8080/#/c/14611/1/fe/src/main/java/org/apache/impala/catalog/Table.java@334
PS1, Line 334:   }
> I don't think there is much value in having this since we don't expect this
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/14611
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5381f9316df588b2004876c6cd9fb7e674085b10
Gerrit-Change-Number: 14611
Gerrit-PatchSet: 2
Gerrit-Owner: Jiawei Wang <ji...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jiawei Wang <ji...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Comment-Date: Wed, 06 Nov 2019 06:00:37 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9110: Add table loading time break-down metrics for HdfsTable

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14611 )

Change subject: IMPALA-9110: Add table loading time break-down metrics for HdfsTable
......................................................................


Patch Set 9:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5278/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/14611
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5381f9316df588b2004876c6cd9fb7e674085b10
Gerrit-Change-Number: 14611
Gerrit-PatchSet: 9
Gerrit-Owner: Jiawei Wang <ji...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jiawei Wang <ji...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yongzhi Chen <yc...@cloudera.com>
Gerrit-Comment-Date: Fri, 22 Nov 2019 06:33:24 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9110: Add table loading time break-down metrics for HdfsTable

Posted by "Jiawei Wang (Code Review)" <ge...@cloudera.org>.
Jiawei Wang has posted comments on this change. ( http://gerrit.cloudera.org:8080/14611 )

Change subject: IMPALA-9110: Add table loading time break-down metrics for HdfsTable
......................................................................


Patch Set 9:

rebase the commit to solve merge conflict.


-- 
To view, visit http://gerrit.cloudera.org:8080/14611
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5381f9316df588b2004876c6cd9fb7e674085b10
Gerrit-Change-Number: 14611
Gerrit-PatchSet: 9
Gerrit-Owner: Jiawei Wang <ji...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jiawei Wang <ji...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yongzhi Chen <yc...@cloudera.com>
Gerrit-Comment-Date: Fri, 22 Nov 2019 05:37:24 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9110: Add table loading time break-down metrics for HdfsTable

Posted by "Jiawei Wang (Code Review)" <ge...@cloudera.org>.
Jiawei Wang has posted comments on this change. ( http://gerrit.cloudera.org:8080/14611 )

Change subject: IMPALA-9110: Add table loading time break-down metrics for HdfsTable
......................................................................


Patch Set 2:

(1 comment)

Also, I renamed the startTableLoadingThreads to startTableLoadingSubmitterThreads. The current code is kind of confusing to understand...

http://gerrit.cloudera.org:8080/#/c/14611/2/fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
File fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java:

http://gerrit.cloudera.org:8080/#/c/14611/2/fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java@244
PS2, Line 244:           //LOG.info();
nit: remove this



-- 
To view, visit http://gerrit.cloudera.org:8080/14611
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5381f9316df588b2004876c6cd9fb7e674085b10
Gerrit-Change-Number: 14611
Gerrit-PatchSet: 2
Gerrit-Owner: Jiawei Wang <ji...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jiawei Wang <ji...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Comment-Date: Wed, 06 Nov 2019 06:04:30 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9110: Add table loading time break-down metrics for HdfsTable

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14611 )

Change subject: IMPALA-9110: Add table loading time break-down metrics for HdfsTable
......................................................................


Patch Set 8: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/14611
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5381f9316df588b2004876c6cd9fb7e674085b10
Gerrit-Change-Number: 14611
Gerrit-PatchSet: 8
Gerrit-Owner: Jiawei Wang <ji...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jiawei Wang <ji...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yongzhi Chen <yc...@cloudera.com>
Gerrit-Comment-Date: Thu, 21 Nov 2019 21:16:14 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9110: Add table loading time break-down metrics for HdfsTable

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14611 )

Change subject: IMPALA-9110: Add table loading time break-down metrics for HdfsTable
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14611/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/14611/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1238
PS1, Line 1238:       if (nullColumnValue_ == null) nullColumnValue_ = FeFsTable.DEFAULT_NULL_COLUMN_VALUE;
line too long (91 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/14611
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5381f9316df588b2004876c6cd9fb7e674085b10
Gerrit-Change-Number: 14611
Gerrit-PatchSet: 1
Gerrit-Owner: Jiawei Wang <ji...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Fri, 01 Nov 2019 06:47:54 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9110: Add table loading time break-down metrics for HdfsTable

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14611 )

Change subject: IMPALA-9110: Add table loading time break-down metrics for HdfsTable
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/4997/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14611
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5381f9316df588b2004876c6cd9fb7e674085b10
Gerrit-Change-Number: 14611
Gerrit-PatchSet: 3
Gerrit-Owner: Jiawei Wang <ji...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jiawei Wang <ji...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yongzhi Chen <yc...@cloudera.com>
Gerrit-Comment-Date: Sun, 10 Nov 2019 09:54:23 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9110: Add table loading time break-down metrics for HdfsTable

Posted by "Jiawei Wang (Code Review)" <ge...@cloudera.org>.
Jiawei Wang has posted comments on this change. ( http://gerrit.cloudera.org:8080/14611 )

Change subject: IMPALA-9110: Add table loading time break-down metrics for HdfsTable
......................................................................


Patch Set 7:

(3 comments)

Thanks for the review. Let me know if we need add anything else in this commit.

http://gerrit.cloudera.org:8080/#/c/14611/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/14611/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@163
PS6, Line 163: load-duration.all-partitions";
> I understand that the naming is used to represent nesting of the metrics. S
Done


http://gerrit.cloudera.org:8080/#/c/14611/6/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

http://gerrit.cloudera.org:8080/#/c/14611/6/fe/src/main/java/org/apache/impala/catalog/Table.java@149
PS6, Line 149: load-duration";
> Since this is the top level load-duration adding a specific --total-time is
Done


http://gerrit.cloudera.org:8080/#/c/14611/6/fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
File fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java:

http://gerrit.cloudera.org:8080/#/c/14611/6/fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java@262
PS6, Line 262: There is a discussion here: https://issues.apache.org/jira/brows
> May be just say (See discussion in ..) :)
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/14611
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5381f9316df588b2004876c6cd9fb7e674085b10
Gerrit-Change-Number: 14611
Gerrit-PatchSet: 7
Gerrit-Owner: Jiawei Wang <ji...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jiawei Wang <ji...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yongzhi Chen <yc...@cloudera.com>
Gerrit-Comment-Date: Thu, 21 Nov 2019 20:28:18 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9110: Add table loading time break-down metrics for HdfsTable

Posted by "Jiawei Wang (Code Review)" <ge...@cloudera.org>.
Hello Quanlong Huang, Yongzhi Chen, Vihang Karajgaonkar, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14611

to look at the new patch set (#5).

Change subject: IMPALA-9110: Add table loading time break-down metrics for HdfsTable
......................................................................

IMPALA-9110: Add table loading time break-down metrics for HdfsTable

Problem:
Catalog table loading currently only records the total loading
time. We will need some break-down times, i.e. more detailed
time recording on each loading function. Also, the table schema
loading is not taken into account for load-duration. We will need
to add some more metrics for that.

Solution:
- We added "hms-load-tbl-schema", "load-duration-all-column-stats",
"all-partitions-load-duration",
"file-metadata-all-partitions-load-duration".
Also, we logged the loadValidWriteIdList() time. So now we have
a more detailed breakdown time for table loading info.

The table loading time metrics for HDFS tables are in the following hierarchy:
- Table Schema Loading
- Table Metadata Loading - total time
    - all column stats loading time
    - ValidWriteIds loading time
    - all partitions loading time - total time
        - file metadata loading time
    - storage-metadata-loading-time(standalone metric)

1. Table Schema Loading:
* Meaning: The time for HMS to fetch table object and the real schema loading time.
Normally, the code path is "msClient.getHiveClient().getTable(dbName, tblName)"
* Metric : hms-load-tbl-schema

2. Table Metadata Loading -- total time
* Meaning: The time to load all the table metadata.
The code path is load() function in HdfsTable.load() function.
* Metric: load-duration

2.1 Table Metadata Loading -- all column stats
* Meaning: load all column stats, this is part of table metadata loading
The code path is HdfsTable.loadAllColumnStats()
* Metric: all-column-stats-load-duration

2.2 Table Metadata Loading -- loadValidWriteIdList
* Meaning: fetch ValidWriteIds from HMS
The code path is HdfsTable.loadValidWriteIdList()
* Metric: no metric recorded for this one. Instead, a debug log is
generated.

2.3 Table Metadata Loading -- storage metadata loading(standalone metric)
* Meaning: Storage related to file system operations during metadata
loading.(The amount of time spent loading metadata from the underlying storage layer.)
* Metric: storage-metadata-load-duration. This is a metric introduced by
IMPALA-7322

2.4 Table Metadata Loading -- load all partitions
* Meaning: Load all partitions time, including fetching all partitions
from HMS and loading all partitions. The code path is
MetaStoreUtil.fetchAllPartitions() and HdfsTable.loadAllPartitions()
* Metric: all-partitions-load-duration

2.4.1 Table Metadata Loading -- load all partitions -- load file metadata
* Meaning: The file metadata loading for all all partitions. (This is
part of 2.4). Code path: loadFileMetadataForPartitions() inside
loadAllPartitions()
* Metric: file-metadata-all-partitions-load-duration

- Extra function: Add PrintUtils.printTimeNs for PrettyPrint time in FrontEnd

Test:
1. Will add metrics test after the few rounds of code review.
2. Add Unit tests for PrintUtils.printTime() function

Change-Id: I5381f9316df588b2004876c6cd9fb7e674085b10
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
M fe/src/main/java/org/apache/impala/common/PrintUtils.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/util/PrintUtilsTest.java
8 files changed, 153 insertions(+), 25 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/14611/5
-- 
To view, visit http://gerrit.cloudera.org:8080/14611
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5381f9316df588b2004876c6cd9fb7e674085b10
Gerrit-Change-Number: 14611
Gerrit-PatchSet: 5
Gerrit-Owner: Jiawei Wang <ji...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jiawei Wang <ji...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yongzhi Chen <yc...@cloudera.com>

[Impala-ASF-CR] IMPALA-9110: Add table loading time break-down metrics for HdfsTable

Posted by "Jiawei Wang (Code Review)" <ge...@cloudera.org>.
Hello Quanlong Huang, Yongzhi Chen, Vihang Karajgaonkar, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14611

to look at the new patch set (#6).

Change subject: IMPALA-9110: Add table loading time break-down metrics for HdfsTable
......................................................................

IMPALA-9110: Add table loading time break-down metrics for HdfsTable

A. Problem:
Catalog table loading currently only records the total loading
time. We will need some break-down times, i.e. more detailed
time recording on each loading function. Also, the table schema
loading is not taken into account for load-duration. We will need
to add some more metrics for that.

B. Solution:
- We added "hms-load-tbl-schema", "load-duration.all-column-stats",
"load-duration.all-partitions.total-time",
"load-duration.all-partitions.file-metadata".
Also, we logged the loadValidWriteIdList() time. So now we have
a more detailed breakdown time for table loading info.

The table loading time metrics for HDFS tables are in the following hierarchy:
- Table Schema Loading
- Table Metadata Loading - total time
    - all column stats loading time
    - ValidWriteIds loading time
    - all partitions loading time - total time
        - file metadata loading time
    - storage-metadata-loading-time(standalone metric)

1. Table Schema Loading:
* Meaning: The time for HMS to fetch table object and the real schema loading time.
Normally, the code path is "msClient.getHiveClient().getTable(dbName, tblName)"
* Metric : hms-load-tbl-schema

2. Table Metadata Loading -- total time
* Meaning: The time to load all the table metadata.
The code path is load() function in HdfsTable.load() function.
* Metric: load-duration.total-time

2.1 Table Metadata Loading -- all column stats
* Meaning: load all column stats, this is part of table metadata loading
The code path is HdfsTable.loadAllColumnStats()
* Metric: load-duration.all-column-stats

2.2 Table Metadata Loading -- loadValidWriteIdList
* Meaning: fetch ValidWriteIds from HMS
The code path is HdfsTable.loadValidWriteIdList()
* Metric: no metric recorded for this one. Instead, a debug log is
generated.

2.3 Table Metadata Loading -- storage metadata loading(standalone metric)
* Meaning: Storage related to file system operations during metadata
loading.(The amount of time spent loading metadata from the underlying storage layer.)
* Metric: we rename it to load-duration.storage-metadata. This is a metric introduced by
IMPALA-7322

2.4 Table Metadata Loading -- load all partitions
* Meaning: Load all partitions time, including fetching all partitions
from HMS and loading all partitions. The code path is
MetaStoreUtil.fetchAllPartitions() and HdfsTable.loadAllPartitions()
* Metric: load-duration.all-partitions

2.4.1 Table Metadata Loading -- load all partitions -- load file metadata
* Meaning: The file metadata loading for all all partitions. (This is
part of 2.4). Code path: loadFileMetadataForPartitions() inside
loadAllPartitions()
* Metric: load-duration.all-partitions.file-metadata

C. Extra thing in this commit:
1. Add PrintUtils.printTimeNs for PrettyPrint time in FrontEnd
2. Add explanation for table loading manager

D. Test:
1. Add Unit tests for PrintUtils.printTime() function
2. Manual describe table and verify the table loading metrics are
correct.

Change-Id: I5381f9316df588b2004876c6cd9fb7e674085b10
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HBaseTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
M fe/src/main/java/org/apache/impala/common/PrintUtils.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/util/PrintUtilsTest.java
10 files changed, 188 insertions(+), 36 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/14611/6
-- 
To view, visit http://gerrit.cloudera.org:8080/14611
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5381f9316df588b2004876c6cd9fb7e674085b10
Gerrit-Change-Number: 14611
Gerrit-PatchSet: 6
Gerrit-Owner: Jiawei Wang <ji...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jiawei Wang <ji...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yongzhi Chen <yc...@cloudera.com>

[Impala-ASF-CR] IMPALA-9110: Add table loading time break-down metrics for HdfsTable

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14611 )

Change subject: IMPALA-9110: Add table loading time break-down metrics for HdfsTable
......................................................................


Patch Set 9: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/14611
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5381f9316df588b2004876c6cd9fb7e674085b10
Gerrit-Change-Number: 14611
Gerrit-PatchSet: 9
Gerrit-Owner: Jiawei Wang <ji...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jiawei Wang <ji...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yongzhi Chen <yc...@cloudera.com>
Gerrit-Comment-Date: Fri, 22 Nov 2019 11:10:04 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9110: Add table loading time break-down metrics for HdfsTable

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14611 )

Change subject: IMPALA-9110: Add table loading time break-down metrics for HdfsTable
......................................................................


Patch Set 5:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/4999/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14611
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5381f9316df588b2004876c6cd9fb7e674085b10
Gerrit-Change-Number: 14611
Gerrit-PatchSet: 5
Gerrit-Owner: Jiawei Wang <ji...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jiawei Wang <ji...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yongzhi Chen <yc...@cloudera.com>
Gerrit-Comment-Date: Sun, 10 Nov 2019 21:00:21 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9110: Add table loading time break-down metrics for HdfsTable

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14611 )

Change subject: IMPALA-9110: Add table loading time break-down metrics for HdfsTable
......................................................................


Patch Set 9:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5115/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14611
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5381f9316df588b2004876c6cd9fb7e674085b10
Gerrit-Change-Number: 14611
Gerrit-PatchSet: 9
Gerrit-Owner: Jiawei Wang <ji...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jiawei Wang <ji...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yongzhi Chen <yc...@cloudera.com>
Gerrit-Comment-Date: Fri, 22 Nov 2019 06:20:13 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9110: Add table loading time break-down metrics for HdfsTable

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14611 )

Change subject: IMPALA-9110: Add table loading time break-down metrics for HdfsTable
......................................................................


Patch Set 6:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5080/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14611
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5381f9316df588b2004876c6cd9fb7e674085b10
Gerrit-Change-Number: 14611
Gerrit-PatchSet: 6
Gerrit-Owner: Jiawei Wang <ji...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jiawei Wang <ji...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yongzhi Chen <yc...@cloudera.com>
Gerrit-Comment-Date: Wed, 20 Nov 2019 07:34:47 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9110: Add table loading time break-down metrics for HdfsTable

Posted by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org>.
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14611 )

Change subject: IMPALA-9110: Add table loading time break-down metrics for HdfsTable
......................................................................


Patch Set 7: Code-Review+2

Thanks for making the changes. The patch looks good to me.


-- 
To view, visit http://gerrit.cloudera.org:8080/14611
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5381f9316df588b2004876c6cd9fb7e674085b10
Gerrit-Change-Number: 14611
Gerrit-PatchSet: 7
Gerrit-Owner: Jiawei Wang <ji...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jiawei Wang <ji...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yongzhi Chen <yc...@cloudera.com>
Gerrit-Comment-Date: Thu, 21 Nov 2019 21:15:51 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9110: Add table loading time break-down metrics for HdfsTable

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/14611 )

Change subject: IMPALA-9110: Add table loading time break-down metrics for HdfsTable
......................................................................

IMPALA-9110: Add table loading time break-down metrics for HdfsTable

A. Problem:
Catalog table loading currently only records the total loading
time. We will need some break-down times, i.e. more detailed
time recording on each loading function. Also, the table schema
loading is not taken into account for load-duration. We will need
to add some more metrics for that.

B. Solution:
- We added "hms-load-tbl-schema", "load-duration.all-column-stats",
"load-duration.all-partitions.total-time",
"load-duration.all-partitions.file-metadata".
Also, we logged the loadValidWriteIdList() time. So now we have
a more detailed breakdown time for table loading info.

The table loading time metrics for HDFS tables are in the following hierarchy:
- Table Schema Loading
- Table Metadata Loading - total time
    - all column stats loading time
    - ValidWriteIds loading time
    - all partitions loading time - total time
        - file metadata loading time
    - storage-metadata-loading-time(standalone metric)

1. Table Schema Loading:
* Meaning: The time for HMS to fetch table object and the real schema loading time.
Normally, the code path is "msClient.getHiveClient().getTable(dbName, tblName)"
* Metric : hms-load-tbl-schema

2. Table Metadata Loading -- total time
* Meaning: The time to load all the table metadata.
The code path is load() function in HdfsTable.load() function.
* Metric: load-duration.total-time

2.1 Table Metadata Loading -- all column stats
* Meaning: load all column stats, this is part of table metadata loading
The code path is HdfsTable.loadAllColumnStats()
* Metric: load-duration.all-column-stats

2.2 Table Metadata Loading -- loadValidWriteIdList
* Meaning: fetch ValidWriteIds from HMS
The code path is HdfsTable.loadValidWriteIdList()
* Metric: no metric recorded for this one. Instead, a debug log is
generated.

2.3 Table Metadata Loading -- storage metadata loading(standalone metric)
* Meaning: Storage related to file system operations during metadata
loading.(The amount of time spent loading metadata from the underlying storage layer.)
* Metric: we rename it to load-duration.storage-metadata. This is a metric introduced by
IMPALA-7322

2.4 Table Metadata Loading -- load all partitions
* Meaning: Load all partitions time, including fetching all partitions
from HMS and loading all partitions. The code path is
MetaStoreUtil.fetchAllPartitions() and HdfsTable.loadAllPartitions()
* Metric: load-duration.all-partitions

2.4.1 Table Metadata Loading -- load all partitions -- load file metadata
* Meaning: The file metadata loading for all all partitions. (This is
part of 2.4). Code path: loadFileMetadataForPartitions() inside
loadAllPartitions()
* Metric: load-duration.all-partitions.file-metadata

C. Extra thing in this commit:
1. Add PrintUtils.printTimeNs for PrettyPrint time in FrontEnd
2. Add explanation for table loading manager

D. Test:
1. Add Unit tests for PrintUtils.printTime() function
2. Manual describe table and verify the table loading metrics are
correct.

Change-Id: I5381f9316df588b2004876c6cd9fb7e674085b10
Reviewed-on: http://gerrit.cloudera.org:8080/14611
Reviewed-by: Vihang Karajgaonkar <vi...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HBaseTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
M fe/src/main/java/org/apache/impala/common/PrintUtils.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/util/PrintUtilsTest.java
10 files changed, 186 insertions(+), 35 deletions(-)

Approvals:
  Vihang Karajgaonkar: Looks good to me, approved
  Impala Public Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/14611
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I5381f9316df588b2004876c6cd9fb7e674085b10
Gerrit-Change-Number: 14611
Gerrit-PatchSet: 10
Gerrit-Owner: Jiawei Wang <ji...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jiawei Wang <ji...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yongzhi Chen <yc...@cloudera.com>

[Impala-ASF-CR] IMPALA-9110: Add table loading time break-down metrics for HdfsTable

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14611 )

Change subject: IMPALA-9110: Add table loading time break-down metrics for HdfsTable
......................................................................


Patch Set 4:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/4998/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14611
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5381f9316df588b2004876c6cd9fb7e674085b10
Gerrit-Change-Number: 14611
Gerrit-PatchSet: 4
Gerrit-Owner: Jiawei Wang <ji...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jiawei Wang <ji...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yongzhi Chen <yc...@cloudera.com>
Gerrit-Comment-Date: Sun, 10 Nov 2019 09:59:32 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9110: Add table loading time break-down metrics for HdfsTable

Posted by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org>.
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14611 )

Change subject: IMPALA-9110: Add table loading time break-down metrics for HdfsTable
......................................................................


Patch Set 6:

(3 comments)

Have some more suggestions below and patch is good to go from my side. Thanks!

http://gerrit.cloudera.org:8080/#/c/14611/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/14611/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@163
PS6, Line 163: load-duration.all-partitions.total-time
I understand that the naming is used to represent nesting of the metrics. Since "load-duration.all-partitions.total-time" includes "load-duration.all-partitions.file-metadata" should we call it "load-duration.all-partitions"?


http://gerrit.cloudera.org:8080/#/c/14611/6/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

http://gerrit.cloudera.org:8080/#/c/14611/6/fe/src/main/java/org/apache/impala/catalog/Table.java@149
PS6, Line 149: load-duration.total-time
Since this is the top level load-duration adding a specific --total-time is redundant in my opinion.


http://gerrit.cloudera.org:8080/#/c/14611/6/fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
File fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java:

http://gerrit.cloudera.org:8080/#/c/14611/6/fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java@262
PS6, Line 262: For people having hard time understanding impala table loading. 
May be just say (See discussion in ..) :)



-- 
To view, visit http://gerrit.cloudera.org:8080/14611
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5381f9316df588b2004876c6cd9fb7e674085b10
Gerrit-Change-Number: 14611
Gerrit-PatchSet: 6
Gerrit-Owner: Jiawei Wang <ji...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jiawei Wang <ji...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yongzhi Chen <yc...@cloudera.com>
Gerrit-Comment-Date: Thu, 21 Nov 2019 18:01:52 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9110: Add table loading time break-down metrics for HdfsTable

Posted by "Jiawei Wang (Code Review)" <ge...@cloudera.org>.
Jiawei Wang has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/14611 )

Change subject: IMPALA-9110: Add table loading time break-down metrics for HdfsTable
......................................................................

IMPALA-9110: Add table loading time break-down metrics for HdfsTable

Problem:
Catalog table loading currently only records the total loading
time. We will need some break-down times, i.e. more detailed
time recording on each loading function. Also, the table schema
loading is not taken into account for load-duration. We will need
to add some more metrics for that.

Solution:
- We added "hms-load-tbl-schema", "load-duration-all-column-stats",
"all-partitions-load-duration",
"file-metadata-all-partitions-load-duration".
Also, we logged the loadValidWriteIdList() time. So now we have
a more detailed breakdown time for table loading info.

The table loading time metrics for HDFS tables are in the following hierarchy:
- Table Schema Loading
- Table Metadata Loading - total time
    - all column stats loading time
    - ValidWriteIds loading time
    - all partitions loading time - total time
        - file metadata loading time
    - storage-metadata-loading-time(standalone metric)

1. Table Schema Loading:
* Meaning: The time for HMS to fetch table object and the real schema loading time.
Normally, the code path is "msClient.getHiveClient().getTable(dbName, tblName)"
* Metric : hms-load-tbl-schema

2. Table Metadata Loading -- total time
* Meaning: The time to load all the table metadata.
The code path is load() function in HdfsTable.load() function.
* Metric: load-duration

2.1 Table Metadata Loading -- all column stats
* Meaning: load all column stats, this is part of table metadata loading
The code path is HdfsTable.loadAllColumnStats()
* Metric: all-column-stats-load-duration

2.2 Table Metadata Loading -- loadValidWriteIdList
* Meaning: fetch ValidWriteIds from HMS
The code path is HdfsTable.loadValidWriteIdList()
* Metric: no metric recorded for this one. Instead, a debug log is
generated.

2.3 Table Metadata Loading -- storage metadata loading(standalone metric)
* Meaning: Storage related to file system operations during metadata
loading.(The amount of time spent loading metadata from the underlying storage layer.)
* Metric: storage-metadata-load-duration. This is a metric introduced by
IMPALA-7322

2.4 Table Metadata Loading -- load all partitions
* Meaning: Load all partitions time, including fetching all partitions
from HMS and loading all partitions. The code path is
MetaStoreUtil.fetchAllPartitions() and HdfsTable.loadAllPartitions()
* Metric: all-partitions-load-duration

2.4.1 Table Metadata Loading -- load all partitions -- load file metadata
* Meaning: The file metadata loading for all all partitions. (This is
part of 2.4). Code path: loadFileMetadataForPartitions() inside
loadAllPartitions()
* Metric: file-metadata-all-partitions-load-duration

- Extra function: Add PrintUtils.printTimeNs for PrettyPrint time in FrontEnd

Test:
1. Will add metrics test after the few rounds of code review.
2. Add Unit tests for PrintUtils.printTime() function

Change-Id: I5381f9316df588b2004876c6cd9fb7e674085b10
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
M fe/src/main/java/org/apache/impala/common/PrintUtils.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/util/PrintUtilsTest.java
8 files changed, 156 insertions(+), 26 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/14611/3
-- 
To view, visit http://gerrit.cloudera.org:8080/14611
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5381f9316df588b2004876c6cd9fb7e674085b10
Gerrit-Change-Number: 14611
Gerrit-PatchSet: 3
Gerrit-Owner: Jiawei Wang <ji...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jiawei Wang <ji...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>

[Impala-ASF-CR] IMPALA-9110: Add table loading time break-down metrics for HdfsTable

Posted by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org>.
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14611 )

Change subject: IMPALA-9110: Add table loading time break-down metrics for HdfsTable
......................................................................


Patch Set 9: Code-Review+2

> Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/5273/

The job failed due to "Script create-load-data.sh timed out". It probably is some jenkins issue. Will retrigger it one more time to make sure.


-- 
To view, visit http://gerrit.cloudera.org:8080/14611
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5381f9316df588b2004876c6cd9fb7e674085b10
Gerrit-Change-Number: 14611
Gerrit-PatchSet: 9
Gerrit-Owner: Jiawei Wang <ji...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jiawei Wang <ji...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yongzhi Chen <yc...@cloudera.com>
Gerrit-Comment-Date: Fri, 22 Nov 2019 06:35:59 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9110: Add table loading time break-down metrics for HdfsTable

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14611 )

Change subject: IMPALA-9110: Add table loading time break-down metrics for HdfsTable
......................................................................


Patch Set 7:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5104/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14611
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5381f9316df588b2004876c6cd9fb7e674085b10
Gerrit-Change-Number: 14611
Gerrit-PatchSet: 7
Gerrit-Owner: Jiawei Wang <ji...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jiawei Wang <ji...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yongzhi Chen <yc...@cloudera.com>
Gerrit-Comment-Date: Thu, 21 Nov 2019 21:14:04 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9110: Add table loading time break-down metrics for HdfsTable

Posted by "Jiawei Wang (Code Review)" <ge...@cloudera.org>.
Hello Quanlong Huang, Yongzhi Chen, Vihang Karajgaonkar, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14611

to look at the new patch set (#9).

Change subject: IMPALA-9110: Add table loading time break-down metrics for HdfsTable
......................................................................

IMPALA-9110: Add table loading time break-down metrics for HdfsTable

A. Problem:
Catalog table loading currently only records the total loading
time. We will need some break-down times, i.e. more detailed
time recording on each loading function. Also, the table schema
loading is not taken into account for load-duration. We will need
to add some more metrics for that.

B. Solution:
- We added "hms-load-tbl-schema", "load-duration.all-column-stats",
"load-duration.all-partitions.total-time",
"load-duration.all-partitions.file-metadata".
Also, we logged the loadValidWriteIdList() time. So now we have
a more detailed breakdown time for table loading info.

The table loading time metrics for HDFS tables are in the following hierarchy:
- Table Schema Loading
- Table Metadata Loading - total time
    - all column stats loading time
    - ValidWriteIds loading time
    - all partitions loading time - total time
        - file metadata loading time
    - storage-metadata-loading-time(standalone metric)

1. Table Schema Loading:
* Meaning: The time for HMS to fetch table object and the real schema loading time.
Normally, the code path is "msClient.getHiveClient().getTable(dbName, tblName)"
* Metric : hms-load-tbl-schema

2. Table Metadata Loading -- total time
* Meaning: The time to load all the table metadata.
The code path is load() function in HdfsTable.load() function.
* Metric: load-duration.total-time

2.1 Table Metadata Loading -- all column stats
* Meaning: load all column stats, this is part of table metadata loading
The code path is HdfsTable.loadAllColumnStats()
* Metric: load-duration.all-column-stats

2.2 Table Metadata Loading -- loadValidWriteIdList
* Meaning: fetch ValidWriteIds from HMS
The code path is HdfsTable.loadValidWriteIdList()
* Metric: no metric recorded for this one. Instead, a debug log is
generated.

2.3 Table Metadata Loading -- storage metadata loading(standalone metric)
* Meaning: Storage related to file system operations during metadata
loading.(The amount of time spent loading metadata from the underlying storage layer.)
* Metric: we rename it to load-duration.storage-metadata. This is a metric introduced by
IMPALA-7322

2.4 Table Metadata Loading -- load all partitions
* Meaning: Load all partitions time, including fetching all partitions
from HMS and loading all partitions. The code path is
MetaStoreUtil.fetchAllPartitions() and HdfsTable.loadAllPartitions()
* Metric: load-duration.all-partitions

2.4.1 Table Metadata Loading -- load all partitions -- load file metadata
* Meaning: The file metadata loading for all all partitions. (This is
part of 2.4). Code path: loadFileMetadataForPartitions() inside
loadAllPartitions()
* Metric: load-duration.all-partitions.file-metadata

C. Extra thing in this commit:
1. Add PrintUtils.printTimeNs for PrettyPrint time in FrontEnd
2. Add explanation for table loading manager

D. Test:
1. Add Unit tests for PrintUtils.printTime() function
2. Manual describe table and verify the table loading metrics are
correct.

Change-Id: I5381f9316df588b2004876c6cd9fb7e674085b10
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HBaseTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
M fe/src/main/java/org/apache/impala/common/PrintUtils.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/util/PrintUtilsTest.java
10 files changed, 186 insertions(+), 35 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/14611/9
-- 
To view, visit http://gerrit.cloudera.org:8080/14611
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5381f9316df588b2004876c6cd9fb7e674085b10
Gerrit-Change-Number: 14611
Gerrit-PatchSet: 9
Gerrit-Owner: Jiawei Wang <ji...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jiawei Wang <ji...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yongzhi Chen <yc...@cloudera.com>

[Impala-ASF-CR] IMPALA-9110: Add table loading time break-down metrics for HdfsTable

Posted by "Jiawei Wang (Code Review)" <ge...@cloudera.org>.
Jiawei Wang has posted comments on this change. ( http://gerrit.cloudera.org:8080/14611 )

Change subject: IMPALA-9110: Add table loading time break-down metrics for HdfsTable
......................................................................


Patch Set 1:

(1 comment)

The table loading process is kind of confusing to anyone reading it for the first time. Here are some good explanation from Quanlong:

" There are two kinds of table loading requests:
1) Async: load requests from invalidate metadata when --load_table_in_background=true. PrioritizedLoad requests from coordinators when missing table meta in analysis.
2) Sync: load requests from DDLs and some RPCs like getPartitionStats and getPartialCatalogObject.

All async requests are put in tableLoadingDeque_ and deduplicated by tableLoadingBarrier_. tableLoadingDeque_ is consumed by threads launched in startTableLoadingThreads(). These threads call CatalogServiceCatalog.getOrLoadTable() finally to load the table. Note that getOrLoadTable() is sync. It only returns when table already loaded or loading finish.
All sync requests will also go to CatalogServiceCatalog.getOrLoadTable(). All loading threads are running in tblLoadingPool_.

So threads in startTableLoadingThreads() are just submitters for background load requests. All table loading threads are in tblLoadingPool"

http://gerrit.cloudera.org:8080/#/c/14611/1/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

http://gerrit.cloudera.org:8080/#/c/14611/1/fe/src/main/java/org/apache/impala/catalog/Table.java@30
PS1, Line 30: import com.codahale.metrics.Timer;
> nit: looks like we import this in below section just before "import com.goo
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/14611
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5381f9316df588b2004876c6cd9fb7e674085b10
Gerrit-Change-Number: 14611
Gerrit-PatchSet: 1
Gerrit-Owner: Jiawei Wang <ji...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jiawei Wang <ji...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Comment-Date: Tue, 05 Nov 2019 18:22:01 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9110: Add table loading time break-down metrics for HdfsTable

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14611 )

Change subject: IMPALA-9110: Add table loading time break-down metrics for HdfsTable
......................................................................


Patch Set 8: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/5273/


-- 
To view, visit http://gerrit.cloudera.org:8080/14611
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5381f9316df588b2004876c6cd9fb7e674085b10
Gerrit-Change-Number: 14611
Gerrit-PatchSet: 8
Gerrit-Owner: Jiawei Wang <ji...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jiawei Wang <ji...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yongzhi Chen <yc...@cloudera.com>
Gerrit-Comment-Date: Fri, 22 Nov 2019 02:16:22 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9110: Add table loading time break-down metrics for HdfsTable

Posted by "Jiawei Wang (Code Review)" <ge...@cloudera.org>.
Jiawei Wang has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/14611 )

Change subject: IMPALA-9110: Add table loading time break-down metrics for HdfsTable
......................................................................

IMPALA-9110: Add table loading time break-down metrics for HdfsTable

Problem:
Catalog table loading currently only records the total loading
time. We will need some break-down times, i.e. more detailed
time recording on each loading function. Also, the table schema
loading is not taken into account for load-duration. We will need
to add some more metrics for that.

Solution:
- We added "hms-load-tbl-schema", "load-duration-all-column-stats",
"all-partitions-load-duration",
"file-metadata-all-partitions-load-duration".
Also, we logged the loadValidWriteIdList() time. So now we have
a more detailed breakdown time for table loading info.

The table loading time metrics for HDFS tables are in the following hierarchy:
- Table Schema Loading
- Table Metadata Loading - total time
    - all column stats loading time
    - ValidWriteIds loading time
    - all partitions loading time - total time
        - file metadata loading time
    - storage-metadata-loading-time(standalone metric)

1. Table Schema Loading:
* Meaning: The time for HMS to fetch table object and the real schema loading time.
Normally, the code path is "msClient.getHiveClient().getTable(dbName, tblName)"
* Metric : hms-load-tbl-schema

2. Table Metadata Loading -- total time
* Meaning: The time to load all the table metadata.
The code path is load() function in HdfsTable.load() function.
* Metric: load-duration

2.1 Table Metadata Loading -- all column stats
* Meaning: load all column stats, this is part of table metadata loading
The code path is HdfsTable.loadAllColumnStats()
* Metric: all-column-stats-load-duration

2.2 Table Metadata Loading -- loadValidWriteIdList
* Meaning: fetch ValidWriteIds from HMS
The code path is HdfsTable.loadValidWriteIdList()
* Metric: no metric recorded for this one. Instead, a debug log is
generated.

2.3 Table Metadata Loading -- storage metadata loading(standalone metric)
* Meaning: Storage related to file system operations during metadata
loading.(The amount of time spent loading metadata from the underlying storage layer.)
* Metric: storage-metadata-load-duration. This is a metric introduced by
IMPALA-7322

2.4 Table Metadata Loading -- load all partitions
* Meaning: Load all partitions time, including fetching all partitions
from HMS and loading all partitions. The code path is
MetaStoreUtil.fetchAllPartitions() and HdfsTable.loadAllPartitions()
* Metric: all-partitions-load-duration

2.4.1 Table Metadata Loading -- load all partitions -- load file metadata
* Meaning: The file metadata loading for all all partitions. (This is
part of 2.4). Code path: loadFileMetadataForPartitions() inside
loadAllPartitions()
* Metric: file-metadata-all-partitions-load-duration

- Extra function: Add PrintUtils.printTimeNs for PrettyPrint time in FrontEnd

Test:
1. Will add metrics test after the few rounds of code review.
2. Add Unit tests for PrintUtils.printTime() function

Change-Id: I5381f9316df588b2004876c6cd9fb7e674085b10
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
M fe/src/main/java/org/apache/impala/common/PrintUtils.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/util/PrintUtilsTest.java
8 files changed, 156 insertions(+), 26 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/14611/4
-- 
To view, visit http://gerrit.cloudera.org:8080/14611
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5381f9316df588b2004876c6cd9fb7e674085b10
Gerrit-Change-Number: 14611
Gerrit-PatchSet: 4
Gerrit-Owner: Jiawei Wang <ji...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jiawei Wang <ji...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yongzhi Chen <yc...@cloudera.com>