You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2021/11/07 15:50:56 UTC

[GitHub] [incubator-doris] xy720 opened a new pull request #7030: [Feature] Print peak memory use of all backend after query in audit log

xy720 opened a new pull request #7030:
URL: https://github.com/apache/incubator-doris/pull/7030


   ## Proposed changes
   
   ![image](https://user-images.githubusercontent.com/22125576/140652063-ad3a334e-2538-4e80-8813-6746bd3a603c.png)
   
   ## Types of changes
   
   What types of changes does your code introduce to Doris?
   _Put an `x` in the boxes that apply_
   
   - [x] New feature (non-breaking change which adds functionality)
   
   ## Checklist
   
   _Put an `x` in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code._
   
   - [x] I have created an issue on (Fix #6945 ) and described the bug/feature there in detail
   - [x] Compiling and unit tests pass locally with my changes
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] xy720 commented on a change in pull request #7030: [Feature] Print peak memory use of all backend after query in audit log

Posted by GitBox <gi...@apache.org>.
xy720 commented on a change in pull request #7030:
URL: https://github.com/apache/incubator-doris/pull/7030#discussion_r747347048



##########
File path: fe/fe-core/src/main/java/org/apache/doris/qe/ConnectProcessor.java
##########
@@ -123,6 +124,15 @@ private void auditAfterExec(String origStmt, StatementBase parsedStmt, Data.PQue
             .setStmtId(ctx.getStmtId())
             .setQueryId(ctx.queryId() == null ? "NaN" : DebugUtil.printId(ctx.queryId()));
 
+        if (statistics != null && statistics.getNodesStatisticsCount() != 0) {

Review comment:
       Now the calculation will be completed in result_sink in the top level fragment at the be side, but the collected node statistics will still be transmitted to Fe.

##########
File path: fe/fe-core/src/main/java/org/apache/doris/qe/ConnectProcessor.java
##########
@@ -123,6 +124,15 @@ private void auditAfterExec(String origStmt, StatementBase parsedStmt, Data.PQue
             .setStmtId(ctx.getStmtId())
             .setQueryId(ctx.queryId() == null ? "NaN" : DebugUtil.printId(ctx.queryId()));
 
+        if (statistics != null && statistics.getNodesStatisticsCount() != 0) {

Review comment:
       Now the calculation will be completed in result sink in the top level fragment at the be side, but the collected node statistics will still be transmitted to Fe.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on a change in pull request #7030: [Feature] Print peak memory use of all backend after query in audit log

Posted by GitBox <gi...@apache.org>.
morningman commented on a change in pull request #7030:
URL: https://github.com/apache/incubator-doris/pull/7030#discussion_r746202859



##########
File path: be/src/runtime/query_statistics.h
##########
@@ -27,25 +27,49 @@ namespace doris {
 
 class QueryStatisticsRecvr;
 
+class NodeStatistics {
+public:
+    NodeStatistics() : peak_memory_bytes(0) {};
+
+    void add_peak_memory(int64_t peak_memory) { this->peak_memory_bytes += peak_memory; }
+
+    void merge(const NodeStatistics& other);
+
+    void to_pb(PNodeStatistics* nodeStatistics);
+
+    void from_pb(const PNodeStatistics& nodeStatistics);
+
+private:
+    int64_t peak_memory_bytes;
+};
+
 // This is responsible for collecting query statistics, usually it consists of
 // two parts, one is current fragment or plan's statistics, the other is sub fragment
 // or plan's statistics and QueryStatisticsRecvr is responsible for collecting it.
 class QueryStatistics {
 public:
     QueryStatistics() : scan_rows(0), scan_bytes(0), cpu_ms(0), returned_rows(0) {}
 
-    void merge(const QueryStatistics& other) {
-        scan_rows += other.scan_rows;
-        scan_bytes += other.scan_bytes;
-        cpu_ms += other.cpu_ms;
-    }
+    void merge(const QueryStatistics& other);
 
     void add_scan_rows(int64_t scan_rows) { this->scan_rows += scan_rows; }
 
     void add_scan_bytes(int64_t scan_bytes) { this->scan_bytes += scan_bytes; }
 
     void add_cpu_ms(int64_t cpu_ms) { this->cpu_ms += cpu_ms; }
 
+    NodeStatistics* add_nodes_statistics(int64_t node_id) {
+        NodeStatistics* nodeStatistics = nullptr;
+        auto iter = nodes_statistics_map.find(node_id);
+        if (iter == nodes_statistics_map.end()) {
+            nodeStatistics = new NodeStatistics;

Review comment:
       You need to add `~QueryStatistics()` to clear the objects in `nodes_statistics_map`, or there will be memory leak.

##########
File path: be/src/runtime/query_statistics.h
##########
@@ -78,6 +93,10 @@ class QueryStatistics {
     // number rows returned by query.
     // only set once by result sink when closing.
     int64_t returned_rows;
+
+    // The statistics of the query on each backend.
+    typedef std::unordered_map<int64_t, NodeStatistics*> NodeStatisticsMap;
+    NodeStatisticsMap nodes_statistics_map;

Review comment:
       ```suggestion
       NodeStatisticsMap _nodes_statistics_map;
   ```

##########
File path: be/src/runtime/plan_fragment_executor.cpp
##########
@@ -81,6 +81,7 @@ Status PlanFragmentExecutor::prepare(const TExecPlanFragmentParams& request,
 
     RETURN_IF_ERROR(_runtime_state->init_mem_trackers(_query_id));
     _runtime_state->set_be_number(request.backend_num);
+    _runtime_state->set_backend_id(request.backend_id);

Review comment:
       Need to check if `backend_Id` is set in `request`, for forward compatibility. Or set a default value for `backend_id` in proto file.

##########
File path: fe/fe-core/src/main/java/org/apache/doris/qe/ConnectProcessor.java
##########
@@ -123,6 +124,15 @@ private void auditAfterExec(String origStmt, StatementBase parsedStmt, Data.PQue
             .setStmtId(ctx.getStmtId())
             .setQueryId(ctx.queryId() == null ? "NaN" : DebugUtil.printId(ctx.queryId()));
 
+        if (statistics != null && statistics.getNodesStatisticsCount() != 0) {

Review comment:
       I think this can be merge on BE side? Or FE will become the bottleneck

##########
File path: be/src/runtime/query_statistics.cpp
##########
@@ -19,6 +19,54 @@
 
 namespace doris {
 
+void NodeStatistics::merge(const NodeStatistics& other) {
+    peak_memory_bytes += other.peak_memory_bytes;
+}
+
+void NodeStatistics::to_pb(PNodeStatistics* nodeStatistics) {

Review comment:
       ```suggestion
   void NodeStatistics::to_pb(PNodeStatistics* node_statistics) {
   ```

##########
File path: fe/fe-core/src/main/java/org/apache/doris/plugin/AuditEvent.java
##########
@@ -79,6 +83,8 @@
     public String stmt = "";
     @AuditField(value = "CpuTimeMS")
     public long cpuTimeMs = -1;
+    @AuditField(value = "PeakMemory")

Review comment:
       Add to the end, for compatibility.
   And better use `long` instead of string, so that it can be easily analyzed by application

##########
File path: be/src/runtime/query_statistics.h
##########
@@ -55,21 +79,12 @@ class QueryStatistics {
         scan_bytes = 0;
         cpu_ms = 0;
         returned_rows = 0;
+        nodes_statistics_map.clear();

Review comment:
       This clear() will not release the memory allocated for `NodeStatistics` in `nodes_statistics_map`

##########
File path: be/src/runtime/query_statistics.cpp
##########
@@ -19,6 +19,54 @@
 
 namespace doris {
 
+void NodeStatistics::merge(const NodeStatistics& other) {
+    peak_memory_bytes += other.peak_memory_bytes;
+}
+
+void NodeStatistics::to_pb(PNodeStatistics* nodeStatistics) {
+    DCHECK(nodeStatistics != nullptr);
+    nodeStatistics->set_peak_memory_bytes(peak_memory_bytes);
+}
+
+void NodeStatistics::from_pb(const PNodeStatistics& nodeStatistics) {

Review comment:
       ```suggestion
   void NodeStatistics::from_pb(const PNodeStatistics& node_statistics) {
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] xy720 commented on a change in pull request #7030: [Feature] Print peak memory use of all backend after query in audit log

Posted by GitBox <gi...@apache.org>.
xy720 commented on a change in pull request #7030:
URL: https://github.com/apache/incubator-doris/pull/7030#discussion_r747347048



##########
File path: fe/fe-core/src/main/java/org/apache/doris/qe/ConnectProcessor.java
##########
@@ -123,6 +124,15 @@ private void auditAfterExec(String origStmt, StatementBase parsedStmt, Data.PQue
             .setStmtId(ctx.getStmtId())
             .setQueryId(ctx.queryId() == null ? "NaN" : DebugUtil.printId(ctx.queryId()));
 
+        if (statistics != null && statistics.getNodesStatisticsCount() != 0) {

Review comment:
       Now the calculation will be completed in result_sink at the be side with the top level fragment, but the collected node statistics will still be transmitted to Fe.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] github-actions[bot] commented on pull request #7030: [Feature] Print peak memory use of all backend after query in audit log

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #7030:
URL: https://github.com/apache/incubator-doris/pull/7030#issuecomment-974766041


   PR approved by at least one committer and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] xy720 commented on a change in pull request #7030: [Feature] Print peak memory use of all backend after query in audit log

Posted by GitBox <gi...@apache.org>.
xy720 commented on a change in pull request #7030:
URL: https://github.com/apache/incubator-doris/pull/7030#discussion_r747344145



##########
File path: be/src/runtime/query_statistics.h
##########
@@ -78,6 +93,10 @@ class QueryStatistics {
     // number rows returned by query.
     // only set once by result sink when closing.
     int64_t returned_rows;
+
+    // The statistics of the query on each backend.
+    typedef std::unordered_map<int64_t, NodeStatistics*> NodeStatisticsMap;
+    NodeStatisticsMap nodes_statistics_map;

Review comment:
       done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman merged pull request #7030: [Feature] Print peak memory use of all backend after query in audit log

Posted by GitBox <gi...@apache.org>.
morningman merged pull request #7030:
URL: https://github.com/apache/incubator-doris/pull/7030


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] xy720 commented on a change in pull request #7030: [Feature] Print peak memory use of all backend after query in audit log

Posted by GitBox <gi...@apache.org>.
xy720 commented on a change in pull request #7030:
URL: https://github.com/apache/incubator-doris/pull/7030#discussion_r747345408



##########
File path: be/src/runtime/query_statistics.h
##########
@@ -27,25 +27,49 @@ namespace doris {
 
 class QueryStatisticsRecvr;
 
+class NodeStatistics {
+public:
+    NodeStatistics() : peak_memory_bytes(0) {};
+
+    void add_peak_memory(int64_t peak_memory) { this->peak_memory_bytes += peak_memory; }
+
+    void merge(const NodeStatistics& other);
+
+    void to_pb(PNodeStatistics* nodeStatistics);
+
+    void from_pb(const PNodeStatistics& nodeStatistics);
+
+private:
+    int64_t peak_memory_bytes;
+};
+
 // This is responsible for collecting query statistics, usually it consists of
 // two parts, one is current fragment or plan's statistics, the other is sub fragment
 // or plan's statistics and QueryStatisticsRecvr is responsible for collecting it.
 class QueryStatistics {
 public:
     QueryStatistics() : scan_rows(0), scan_bytes(0), cpu_ms(0), returned_rows(0) {}
 
-    void merge(const QueryStatistics& other) {
-        scan_rows += other.scan_rows;
-        scan_bytes += other.scan_bytes;
-        cpu_ms += other.cpu_ms;
-    }
+    void merge(const QueryStatistics& other);
 
     void add_scan_rows(int64_t scan_rows) { this->scan_rows += scan_rows; }
 
     void add_scan_bytes(int64_t scan_bytes) { this->scan_bytes += scan_bytes; }
 
     void add_cpu_ms(int64_t cpu_ms) { this->cpu_ms += cpu_ms; }
 
+    NodeStatistics* add_nodes_statistics(int64_t node_id) {
+        NodeStatistics* nodeStatistics = nullptr;
+        auto iter = nodes_statistics_map.find(node_id);
+        if (iter == nodes_statistics_map.end()) {
+            nodeStatistics = new NodeStatistics;

Review comment:
       done. Now the clearNodeStatistics() cleans up all newly allocated memory.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] github-actions[bot] commented on pull request #7030: [Feature] Print peak memory use of all backend after query in audit log

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #7030:
URL: https://github.com/apache/incubator-doris/pull/7030#issuecomment-966374931






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] github-actions[bot] commented on pull request #7030: [Feature] Print peak memory use of all backend after query in audit log

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #7030:
URL: https://github.com/apache/incubator-doris/pull/7030#issuecomment-974766041


   PR approved by at least one committer and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] xy720 commented on a change in pull request #7030: [Feature] Print peak memory use of all backend after query in audit log

Posted by GitBox <gi...@apache.org>.
xy720 commented on a change in pull request #7030:
URL: https://github.com/apache/incubator-doris/pull/7030#discussion_r747345534



##########
File path: be/src/runtime/query_statistics.h
##########
@@ -55,21 +79,12 @@ class QueryStatistics {
         scan_bytes = 0;
         cpu_ms = 0;
         returned_rows = 0;
+        nodes_statistics_map.clear();

Review comment:
       Now the clearNodeStatistics() cleans up all newly allocated memory.

##########
File path: be/src/runtime/query_statistics.cpp
##########
@@ -19,6 +19,54 @@
 
 namespace doris {
 
+void NodeStatistics::merge(const NodeStatistics& other) {
+    peak_memory_bytes += other.peak_memory_bytes;
+}
+
+void NodeStatistics::to_pb(PNodeStatistics* nodeStatistics) {

Review comment:
       done

##########
File path: fe/fe-core/src/main/java/org/apache/doris/plugin/AuditEvent.java
##########
@@ -79,6 +83,8 @@
     public String stmt = "";
     @AuditField(value = "CpuTimeMS")
     public long cpuTimeMs = -1;
+    @AuditField(value = "PeakMemory")

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] xy720 commented on a change in pull request #7030: [Feature] Print peak memory use of all backend after query in audit log

Posted by GitBox <gi...@apache.org>.
xy720 commented on a change in pull request #7030:
URL: https://github.com/apache/incubator-doris/pull/7030#discussion_r747347048



##########
File path: fe/fe-core/src/main/java/org/apache/doris/qe/ConnectProcessor.java
##########
@@ -123,6 +124,15 @@ private void auditAfterExec(String origStmt, StatementBase parsedStmt, Data.PQue
             .setStmtId(ctx.getStmtId())
             .setQueryId(ctx.queryId() == null ? "NaN" : DebugUtil.printId(ctx.queryId()));
 
+        if (statistics != null && statistics.getNodesStatisticsCount() != 0) {

Review comment:
       Now the calculation will be completed at the be side with the top level fragment, but the collected node statistics will still be transmitted to Fe.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] xy720 commented on a change in pull request #7030: [Feature] Print peak memory use of all backend after query in audit log

Posted by GitBox <gi...@apache.org>.
xy720 commented on a change in pull request #7030:
URL: https://github.com/apache/incubator-doris/pull/7030#discussion_r747349443



##########
File path: be/src/runtime/plan_fragment_executor.cpp
##########
@@ -81,6 +81,7 @@ Status PlanFragmentExecutor::prepare(const TExecPlanFragmentParams& request,
 
     RETURN_IF_ERROR(_runtime_state->init_mem_trackers(_query_id));
     _runtime_state->set_be_number(request.backend_num);
+    _runtime_state->set_backend_id(request.backend_id);

Review comment:
       ok. Now the default value (-1) is set in runtime_state.h. 
   
   If it finds that backend_id equals to -1 during fragment execution, the node statistics of this fragment will not be collected.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org