You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Song Jiacheng (Code Review)" <ge...@cloudera.org> on 2023/07/07 07:14:10 UTC

[kudu-CR] MM: Give a chance to do other OP while server is under memory pressure

Song Jiacheng has uploaded this change for review. ( http://gerrit.cloudera.org:8080/20166


Change subject: MM: Give a chance to do other OP while server is under memory pressure
......................................................................

MM: Give a chance to do other OP while server is under memory pressure

This patch add an argument to give a chance to do other OP while server is under memory pressure.

This mechanism works when the memory usage between memory_pressure_percentage and memory_limit_soft_percentage.  The higher memory usage is, this higher probability to do flush MRS/DMS.

e.g.
memory_pressure_percentage = 60%
memory_pressure_percentage = 80%
The probability of not flushing MRS/DMS is the value of not_flush_memory_prob, which is 0.2 by default, when the memory usage is 60%. As the memory increases, it gradually decreases to 0, when the memory usage is 80%.

Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
---
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
3 files changed, 66 insertions(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/20166/1
-- 
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 1
Gerrit-Owner: Song Jiacheng <so...@thinkingdata.cn>

[kudu-CR] KUDU-3407 not always flush even if under memory pressure.

Posted by "Ashwani Raina (Code Review)" <ge...@cloudera.org>.
Ashwani Raina has posted comments on this change. ( http://gerrit.cloudera.org:8080/20166 )

Change subject: KUDU-3407 not always flush even if under memory pressure.
......................................................................


Patch Set 17:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/20166/14//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20166/14//COMMIT_MSG@18
PS14, Line 18: usage 
nit: usage is


http://gerrit.cloudera.org:8080/#/c/20166/14//COMMIT_MSG@20
PS14, Line 20: this
nit: the


http://gerrit.cloudera.org:8080/#/c/20166/14//COMMIT_MSG@20
PS14, Line 20: higher memory 
nit: higher the memory


http://gerrit.cloudera.org:8080/#/c/20166/14//COMMIT_MSG@20
PS14, Line 20: flush
             : MRS/DMS
nit: MRS/DMS flushes


http://gerrit.cloudera.org:8080/#/c/20166/17//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20166/17//COMMIT_MSG@7
PS17, Line 7: not always flush even if
nit: Avoid unchecked scheduling of flush operations


http://gerrit.cloudera.org:8080/#/c/20166/17//COMMIT_MSG@20
PS17, Line 20: this higher probability to do flush
nit: higher the probability to flush


http://gerrit.cloudera.org:8080/#/c/20166/17//COMMIT_MSG@20
PS17, Line 20: The higher
nit: Higher the


http://gerrit.cloudera.org:8080/#/c/20166/17//COMMIT_MSG@27
PS17, Line 27:  
nit: remove whitespace


http://gerrit.cloudera.org:8080/#/c/20166/17//COMMIT_MSG@27
PS17, Line 27: not_flush_memory_prob
nit: you mean "run_non_memory_ops_prob" ?


http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager.cc
File src/kudu/util/maintenance_manager.cc:

http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager.cc@110
PS17, Line 110: maintainer
nit: maybe you could use "system admin" or "user"?

There has to be some mechanism by which user/admin is made aware that this flag can be used to regulate the scheduling of flush operations when node is under memory pressure.

Maybe you could point to some (metrics) or (recommended action via a warning log), visible to user/admin that essentially denote these conditions and suggest to turn this on or increase the probability.


http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager.cc@111
PS17, Line 111: the performance is decreasing.
nit: "there is a significant degradation in performance."


http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager.cc@730
PS17, Line 730: FlushOrNot
FlushOrNot becomes confusing when return type is bool. It is not clearly evident from the name whether this method returns true or false when flush is a go.

Maybe rename to "ProceedWithFlush" ?


http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager.cc@730
PS17, Line 730: bool MaintenanceManager::FlushOrNot(double* used_memory_percentage) {
nit: Add a comment above that describes the purpose of this method, out parameter and return value.



-- 
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 17
Gerrit-Owner: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Comment-Date: Tue, 12 Sep 2023 07:30:14 +0000
Gerrit-HasComments: Yes

[kudu-CR] MM: Give a chance to do other OP while server is under memory pressure

Posted by "Song Jiacheng (Code Review)" <ge...@cloudera.org>.
Song Jiacheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/20166 )

Change subject: MM: Give a chance to do other OP while server is under memory pressure
......................................................................


Patch Set 2:

Hello,
 this patch is for KUDU-3407, so sorry for submit it too late, could you please help me review this?


-- 
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 2
Gerrit-Owner: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Comment-Date: Fri, 07 Jul 2023 07:46:29 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-3407 not always flush even if under memory pressure.

Posted by "Wang Xixu (Code Review)" <ge...@cloudera.org>.
Wang Xixu has posted comments on this change. ( http://gerrit.cloudera.org:8080/20166 )

Change subject: KUDU-3407 not always flush even if under memory pressure.
......................................................................


Patch Set 17: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 17
Gerrit-Owner: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Comment-Date: Thu, 31 Aug 2023 07:52:22 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-3407 not always flush even if under memory pressure.

Posted by "Song Jiacheng (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Ashwani Raina, Kudu Jenkins, Wang Xixu, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/20166

to look at the new patch set (#15).

Change subject: KUDU-3407 not always flush even if under memory pressure.
......................................................................

KUDU-3407 not always flush even if under memory pressure.

In some clusters, the memory usages of tservers might be 60% ~ 80%
for a long time. During this time the maintenance manager will not
run any operation other than wal gc and MRS/DRS flushes, which will
make the performance of tservers worse and worse and eventually break
due to OOM.

This patch add an argument to give a chance to do other ops while
server is under memory pressure.

This mechanism works when the memory usage between
memory_pressure_percentage and memory_limit_soft_percentage.
The higher memory usage is, this higher probability to do flush
MRS/DMS.

e.g.
memory_pressure_percentage = 60%
memory_limit_soft_percentage = 80%
The probability of not flushing MRS/DMS is the value of
not_flush_memory_prob, which is 0.2 by default, when the memory
usage is 60%.
As the memory increases, it gradually decreases to 0, when the
memory usage is 80%.

Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
---
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
3 files changed, 210 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/20166/15
-- 
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 15
Gerrit-Owner: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>

[kudu-CR] eKUDU-3407 Avoid unchecked scheduling of flush operations.

Posted by "Song Jiacheng (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Ashwani Raina, Kudu Jenkins, Abhishek Chennaka, Wang Xixu, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/20166

to look at the new patch set (#35).

Change subject: eKUDU-3407 Avoid unchecked scheduling of flush operations.
......................................................................

eKUDU-3407 Avoid unchecked scheduling of flush operations.

In some clusters, the memory usages of tservers might be 60% ~ 80%
for a long time. During this time the maintenance manager will not
run any operation other than wal gc and MRS/DRS flushes, which will
make the performance of tservers worse and worse and eventually break
due to OOM.

This patch add an argument to give a chance to do other operations
while server is under memory pressure.

This mechanism works when the memory usage is between
memory_pressure_percentage and memory_limit_soft_percentage.
Higher the memory usage is, higher the probability to flush
MRS/DMS.

e.g.
memory_pressure_percentage = 60%
memory_limit_soft_percentage = 80%
The probability of not flushing MRS/DMS is the value of
run_non_memory_ops_prob. As the memory increases, it gradually
decreases to 0, when thememory usage is 80%.

Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
---
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
3 files changed, 200 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/20166/35
-- 
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 35
Gerrit-Owner: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>

[kudu-CR] KUDU-3407 Avoid unchecked scheduling of flush operations.

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/20166 )

Change subject: KUDU-3407 Avoid unchecked scheduling of flush operations.
......................................................................


Patch Set 36: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20166/34/src/kudu/util/maintenance_manager-test.cc
File src/kudu/util/maintenance_manager-test.cc:

http://gerrit.cloudera.org:8080/#/c/20166/34/src/kudu/util/maintenance_manager-test.cc@1026
PS34, Line 1026:   // take time, so the other_ops_running_times might be greater than expected.
               :   const int64_t memory_op_running_times = op2.run_count();
> I ran it with the parameters and it turns out that AssertEventually will ti
All right, that sounds good to me.  Thank you for verifying!



-- 
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 36
Gerrit-Owner: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Comment-Date: Mon, 20 Nov 2023 19:44:26 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-3407 Avoid unchecked scheduling of flush operations.

Posted by "Song Jiacheng (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Ashwani Raina, Kudu Jenkins, Abhishek Chennaka, Wang Xixu, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/20166

to look at the new patch set (#21).

Change subject: KUDU-3407 Avoid unchecked scheduling of flush operations.
......................................................................

KUDU-3407 Avoid unchecked scheduling of flush operations.

In some clusters, the memory usages of tservers might be 60% ~ 80%
for a long time. During this time the maintenance manager will not
run any operations other than wal gc and MRS/DRS flushes, which will
make the performance of tservers worse and worse and eventually break
due to OOM.

This patch add an argument to give a chance to do other ops while
server is under memory pressure.

This mechanism works when the memory usage is between
memory_pressure_percentage and memory_limit_soft_percentage.
Higher the memory usage is, higher the probability to flush
MRS/DMS.

e.g.
memory_pressure_percentage = 60%
memory_limit_soft_percentage = 80%
The probability of not flushing MRS/DMS is the value of
run_non_memory_ops_prob, which is 0.2 by default, when the memory
usage is 60%.
As the memory increases, it gradually decreases to 0, when the
memory usage is 80%.

Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
---
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
3 files changed, 222 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/20166/21
-- 
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 21
Gerrit-Owner: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>

[kudu-CR] KUDU-3407 not always flush even if under memory pressure.

Posted by "Song Jiacheng (Code Review)" <ge...@cloudera.org>.
Song Jiacheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/20166 )

Change subject: KUDU-3407 not always flush even if under memory pressure.
......................................................................


Patch Set 14:

(7 comments)

I added a test to show which operation the maintenance manager would like to choose in various policies and workloads, which is involved in KUDU-3488. 
Using the test, we could test all the operations and flags, including the newly added flag in this patch.

http://gerrit.cloudera.org:8080/#/c/20166/9/src/kudu/util/maintenance_manager-test.cc
File src/kudu/util/maintenance_manager-test.cc:

http://gerrit.cloudera.org:8080/#/c/20166/9/src/kudu/util/maintenance_manager-test.cc@336
PS9, Line 336: 
> The default value is 0, why set it again?
Done


http://gerrit.cloudera.org:8080/#/c/20166/9/src/kudu/util/maintenance_manager.h
File src/kudu/util/maintenance_manager.h:

http://gerrit.cloudera.org:8080/#/c/20166/9/src/kudu/util/maintenance_manager.h@382
PS9, Line 382:   /// @param [out] used_memory_percentage
> nit: used_memory_percentage
Done


http://gerrit.cloudera.org:8080/#/c/20166/9/src/kudu/util/maintenance_manager.cc
File src/kudu/util/maintenance_manager.cc:

http://gerrit.cloudera.org:8080/#/c/20166/9/src/kudu/util/maintenance_manager.cc@104
PS9, Line 104: run_non_memory_ops_pr
> It is easy to be understand to use a positive flag. How about: run_non_memo
Done


http://gerrit.cloudera.org:8080/#/c/20166/9/src/kudu/util/maintenance_manager.cc@107
PS9, Line 107: mory operations "
> non-memory operations waiting to be ran.
Done


http://gerrit.cloudera.org:8080/#/c/20166/9/src/kudu/util/maintenance_manager.cc@109
PS9, Line 109: ead of flushing ops. This might be needed to turn "
             :               "on if maintainer found that the tablet server is under memory "
             :               "pressure for a long time and 
> I think your purpose is to make a balance between running memory and non-me
It is computed by the memory usage. Maintainer may set a certain value and the exact probability is calculated by the value and the memory usage. I think the only convincible factor is memory usage, other factors, like perf scores of operations, are not really reliable.


http://gerrit.cloudera.org:8080/#/c/20166/9/src/kudu/util/maintenance_manager.cc@111
PS9, Line 111: pressure for a 
> Reading or wring performance?
Both, I think. Row set height is getting higher, redo files are getting larger, so both write performance and read performance might be worse.


http://gerrit.cloudera.org:8080/#/c/20166/9/src/kudu/util/maintenance_manager.cc@112
PS9, Line 112: TAG_FLAG(run_non_memory_ops_prob, experimental
> It should also add 'runtime' flag?
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 14
Gerrit-Owner: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Comment-Date: Thu, 24 Aug 2023 09:05:17 +0000
Gerrit-HasComments: Yes

[kudu-CR] MM: Give a chance to do other OP while server is under memory pressure

Posted by "Song Jiacheng (Code Review)" <ge...@cloudera.org>.
Song Jiacheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/20166 )

Change subject: MM: Give a chance to do other OP while server is under memory pressure
......................................................................


Patch Set 3:

> Patch Set 3:
> 
> (17 comments)
> 
> Thank you very much for the patch!
> 
> I'm curious: are there any particular workloads that benefit a lot from the updated behavior of the 'next best op' selection?

About the test, there is another test about the whole maintenance manager, testing which operations will be run in various policies, which you mentioned in JIRA. But it's based on this patch so I post this patch first.


-- 
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 3
Gerrit-Owner: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Comment-Date: Wed, 12 Jul 2023 03:30:42 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-3407: Give a chance to do other maintenance operations while server is under memory pressure.

Posted by "Song Jiacheng (Code Review)" <ge...@cloudera.org>.
Song Jiacheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/20166 )

Change subject: KUDU-3407: Give a chance to do other maintenance operations while server is under memory pressure.
......................................................................


Patch Set 5:

(17 comments)

> Patch Set 5: Verified-1
> 
> Build Failed 
> 
> http://jenkins.kudu.apache.org/job/kudu-gerrit/28183/ : FAILURE

This unit test TestPrioritizeLogRetentionUnderMemoryPressure works well in my PC.

http://gerrit.cloudera.org:8080/#/c/20166/3//COMMIT_MSG
Commit Message:

PS3: 
> Please follow this generic git guideline to properly format the commit mess
Changed the tittle and still not sure it's ok.


http://gerrit.cloudera.org:8080/#/c/20166/3//COMMIT_MSG@7
PS3, Line 7: -3407: Give a chance to do other maintenance operations while serv
> Do you have any measurements/results that show how much improvement this pr
This patch has been working in our clusters(more than 100 clusters) for over half a year. It does make some influence since the performance will not be too low even if the cluster stay in high memory pressure for very long. According to web page and log, tablet server is scheduling some other operations like CompactRowsetOp, MajorCompactOp, which have big impact on the performance of tablet server.


http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager-test.cc
File src/kudu/util/maintenance_manager-test.cc:

http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager-test.cc@919
PS3, Line 919:  FLAGS_not_flush_memory_prob = 1;
> This test scenario covers just one corner case.  It would be great to add m
About the test, there is another test about the whole maintenance manager, testing which operations will be run in various policies, which you mentioned in JIRA. But it's based on this patch so I post this patch first.


http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.h
File src/kudu/util/maintenance_manager.h:

http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.h@379
PS3, Line 379: ressure 
> memory pressure
Done


http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.h@379
PS3, Line 379: ends o
> the server
Done


http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.h@379
PS3, Line 379:  r
> run
Done


http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.h@380
PS3, Line 380: he flag no
> Please describe the parameter and the return value.
Done


http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.cc
File src/kudu/util/maintenance_manager.cc:

http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.cc@104
PS3, Line 104: 0,
> What sort of testing has been done to make sure setting the default value t
That's the default value in our clusters and it works well,  but still I changed it to 0.


http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.cc@104
PS3, Line 104: DEFINE_double(not_flush_memory_prob, 0,
> Consider adding a validator for this flag: IIUC, the only allowed values ar
Added validator for not_flush_memory_prob and also data_gc_prioritization_prob.


http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.cc@105
PS3, Line 105: "
> memory pressure
Done


http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.cc@106
PS3, Line 106:  server 
> memory pressure
Done


http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.cc@107
PS3, Line 107: d there are oth
> to run
Done


http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.cc@519
PS3, Line 519: 
> even if under memory pressure
Done


http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.cc@519
PS3, Line 519: 
> What are 'performance ops'?
Done


http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.cc@522
PS3, Line 522:   // anchors the most WALs (the op should also free memory).
             :   //
> Would it would be easier to comprehend if calling memory_pressure_func_() f
That would be much better.


http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.cc@715
PS3, Line 715: void Main
> nit: the indent here should be 4 spaces per Kudu's C++ code style
Done


http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.cc@716
PS3, Line 716:   running
> ditto
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 5
Gerrit-Owner: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Wed, 12 Jul 2023 09:04:37 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-3407 not always flush even if under memory pressure.

Posted by "Song Jiacheng (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Ashwani Raina, Kudu Jenkins, Wang Xixu, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/20166

to look at the new patch set (#17).

Change subject: KUDU-3407 not always flush even if under memory pressure.
......................................................................

KUDU-3407 not always flush even if under memory pressure.

In some clusters, the memory usages of tservers might be 60% ~ 80%
for a long time. During this time the maintenance manager will not
run any operation other than wal gc and MRS/DRS flushes, which will
make the performance of tservers worse and worse and eventually break
due to OOM.

This patch add an argument to give a chance to do other ops while
server is under memory pressure.

This mechanism works when the memory usage is between
memory_pressure_percentage and memory_limit_soft_percentage.
The higher memory usage is, this higher probability to do flush
MRS/DMS.

e.g.
memory_pressure_percentage = 60%
memory_limit_soft_percentage = 80%
The probability of not flushing MRS/DMS is the value of
not_flush_memory_prob. As the memory increases, it gradually 
decreases to 0, when the memory usage is 80%.

Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
---
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
3 files changed, 210 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/20166/17
-- 
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 17
Gerrit-Owner: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>

[kudu-CR] KUDU-3407 Avoid unchecked scheduling of flush operations.

Posted by "Song Jiacheng (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Ashwani Raina, Kudu Jenkins, Abhishek Chennaka, Wang Xixu, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/20166

to look at the new patch set (#22).

Change subject: KUDU-3407 Avoid unchecked scheduling of flush operations.
......................................................................

KUDU-3407 Avoid unchecked scheduling of flush operations.

In some clusters, the memory usages of tservers might be 60% ~ 80%
for a long time. During this time the maintenance manager will not
run any operation other than wal gc and MRS/DRS flushes, which will
make the performance of tservers worse and worse and eventually break
due to OOM.

This patch add an argument to give a chance to do other ops while
server is under memory pressure.

This mechanism works when the memory usage between
memory_pressure_percentage and memory_limit_soft_percentage.
Higher the memory usage is, higher the probability to flush
MRS/DMS.

e.g.
memory_pressure_percentage = 60%
memory_limit_soft_percentage = 80%
The probability of not flushing MRS/DMS is the value of
run_non_memory_ops_prob, which is 0.2 by default, when the memory
usage is 60%.
As the memory increases, it gradually decreases to 0, when the
memory usage is 80%.

Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
---
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
3 files changed, 222 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/20166/22
-- 
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 22
Gerrit-Owner: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>

[kudu-CR] KUDU-3407 Avoid unchecked scheduling of flush operations.

Posted by "Song Jiacheng (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Ashwani Raina, Kudu Jenkins, Abhishek Chennaka, Wang Xixu, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/20166

to look at the new patch set (#19).

Change subject: KUDU-3407 Avoid unchecked scheduling of flush operations.
......................................................................

KUDU-3407 Avoid unchecked scheduling of flush operations.

In some clusters, the memory usages of tservers might be 60% ~ 80%
for a long time. During this time the maintenance manager will not
run any operation other than wal gc and MRS/DRS flushes, which will
make the performance of tservers worse and worse and eventually break
due to OOM.

This patch add an argument to give a chance to do other ops while
server is under memory pressure.

This mechanism works when the memory usage between
memory_pressure_percentage and memory_limit_soft_percentage.
Higher the memory usage is, higher the probability to flush
MRS/DMS.

e.g.
memory_pressure_percentage = 60%
memory_limit_soft_percentage = 80%
The probability of not flushing MRS/DMS is the value of
run_non_memory_ops_prob, which is 0.2 by default, when the memory
usage is 60%.
As the memory increases, it gradually decreases to 0, when the
memory usage is 80%.

Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
---
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
3 files changed, 222 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/20166/19
-- 
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 19
Gerrit-Owner: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>

[kudu-CR] KUDU-3407 Avoid unchecked scheduling of flush operations.

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/20166 )

Change subject: KUDU-3407 Avoid unchecked scheduling of flush operations.
......................................................................


Patch Set 31:

> Build Failed
 > 
 > http://jenkins.kudu.apache.org/job/kudu-gerrit/28625/ : FAILURE

LINT isn't yet happy, but the rest seem to be unrelated test failures (at least a first glance).  As for the LINT, please address the following:

04:56:46 /home/jenkins-slave/workspace/kudu-master/3/src/kudu/util/maintenance_manager-test.cc:354:  Lines should very rarely be longer than 100 characters  [whitespace/line_length] [4]
04:56:46 Total errors found: 1


-- 
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 31
Gerrit-Owner: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Comment-Date: Thu, 19 Oct 2023 17:50:04 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-3407 Avoid unchecked scheduling of flush operations.

Posted by "Song Jiacheng (Code Review)" <ge...@cloudera.org>.
Song Jiacheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/20166 )

Change subject: KUDU-3407 Avoid unchecked scheduling of flush operations.
......................................................................


Patch Set 31:

(16 comments)

Thank you for the review and your attention for such a long time.

http://gerrit.cloudera.org:8080/#/c/20166/29//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20166/29//COMMIT_MSG@11
PS29, Line 11: wal
> nit: WAL
Done


http://gerrit.cloudera.org:8080/#/c/20166/29//COMMIT_MSG@12
PS29, Line 12: eventually break
             : due to OOM
> I'm curious what do you think was the actual reason behind that OOM conditi
Yes, normally tablet servers reject write requests if their memory usage reach 80%, but sometimes the usage might be higher than 100% and got killed by system(The system memory runs out).
I think the reason is that the running memory is getting higher because of the worse performance. I could always find that the running memory would be high if the cluster is lack of CPU or I/O. I think their root cause are the same--read/write operation takes longer.


http://gerrit.cloudera.org:8080/#/c/20166/29//COMMIT_MSG@16
PS29, Line 16: server is under memory pressure
> By your experience with applying this functionality in a real cluster, woul
Not really actually, this patch has been applied in over 100 clusters for about a year and the value run_non_memory_ops_prob is set to 0.2, I did not find any OOM because of this.
The probability of running compactions is decreasing as the memory usage getting higher. So actually the behaviors while the memory usage is higher than 80% are the same. And with better performance, read/write operation will not take too long even if under memory pressure.


http://gerrit.cloudera.org:8080/#/c/20166/29/src/kudu/util/maintenance_manager-test.cc
File src/kudu/util/maintenance_manager-test.cc:

http://gerrit.cloudera.org:8080/#/c/20166/29/src/kudu/util/maintenance_manager-test.cc@314
PS29, Line 314: 
> nit: drop 'that'
Done


http://gerrit.cloudera.org:8080/#/c/20166/29/src/kudu/util/maintenance_manager-test.cc@316
PS29, Line 316: y times the operat
> nit: usually it's called queue time or something, meaning how much time an 
Done


http://gerrit.cloudera.org:8080/#/c/20166/29/src/kudu/util/maintenance_manager-test.cc@318
PS29, Line 318: e oper
> nit: maybe, name this 'run_count_'?
Done.


http://gerrit.cloudera.org:8080/#/c/20166/29/src/kudu/util/maintenance_manager-test.cc@389
PS29, Line 389: 
> I guess it makes sense to rename this field since its semantics is differen
Done


http://gerrit.cloudera.org:8080/#/c/20166/29/src/kudu/util/maintenance_manager-test.cc@982
PS29, Line 982: tMaintenanceOp op1("perf_op", MaintenanceOp::HIGH_IO_USAGE)
> What is the probability of actually choosing a memory flushing op here?  In
This test does not make too much sense for now. The test below is actually covering it, so delete it.


http://gerrit.cloudera.org:8080/#/c/20166/29/src/kudu/util/maintenance_manager-test.cc@985
PS29, Line 985: p_time(MonoDelta::FromMilliseconds(5));
> nit for here and elsewhere for ASSERT_EQ(): the expected value comes first
Done


http://gerrit.cloudera.org:8080/#/c/20166/29/src/kudu/util/maintenance_manager-test.cc@998
PS29, Line 998: ister_self(true);
> Even with probabilistic behavior, isn't it always possible at least to chec
Done


http://gerrit.cloudera.org:8080/#/c/20166/29/src/kudu/util/maintenance_manager-test.cc@999
PS29, Line 999: 
> Given this test scenario runs way over 3 seconds, please add  SKIP_IF_SLOW_
Done


http://gerrit.cloudera.org:8080/#/c/20166/29/src/kudu/util/maintenance_manager-test.cc@1006
PS29, Line 1006:   FLAGS_enable_maintenance_manager = true;
               :   // Wait for the memory_op to run over 1000 times and then check the running times
               :   // of other operations.
> Why to output this if all these variables aren't even changing at this poin
I thought the test is provided to the users to set different parameters and run multi times, done.


http://gerrit.cloudera.org:8080/#/c/20166/29/src/kudu/util/maintenance_manager-test.cc@1035
PS29, Line 1035: O) << Substitute("
> What is 'MaintenanceMgr num'?
It is the thread name of maintenance manager. I will make it clear.


http://gerrit.cloudera.org:8080/#/c/20166/29/src/kudu/util/maintenance_manager-test.cc@1036
PS29, Line 1036:                 
> Wrap this with NO_FATALS() ?
Done


http://gerrit.cloudera.org:8080/#/c/20166/29/src/kudu/util/maintenance_manager-test.cc@1044
PS29, Line 1044: 
> Why to wait so long?  Would be great to comment on this.  Alternatively, ma
Done


http://gerrit.cloudera.org:8080/#/c/20166/29/src/kudu/util/maintenance_manager-test.cc@1046
PS29, Line 1046: 
               : 
> Is there a way to check for the number of operations still running or opera
50ms is enough, which is also accepted, I think..



-- 
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 31
Gerrit-Owner: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Comment-Date: Fri, 20 Oct 2023 06:28:59 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-3407 Avoid unchecked scheduling of flush operations.

Posted by "Song Jiacheng (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Ashwani Raina, Kudu Jenkins, Abhishek Chennaka, Wang Xixu, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/20166

to look at the new patch set (#33).

Change subject: KUDU-3407 Avoid unchecked scheduling of flush operations.
......................................................................

KUDU-3407 Avoid unchecked scheduling of flush operations.

In some clusters, the memory usages of tservers might be 60% ~ 80%
for a long time. During this time the maintenance manager will not
run any operation other than wal gc and MRS/DRS flushes, which will
make the performance of tservers worse and worse and eventually break
due to OOM.

This patch add an argument to give a chance to do other operations
while server is under memory pressure.

This mechanism works when the memory usage is between
memory_pressure_percentage and memory_limit_soft_percentage.
Higher the memory usage is, higher the probability to flush
MRS/DMS.

e.g.
memory_pressure_percentage = 60%
memory_limit_soft_percentage = 80%
The probability of not flushing MRS/DMS is the value of
run_non_memory_ops_prob. As the memory increases, it gradually
decreases to 0, when the memory usage is 80%.

Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
---
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
3 files changed, 200 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/20166/33
-- 
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 33
Gerrit-Owner: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>

[kudu-CR] KUDU-3407 not always flush even if under memory pressure.

Posted by "Song Jiacheng (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Ashwani Raina, Kudu Jenkins, Wang Xixu, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/20166

to look at the new patch set (#14).

Change subject: KUDU-3407 not always flush even if under memory pressure.
......................................................................

KUDU-3407 not always flush even if under memory pressure.

In some clusters, the memory usages of tservers might be 60% ~ 80%
for a long time. During this time the maintenance manager will not
run any operation other than WAL gc and MRS/DRS flushes, which will
make the performance of tservers worse and worse and eventually break
due to OOM.

This patch add an argument to give a chance to do other ops while
server is under memory pressure.

This mechanism works when the memory usage between
memory_pressure_percentage and memory_limit_soft_percentage.
The higher memory usage is, this higher probability to do flush
MRS/DMS.

e.g.
memory_pressure_percentage = 60%
memory_limit_soft_percentage = 80%
The probability of not flushing MRS/DMS is the value of
not_flush_memory_prob, which is 0.2 by default, when the memory
usage is 60%.
As the memory increases, it gradually decreases to 0, when the
memory usage is 80%.

Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
---
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
3 files changed, 213 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/20166/14
-- 
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 14
Gerrit-Owner: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>

[kudu-CR] KUDU-3407: Give a chance to do other maintenance operations while server is under memory pressure.

Posted by "Song Jiacheng (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/20166

to look at the new patch set (#6).

Change subject: KUDU-3407: Give a chance to do other maintenance operations while server is under memory pressure.
......................................................................

KUDU-3407: Give a chance to do other maintenance operations while server is under memory pressure.

This patch add an argument to give a chance to do other ops while server is under memory pressure.

This mechanism works when the memory usage between memory_pressure_percentage and memory_limit_soft_percentage.
The higher memory usage is, this higher probability to do flush MRS/DMS.

e.g.
memory_pressure_percentage = 60%
memory_limit_soft_percentage = 80%
The probability of not flushing MRS/DMS is the value of not_flush_memory_prob, which is 0.2 by default, when the memory usage is 60%.
As the memory increases, it gradually decreases to 0, when the memory usage is 80%.

Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
---
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
3 files changed, 87 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/20166/6
-- 
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 6
Gerrit-Owner: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] KUDU-3407: Give a chance to do other maintenance operations while server is under memory pressure.

Posted by "Song Jiacheng (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/20166

to look at the new patch set (#7).

Change subject: KUDU-3407: Give a chance to do other maintenance operations while server is under memory pressure.
......................................................................

KUDU-3407: Give a chance to do other maintenance operations while server is under memory pressure.

This patch add an argument to give a chance to do other ops while server is under memory pressure.

This mechanism works when the memory usage between memory_pressure_percentage and memory_limit_soft_percentage.
The higher memory usage is, this higher probability to do flush MRS/DMS.

e.g.
memory_pressure_percentage = 60%
memory_limit_soft_percentage = 80%
The probability of not flushing MRS/DMS is the value of not_flush_memory_prob, which is 0.2 by default, when the memory usage is 60%.
As the memory increases, it gradually decreases to 0, when the memory usage is 80%.

Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
---
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
3 files changed, 101 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/20166/7
-- 
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 7
Gerrit-Owner: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] MM: Give a chance to do other OP while server is under memory pressure

Posted by "Song Jiacheng (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/20166

to look at the new patch set (#2).

Change subject: MM: Give a chance to do other OP while server is under memory pressure
......................................................................

MM: Give a chance to do other OP while server is under memory pressure

This patch add an argument to give a chance to do other OP while server is under memory pressure.

This mechanism works when the memory usage between memory_pressure_percentage and memory_limit_soft_percentage.  The higher memory usage is, this higher probability to do flush MRS/DMS.

e.g.
memory_pressure_percentage = 60%
memory_limit_soft_percentage = 80%
The probability of not flushing MRS/DMS is the value of not_flush_memory_prob, which is 0.2 by default, when the memory usage is 60%. As the memory increases, it gradually decreases to 0, when the memory usage is 80%.

Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
---
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
3 files changed, 66 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/20166/2
-- 
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 2
Gerrit-Owner: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] KUDU-3407 Avoid unchecked scheduling of flush operations.

Posted by "Song Jiacheng (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Ashwani Raina, Kudu Jenkins, Abhishek Chennaka, Wang Xixu, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/20166

to look at the new patch set (#18).

Change subject: KUDU-3407 Avoid unchecked scheduling of flush operations.
......................................................................

KUDU-3407 Avoid unchecked scheduling of flush operations.

In some clusters, the memory usages of tservers might be 60% ~ 80%
for a long time. During this time the maintenance manager will not
run any operation other than wal gc and MRS/DRS flushes, which will
make the performance of tservers worse and worse and eventually break
due to OOM.

This patch add an argument to give a chance to do other ops while
server is under memory pressure.

This mechanism works when the memory usage between
memory_pressure_percentage and memory_limit_soft_percentage.
Higher the memory usage is, higher the probability to flush
MRS/DMS.

e.g.
memory_pressure_percentage = 60%
memory_limit_soft_percentage = 80%
The probability of not flushing MRS/DMS is the value of
run_non_memory_ops_prob, which is 0.2 by default, when the memory
usage is 60%.
As the memory increases, it gradually decreases to 0, when the
memory usage is 80%.

Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
---
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
3 files changed, 221 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/20166/18
-- 
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 18
Gerrit-Owner: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>

[kudu-CR] KUDU-3407 Avoid unchecked scheduling of flush operations.

Posted by "Song Jiacheng (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Ashwani Raina, Kudu Jenkins, Abhishek Chennaka, Wang Xixu, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/20166

to look at the new patch set (#34).

Change subject: KUDU-3407 Avoid unchecked scheduling of flush operations.
......................................................................

KUDU-3407 Avoid unchecked scheduling of flush operations.

In some clusters, the memory usages of tservers might be 60% ~ 80%
for a long time. During this time the maintenance manager will not
run any operation other than wal gc and MRS/DRS flushes, which will
make the performance of tservers worse and worse and eventually break
due to OOM.

This patch adds an argument to give a chance to do other operations
while server is under memory pressure.

This mechanism works when the memory usage is between
memory_pressure_percentage and memory_limit_soft_percentage.
Higher the memory usage is, higher the probability to flush
MRS/DMS.

e.g.
memory_pressure_percentage = 60%
memory_limit_soft_percentage = 80%
The probability of not flushing MRS/DMS is the value of
run_non_memory_ops_prob. As the memory increases, it gradually
decreases to 0, when the memory usage is 80%.

Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
---
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
3 files changed, 200 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/20166/34
-- 
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 34
Gerrit-Owner: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>

[kudu-CR] KUDU-3407: Give a chance to do other maintenance operations while server is under memory pressure.

Posted by "Ashwani Raina (Code Review)" <ge...@cloudera.org>.
Ashwani Raina has posted comments on this change. ( http://gerrit.cloudera.org:8080/20166 )

Change subject: KUDU-3407: Give a chance to do other maintenance operations while server is under memory pressure.
......................................................................


Patch Set 7:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/20166/7/src/kudu/util/maintenance_manager-test.cc
File src/kudu/util/maintenance_manager-test.cc:

http://gerrit.cloudera.org:8080/#/c/20166/7/src/kudu/util/maintenance_manager-test.cc@922
PS7, Line 922: TEST_F(MaintenanceManagerTest, TestNotFlushMemory) {
Did you run any practical workloads to see other maintenance ops in action when memory usage is high?

I want to understand the impact of not choosing flush ops when memory usage is high for long time.

Also, what factors are to be kept in mind when deciding to increase/decrease a probability of running flush ops?


http://gerrit.cloudera.org:8080/#/c/20166/7/src/kudu/util/maintenance_manager.cc
File src/kudu/util/maintenance_manager.cc:

http://gerrit.cloudera.org:8080/#/c/20166/7/src/kudu/util/maintenance_manager.cc@104
PS7, Line 104: 0
If this is going to be 0(i.e. DRS/MRS flush ops will be scheduled as per priority), you might want to think about providing some to hint to user that it is time to avoid flushing DRS and MRS when under memory pressure.



-- 
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 7
Gerrit-Owner: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Comment-Date: Fri, 14 Jul 2023 13:51:53 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-3407 Avoid unchecked scheduling of flush operations.

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/20166 )

Change subject: KUDU-3407 Avoid unchecked scheduling of flush operations.
......................................................................

KUDU-3407 Avoid unchecked scheduling of flush operations.

In some clusters, the memory usages of tservers might be 60% ~ 80%
for a long time. During this time the maintenance manager will not
run any operation other than wal gc and MRS/DRS flushes, which will
make the performance of tservers worse and worse and eventually break
due to OOM.

This patch add an argument to give a chance to do other operations
while server is under memory pressure.

This mechanism works when the memory usage is between
memory_pressure_percentage and memory_limit_soft_percentage.
Higher the memory usage is, higher the probability to flush
MRS/DMS.

e.g.
memory_pressure_percentage = 60%
memory_limit_soft_percentage = 80%
The probability of not flushing MRS/DMS is the value of
run_non_memory_ops_prob. As the memory increases, it gradually
decreases to 0, when thememory usage is 80%.

Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Reviewed-on: http://gerrit.cloudera.org:8080/20166
Tested-by: Kudu Jenkins
Reviewed-by: Alexey Serbin <al...@apache.org>
---
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
3 files changed, 200 insertions(+), 10 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Alexey Serbin: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 37
Gerrit-Owner: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>

[kudu-CR] KUDU-3407 Avoid unchecked scheduling of flush operations.

Posted by "Song Jiacheng (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Ashwani Raina, Kudu Jenkins, Abhishek Chennaka, Wang Xixu, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/20166

to look at the new patch set (#25).

Change subject: KUDU-3407 Avoid unchecked scheduling of flush operations.
......................................................................

KUDU-3407 Avoid unchecked scheduling of flush operations.

In some clusters, the memory usages of tservers might be 60% ~ 80%
for a long time. During this time the maintenance manager will not
run any operation other than wal gc and MRS/DRS flushes, which will
make the performance of tservers worse and worse and eventually break
due to OOM.

This patch add an argument to give a chance to do other ops while
server is under memory pressure.

This mechanism works when the memory usage between
memory_pressure_percentage and memory_limit_soft_percentage.
Higher the memory usage is, higher the probability to flush
MRS/DMS.

e.g.
memory_pressure_percentage = 60%
memory_limit_soft_percentage = 80%
The probability of not flushing MRS/DMS is the value of
run_non_memory_ops_prob, which is 0.2 by default, when the memory
usage is 60%.
As the memory increases, it gradually decreases to 0, when the
memory usage is 80%.

Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
---
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
3 files changed, 224 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/20166/25
-- 
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 25
Gerrit-Owner: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>

[kudu-CR] KUDU-3407: Give a chance to do other maintenance operations while server is under memory pressure.

Posted by "Wang Xixu (Code Review)" <ge...@cloudera.org>.
Wang Xixu has posted comments on this change. ( http://gerrit.cloudera.org:8080/20166 )

Change subject: KUDU-3407: Give a chance to do other maintenance operations while server is under memory pressure.
......................................................................


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20166/7/src/kudu/util/maintenance_manager.cc
File src/kudu/util/maintenance_manager.cc:

http://gerrit.cloudera.org:8080/#/c/20166/7/src/kudu/util/maintenance_manager.cc@104
PS7, Line 104: DEFINE_double(not_flush_memory_prob, 0,
You solution only distinguishes the memory related ops and non-memory related ops. How about distributing every op a priority, so you can adjust the priority of every type of ops to decide which op should be executed firstly?



-- 
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 7
Gerrit-Owner: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Comment-Date: Thu, 13 Jul 2023 08:50:42 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-3407 not always flush even if under memory pressure.

Posted by "Song Jiacheng (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Ashwani Raina, Kudu Jenkins, Wang Xixu, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/20166

to look at the new patch set (#10).

Change subject: KUDU-3407 not always flush even if under memory pressure.
......................................................................

KUDU-3407 not always flush even if under memory pressure.

In some clusters, the memory usages of tservers might be 60% ~ 80%
for a long time. During this time the maintenance manager will not
run any operation other than wal gc and MRS/DRS flushes, which will
make the performance of tservers worse and worse and eventually break
due to OOM.

This patch add an argument to give a chance to do other ops while
server is under memory pressure.

This mechanism works when the memory usage between
memory_pressure_percentage and memory_limit_soft_percentage.
The higher memory usage is, this higher probability to do flush
MRS/DMS.

e.g.
memory_pressure_percentage = 60%
memory_limit_soft_percentage = 80%
The probability of not flushing MRS/DMS is the value of
not_flush_memory_prob, which is 0.2 by default, when the memory
usage is 60%.
As the memory increases, it gradually decreases to 0, when the
memory usage is 80%.

Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
---
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
3 files changed, 207 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/20166/10
-- 
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 10
Gerrit-Owner: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>

[kudu-CR] KUDU-3407 not always flush even if under memory pressure.

Posted by "Song Jiacheng (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Ashwani Raina, Kudu Jenkins, Wang Xixu, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/20166

to look at the new patch set (#12).

Change subject: KUDU-3407 not always flush even if under memory pressure.
......................................................................

KUDU-3407 not always flush even if under memory pressure.

In some clusters, the memory usages of tservers might be 60% ~ 80%
for a long time. During this time the maintenance manager will not
run any operation other than wal gc and MRS/DRS flushes, which will
make the performance of tservers worse and worse and eventually break
due to OOM.

This patch add an argument to give a chance to do other ops while
server is under memory pressure.

This mechanism works when the memory usage between
memory_pressure_percentage and memory_limit_soft_percentage.
The higher memory usage is, this higher probability to do flush
MRS/DMS.

e.g.
memory_pressure_percentage = 60%
memory_limit_soft_percentage = 80%
The probability of not flushing MRS/DMS is the value of
not_flush_memory_prob, which is 0.2 by default, when the memory
usage is 60%.
As the memory increases, it gradually decreases to 0, when the
memory usage is 80%.

Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
---
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
3 files changed, 212 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/20166/12
-- 
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 12
Gerrit-Owner: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>

[kudu-CR] KUDU-3407 Avoid unchecked scheduling of flush operations.

Posted by "Song Jiacheng (Code Review)" <ge...@cloudera.org>.
Song Jiacheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/20166 )

Change subject: KUDU-3407 Avoid unchecked scheduling of flush operations.
......................................................................


Patch Set 23:

(21 comments)

http://gerrit.cloudera.org:8080/#/c/20166/17//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20166/17//COMMIT_MSG@7
PS17, Line 7: Avoid unchecked scheduli
> nit: Avoid unchecked scheduling of flush operations
Done


http://gerrit.cloudera.org:8080/#/c/20166/17//COMMIT_MSG@20
PS17, Line 20: higher the probability to flush
> nit: higher the probability to flush
Done


http://gerrit.cloudera.org:8080/#/c/20166/17//COMMIT_MSG@20
PS17, Line 20: Higher the
> nit: Higher the
Done


http://gerrit.cloudera.org:8080/#/c/20166/17//COMMIT_MSG@27
PS17, Line 27: e
> nit: remove whitespace
Done


http://gerrit.cloudera.org:8080/#/c/20166/17//COMMIT_MSG@27
PS17, Line 27: 
Yes..
I modified the flag name and forgot to correct the commit message.
Done.


http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager-test.cc
File src/kudu/util/maintenance_manager-test.cc:

http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager-test.cc@151
PS17, Line 151: gister_self_) {
> Consider removing this once done debugging/troubleshooting the new test -- 
TSAN will report data race if I remove this, which I think is a wrong report. 
Please check the patch 18~22 to view the test log.


http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager-test.cc@314
PS17, Line 314: MonoTim
> nit: why not to use the MonoDelta type here?
Done


http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager-test.cc@315
PS17, Line 315: Sum of schedul
> nit: How many times the operation has been run.
Done


http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager-test.cc@989
PS17, Line 989: ob = 1;
> Is it possible to assert on the range of the expected counters, etc.?  Give
Yes, mostly the counter is in a range, but I am afraid that there will be some accidental compile failures if I assert that the counter is in a range, which will be confusing.


http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager.h
File src/kudu/util/maintenance_manager.h:

http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager.h@379
PS17, Line 379: memory pressure
> the memory pressure
Done


http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager.h@386
PS17, Line 386:   bool ProceedWithFlush(double* used_memory_percentage);
> nit: you could use a free format in a non-public API when documenting metho
Got it, thanks!


http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager.cc
File src/kudu/util/maintenance_manager.cc:

http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager.cc@105
PS17, Line 105: th
> nit: Who's "we" here?  It's better to be more specific on who does what her
Thans you for the review!
Done.


http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager.cc@108
PS17, Line 108: run
> run
Done


http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager.cc@110
PS17, Line 110: system adm
> nit: maybe you could use "system admin" or "user"?
Thank you for the review!
A user should turn on this switch if the tablet server is under memory pressure and there are some high-perf-score operations not scheduled for a long time.
If I try to warn the users according to the logic above, some extra and complex codes need to be added to the maintenance scheduler thread. I guess this might make a bad impact on the scheduler. Also, it not worth to create another thread just for making a warn log.


http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager.cc@111
PS17, Line 111: there is a significant degrada
> nit: "there is a significant degradation in performance."
Done


http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager.cc@136
PS17, Line 136: emory_pressure_percent
> What is the unit of the usage?
Done


http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager.cc@539
PS17, Line 539: pc
> under
Done


http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager.cc@542
PS17, Line 542:           
> Alternatively, consider calling memory_pressure_func_() functor as it is, b
Yes, this is much more reasonable.
Done


http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager.cc@730
PS17, Line 730: tatic_cast
> FlushOrNot becomes confusing when return type is bool. It is not clearly ev
Yes, That is much better, thank you!


http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager.cc@730
PS17, Line 730:     double soft_limit = static_cast<double>(FLAGS_memory_limit_soft_p
> nit: Add a comment above that describes the purpose of this method, out par
Sorry, I'm not sure. There is a comment of this method in file maintenance_manager.h, should I add another one here?


http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager.cc@732
PS17, Line 732:         rand_.NextDoubleFraction() >= FLAGS_run_non_memory_ops_prob *
              :         (soft_limit - *used_memory_percentage) / (soft_limit - pressure_threshold);
              :   }
> Instead of introducing this extra test flag and adding this extra logic, co
Got it, done.



-- 
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 23
Gerrit-Owner: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Comment-Date: Fri, 22 Sep 2023 09:47:59 +0000
Gerrit-HasComments: Yes

[kudu-CR] MM: Give a chance to do other OP while server is under memory pressure

Posted by "Song Jiacheng (Code Review)" <ge...@cloudera.org>.
Song Jiacheng has removed Alexey Serbin from this change.  ( http://gerrit.cloudera.org:8080/20166 )

Change subject: MM: Give a chance to do other OP while server is under memory pressure
......................................................................


Removed reviewer Alexey Serbin.
-- 
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteReviewer
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 2
Gerrit-Owner: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng <so...@thinkingdata.cn>

[kudu-CR] KUDU-3407 Avoid unchecked scheduling of flush operations.

Posted by "Song Jiacheng (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Ashwani Raina, Kudu Jenkins, Abhishek Chennaka, Wang Xixu, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/20166

to look at the new patch set (#31).

Change subject: KUDU-3407 Avoid unchecked scheduling of flush operations.
......................................................................

KUDU-3407 Avoid unchecked scheduling of flush operations.

In some clusters, the memory usages of tservers might be 60% ~ 80%
for a long time. During this time the maintenance manager will not
run any operation other than wal gc and MRS/DRS flushes, which will
make the performance of tservers worse and worse and eventually break
due to OOM.

This patch add an argument to give a chance to do other operations
while server is under memory pressure.

This mechanism works when the memory usage is between
memory_pressure_percentage and memory_limit_soft_percentage.
Higher the memory usage is, higher the probability to flush
MRS/DMS.

e.g.
memory_pressure_percentage = 60%
memory_limit_soft_percentage = 80%
The probability of not flushing MRS/DMS is the value of
run_non_memory_ops_prob. As the memory increases, it gradually
decreases to 0, when thememory usage is 80%.

Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
---
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
3 files changed, 200 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/20166/31
-- 
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 31
Gerrit-Owner: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>

[kudu-CR] KUDU-3407: Give a chance to do other maintenance operations while server is under memory pressure.

Posted by "Song Jiacheng (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Ashwani Raina, Kudu Jenkins, Wang Xixu, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/20166

to look at the new patch set (#8).

Change subject: KUDU-3407: Give a chance to do other maintenance operations while server is under memory pressure.
......................................................................

KUDU-3407: Give a chance to do other maintenance operations while server is under memory pressure.

This patch add an argument to give a chance to do other ops while server is under memory pressure.

This mechanism works when the memory usage between memory_pressure_percentage and memory_limit_soft_percentage.
The higher memory usage is, this higher probability to do flush MRS/DMS.

e.g.
memory_pressure_percentage = 60%
memory_limit_soft_percentage = 80%
The probability of not flushing MRS/DMS is the value of not_flush_memory_prob, which is 0.2 by default, when the memory usage is 60%.
As the memory increases, it gradually decreases to 0, when the memory usage is 80%.

Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
---
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
3 files changed, 103 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/20166/8
-- 
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 8
Gerrit-Owner: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>

[kudu-CR] KUDU-3407 not always flush even if under memory pressure.

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/20166 )

Change subject: KUDU-3407 not always flush even if under memory pressure.
......................................................................


Patch Set 17:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager-test.cc
File src/kudu/util/maintenance_manager-test.cc:

http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager-test.cc@151
PS17, Line 151: DLOG(INFO) << "Re-registering op " << this->name();
Consider removing this once done debugging/troubleshooting the new test -- it isn't used by the test, but pollutes the output.


http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager-test.cc@314
PS17, Line 314: int64_t
nit: why not to use the MonoDelta type here?


http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager-test.cc@315
PS17, Line 315: Perform count;
nit: How many times the operation has been run.


http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager-test.cc@989
PS17, Line 989: not assert anything
Is it possible to assert on the range of the expected counters, etc.?  Given the amount of memory is pre-determined and doesn't depend on the actual amount of memory that a node has, I'd expect that the scenario should preserve some particular traits even if its behavior is driven by some stochastic factors.  Running the scenario multiple times should converge to a pretty narrow range of the result counters, no?


http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager.h
File src/kudu/util/maintenance_manager.h:

http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager.h@379
PS17, Line 379: server pressure
the memory pressure


http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager.h@386
PS17, Line 386:   bool FlushOrNot(double* used_memory_percentage);
nit: you could use a free format in a non-public API when documenting methods and fields; the doxygen format isn't a requirement here


http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.cc
File src/kudu/util/maintenance_manager.cc:

http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.cc@522
PS3, Line 522:   // Look at ops that we can run quickly that free up log retention.
             :   if (low_io_most_logs_retai
> That would be much better.
Alternatively, updating the `memory_pressure_func_` with the new function could be a way to go.  Basically, having less parts for the condition would be better.


http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager.cc
File src/kudu/util/maintenance_manager.cc:

http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager.cc@105
PS17, Line 105: we
nit: Who's "we" here?  It's better to be more specific on who does what here.


http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager.cc@108
PS17, Line 108: ran
run


http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager.cc@136
PS17, Line 136: Simulated memory usage
What is the unit of the usage?


http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager.cc@539
PS17, Line 539: in
under


http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager.cc@542
PS17, Line 542: FlushOrNot
Alternatively, consider calling memory_pressure_func_() functor as it is, but assign it to a different method/function, as necessary?


http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager.cc@732
PS17, Line 732:     if (PREDICT_FALSE(FLAGS_memory_simulate_for_test != 0.0)) {
              :       *used_memory_percentage = FLAGS_memory_simulate_for_test;
              :     }
Instead of introducing this extra test flag and adding this extra logic, consider setting the memory_pressure_func_ via the MaintenanceManager::set_memory_pressure_func_for_tests() method for specific tests, similar to what's done in MaintenanceManagerTest::StartManager()?

The extra level of indirection introduced by MaintenanceManager::memory_pressure_func_ is there just for this particular purpose, so introducing an extra flag looks like an overkill if having that provision already.



-- 
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 17
Gerrit-Owner: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Comment-Date: Wed, 06 Sep 2023 23:46:58 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-3407 Avoid unchecked scheduling of flush operations.

Posted by "Song Jiacheng (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Ashwani Raina, Kudu Jenkins, Abhishek Chennaka, Wang Xixu, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/20166

to look at the new patch set (#23).

Change subject: KUDU-3407 Avoid unchecked scheduling of flush operations.
......................................................................

KUDU-3407 Avoid unchecked scheduling of flush operations.

In some clusters, the memory usages of tservers might be 60% ~ 80%
for a long time. During this time the maintenance manager will not
run any operation other than wal gc and MRS/DRS flushes, which will
make the performance of tservers worse and worse and eventually break
due to OOM.

This patch add an argument to give a chance to do other ops while
server is under memory pressure.

This mechanism works when the memory usage between
memory_pressure_percentage and memory_limit_soft_percentage.
Higher the memory usage is, higher the probability to flush
MRS/DMS.

e.g.
memory_pressure_percentage = 60%
memory_limit_soft_percentage = 80%
The probability of not flushing MRS/DMS is the value of
run_non_memory_ops_prob, which is 0.2 by default, when the memory
usage is 60%.
As the memory increases, it gradually decreases to 0, when the
memory usage is 80%.

Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
---
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
3 files changed, 223 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/20166/23
-- 
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 23
Gerrit-Owner: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>

[kudu-CR] KUDU-3407 Avoid unchecked scheduling of flush operations.

Posted by "Song Jiacheng (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Ashwani Raina, Kudu Jenkins, Abhishek Chennaka, Wang Xixu, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/20166

to look at the new patch set (#30).

Change subject: KUDU-3407 Avoid unchecked scheduling of flush operations.
......................................................................

KUDU-3407 Avoid unchecked scheduling of flush operations.

In some clusters, the memory usages of tservers might be 60% ~ 80%
for a long time. During this time the maintenance manager will not
run any operation other than wal gc and MRS/DRS flushes, which will
make the performance of tservers worse and worse and eventually break
due to OOM.

This patch add an argument to give a chance to do other operations
while server is under memory pressure.

This mechanism works when the memory usage is between
memory_pressure_percentage and memory_limit_soft_percentage.
Higher the memory usage is, higher the probability to flush
MRS/DMS.

e.g.
memory_pressure_percentage = 60%
memory_limit_soft_percentage = 80%
The probability of not flushing MRS/DMS is the value of
run_non_memory_ops_prob. As the memory increases, it gradually
decreases to 0, when thememory usage is 80%.

Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
---
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
3 files changed, 221 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/20166/30
-- 
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 30
Gerrit-Owner: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>

[kudu-CR] KUDU-3407: Give a chance to do other maintenance operations while server is under memory pressure.

Posted by "Song Jiacheng (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/20166

to look at the new patch set (#5).

Change subject: KUDU-3407: Give a chance to do other maintenance operations while server is under memory pressure.
......................................................................

KUDU-3407: Give a chance to do other maintenance operations while server is under memory pressure.

This patch add an argument to give a chance to do other OP while server is under memory pressure.

This mechanism works when the memory usage between memory_pressure_percentage and memory_limit_soft_percentage.  The higher memory usage is, this higher probability to do flush MRS/DMS.

e.g.
memory_pressure_percentage = 60%
memory_limit_soft_percentage = 80%
The probability of not flushing MRS/DMS is the value of not_flush_memory_prob, which is 0.2 by default, when the memory usage is 60%. As the memory increases, it gradually decreases to 0, when the memory usage is 80%.

Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
---
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
3 files changed, 87 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/20166/5
-- 
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 5
Gerrit-Owner: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] KUDU-3407 Avoid unchecked scheduling of flush operations.

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/20166 )

Change subject: KUDU-3407 Avoid unchecked scheduling of flush operations.
......................................................................


Patch Set 29:

(6 comments)

Overall looks good to me, just a few questions and nits.

Thank you very much for working on this!

http://gerrit.cloudera.org:8080/#/c/20166/29//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20166/29//COMMIT_MSG@12
PS29, Line 12: eventually break
             : due to OOM
> Yes, normally tablet servers reject write requests if their memory usage re
Right -- overall performance is dropping, so every operation takes longer time to complete.  However, longer running operations leads make the memory pressure building up faster, and the requests are rejected with a higher ratio, right? If so, then it's not clear how OOM could result out of such performance drop.  Actually, the memory pressure mechanism is there to prevent OOM under such conditions.

I guess the OOM could happen because of at least the following:
  1. Some operations aren't accounted for and run despite of the memory pressure, and they might consume a lot of memory.
  2. Operations are given wrong memory estimates when the process is on the threshold of an OOM condition, and allowed to run and they in fact consume much more memory they estimated, so it results in OOM condition.

I know that item 1 is present for sure, and it's reported by KUDU-3406: running CompactRowSetsOp on a long history of deltas might consume a lot of memory.

Do you think it could be the culprit behind the OOM in this case as well?  Have you tried to check memory profile of the processes that were eventually killed by the OOM killer?


http://gerrit.cloudera.org:8080/#/c/20166/29//COMMIT_MSG@16
PS29, Line 16: server is under memory pressure
> Not really actually, this patch has been applied in over 100 clusters for a
Ah, that's good news.  Thanks for the clarification.


http://gerrit.cloudera.org:8080/#/c/20166/34/src/kudu/util/maintenance_manager-test.cc
File src/kudu/util/maintenance_manager-test.cc:

http://gerrit.cloudera.org:8080/#/c/20166/34/src/kudu/util/maintenance_manager-test.cc@584
PS34, Line 584: 
nit: fix the alignment for this line


http://gerrit.cloudera.org:8080/#/c/20166/34/src/kudu/util/maintenance_manager-test.cc@1022
PS34, Line 1022: 
What's "e"?  This looks like an incomplete word or a typo.


http://gerrit.cloudera.org:8080/#/c/20166/34/src/kudu/util/maintenance_manager-test.cc@1023
PS34, Line 1023: ntenan
nit: greater


http://gerrit.cloudera.org:8080/#/c/20166/34/src/kudu/util/maintenance_manager-test.cc@1026
PS34, Line 1026:   op3.set_sleep_time(MonoDelta::FromMilliseconds(30));
               :   op3.set_register_self(true);
These margins make sense from the theoretical perspective, but how did you verify that these margins are safe enough to avoid flakiness in this test?  

Just curious: did you try to run this test scenario a test with --stress_cpu_threads=16 flag in TSAN and ASAN build configurations many times (say, setting --gtest_repeat=200)?



-- 
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 29
Gerrit-Owner: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Comment-Date: Tue, 14 Nov 2023 03:59:43 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-3407 Avoid unchecked scheduling of flush operations.

Posted by "Song Jiacheng (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Ashwani Raina, Kudu Jenkins, Abhishek Chennaka, Wang Xixu, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/20166

to look at the new patch set (#24).

Change subject: KUDU-3407 Avoid unchecked scheduling of flush operations.
......................................................................

KUDU-3407 Avoid unchecked scheduling of flush operations.

In some clusters, the memory usages of tservers might be 60% ~ 80%
for a long time. During this time the maintenance manager will not
run any operation other than wal gc and MRS/DRS flushes, which will
make the performance of tservers worse and worse and eventually break
due to OOM.

This patch add an argument to give a chance to do other ops while
server is under memory pressure.

This mechanism works when the memory usage between
memory_pressure_percentage and memory_limit_soft_percentage.
Higher the memory usage is, higher the probability to flush
MRS/DMS.

e.g.
memory_pressure_percentage = 60%
memory_limit_soft_percentage = 80%
The probability of not flushing MRS/DMS is the value of
run_non_memory_ops_prob, which is 0.2 by default, when the memory
usage is 60%.
As the memory increases, it gradually decreases to 0, when the
memory usage is 80%.

Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
---
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
3 files changed, 223 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/20166/24
-- 
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 24
Gerrit-Owner: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>

[kudu-CR] MM: Give a chance to do other OP while server is under memory pressure

Posted by "Song Jiacheng (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/20166

to look at the new patch set (#3).

Change subject: MM: Give a chance to do other OP while server is under memory pressure
......................................................................

MM: Give a chance to do other OP while server is under memory pressure

This patch add an argument to give a chance to do other OP while server is under memory pressure.

This mechanism works when the memory usage between memory_pressure_percentage and memory_limit_soft_percentage.  The higher memory usage is, this higher probability to do flush MRS/DMS.

e.g.
memory_pressure_percentage = 60%
memory_limit_soft_percentage = 80%
The probability of not flushing MRS/DMS is the value of not_flush_memory_prob, which is 0.2 by default, when the memory usage is 60%. As the memory increases, it gradually decreases to 0, when the memory usage is 80%.

Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
---
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
3 files changed, 67 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/20166/3
-- 
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 3
Gerrit-Owner: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng <so...@thinkingdata.cn>

[kudu-CR] KUDU-3407: Give a chance to do other maintenance operations while server is under memory pressure.

Posted by "Song Jiacheng (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/20166

to look at the new patch set (#4).

Change subject: KUDU-3407: Give a chance to do other maintenance operations while server is under memory pressure.
......................................................................

KUDU-3407: Give a chance to do other maintenance operations while server is under memory pressure.

This patch add an argument to give a chance to do other OP while server is under memory pressure.

This mechanism works when the memory usage between memory_pressure_percentage and memory_limit_soft_percentage.  The higher memory usage is, this higher probability to do flush MRS/DMS.

e.g.
memory_pressure_percentage = 60%
memory_limit_soft_percentage = 80%
The probability of not flushing MRS/DMS is the value of not_flush_memory_prob, which is 0.2 by default, when the memory usage is 60%. As the memory increases, it gradually decreases to 0, when the memory usage is 80%.

Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
---
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
3 files changed, 85 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/20166/4
-- 
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 4
Gerrit-Owner: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng <so...@thinkingdata.cn>

[kudu-CR] KUDU-3407: Give a chance to do other maintenance operations while server is under memory pressure.

Posted by "Song Jiacheng (Code Review)" <ge...@cloudera.org>.
Song Jiacheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/20166 )

Change subject: KUDU-3407: Give a chance to do other maintenance operations while server is under memory pressure.
......................................................................


Patch Set 7:

> Patch Set 7: Verified-1
> 
> Build Failed 
> 
> http://jenkins.kudu.apache.org/job/kudu-gerrit/28185/ : FAILURE

Seems that the failed test is kudu-tool-test, whose fail reason is Broken pipe. I guess it's not related to this patch.


-- 
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 7
Gerrit-Owner: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Wed, 12 Jul 2023 11:28:50 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-3407 not always flush even if under memory pressure.

Posted by "Song Jiacheng (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Ashwani Raina, Kudu Jenkins, Wang Xixu, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/20166

to look at the new patch set (#9).

Change subject: KUDU-3407 not always flush even if under memory pressure.
......................................................................

KUDU-3407 not always flush even if under memory pressure.

This patch add an argument to give a chance to do other ops while
server is under memory pressure.

This mechanism works when the memory usage between
memory_pressure_percentage and memory_limit_soft_percentage.
The higher memory usage is, this higher probability to do flush
MRS/DMS.

e.g.
memory_pressure_percentage = 60%
memory_limit_soft_percentage = 80%
The probability of not flushing MRS/DMS is the value of
not_flush_memory_prob, which is 0.2 by default, when the memory
usage is 60%.
As the memory increases, it gradually decreases to 0, when the
memory usage is 80%.

Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
---
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
3 files changed, 103 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/20166/9
-- 
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 9
Gerrit-Owner: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>

[kudu-CR] KUDU-3407 not always flush even if under memory pressure.

Posted by "Song Jiacheng (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Ashwani Raina, Kudu Jenkins, Wang Xixu, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/20166

to look at the new patch set (#13).

Change subject: KUDU-3407 not always flush even if under memory pressure.
......................................................................

KUDU-3407 not always flush even if under memory pressure.

In some clusters, the memory usages of tservers might be 60% ~ 80%
for a long time. During this time the maintenance manager will not
run any operation other than wal gc and MRS/DRS flushes, which will
make the performance of tservers worse and worse and eventually break
due to OOM.

This patch add an argument to give a chance to do other ops while
server is under memory pressure.

This mechanism works when the memory usage between
memory_pressure_percentage and memory_limit_soft_percentage.
The higher memory usage is, this higher probability to do flush
MRS/DMS.

e.g.
memory_pressure_percentage = 60%
memory_limit_soft_percentage = 80%
The probability of not flushing MRS/DMS is the value of
not_flush_memory_prob, which is 0.2 by default, when the memory
usage is 60%.
As the memory increases, it gradually decreases to 0, when the
memory usage is 80%.

Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
---
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
3 files changed, 213 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/20166/13
-- 
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 13
Gerrit-Owner: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>

[kudu-CR] KUDU-3407 not always flush even if under memory pressure.

Posted by "Song Jiacheng (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Ashwani Raina, Kudu Jenkins, Wang Xixu, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/20166

to look at the new patch set (#11).

Change subject: KUDU-3407 not always flush even if under memory pressure.
......................................................................

KUDU-3407 not always flush even if under memory pressure.

In some clusters, the memory usages of tservers might be 60% ~ 80%
for a long time. During this time the maintenance manager will not
run any operation other than wal gc and MRS/DRS flushes, which will
make the performance of the tservers worse and worse and eventually
break due to OOM.

This patch add an argument to give a chance to do other ops while
server is under memory pressure.

This mechanism works when the memory usage between
memory_pressure_percentage and memory_limit_soft_percentage.
The higher memory usage is, this higher probability to do flush
MRS/DMS.

e.g.
memory_pressure_percentage = 60%
memory_limit_soft_percentage = 80%
The probability of not flushing MRS/DMS is the value of
not_flush_memory_prob, which is 0.2 by default, when the memory
usage is 60%.
As the memory increases, it gradually decreases to 0, when the
memory usage is 80%.

Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
---
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
3 files changed, 207 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/20166/11
-- 
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 11
Gerrit-Owner: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>

[kudu-CR] KUDU-3407 Avoid unchecked scheduling of flush operations.

Posted by "Song Jiacheng (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Ashwani Raina, Kudu Jenkins, Abhishek Chennaka, Wang Xixu, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/20166

to look at the new patch set (#28).

Change subject: KUDU-3407 Avoid unchecked scheduling of flush operations.
......................................................................

KUDU-3407 Avoid unchecked scheduling of flush operations.

In some clusters, the memory usages of tservers might be 60% ~ 80%
for a long time. During this time the maintenance manager will not
run any operation other than wal gc and MRS/DRS flushes, which will
make the performance of tservers worse and worse and eventually break
due to OOM.

This patch add an argument to give a chance to do other ops while
server is under memory pressure.

This mechanism works when the memory usage is between
memory_pressure_percentage and memory_limit_soft_percentage.
Higher the memory usage is, higher the probability to flush
MRS/DMS.

e.g.
memory_pressure_percentage = 60%
memory_limit_soft_percentage = 80%
The probability of not flushing MRS/DMS is the value of
run_non_memory_ops_prob. As the memory increases, it gradually 
decreases to 0, when thememory usage is 80%.

Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
---
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
3 files changed, 224 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/20166/28
-- 
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 28
Gerrit-Owner: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>

[kudu-CR] eKUDU-3407 Avoid unchecked scheduling of flush operations.

Posted by "Song Jiacheng (Code Review)" <ge...@cloudera.org>.
Song Jiacheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/20166 )

Change subject: eKUDU-3407 Avoid unchecked scheduling of flush operations.
......................................................................


Patch Set 35:

(5 comments)

Thanks for the review~

http://gerrit.cloudera.org:8080/#/c/20166/29//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20166/29//COMMIT_MSG@12
PS29, Line 12: eventually break
             : due to OOM
> Right -- overall performance is dropping, so every operation takes longer t
Theoretically, yes, it should not be much higher than 80%. But I could always find a tserver whose memory is higher than 90, sometimes 100%. I had looked into the memory profile and I think everything is OK except some big SchemaPBs from tombstoned replicas, which I have raised another issue KUDU-3486 to fix it.
But the memory of tombstoned replicas is also tracked by the memory tracker, so the memory pressure mechanism should still work. 
I think the reason might be the scan requests, or the concurrent writes or I/O problem. And all of them could be optimized by maintenance operations.
And also, sometimes the performance might not be the root cause of OOM. I still think this mechanism is needed since I always find that there are many operations pending with high perf improvement score when the tserver is under memory pressure.


http://gerrit.cloudera.org:8080/#/c/20166/34/src/kudu/util/maintenance_manager-test.cc
File src/kudu/util/maintenance_manager-test.cc:

http://gerrit.cloudera.org:8080/#/c/20166/34/src/kudu/util/maintenance_manager-test.cc@584
PS34, Line 584:   me
> nit: fix the alignment for this line
Done


http://gerrit.cloudera.org:8080/#/c/20166/34/src/kudu/util/maintenance_manager-test.cc@1022
PS34, Line 1022: l
> What's "e"?  This looks like an incomplete word or a typo.
Done


http://gerrit.cloudera.org:8080/#/c/20166/34/src/kudu/util/maintenance_manager-test.cc@1023
PS34, Line 1023: s * pr
> nit: greater
Done


http://gerrit.cloudera.org:8080/#/c/20166/34/src/kudu/util/maintenance_manager-test.cc@1026
PS34, Line 1026:   // take time, so the other_ops_running_times might be greater than expected.
               :   const int64_t memory_op_running_times = op2.run_count();
> These margins make sense from the theoretical perspective, but how did you 
I ran it with the parameters and it turns out that AssertEventually will time out until memory_op run 1000 times. 
No other problem except that, the range is big enough to avoid the flakiness.
Done.



-- 
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 35
Gerrit-Owner: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Comment-Date: Tue, 14 Nov 2023 17:19:27 +0000
Gerrit-HasComments: Yes

[kudu-CR] MM: Give a chance to do other OP while server is under memory pressure

Posted by "Song Jiacheng (Code Review)" <ge...@cloudera.org>.
Song Jiacheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/20166 )

Change subject: MM: Give a chance to do other OP while server is under memory pressure
......................................................................


Patch Set 3:

> Patch Set 3:
> 
> (17 comments)
> 
> Thank you very much for the patch!
> 
> I'm curious: are there any particular workloads that benefit a lot from the updated behavior of the 'next best op' selection?

Thanks a lot for your review, all the comments are very precious for me!
In some clusters, the memory usages are always about 60%, in which case maintenance manager always tries to find a flush operation to run, but actually there are many other important operations to run.
For example, we might have set some priorities to some tables, so the perf scores of the tablets which belong to the priority could be very high, which means they are very important to the cluster, but they are never executed because of the memory pressure. And the memory usage is getting higher and higher because the performance of the tablets is getting worse.
I'm fixing this patch according to your comments, and will post again today, thanks again~


-- 
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 3
Gerrit-Owner: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Comment-Date: Wed, 12 Jul 2023 03:27:29 +0000
Gerrit-HasComments: No

[kudu-CR] MM: Give a chance to do other OP while server is under memory pressure

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/20166 )

Change subject: MM: Give a chance to do other OP while server is under memory pressure
......................................................................


Patch Set 3:

(17 comments)

Thank you very much for the patch!

I'm curious: are there any particular workloads that benefit a lot from the updated behavior of the 'next best op' selection?

http://gerrit.cloudera.org:8080/#/c/20166/3//COMMIT_MSG
Commit Message:

PS3: 
Please follow this generic git guideline to properly format the commit message:
  https://git-scm.com/book/en/v2/Distributed-Git-Contributing-to-a-Project#_commit_guidelines

You can find more useful resources at this page:
  https://kudu.apache.org/docs/contributing.html#_submitting_patches


http://gerrit.cloudera.org:8080/#/c/20166/3//COMMIT_MSG@7
PS3, Line 7: Give a chance to do other OP while server is under memory pressure
Do you have any measurements/results that show how much improvement this provides under a particular workload?

Thanks!


http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager-test.cc
File src/kudu/util/maintenance_manager-test.cc:

http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager-test.cc@919
PS3, Line 919:  FLAGS_not_flush_memory_prob = 1;
This test scenario covers just one corner case.  It would be great to add more scenarios where other values for --not_flush_memory_prob are used, at least to make sure the ratio of selected maintenance operations is as expected per the setting of the newly introduced flag.

Thanks!


http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.h
File src/kudu/util/maintenance_manager.h:

http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.h@379
PS3, Line 379: pressure
memory pressure


http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.h@379
PS3, Line 379: do
run


http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.h@379
PS3, Line 379: server
the server


http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.h@380
PS3, Line 380: FlushOrNot
Please describe the parameter and the return value.


http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.cc
File src/kudu/util/maintenance_manager.cc:

http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.cc@104
PS3, Line 104: DEFINE_double(not_flush_memory_prob, 0.2,
Consider adding a validator for this flag: IIUC, the only allowed values are in the range of [0, 1]


http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.cc@104
PS3, Line 104: 0.2
What sort of testing has been done to make sure setting the default value to 0.2 doesn't introduce any performance issues for existing workloads?

If no testing has been performed, maybe set the default value for this parameter as 0 to maintain the legacy behavior of the new code by default?


http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.cc@105
PS3, Line 105: pressure
memory pressure


http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.cc@106
PS3, Line 106: pressure
memory pressure


http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.cc@107
PS3, Line 107: need to be done
to run


http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.cc@519
PS3, Line 519: some performance ops
What are 'performance ops'?


http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.cc@519
PS3, Line 519: even in memory pressure
even if under memory pressure


http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.cc@522
PS3, Line 522:   if (memory_pressure_func_(&capacity_pct) && most_logs_retained_bytes_ram_anchored_op &&
             :     FlushOrNot(capacity_pct)
Would it would be easier to comprehend if calling memory_pressure_func_() from within FlushOrNot()?


http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.cc@715
PS3, Line 715:          
nit: the indent here should be 4 spaces per Kudu's C++ code style


http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.cc@716
PS3, Line 716:          
ditto



-- 
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 3
Gerrit-Owner: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Comment-Date: Wed, 12 Jul 2023 00:57:23 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-3407: Give a chance to do other maintenance operations while server is under memory pressure.

Posted by "Song Jiacheng (Code Review)" <ge...@cloudera.org>.
Song Jiacheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/20166 )

Change subject: KUDU-3407: Give a chance to do other maintenance operations while server is under memory pressure.
......................................................................


Patch Set 8:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/20166/7/src/kudu/util/maintenance_manager-test.cc
File src/kudu/util/maintenance_manager-test.cc:

http://gerrit.cloudera.org:8080/#/c/20166/7/src/kudu/util/maintenance_manager-test.cc@922
PS7, Line 922: TEST_F(MaintenanceManagerTest, TestNotFlushMemory) {
> Did you run any practical workloads to see other maintenance ops in action 
Thank you for your review!
Yes, I did. This patch has worked in our clusters for a long time. Maintenance manager does schedule some operations other than flush ops while under memory pressure. 
I have a test, which is involved in KUDU-3488, testing various policies of maintenance manager, including this not_flush_memory_prob. And it shows that the not-flush mechanism works.
Sometimes the memory usage of tablet servers stay at about 60%(memory pressure threshold), because the write workload and the flush ability of maintenance manager are almost equal, in which case the high perf score operations can not be run. And eventually the performance of the tablet server will be lower and lower, leading to higher memory usage, and finally a vicious circle appears.
A user might want to turn on this if he find that the situation which is described above occurred, and he need to find the balance between the performance and memory.
Actually, The probability is decreasing while the memory usage is getting close to the 80%(memory soft limit), so mostly the memory usage won't be too high.


http://gerrit.cloudera.org:8080/#/c/20166/7/src/kudu/util/maintenance_manager.cc
File src/kudu/util/maintenance_manager.cc:

http://gerrit.cloudera.org:8080/#/c/20166/7/src/kudu/util/maintenance_manager.cc@104
PS7, Line 104: DEFINE_double(not_flush_memory_prob, 0,
> You solution only distinguishes the memory related ops and non-memory relat
Thanks for your comment!
I think the ops which we want to run have already got a high perf score, the only reason they can't be run is that FindBestOp always do flush ops if under memory pressure. For now, I think the perf score mechanism is able to find the best op to run after tuning the configurations and table priorities.


http://gerrit.cloudera.org:8080/#/c/20166/7/src/kudu/util/maintenance_manager.cc@104
PS7, Line 104: 0
> If this is going to be 0(i.e. DRS/MRS flush ops will be scheduled as per pr
Exactly, I will commit another patch with more information.
Thanks!



-- 
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 8
Gerrit-Owner: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Comment-Date: Tue, 25 Jul 2023 07:33:19 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-3407 Avoid unchecked scheduling of flush operations.

Posted by "Song Jiacheng (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Ashwani Raina, Kudu Jenkins, Abhishek Chennaka, Wang Xixu, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/20166

to look at the new patch set (#20).

Change subject: KUDU-3407 Avoid unchecked scheduling of flush operations.
......................................................................

KUDU-3407 Avoid unchecked scheduling of flush operations.

In some clusters, the memory usages of tservers might be 60% ~ 80%
for a long time. During this time the maintenance manager will not
run any operation other than wal gc and MRS/DRS flushes, which will
make the performance of tservers worse and worse and eventually break
due to OOM.

This patch add an argument to give a chance to do other ops while
server is under memory pressure.

This mechanism works when the memory usage is between
memory_pressure_percentage and memory_limit_soft_percentage.
Higher the memory usage is, higher the probability to flush
MRS/DMS.

e.g.
memory_pressure_percentage = 60%
memory_limit_soft_percentage = 80%
The probability of not flushing MRS/DMS is the value of
run_non_memory_ops_prob, which is 0.2 by default, when the memory
usage is 60%.
As the memory increases, it gradually decreases to 0, when the
memory usage is 80%.

Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
---
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
3 files changed, 222 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/20166/20
-- 
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 20
Gerrit-Owner: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>

[kudu-CR] KUDU-3407 not always flush even if under memory pressure.

Posted by "Song Jiacheng (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Ashwani Raina, Kudu Jenkins, Wang Xixu, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/20166

to look at the new patch set (#16).

Change subject: KUDU-3407 not always flush even if under memory pressure.
......................................................................

KUDU-3407 not always flush even if under memory pressure.

In some clusters, the memory usages of tservers might be 60% ~ 80%
for a long time. During this time the maintenance manager will not
run any operation other than wal gc and MRS/DRS flushes, which will
make the performance of tservers worse and worse and eventually break
due to OOM.

This patch add an argument to give a chance to do other ops while
server is under memory pressure.

This mechanism works when the memory usage is between
memory_pressure_percentage and memory_limit_soft_percentage.
The higher memory usage is, this higher probability to do flush
MRS/DMS.

e.g.
memory_pressure_percentage = 60%
memory_limit_soft_percentage = 80%
The probability of not flushing MRS/DMS is the value of
not_flush_memory_prob, which is 0.2 by default, when the memory
usage is 60%.
As the memory increases, it gradually decreases to 0, when the
memory usage is 80%.

Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
---
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
3 files changed, 210 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/20166/16
-- 
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 16
Gerrit-Owner: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>

[kudu-CR] KUDU-3407 not always flush even if under memory pressure.

Posted by "Wang Xixu (Code Review)" <ge...@cloudera.org>.
Wang Xixu has posted comments on this change. ( http://gerrit.cloudera.org:8080/20166 )

Change subject: KUDU-3407 not always flush even if under memory pressure.
......................................................................


Patch Set 9:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/20166/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20166/9//COMMIT_MSG@8
PS9, Line 8: 
           : This patch add an argument to give a chance to do other ops while
           : server is under memory pressure.
           : 
The background of this patch should also be introduced.


http://gerrit.cloudera.org:8080/#/c/20166/9/src/kudu/util/maintenance_manager-test.cc
File src/kudu/util/maintenance_manager-test.cc:

http://gerrit.cloudera.org:8080/#/c/20166/9/src/kudu/util/maintenance_manager-test.cc@336
PS9, Line 336:   FLAGS_not_flush_memory_prob = 0.0;
The default value is 0, why set it again?


http://gerrit.cloudera.org:8080/#/c/20166/9/src/kudu/util/maintenance_manager.h
File src/kudu/util/maintenance_manager.h:

http://gerrit.cloudera.org:8080/#/c/20166/9/src/kudu/util/maintenance_manager.h@382
PS9, Line 382:   /// @param [out] capacity_pct
nit: used_memory_percentage


http://gerrit.cloudera.org:8080/#/c/20166/9/src/kudu/util/maintenance_manager.cc
File src/kudu/util/maintenance_manager.cc:

http://gerrit.cloudera.org:8080/#/c/20166/9/src/kudu/util/maintenance_manager.cc@104
PS9, Line 104: not_flush_memory_prob
It is easy to be understand to use a positive flag. How about: run_non_memory_ops_prob?


http://gerrit.cloudera.org:8080/#/c/20166/9/src/kudu/util/maintenance_manager.cc@107
PS9, Line 107: operations to run
non-memory operations waiting to be ran.


http://gerrit.cloudera.org:8080/#/c/20166/9/src/kudu/util/maintenance_manager.cc@109
PS9, Line 109: This might be needed to turn on if maintainer found "
             :               "that the tablet server is under memory pressure for a long time and "
             :               "the performance is decreasing
I think your purpose is to make a balance between running memory and non-memory operations. But It is hard to decide the value of not_flush_memory_prob by  the maintainer. How about to computing the not_flush_memory_prob automatically, and using a switch to turn on this feature?


http://gerrit.cloudera.org:8080/#/c/20166/9/src/kudu/util/maintenance_manager.cc@111
PS9, Line 111: the performance
Reading or wring performance?


http://gerrit.cloudera.org:8080/#/c/20166/9/src/kudu/util/maintenance_manager.cc@112
PS9, Line 112: TAG_FLAG(not_flush_memory_prob, experimental);
It should also add 'runtime' flag?



-- 
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 9
Gerrit-Owner: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Comment-Date: Fri, 04 Aug 2023 08:29:20 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-3407 Avoid unchecked scheduling of flush operations.

Posted by "Song Jiacheng (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Ashwani Raina, Kudu Jenkins, Abhishek Chennaka, Wang Xixu, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/20166

to look at the new patch set (#29).

Change subject: KUDU-3407 Avoid unchecked scheduling of flush operations.
......................................................................

KUDU-3407 Avoid unchecked scheduling of flush operations.

In some clusters, the memory usages of tservers might be 60% ~ 80%
for a long time. During this time the maintenance manager will not
run any operation other than wal gc and MRS/DRS flushes, which will
make the performance of tservers worse and worse and eventually break
due to OOM.

This patch add an argument to give a chance to do other operations
while server is under memory pressure.

This mechanism works when the memory usage is between
memory_pressure_percentage and memory_limit_soft_percentage.
Higher the memory usage is, higher the probability to flush
MRS/DMS.

e.g.
memory_pressure_percentage = 60%
memory_limit_soft_percentage = 80%
The probability of not flushing MRS/DMS is the value of
run_non_memory_ops_prob. As the memory increases, it gradually 
decreases to 0, when thememory usage is 80%.

Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
---
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
3 files changed, 224 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/20166/29
-- 
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 29
Gerrit-Owner: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>

[kudu-CR] KUDU-3407 Avoid unchecked scheduling of flush operations.

Posted by "Song Jiacheng (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Ashwani Raina, Kudu Jenkins, Abhishek Chennaka, Wang Xixu, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/20166

to look at the new patch set (#27).

Change subject: KUDU-3407 Avoid unchecked scheduling of flush operations.
......................................................................

KUDU-3407 Avoid unchecked scheduling of flush operations.

In some clusters, the memory usages of tservers might be 60% ~ 80%
for a long time. During this time the maintenance manager will not
run any operation other than wal gc and MRS/DRS flushes, which will
make the performance of tservers worse and worse and eventually break
due to OOM.

This patch add an argument to give a chance to do other ops while
server is under memory pressure.

This mechanism works when the memory usage is between
memory_pressure_percentage and memory_limit_soft_percentage.
Higher the memory usage is, higher the probability to flush
MRS/DMS.

e.g.
memory_pressure_percentage = 60%
memory_limit_soft_percentage = 80%
The probability of not flushing MRS/DMS is the value of
run_non_memory_ops_prob. As the memory increases, it 
gradually decreases to 0, when thememory usage is 80%.

Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
---
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
3 files changed, 224 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/20166/27
-- 
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 27
Gerrit-Owner: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>

[kudu-CR] KUDU-3407 Avoid unchecked scheduling of flush operations.

Posted by "Song Jiacheng (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Ashwani Raina, Kudu Jenkins, Abhishek Chennaka, Wang Xixu, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/20166

to look at the new patch set (#26).

Change subject: KUDU-3407 Avoid unchecked scheduling of flush operations.
......................................................................

KUDU-3407 Avoid unchecked scheduling of flush operations.

In some clusters, the memory usages of tservers might be 60% ~ 80%
for a long time. During this time the maintenance manager will not
run any operation other than wal gc and MRS/DRS flushes, which will
make the performance of tservers worse and worse and eventually break
due to OOM.

This patch add an argument to give a chance to do other ops while
server is under memory pressure.

This mechanism works when the memory usage is between
memory_pressure_percentage and memory_limit_soft_percentage.
Higher the memory usage is, higher the probability to flush
MRS/DMS.

e.g.
memory_pressure_percentage = 60%
memory_limit_soft_percentage = 80%
The probability of not flushing MRS/DMS is the value of
run_non_memory_ops_prob, which is 0.2 by default, when the memory
usage is 60%.
As the memory increases, it gradually decreases to 0, when the
memory usage is 80%.

Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
---
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
3 files changed, 224 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/20166/26
-- 
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 26
Gerrit-Owner: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>

[kudu-CR] KUDU-3407 Avoid unchecked scheduling of flush operations.

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/20166 )

Change subject: KUDU-3407 Avoid unchecked scheduling of flush operations.
......................................................................


Patch Set 29:

(19 comments)

http://gerrit.cloudera.org:8080/#/c/20166/29//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20166/29//COMMIT_MSG@11
PS29, Line 11: wal
nit: WAL


http://gerrit.cloudera.org:8080/#/c/20166/29//COMMIT_MSG@12
PS29, Line 12: eventually break
             : due to OOM
I'm curious what do you think was the actual reason behind that OOM condition.  If the tablet server flushes MRSs, and pushes back the incoming write operations according to the memory pressure, even if each read/write operation takes longer and longer, I'd assume that would just make the memory pressure higher (so less write operations per second are admitted), but why does it eventually goes OOM?


http://gerrit.cloudera.org:8080/#/c/20166/29//COMMIT_MSG@15
PS29, Line 15: add
nit: adds


http://gerrit.cloudera.org:8080/#/c/20166/29//COMMIT_MSG@16
PS29, Line 16: server is under memory pressure
By your experience with applying this functionality in a real cluster, wouldn't it increase the chances of an OOM condition?  IIUC, avoiding flushing MRSs and instead focusing on running compactions would be triggering an OOM under heavy load, no?


http://gerrit.cloudera.org:8080/#/c/20166/29//COMMIT_MSG@27
PS29, Line 27:  
nit: remove the trailing space


http://gerrit.cloudera.org:8080/#/c/20166/29//COMMIT_MSG@28
PS29, Line 28: thememory
nit: the memory


http://gerrit.cloudera.org:8080/#/c/20166/29/src/kudu/util/maintenance_manager-test.cc
File src/kudu/util/maintenance_manager-test.cc:

http://gerrit.cloudera.org:8080/#/c/20166/29/src/kudu/util/maintenance_manager-test.cc@314
PS29, Line 314: that
nit: drop 'that'


http://gerrit.cloudera.org:8080/#/c/20166/29/src/kudu/util/maintenance_manager-test.cc@316
PS29, Line 316: schedule_time_sum_
nit: usually it's called queue time or something, meaning how much time an entity has spent waiting in the queue after it had been added into the queue of runnable tasks


http://gerrit.cloudera.org:8080/#/c/20166/29/src/kudu/util/maintenance_manager-test.cc@318
PS29, Line 318: count_
nit: maybe, name this 'run_count_'?


http://gerrit.cloudera.org:8080/#/c/20166/29/src/kudu/util/maintenance_manager-test.cc@389
PS29, Line 389: indicate_memory_pressure_
I guess it makes sense to rename this field since its semantics is different now: 'memory_pressure_pct_' might be a good option.


http://gerrit.cloudera.org:8080/#/c/20166/29/src/kudu/util/maintenance_manager-test.cc@982
PS29, Line 982: Perf improvement op should be scheduled, not the memory op.
What is the probability of actually choosing a memory flushing op here?  In other words, how often do you expect this to fail?  Would be great to have a comment about that here.


http://gerrit.cloudera.org:8080/#/c/20166/29/src/kudu/util/maintenance_manager-test.cc@985
PS29, Line 985: op1.DurationHistogram()->TotalCount(), 1
nit for here and elsewhere for ASSERT_EQ(): the expected value comes first


http://gerrit.cloudera.org:8080/#/c/20166/29/src/kudu/util/maintenance_manager-test.cc@998
PS29, Line 998: This test would not assert anything since it tests the probability flags
Even with probabilistic behavior, isn't it always possible at least to check against the expected number of operations with a wide, but still meaningful safety margin?  If running thousands of operations, I'd think that those numbers are within particular bounds that depend on the induced memory pressure level, with very high probability, and having 1 out of 1 trillion runs failed isn't a big deal -- the natural flakiness of the IC infrastructure is much higher than that :)


http://gerrit.cloudera.org:8080/#/c/20166/29/src/kudu/util/maintenance_manager-test.cc@999
PS29, Line 999: ComprehensiveTest
Given this test scenario runs way over 3 seconds, please add  SKIP_IF_SLOW_NOT_ALLOWED() as the very first line in this scope to skip running this if KUDU_ALLOW_SLOW_TESTS env variable isn't set to 1.


http://gerrit.cloudera.org:8080/#/c/20166/29/src/kudu/util/maintenance_manager-test.cc@1006
PS29, Line 1006:       "run_non_memory_ops_prob = $1, data_gc_prioritization_prob = $2",
               :       indicate_memory_pressure_.load(), FLAGS_run_non_memory_ops_prob,
               :       FLAGS_data_gc_prioritization_prob);
Why to output this if all these variables aren't even changing at this point after they have been assigned in the code just above?  Consider removing this logging since it doesn't provide any useful information from the standpoint of an automated test.


http://gerrit.cloudera.org:8080/#/c/20166/29/src/kudu/util/maintenance_manager-test.cc@1035
PS29, Line 1035: MaintenanceMgr num
What is 'MaintenanceMgr num'?


http://gerrit.cloudera.org:8080/#/c/20166/29/src/kudu/util/maintenance_manager-test.cc@1036
PS29, Line 1036: StartManager(1);
Wrap this with NO_FATALS() ?


http://gerrit.cloudera.org:8080/#/c/20166/29/src/kudu/util/maintenance_manager-test.cc@1044
PS29, Line 1044:   SleepFor(MonoDelta::FromMilliseconds(10000));
Why to wait so long?  Would be great to comment on this.  Alternatively, may be it's better to rather check for the number of operations run by the maintenance manager so far, and shoot for a fixed threshold there.  That would be better if thinking about variations across hardware capabilities of the node where this test is run.

You could use DurationHistogram()->TotalCount() for an operation to check how many times it has been run so far.


http://gerrit.cloudera.org:8080/#/c/20166/29/src/kudu/util/maintenance_manager-test.cc@1046
PS29, Line 1046:   // Wait until all the operations are done.
               :   SleepFor(MonoDelta::FromMilliseconds(8000));
Is there a way to check for the number of operations still running or operations that are to be scheduled?  E.g., check against maintenance_ops_running_, etc.?



-- 
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 29
Gerrit-Owner: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Comment-Date: Mon, 16 Oct 2023 21:10:07 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-3407 Avoid unchecked scheduling of flush operations.

Posted by "Song Jiacheng (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Ashwani Raina, Kudu Jenkins, Abhishek Chennaka, Wang Xixu, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/20166

to look at the new patch set (#36).

Change subject: KUDU-3407 Avoid unchecked scheduling of flush operations.
......................................................................

KUDU-3407 Avoid unchecked scheduling of flush operations.

In some clusters, the memory usages of tservers might be 60% ~ 80%
for a long time. During this time the maintenance manager will not
run any operation other than wal gc and MRS/DRS flushes, which will
make the performance of tservers worse and worse and eventually break
due to OOM.

This patch add an argument to give a chance to do other operations
while server is under memory pressure.

This mechanism works when the memory usage is between
memory_pressure_percentage and memory_limit_soft_percentage.
Higher the memory usage is, higher the probability to flush
MRS/DMS.

e.g.
memory_pressure_percentage = 60%
memory_limit_soft_percentage = 80%
The probability of not flushing MRS/DMS is the value of
run_non_memory_ops_prob. As the memory increases, it gradually
decreases to 0, when thememory usage is 80%.

Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
---
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
3 files changed, 200 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/20166/36
-- 
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 36
Gerrit-Owner: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>

[kudu-CR] KUDU-3407 Avoid unchecked scheduling of flush operations.

Posted by "Song Jiacheng (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Ashwani Raina, Kudu Jenkins, Abhishek Chennaka, Wang Xixu, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/20166

to look at the new patch set (#32).

Change subject: KUDU-3407 Avoid unchecked scheduling of flush operations.
......................................................................

KUDU-3407 Avoid unchecked scheduling of flush operations.

In some clusters, the memory usages of tservers might be 60% ~ 80%
for a long time. During this time the maintenance manager will not
run any operation other than wal gc and MRS/DRS flushes, which will
make the performance of tservers worse and worse and eventually break
due to OOM.

This patch add an argument to give a chance to do other operations
while server is under memory pressure.

This mechanism works when the memory usage is between
memory_pressure_percentage and memory_limit_soft_percentage.
Higher the memory usage is, higher the probability to flush
MRS/DMS.

e.g.
memory_pressure_percentage = 60%
memory_limit_soft_percentage = 80%
The probability of not flushing MRS/DMS is the value of
run_non_memory_ops_prob. As the memory increases, it gradually
decreases to 0, when thememory usage is 80%.

Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
---
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
3 files changed, 200 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/20166/32
-- 
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 32
Gerrit-Owner: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng <so...@thinkingdata.cn>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>