You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Guanghao Zhang (Jira)" <ji...@apache.org> on 2020/08/20 00:44:00 UTC

[jira] [Comment Edited] (HBASE-23956) Use less resources running tests

    [ https://issues.apache.org/jira/browse/HBASE-23956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17180893#comment-17180893 ] 

Guanghao Zhang edited comment on HBASE-23956 at 8/20/20, 12:43 AM:
-------------------------------------------------------------------

[~stack] Sir, I found HBASE-23956 changed hbase.server.thread.wakefrequency config from 1000 to 100. So many chore threads will be scheduled every 100ms. There are a lot of logs about these chore threads, such as:
  
 2020-08-19 01:20:59,899 DEBUG [regionserver/asf909:0.Chore.1] hbase.ScheduledChore(192): CompactionChecker execution time: 0 ms.
 2020-08-19 01:20:59,899 DEBUG [regionserver/asf909:0.Chore.1] hbase.ScheduledChore(192): MemstoreFlusherChore execution time: 0 ms.
 2020-08-19 01:20:59,900 DEBUG [regionserver/asf909:0.Chore.1] hbase.ScheduledChore(192): CompactionChecker execution time: 0 ms.
 2020-08-19 01:20:59,900 DEBUG [regionserver/asf909:0.Chore.1] hbase.ScheduledChore(192): MemstoreFlusherChore execution time: 0 ms.
 2020-08-19 01:20:59,905 DEBUG [regionserver/asf909:0.Chore.1] hbase.ScheduledChore(192): MemstoreFlusherChore execution time: 0 ms.
 2020-08-19 01:20:59,905 DEBUG [regionserver/asf909:0.Chore.1] hbase.ScheduledChore(192): CompactionChecker execution time: 0 ms.
 2020-08-19 01:21:00,001 DEBUG [regionserver/asf909:0.Chore.1] hbase.ScheduledChore(192): CompactionChecker execution time: 0 ms.
 2020-08-19 01:21:00,004 DEBUG [regionserver/asf909:0.Chore.1] hbase.ScheduledChore(192): StorefileRefresherChore execution time: 2 ms.
 2020-08-19 01:21:00,004 DEBUG [regionserver/asf909:0.Chore.1] hbase.ScheduledChore(192): StorefileRefresherChore execution time: 2 ms.
  
 The reason for this change is? 


was (Author: zghaobac):
Sir, I found HBASE-23956 changed hbase.server.thread.wakefrequency config from 1000 to 100. So many chore threads will be scheduled every 100ms. There are a lot of logs about these chore threads, such as:
 
2020-08-19 01:20:59,899 DEBUG [regionserver/asf909:0.Chore.1] hbase.ScheduledChore(192): CompactionChecker execution time: 0 ms.
2020-08-19 01:20:59,899 DEBUG [regionserver/asf909:0.Chore.1] hbase.ScheduledChore(192): MemstoreFlusherChore execution time: 0 ms.
2020-08-19 01:20:59,900 DEBUG [regionserver/asf909:0.Chore.1] hbase.ScheduledChore(192): CompactionChecker execution time: 0 ms.
2020-08-19 01:20:59,900 DEBUG [regionserver/asf909:0.Chore.1] hbase.ScheduledChore(192): MemstoreFlusherChore execution time: 0 ms.
2020-08-19 01:20:59,905 DEBUG [regionserver/asf909:0.Chore.1] hbase.ScheduledChore(192): MemstoreFlusherChore execution time: 0 ms.
2020-08-19 01:20:59,905 DEBUG [regionserver/asf909:0.Chore.1] hbase.ScheduledChore(192): CompactionChecker execution time: 0 ms.
2020-08-19 01:21:00,001 DEBUG [regionserver/asf909:0.Chore.1] hbase.ScheduledChore(192): CompactionChecker execution time: 0 ms.
2020-08-19 01:21:00,004 DEBUG [regionserver/asf909:0.Chore.1] hbase.ScheduledChore(192): StorefileRefresherChore execution time: 2 ms.
2020-08-19 01:21:00,004 DEBUG [regionserver/asf909:0.Chore.1] hbase.ScheduledChore(192): StorefileRefresherChore execution time: 2 ms.
 
The reason for this change is? 

> Use less resources running tests
> --------------------------------
>
>                 Key: HBASE-23956
>                 URL: https://issues.apache.org/jira/browse/HBASE-23956
>             Project: HBase
>          Issue Type: Test
>          Components: test
>            Reporter: Michael Stack
>            Assignee: Michael Stack
>            Priority: Major
>             Fix For: 3.0.0-alpha-1, 2.3.0
>
>         Attachments: Screen Shot 2020-03-09 at 10.11.26 PM.png, Screen Shot 2020-03-09 at 10.11.35 PM.png, Screen Shot 2020-03-09 at 10.20.08 PM.png, Screen Shot 2020-03-10 at 7.10.15 AM.png, Screen Shot 2020-03-10 at 7.10.28 AM.png
>
>
> Our tests can create thousands of threads all up in the one JVM. Using less means less memory, less contention, likelier passes, and later, more possible parallelism.
> I've been studying the likes of TestNamespaceReplicationWithBulkLoadedData to see what it does as it runs (this test puts up 4 clusters with replication between). It peaks at 2k threads. After some configuration and using less HDFS, its possible to get it down to ~800 threads and about 1/2 the memory-used. HDFS is a main offender. DataXceivers (Server and Client), jetty threads, Volume threads (async disk 'worker' then another for cleanup...), image savers, ipc clients -- new thread per incoming connection w/o bound (or reuse), block responder threads, anonymous threads, and so on. Many are not configurable or boundable or are hard-coded; e.g. each volume gets 4 workers regardless. Biggest impact was just downing the count of data nodes. TODO: a follow-on that turns down DN counts in all tests.
> I've been using Java Flight Recorder during this study. Here is how you get a flight recorder for the a single test run: \{code:java} MAVEN_OPTS=" -XX:StartFlightRecording=disk=true,dumponexit=true,filename=recording.jfr,settings=profile,path-to-gc-roots=true,maxsize=1024m" mvn test -Dtest=TestNamespaceReplicationWithBulkLoadedData -Dsurefire.firstPartForkCount=0 -Dsurefire.secondPartForkCount=0 \{code} i.e. start recording on mvn launch, bound the size of the recording, and have the test run in the mvn context (DON'T fork). Useful is connecting to the running test at the same time from JDK Mission Control. We do the latter because the thread reporting screen is overwhelmed by the count of running threads and if you connect live, you can at least get a 'live threads' graph w/ count as the test progresses. Useful. When the test finishes, it dumps a .jfr file which can be opened in JDK MC.
> I've been compiling w/ JDK8 and then running w/ JDK11 so I can use JDK MC Version 7, the non-commercial latest. Works pretty well. Let me put up a patch for tests that cuts down thread counts where we can.
> Let me put up a patch that does first pass on curtailing resource usage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)