You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafficserver.apache.org by "Leif Hedstrom (JIRA)" <ji...@apache.org> on 2013/12/06 19:03:36 UTC
[jira] [Resolved] (TS-2089) one log collation thread is not enough
for busy system
[ https://issues.apache.org/jira/browse/TS-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Leif Hedstrom resolved TS-2089.
-------------------------------
Resolution: Fixed
I believe this was resolved, so closing it.
> one log collation thread is not enough for busy system
> ------------------------------------------------------
>
> Key: TS-2089
> URL: https://issues.apache.org/jira/browse/TS-2089
> Project: Traffic Server
> Issue Type: Improvement
> Components: Logging
> Reporter: Zhao Yongming
> Assignee: Yunkai Zhang
> Fix For: 4.1.2
>
> Attachments: 0001-TS-2089-introduce-configurable-collation-preproc-thr.patch, 0001-TS-2089-introduce-configurable-collation-preproc-thr.patch.V2
>
>
> we should make log collation more thread & cpu nice.
>
> {code}
> top - 22:34:22 up 3:37, 1 user, load average: 1.20, 1.71, 1.66
> Tasks: 979 total, 2 running, 977 sleeping, 0 stopped, 0 zombie
> Cpu(s): 3.5%us, 0.6%sy, 0.0%ni, 95.6%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
> Mem: 65943744k total, 62536348k used, 3407396k free, 90544k buffers
> Swap: 2097144k total, 0k used, 2097144k free, 54189788k cached
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 4651 ats 20 0 12.9g 5.3g 4928 R 100.1 8.5 197:51.78 [LOGGING]
> 4460 ats 20 0 12.9g 5.3g 4928 S 7.9 8.5 18:12.57 [ET_NET 44]
> 4457 ats 20 0 12.9g 5.3g 4928 S 5.3 8.5 7:32.34 [ET_NET 41]
> 4461 ats 20 0 12.9g 5.3g 4928 S 4.3 8.5 6:41.79 [ET_NET 45]
> 4453 ats 20 0 12.9g 5.3g 4928 S 3.0 8.5 6:44.52 [ET_NET 37]
> 4463 ats 20 0 12.9g 5.3g 4928 S 3.0 8.5 7:54.01 [ET_NET 47]
> 3652 root 39 19 0 0 0 S 1.0 0.0 2:14.15 kipmi0
> 4425 ats 20 0 12.9g 5.3g 4928 S 0.7 8.5 0:19.21 [ET_NET 9]
> 4426 ats 20 0 12.9g 5.3g 4928 S 0.7 8.5 0:19.42 [ET_NET 10]
> 4430 ats 20 0 12.9g 5.3g 4928 S 0.7 8.5 0:19.55 [ET_NET 14]
> 132 root 20 0 0 0 0 S 0.3 0.0 0:00.82 events/1
> 4417 ats 20 0 12.9g 5.3g 4928 S 0.3 8.5 0:19.82 [ET_NET 1]
> 4418 ats 20 0 12.9g 5.3g 4928 S 0.3 8.5 0:20.02 [ET_NET 2]
> 4420 ats 20 0 12.9g 5.3g 4928 S 0.3 8.5 0:19.64 [ET_NET 4]
> 4422 ats 20 0 12.9g 5.3g 4928 S 0.3 8.5 0:19.84 [ET_NET 6]
> 4427 ats 20 0 12.9g 5.3g 4928 S 0.3 8.5 0:19.94 [ET_NET 11]
> 4435 ats 20 0 12.9g 5.3g 4928 S 0.3 8.5 0:19.88 [ET_NET 19]
> {code}
> == Update ===
> I have developed a patch to fix this issue. After I rebase it to master branch, I'll attach it to here. Here is the description of my patch:
> ---------------
> We found that CPU of logging thread could be easy to reach up 100% in
> collation host, but disk IO was low at the same time.
> The bottleneck of logging thread is that some preprocessing job, such as
> convert LogBuffer to ascii text, consume so much CPU time. And more
> worse, the write() operation will block logging thread.
> So this patch try to split logging thread into two parts:
> 1) Configurable preproc threads, which are responsiable for processing all
> of prepare work, and then forward the preprocessed buffer to flush thread,
> or send them to CollationClient/HostSM.
> 2) One Flush thread, it will consume preprocessed buffers and write them to
> disk. In our testing, one flush thread is enough for us.
> TODO: This patch supports only one flush thread, we can improve it to
> "one flush thread per file/disk" in the future.
> == How to configure ==
> The number of preproc threads is 1 by default.
> Please modify "proxy.config.log.collation_preproc_threads" option to
> change it.
> ---------------
> ======
> After apply this patch, let's have a look the 'top' output:
> {code}
> [root@test81.cn8 core]# top -H -p 3984 -b -n 1
> top - 15:39:06 up 17:23, 6 users, load average: 3.37, 3.09, 3.15
> Tasks: 246 total, 6 running, 240 sleeping, 0 stopped, 0 zombie
> Cpu(s): 1.4%us, 0.5%sy, 0.0%ni, 97.6%id, 0.4%wa, 0.0%hi, 0.0%si, 0.0%st
> Mem: 65943744k total, 65763084k used, 180660k free, 136688k buffers
> Swap: 2097144k total, 0k used, 2097144k free, 57275428k cached
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 4237 ats 20 0 8941m 5.2g 5148 R 39.5 8.3 80:46.23 [LOG_PREPROC 2]
> 4236 ats 20 0 8941m 5.2g 5148 R 37.5 8.3 80:46.53 [LOG_PREPROC 1]
> 4235 ats 20 0 8941m 5.2g 5148 R 35.6 8.3 80:46.71 [LOG_PREPROC 0]
> 4238 ats 20 0 8941m 5.2g 5148 R 35.6 8.3 80:48.37 [LOG_PREPROC 3]
> 4239 ats 20 0 8941m 5.2g 5148 R 35.6 8.3 80:46.58 [LOG_PREPROC 4]
> 4240 ats 20 0 8941m 5.2g 5148 R 11.9 8.3 20:45.28 [LOG_FLUSH]
> 4037 ats 20 0 8941m 5.2g 5148 S 7.9 8.3 15:06.60 [ET_NET 38]
> 4045 ats 20 0 8941m 5.2g 5148 S 5.9 8.3 9:55.74 [ET_NET 46]
> 4038 ats 20 0 8941m 5.2g 5148 S 4.0 8.3 6:08.97 [ET_NET 39]
> 4042 ats 20 0 8941m 5.2g 5148 S 4.0 8.3 8:36.73 [ET_NET 43]
> ...
> {code}
> [LOG_PREPROC 0-4] are collation preproc threads, [LOG_FLUSH] is new flush thread. They have good throughput now.
--
This message was sent by Atlassian JIRA
(v6.1#6144)