You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafficserver.apache.org by "Yunkai Zhang (JIRA)" <ji...@apache.org> on 2013/08/15 14:57:48 UTC
[jira] [Work stopped] (TS-2089) one log collation thread is not enough for busy system

     [ https://issues.apache.org/jira/browse/TS-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Work on TS-2089 stopped by Yunkai Zhang.

> one log collation thread is not enough for busy system
> ------------------------------------------------------
>
>                 Key: TS-2089
>                 URL: https://issues.apache.org/jira/browse/TS-2089
>             Project: Traffic Server
>          Issue Type: Improvement
>          Components: Logging
>            Reporter: Zhao Yongming
>            Assignee: Yunkai Zhang
>             Fix For: 3.5.2
>
>
> we should make log collation more thread & cpu nice.
>  
> {code}
> top - 22:34:22 up  3:37,  1 user,  load average: 1.20, 1.71, 1.66
> Tasks: 979 total,   2 running, 977 sleeping,   0 stopped,   0 zombie
> Cpu(s):  3.5%us,  0.6%sy,  0.0%ni, 95.6%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st
> Mem:  65943744k total, 62536348k used,  3407396k free,    90544k buffers
> Swap:  2097144k total,        0k used,  2097144k free, 54189788k cached
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>  4651 ats       20   0 12.9g 5.3g 4928 R 100.1  8.5 197:51.78 [LOGGING]
>  4460 ats       20   0 12.9g 5.3g 4928 S  7.9  8.5  18:12.57 [ET_NET 44]
>  4457 ats       20   0 12.9g 5.3g 4928 S  5.3  8.5   7:32.34 [ET_NET 41]
>  4461 ats       20   0 12.9g 5.3g 4928 S  4.3  8.5   6:41.79 [ET_NET 45]
>  4453 ats       20   0 12.9g 5.3g 4928 S  3.0  8.5   6:44.52 [ET_NET 37]
>  4463 ats       20   0 12.9g 5.3g 4928 S  3.0  8.5   7:54.01 [ET_NET 47]
>  3652 root      39  19     0    0    0 S  1.0  0.0   2:14.15 kipmi0
>  4425 ats       20   0 12.9g 5.3g 4928 S  0.7  8.5   0:19.21 [ET_NET 9]
>  4426 ats       20   0 12.9g 5.3g 4928 S  0.7  8.5   0:19.42 [ET_NET 10]
>  4430 ats       20   0 12.9g 5.3g 4928 S  0.7  8.5   0:19.55 [ET_NET 14]
>   132 root      20   0     0    0    0 S  0.3  0.0   0:00.82 events/1
>  4417 ats       20   0 12.9g 5.3g 4928 S  0.3  8.5   0:19.82 [ET_NET 1]
>  4418 ats       20   0 12.9g 5.3g 4928 S  0.3  8.5   0:20.02 [ET_NET 2]
>  4420 ats       20   0 12.9g 5.3g 4928 S  0.3  8.5   0:19.64 [ET_NET 4]
>  4422 ats       20   0 12.9g 5.3g 4928 S  0.3  8.5   0:19.84 [ET_NET 6]
>  4427 ats       20   0 12.9g 5.3g 4928 S  0.3  8.5   0:19.94 [ET_NET 11]
>  4435 ats       20   0 12.9g 5.3g 4928 S  0.3  8.5   0:19.88 [ET_NET 19]
> {code}
> == Update ===
> I have developed a patch to fix this issue. After I rebase it to master branch, I'll attach it to here. Here is the description of my patch:
> ---------------
> We found that CPU of logging thread could be easy to reach up 100% in
> collation host, but disk IO was low at the same time.
> The bottleneck of logging thread is that some preprocessing job, such as
> convert LogBuffer to ascii text, consume so much CPU time. And more
> worse, the write() operation will block logging thread.
> So this patch try to split logging thread into two parts:
> 1) Configurable preproc threads, which are responsiable for processing all
>    of prepare work, and then forward the preprocessed buffer to flush thread,
>    or send them to CollationClient/HostSM.
> 2) One Flush thread, it will consume preprocessed buffers and write them to
>    disk. In our testing, one flush thread is enough for us.
> TODO: This patch supports only one flush thread, we can improve it to
>       "one flush thread per file/disk" in the future.
> == How to configure ==
> The number of preproc threads is 1 by default.
> Please modify "proxy.config.log.collation_preproc_threads" option to
> change it.
> ---------------
> ======
> After apply this patch, let's have a look the 'top' output:
> {code}
> [root@test81.cn8 core]# top -H -p 3984 -b -n 1
> top - 15:39:06 up 17:23,  6 users,  load average: 3.37, 3.09, 3.15
> Tasks: 246 total,   6 running, 240 sleeping,   0 stopped,   0 zombie
> Cpu(s):  1.4%us,  0.5%sy,  0.0%ni, 97.6%id,  0.4%wa,  0.0%hi,  0.0%si,  0.0%st
> Mem:  65943744k total, 65763084k used,   180660k free,   136688k buffers
> Swap:  2097144k total,        0k used,  2097144k free, 57275428k cached
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>  4237 ats       20   0 8941m 5.2g 5148 R 39.5  8.3  80:46.23 [LOG_PREPROC 2]
>  4236 ats       20   0 8941m 5.2g 5148 R 37.5  8.3  80:46.53 [LOG_PREPROC 1]
>  4235 ats       20   0 8941m 5.2g 5148 R 35.6  8.3  80:46.71 [LOG_PREPROC 0]
>  4238 ats       20   0 8941m 5.2g 5148 R 35.6  8.3  80:48.37 [LOG_PREPROC 3]
>  4239 ats       20   0 8941m 5.2g 5148 R 35.6  8.3  80:46.58 [LOG_PREPROC 4]
>  4240 ats       20   0 8941m 5.2g 5148 R 11.9  8.3  20:45.28 [LOG_FLUSH]
>  4037 ats       20   0 8941m 5.2g 5148 S  7.9  8.3  15:06.60 [ET_NET 38]
>  4045 ats       20   0 8941m 5.2g 5148 S  5.9  8.3   9:55.74 [ET_NET 46]
>  4038 ats       20   0 8941m 5.2g 5148 S  4.0  8.3   6:08.97 [ET_NET 39]
>  4042 ats       20   0 8941m 5.2g 5148 S  4.0  8.3   8:36.73 [ET_NET 43]
> ...
> {code}
> [LOG_PREPROC 0-4] are collation preproc threads, [LOG_FLUSH] is new flush thread. They have good throughput now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira