You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@httpd.apache.org by Mark Jacquet <ma...@yahoo.com.INVALID> on 2015/06/17 00:26:26 UTC

[users@httpd] Hung thread

I am seeing something very odd on our Apache 2.4.12 server  (SunOS myhostname 5.10 Generic_118833-36 sun4v sparc SUNW,Sun-Fire-T200)
We are using MPM Worker.
I have been watching the scoreboard all day monitoring system load and running processes/threads.Around 10AM the load jumped to from a normal < 1 to >7 then made it's way up to >20 where it has sat all day with 21 threads in status "W"I traced the threads back to the actual users here at work and asked them what they did, etc. No help there other than they both rapidly made requests to the server (one "restored" a browser session, the other rapidly clicked some URLs in a Word doc). One user even rebooted for me (no effect on Apache)

In any case I have 21 threads in "W" state.
The server has even gone on and created new process leaving these procs behind open with one or more thread active. But the load will not drop!
Pstack of a hung process, this one only has one hung thread, looks like this:

3260:   /codeadm/http_servers/httpd/bin/httpd -f /codeadm/http_servers/httpd/c
-----------------  lwp# 1 / thread# 1  --------------------
 ff041714 lwp_wait (10, ffbff2ec)
 ff03d11c _thrp_join (10, 0, ffbff354, 1, ffbff2ec, ff06cbc0) + 34
 ff24fd08 apr_thread_join (ffbff3d4, 1ef320, ff06cbc0, 0, 0, ff3a2000) + 48
 000d4490 join_workers (1ef4a0, 1f4a88, 1, 1eef00, 1eee50, 1883d0) + 2f8
 000d4e80 child_main (2, d1988, ff06cbc0, 0, 0, ff3a2000) + 7f8
 000d50a8 make_child (1883d0, 2, 134518, 7, 0, 1883d0) + 1b0
 000d5cb0 perform_idle_server_maintenance (ffbff69c, ffbff698, ffbff684, 163188, 1883d0, ff3a0140) + a28
 000d6300 server_main_loop (0, 0, 134518, 7, 0, 1883d0) + 548
 000d67e8 worker_run (134518, 18a470, 1883d0, 150000, ff3a0100, ff3a0140) + 490
 0005dd28 ap_run_mpm (163188, 18a470, 1883d0, 1883d0, 0, 0) + a8
 0004e0e0 main     (5, ffbff8cc, ffbff8e4, 150000, ff3a0100, ff3a0140) + 17b0
 0004b3b4 _start   (0, 0, 0, 0, 0, 0) + dc
-----------------  lwp# 16 / thread# 16  --------------------
 ff31dcc4 find_block_by_offset (19c550, 10, d778, 1, 0, 314628) + 8c
 ff31e218 move_block (19c550, d778, 0, 0, 2, 0) + 228
 ff31f44c apr_rmm_calloc (19c550, 18, fe8e4af8, c, 0, 314628) + 1fc
 fe8e07bc util_ald_alloc (fe580670, 18, 0, 0, 2, 0) + 7c
 fe8e1f20 util_ald_cache_insert (fe580670, fd0f9898, fe8e4af8, c, 0, 314628) + 170
 fe8d9d2c uldap_cache_checkuserid (fe8e4af8, 0, 0, 0, 2, 0) + 1044
 fe9e3f74 authn_ldap_check_password (0, fd0f99ac, 31609f, fd0f9998, 80808080, 1010101) + 834
 fe982470 authenticate_basic_user (314628, 0, 3145e8, 8d, 237120, 25aec0) + 608
 0007f750 ap_run_check_user_id (314628, 236e78, 236e78, 2, d, 25aec0) + 90
 000818fc ap_process_request_internal (314628, 0, 3145e8, 8d, 237120, 25aec0) + 6e4
 000c5288 ap_process_async_request (314628, 236e78, 236e78, 2, d, 25aec0) + 638
 000c5428 ap_process_request (314628, 4, 314628, 8d, 237120, 25aec0) + 20
 000bddc0 ap_process_http_sync_connection (237128, 236e78, 236e78, 2, d, 25aec0) + f0
 000bdfbc ap_process_http_connection (237128, 236e78, 236e78, 8d, 237120, 25aec0) + 64
 000ab038 ap_run_process_connection (237128, 236e78, 236e78, 2, d, 25aec0) + 90
 000ab9bc ap_process_connection (237128, 236e78, 236e78, 8d, 237120, 25aec0) + 8c
 000d235c process_socket (1ef320, 236e30, 236e78, 2, d, 25aec0) + ec
 000d373c worker_thread (1ef320, 1f6ef0, 0, 0, 0, 0) + 49c
 ff24f894 dummy_worker (1ef320, fd0fc000, 0, 0, ff24f840, 1) + 54
 ff0404f4 _lwp_start (0, 0, 0, 0, 0, 0)
-----------------  lwp# 17 / thread# 17  --------------------
 ff24f840 dummy_worker(), exit value = 0x00000000
        ** zombie (exited, not detached, not yet joined) **
-----------------  lwp# 18 / thread# 18  --------------------
 ff24f840 dummy_worker(), exit value = 0x00000000
        ** zombie (exited, not detached, not yet joined) **
-----------------  lwp# 19 / thread# 19  --------------------
 ff24f840 dummy_worker(), exit value = 0x00000000
        ** zombie (exited, not detached, not yet joined) **
-----------------  lwp# 20 / thread# 20  --------------------
 ff24f840 dummy_worker(), exit value = 0x00000000
        ** zombie (exited, not detached, not yet joined) **
-----------------  lwp# 21 / thread# 21  --------------------
 ff24f840 dummy_worker(), exit value = 0x00000000
        ** zombie (exited, not detached, not yet joined) **
-----------------  lwp# 22 / thread# 22  --------------------
 ff24f840 dummy_worker(), exit value = 0x00000000
        ** zombie (exited, not detached, not yet joined) **
-----------------  lwp# 23 / thread# 23  --------------------
 ff24f840 dummy_worker(), exit value = 0x00000000
        ** zombie (exited, not detached, not yet joined) **
-----------------  lwp# 24 / thread# 24  --------------------
 ff24f840 dummy_worker(), exit value = 0x00000000
        ** zombie (exited, not detached, not yet joined) **
-----------------  lwp# 25 / thread# 25  --------------------
 ff24f840 dummy_worker(), exit value = 0x00000000
        ** zombie (exited, not detached, not yet joined) **
-----------------  lwp# 26 / thread# 26  --------------------
 ff24f840 dummy_worker(), exit value = 0x00000000
        ** zombie (exited, not detached, not yet joined) **
-----------------  lwp# 27 / thread# 27  --------------------
 ff24f840 dummy_worker(), exit value = 0x00000000
        ** zombie (exited, not detached, not yet joined) **
newyahoo%  



Partial ScoreBoard looks like:
Server Version: Apache/2.4.12 (Unix)
Server MPM: worker
Server Built: Jun 3 2015 17:19:20

Current Time: Tuesday, 16-Jun-2015 15:01:45 PDT
Restart Time: Monday, 08-Jun-2015 14:30:49 PDT
Parent Server Config. Generation: 1
Parent Server MPM Generation: 0
Server uptime: 8 days 30 minutes 55 seconds
Server load: 23.09 22.46 21.88
Total accesses: 68346 - Total Traffic: 10.0 GB
CPU Usage: u97541.5 s126.35 cu787.35 cs139.55 - 14.2% CPU load
.0986 requests/sec - 15.1 kB/second - 152.7 kB/request
6 requests currently being processed, 94 idle workers

_____________WW_____W__W_____________W_____W______.............W
....................W..W.W.W...W...W..........W.......WW..W.....
..........W...W..W.W..__________________________________________
________

Scoreboard Key:
"_" Waiting for Connection, "S" Starting up, "R" Reading Request,
"W" Sending Reply, "K" Keepalive (read), "D" DNS Lookup,
"C" Closing connection, "L" Logging, "G" Gracefully finishing,
"I" Idle cleanup of worker, "." Open slot with no current process

Net stat shows some hung connections in "CLOSE_WAIT" state for one of the hosts (but not all) that have hung thread/connections:
newyahoo% netstat | grep clienthostname
newyahoo.WWW         clienthostname.62580 65142      0 49896      0 CLOSE_WAIT
newyahoo.WWW         clienthostname.62579 65142      0 49896      0 CLOSE_WAIT
newyahoo.WWW         clienthostname.62582 65142      0 49896      0 CLOSE_WAIT
newyahoo.WWW         clienthostname.62591 65142      0 49896      0 CLOSE_WAIT

Can anyone assist in debugging this? 

I would love to have these threads exist without having to manually restart the server.
ThanksMJ





Re: [users@httpd] Hung thread

Posted by Dr James Smith <js...@sanger.ac.uk>.
Have you looked at installing apache server status code so you can see 
what the last request is on each of these hung threads...

Alternatively if you have something like mod_perl installed one thing 
that you can do is add a handler to warn the PID/request to the error 
logs at the start and end of the requests (with an appropriate tag) then 
you can look at the history of the hung threads to see if there is 
anything consistent with them...

Before I've had threads hang if it is the request after a particular 
request - or on a particular set of circumstances for a particular 
request (infinite loop or something similar)

HTH

James

On 17/08/2015 20:18, Mark Jacquet wrote:
> Jeff/Community
>
> Getting back to this thread after a long time. We tried many things 
> since this initial issue: Moved to linux, tried latest 
> apache/apr/aprutils bins, tried adjusting the configuration, etc. All 
> this failed eventually in the same way: Multiple hung threads 
> eventually overloading the server.
>
> In our current environment we switched to pre-fork mpm thinking that 
> maybe threading was killing us. This seemed to work well until day 20 
> (which seems to be relevant as we got to day 20 a few times). Today 
> all 200 procs (Max Servers) were launched, not one would die. All hung.
>
> The root proc is in this state:
>
> $sudo pstack 5362
> #0  0x00000039892e1353 in __select_nocancel () from /lib64/libc.so.6
> #1  0x00007ffff7989025 in apr_sleep () from 
> /codeadm/http_servers/httpd-2.4.16-prefork/lib/libapr-1.so.0
> #2  0x00000000004325ec in ap_wait_or_timeout ()
> #3  0x0000000000469680 in prefork_run ()
> #4  0x000000000043171e in ap_run_mpm ()
> #5  0x000000000042b9e4 in main ()
>
> Typical pstack from a hung proc is
>
> $ sudo pstack 6100
> #0  0x00007ffff7dd4955 in move_block () from 
> /codeadm/http_servers/httpd-2.4.16-prefork/lib/libaprutil-1.so.0
> #1  0x00007ffff7dd50a1 in apr_rmm_calloc () from 
> /codeadm/http_servers/httpd-2.4.16-prefork/lib/libaprutil-1.so.0
> #2  0x00007ffff5f26c66 in util_ald_strdup () from 
> /codeadm/http_servers/httpd/modules/mod_ldap.so
> #3  0x00007ffff5f2628a in util_ldap_search_node_copy () from 
> /codeadm/http_servers/httpd/modules/mod_ldap.so
> #4  0x00007ffff5f27235 in util_ald_cache_insert () from 
> /codeadm/http_servers/httpd/modules/mod_ldap.so
> #5  0x00007ffff5f2352d in uldap_cache_checkuserid () from 
> /codeadm/http_servers/httpd/modules/mod_ldap.so
> #6  0x00007ffff6b459ae in authn_ldap_check_password () from 
> /codeadm/http_servers/httpd/modules/mod_authnz_ldap.so
> #7  0x00007ffff673ae4f in authenticate_basic_user () from 
> /codeadm/http_servers/httpd/modules/mod_auth_basic.so
> #8  0x0000000000441c90 in ap_run_check_user_id ()
> #9  0x00000000004451d2 in ap_process_request_internal ()
> #10 0x00000000004627d8 in ap_process_async_request ()
> #11 0x000000000046294f in ap_process_request ()
> #12 0x000000000045ec9e in ap_process_http_connection ()
> #13 0x00000000004567f0 in ap_run_process_connection ()
> #14 0x000000000046900e in child_main ()
> #15 0x0000000000469264 in make_child ()
> #16 0x0000000000469d87 in prefork_run ()
> #17 0x000000000043171e in ap_run_mpm ()
> #18 0x000000000042b9e4 in main ()
> [jacquet@llbdub0009 logs]$
>
> Running on Red Hat Enterprise Linux Server release 6.6 (Santiago) with 
> httpd-2.4.16-prefork.
>
> Killing off these hung procs only band-aides the situation. New procs 
> also hang (building up slowly now).
> I am going to have to do a full restart of the server.
> My expectation is that the server will be find again for another 20 days.
>
> Grasping at straws now. Any thoughts on this? Anything to try?
>
> Thanks
> Mj
>
>
>
>
>
> On Thursday, June 18, 2015 7:56 AM, Jeff Trawick <tr...@gmail.com> 
> wrote:
>
>
> On Wed, Jun 17, 2015 at 8:51 PM, Mark Jacquet 
> <mark_jacquet@yahoo.com.invalid 
> <ma...@yahoo.com.invalid>> wrote:
>
>     Just another oddity to add to the issue.
>
>     Overnight several more hung threads appeared and the load on the
>     system had jumped into the mid 20's.
>     After killing these the load did not drop. Looking at the list of
>     running processes I found httpd's running,spawned from the
>     original root httpd process that *were not even displayed* in the
>     scoreboard!!  After killing these hidden zombies off the load
>     dropped again.
>
>
> What's common about the processes?  Similar backtrace to the first one 
> posted?
>
>
>
>     So now I have to catch and kill two types: Zombies on the
>     scoreboard and hidden zombies.
>
>     And this is cute. Some times the zombies hang around so long that
>     when the system gets back to creating a new process for slot #1,
>     if the zombie was originally in that slot it is displayed their
>     along with it's brothers for the new process:
>
>
> "scoreboard squatting"
>
>
>     e.g. Note process 19597 below
>
>     *1-0*166310/33/1320_ 131.22202255280.01.6035.79
>     10.172.91.217newyahoo.oak.sap.corp:80NULL *1-0*166310/18/1087_
>     105.88340736980.00.6926.65
>     10.172.240.113www-dse.oak.sap.corp:80GET
>     /cgi-bin/websql/websql.dir/QTS/bugsheetcont.hts?bugid=74133
>     *1-0*166310/11/1178_ 76.49589542980.00.5634.78
>     10.172.91.92newyahoo.oak.sap.corp:80NULL *1-0*166310/32/1295_
>     92.17425417130.04.0342.07
>     10.172.240.113newyahoo.oak.sap.corp:80NULL *1-0*195970/26/1319*W*
>     35.552441700.00.5437.10 10.172.248.87www-rev.oak.sap.corp:80GET
>     /cgi-bin/rev.cgi?action=105;id=58037 HTTP/1.1 *1-0*166310/12/1427_
>     18.41794100.00.14238.52 10.172.240.113newyahoo.oak.sap.corp:80NULL
>     *1-0*166310/27/1442_ 30.67719695430.00.7835.07
>     10.172.85.9newyahoo.oak.sap.corp:80NULL *1-0*166310/19/784_
>     10.70940630.00.4520.95 10.172.246.203newyahoo.oak.sap.corp:80NULL
>     *1-0*166310/8/1034_ 2.86103144630.00.0124.04
>     10.172.90.155newyahoo.oak.sap.corp:80NULL *2-0*-0/0/99.
>     58.943145013820.00.002.15
>     10.136.66.135newyahoo.oak.sap.corp:80NULL *2-0*-0/0/82.
>     2181.923144824390.00.001.48
>     10.162.65.165www-dse.oak.sap.corp:80POST
>     /cgi-bin/websql/websql.dir/QTS/bugsescalated.pl?product=AN
>     <http://bugsescalated.pl/?product=AN> *2-0*-0/0/162.
>     2027.12314509350.00.003.36 10.50.3.99newyahoo.oak.sap.corp:80NULL
>     *2-0*-0/0/576. 1704.40314504100.00.0013.38
>     10.172.240.113newyahoo.oak.sap.corp:80NULL *2-0*-0/0/928.
>     1295.363145029750.00.0024.38
>     10.50.17.221newyahoo.oak.sap.corp:80NULL *2-0*-0/0/852.
>     1798.52314503810.00.0020.72
>     10.162.65.165newyahoo.oak.sap.corp:80NULL *2-0*-0/0/1084.
>     551.293145022210.00.0026.52
>     10.176.138.162newyahoo.oak.sap.corp:80NULL *2-0*-0/0/1180.
>     385.833145019630.00.0034.31
>     10.162.65.197newyahoo.oak.sap.corp:80NULL *2-0*-0/0/50.
>     50.713145000.00.001.62 10.58.181.166www-rev.oak.sap.corp:80GET
>     /cgi-bin/rev.cgi?action=105;id=58051 HTTP/1.1
>     *2-0*137610/12/1078*W* 58.803489600.00.1031.67
>     10.172.107.38www-rev.oak.sap.corp:80POST /cgi-bin/rev.cgi HTTP/1.1
>     *2-0*-0/0/1075. 1061.5331450790.00.0031.65
>     10.172.90.88newyahoo.oak.sap.corp:80GET /server-status HTTP/1.1
>     *2-0*-0/0/1362. 46.803145080.00.0039.72
>     10.172.107.38www-rev.oak.sap.corp:80POST /cgi-bin/rev.cgi HTTP/1.1
>     *2-0*-0/0/1142. 56.693145011490.00.0035.22
>     10.172.240.113newyahoo.oak.sap.corp:80NUL
>     Slot #2 currently not being used (still has zombie)
>
>     MJ
>
>
>
>
>     Mj
>
>
>
>
>     On Tuesday, June 16, 2015 5:42 PM, Mark Jacquet
>     <ma...@yahoo.com.INVALID> wrote:
>
>
>     Upgrade as in Apache upgrade or Solaris 5.10 patch upgrad? :)
>
>     Apache is all new of course 2.4.12 with the latest add on sources
>     (apr, pcre, etc)
>     The bad news is the OS is not at all up to date. And for reasons I
>     have no control over, I cannot patch.
>     So if this is an OS issue then ......
>
>     I seem to be running with the Sun Native LDAP SDK. Would building
>     against  different LDAP source help? (Open LDAP)?
>
>     Long term plan -> moving all Apache servers to Linux
>
>     Mj
>
>
>
>     On Tuesday, June 16, 2015 5:31 PM, Eric Covener <covener@gmail.com
>     <ma...@gmail.com>> wrote:
>
>
>     On Tue, Jun 16, 2015 at 8:23 PM, Mark Jacquet
>
>     <mark_jacquet@yahoo.com.invalid
>     <ma...@yahoo.com.invalid>> wrote:
>     > So do you think this hang is related to the native LDAP lib code?
>
>
>     It is possible but IMO not very likely. It has to corrutp memory just
>     enough to put a looping structure in apr_rmm. What's your upgrade
>     history like?
>
>     -- 
>     Eric Covener
>     covener@gmail.com <ma...@gmail.com>
>
>     ---------------------------------------------------------------------
>     To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
>     <ma...@httpd.apache.org>
>     For additional commands, e-mail: users-help@httpd.apache.org
>     <ma...@httpd.apache.org>
>
>
>
>
>
>
>
>
>
> -- 
> Born in Roswell... married an alien...
> http://emptyhammock.com/
>
>
>




-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 

Re: [users@httpd] Hung thread

Posted by Mark Jacquet <ma...@yahoo.com.INVALID>.
Jeff/Community
Getting back to this thread after a long time. We tried many things since this initial issue: Moved to linux, tried latest apache/apr/aprutils bins, tried adjusting the configuration, etc. All this failed eventually in the same way: Multiple hung threads eventually overloading the server.
In our current environment we switched to pre-fork mpm thinking that maybe threading was killing us. This seemed to work well until day 20 (which seems to be relevant as we got to day 20 a few times). Today all 200 procs (Max Servers) were launched, not one would die. All hung.

The root proc is in this state:
$sudo pstack 5362
#0  0x00000039892e1353 in __select_nocancel () from /lib64/libc.so.6
#1  0x00007ffff7989025 in apr_sleep () from /codeadm/http_servers/httpd-2.4.16-prefork/lib/libapr-1.so.0
#2  0x00000000004325ec in ap_wait_or_timeout ()
#3  0x0000000000469680 in prefork_run ()
#4  0x000000000043171e in ap_run_mpm ()
#5  0x000000000042b9e4 in main ()
Typical pstack from a hung proc is
$ sudo pstack 6100
#0  0x00007ffff7dd4955 in move_block () from /codeadm/http_servers/httpd-2.4.16-prefork/lib/libaprutil-1.so.0
#1  0x00007ffff7dd50a1 in apr_rmm_calloc () from /codeadm/http_servers/httpd-2.4.16-prefork/lib/libaprutil-1.so.0
#2  0x00007ffff5f26c66 in util_ald_strdup () from /codeadm/http_servers/httpd/modules/mod_ldap.so
#3  0x00007ffff5f2628a in util_ldap_search_node_copy () from /codeadm/http_servers/httpd/modules/mod_ldap.so
#4  0x00007ffff5f27235 in util_ald_cache_insert () from /codeadm/http_servers/httpd/modules/mod_ldap.so
#5  0x00007ffff5f2352d in uldap_cache_checkuserid () from /codeadm/http_servers/httpd/modules/mod_ldap.so
#6  0x00007ffff6b459ae in authn_ldap_check_password () from /codeadm/http_servers/httpd/modules/mod_authnz_ldap.so
#7  0x00007ffff673ae4f in authenticate_basic_user () from /codeadm/http_servers/httpd/modules/mod_auth_basic.so
#8  0x0000000000441c90 in ap_run_check_user_id ()
#9  0x00000000004451d2 in ap_process_request_internal ()
#10 0x00000000004627d8 in ap_process_async_request ()
#11 0x000000000046294f in ap_process_request ()
#12 0x000000000045ec9e in ap_process_http_connection ()
#13 0x00000000004567f0 in ap_run_process_connection ()
#14 0x000000000046900e in child_main ()
#15 0x0000000000469264 in make_child ()
#16 0x0000000000469d87 in prefork_run ()
#17 0x000000000043171e in ap_run_mpm ()
#18 0x000000000042b9e4 in main ()
[jacquet@llbdub0009 logs]$ 

Running on Red Hat Enterprise Linux Server release 6.6 (Santiago) with httpd-2.4.16-prefork.

Killing off these hung procs only band-aides the situation. New procs also hang (building up slowly now).I am going to have to do a full restart of the server.My expectation is that the server will be find again for another 20 days.
Grasping at straws now. Any thoughts on this? Anything to try?
ThanksMj





     On Thursday, June 18, 2015 7:56 AM, Jeff Trawick <tr...@gmail.com> wrote:
   

 On Wed, Jun 17, 2015 at 8:51 PM, Mark Jacquet <ma...@yahoo.com.invalid> wrote:

Just another oddity to add to the issue.
Overnight several more hung threads appeared and the load on the system had jumped into the mid 20's.After killing these the load did not drop. Looking at the list of running processes I found httpd's running,spawned from the original root httpd process that *were not even displayed* in the scoreboard!!  After killing these hidden zombies off the load dropped again.

What's common about the processes?  Similar backtrace to the first one posted?

 

So now I have to catch and kill two types: Zombies on the scoreboard and hidden zombies.
And this is cute. Some times the zombies hang around so long that when the system gets back to creating a new process for slot #1, if the zombie was originally in that slot it is displayed their along with it's brothers for the new process:


"scoreboard squatting"


e.g. Note process 19597 below

1-0166310/33/1320_131.22202255280.01.6035.7910.172.91.217newyahoo.oak.sap.corp:80NULL1-0166310/18/1087_105.88340736980.00.6926.6510.172.240.113www-dse.oak.sap.corp:80GET /cgi-bin/websql/websql.dir/QTS/bugsheetcont.hts?bugid=741331-0166310/11/1178_76.49589542980.00.5634.7810.172.91.92newyahoo.oak.sap.corp:80NULL1-0166310/32/1295_92.17425417130.04.0342.0710.172.240.113newyahoo.oak.sap.corp:80NULL1-0195970/26/1319W35.552441700.00.5437.1010.172.248.87www-rev.oak.sap.corp:80GET /cgi-bin/rev.cgi?action=105;id=58037 HTTP/1.11-0166310/12/1427_18.41794100.00.14238.5210.172.240.113newyahoo.oak.sap.corp:80NULL1-0166310/27/1442_30.67719695430.00.7835.0710.172.85.9newyahoo.oak.sap.corp:80NULL1-0166310/19/784_10.70940630.00.4520.9510.172.246.203newyahoo.oak.sap.corp:80NULL1-0166310/8/1034_2.86103144630.00.0124.0410.172.90.155newyahoo.oak.sap.corp:80NULL2-0-0/0/99.58.943145013820.00.002.1510.136.66.135newyahoo.oak.sap.corp:80NULL2-0-0/0/82.2181.923144824390.00.001.4810.162.65.165www-dse.oak.sap.corp:80POST /cgi-bin/websql/websql.dir/QTS/bugsescalated.pl?product=AN2-0-0/0/162.2027.12314509350.00.003.3610.50.3.99newyahoo.oak.sap.corp:80NULL2-0-0/0/576.1704.40314504100.00.0013.3810.172.240.113newyahoo.oak.sap.corp:80NULL2-0-0/0/928.1295.363145029750.00.0024.3810.50.17.221newyahoo.oak.sap.corp:80NULL2-0-0/0/852.1798.52314503810.00.0020.7210.162.65.165newyahoo.oak.sap.corp:80NULL2-0-0/0/1084.551.293145022210.00.0026.5210.176.138.162newyahoo.oak.sap.corp:80NULL2-0-0/0/1180.385.833145019630.00.0034.3110.162.65.197newyahoo.oak.sap.corp:80NULL2-0-0/0/50.50.713145000.00.001.6210.58.181.166www-rev.oak.sap.corp:80GET /cgi-bin/rev.cgi?action=105;id=58051 HTTP/1.12-0137610/12/1078W58.803489600.00.1031.6710.172.107.38www-rev.oak.sap.corp:80POST /cgi-bin/rev.cgi HTTP/1.12-0-0/0/1075.1061.5331450790.00.0031.6510.172.90.88newyahoo.oak.sap.corp:80GET /server-status HTTP/1.12-0-0/0/1362.46.803145080.00.0039.7210.172.107.38www-rev.oak.sap.corp:80POST /cgi-bin/rev.cgi HTTP/1.12-0-0/0/1142.56.693145011490.00.0035.2210.172.240.113newyahoo.oak.sap.corp:80NUL
Slot #2 currently not being used (still has zombie)
MJ



Mj 



     On Tuesday, June 16, 2015 5:42 PM, Mark Jacquet <ma...@yahoo.com.INVALID> wrote:
   

 Upgrade as in Apache upgrade or Solaris 5.10 patch upgrad? :)
Apache is all new of course 2.4.12 with the latest add on sources (apr, pcre, etc)The bad news is the OS is not at all up to date. And for reasons I have no control over, I cannot patch.So if this is an OS issue then ......
I seem to be running with the Sun Native LDAP SDK. Would building against  different LDAP source help? (Open LDAP)?

Long term plan -> moving all Apache servers to Linux
Mj
 


     On Tuesday, June 16, 2015 5:31 PM, Eric Covener <co...@gmail.com> wrote:
   

 On Tue, Jun 16, 2015 at 8:23 PM, Mark Jacquet
<ma...@yahoo.com.invalid> wrote:
> So do you think this hang is related to the native LDAP lib code?

It is possible but IMO not very likely. It has to corrutp memory just
enough to put a looping structure in apr_rmm.  What's your upgrade
history like?

-- 
Eric Covener
covener@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org



   

   



-- 
Born in Roswell... married an alien...
http://emptyhammock.com/


  

Re: [users@httpd] Hung thread

Posted by Mark Jacquet <ma...@yahoo.com.INVALID>.
Yes, Each hung thread shows the exact same back trace. And each was spawned by a request to the same CGI script but with differing arguments.
There is an LDAP login requirement for this tool. Not sure if that is interesting as many other tools on this same server require LDAP authentication as well.

MJ
 


     On Thursday, June 18, 2015 7:56 AM, Jeff Trawick <tr...@gmail.com> wrote:
   

 On Wed, Jun 17, 2015 at 8:51 PM, Mark Jacquet <ma...@yahoo.com.invalid> wrote:

Just another oddity to add to the issue.
Overnight several more hung threads appeared and the load on the system had jumped into the mid 20's.After killing these the load did not drop. Looking at the list of running processes I found httpd's running,spawned from the original root httpd process that *were not even displayed* in the scoreboard!!  After killing these hidden zombies off the load dropped again.

What's common about the processes?  Similar backtrace to the first one posted?

 

So now I have to catch and kill two types: Zombies on the scoreboard and hidden zombies.
And this is cute. Some times the zombies hang around so long that when the system gets back to creating a new process for slot #1, if the zombie was originally in that slot it is displayed their along with it's brothers for the new process:


"scoreboard squatting"


e.g. Note process 19597 below

1-0166310/33/1320_131.22202255280.01.6035.7910.172.91.217newyahoo.oak.sap.corp:80NULL1-0166310/18/1087_105.88340736980.00.6926.6510.172.240.113www-dse.oak.sap.corp:80GET /cgi-bin/websql/websql.dir/QTS/bugsheetcont.hts?bugid=741331-0166310/11/1178_76.49589542980.00.5634.7810.172.91.92newyahoo.oak.sap.corp:80NULL1-0166310/32/1295_92.17425417130.04.0342.0710.172.240.113newyahoo.oak.sap.corp:80NULL1-0195970/26/1319W35.552441700.00.5437.1010.172.248.87www-rev.oak.sap.corp:80GET /cgi-bin/rev.cgi?action=105;id=58037 HTTP/1.11-0166310/12/1427_18.41794100.00.14238.5210.172.240.113newyahoo.oak.sap.corp:80NULL1-0166310/27/1442_30.67719695430.00.7835.0710.172.85.9newyahoo.oak.sap.corp:80NULL1-0166310/19/784_10.70940630.00.4520.9510.172.246.203newyahoo.oak.sap.corp:80NULL1-0166310/8/1034_2.86103144630.00.0124.0410.172.90.155newyahoo.oak.sap.corp:80NULL2-0-0/0/99.58.943145013820.00.002.1510.136.66.135newyahoo.oak.sap.corp:80NULL2-0-0/0/82.2181.923144824390.00.001.4810.162.65.165www-dse.oak.sap.corp:80POST /cgi-bin/websql/websql.dir/QTS/bugsescalated.pl?product=AN2-0-0/0/162.2027.12314509350.00.003.3610.50.3.99newyahoo.oak.sap.corp:80NULL2-0-0/0/576.1704.40314504100.00.0013.3810.172.240.113newyahoo.oak.sap.corp:80NULL2-0-0/0/928.1295.363145029750.00.0024.3810.50.17.221newyahoo.oak.sap.corp:80NULL2-0-0/0/852.1798.52314503810.00.0020.7210.162.65.165newyahoo.oak.sap.corp:80NULL2-0-0/0/1084.551.293145022210.00.0026.5210.176.138.162newyahoo.oak.sap.corp:80NULL2-0-0/0/1180.385.833145019630.00.0034.3110.162.65.197newyahoo.oak.sap.corp:80NULL2-0-0/0/50.50.713145000.00.001.6210.58.181.166www-rev.oak.sap.corp:80GET /cgi-bin/rev.cgi?action=105;id=58051 HTTP/1.12-0137610/12/1078W58.803489600.00.1031.6710.172.107.38www-rev.oak.sap.corp:80POST /cgi-bin/rev.cgi HTTP/1.12-0-0/0/1075.1061.5331450790.00.0031.6510.172.90.88newyahoo.oak.sap.corp:80GET /server-status HTTP/1.12-0-0/0/1362.46.803145080.00.0039.7210.172.107.38www-rev.oak.sap.corp:80POST /cgi-bin/rev.cgi HTTP/1.12-0-0/0/1142.56.693145011490.00.0035.2210.172.240.113newyahoo.oak.sap.corp:80NUL
Slot #2 currently not being used (still has zombie)
MJ



Mj 



     On Tuesday, June 16, 2015 5:42 PM, Mark Jacquet <ma...@yahoo.com.INVALID> wrote:
   

 Upgrade as in Apache upgrade or Solaris 5.10 patch upgrad? :)
Apache is all new of course 2.4.12 with the latest add on sources (apr, pcre, etc)The bad news is the OS is not at all up to date. And for reasons I have no control over, I cannot patch.So if this is an OS issue then ......
I seem to be running with the Sun Native LDAP SDK. Would building against  different LDAP source help? (Open LDAP)?

Long term plan -> moving all Apache servers to Linux
Mj
 


     On Tuesday, June 16, 2015 5:31 PM, Eric Covener <co...@gmail.com> wrote:
   

 On Tue, Jun 16, 2015 at 8:23 PM, Mark Jacquet
<ma...@yahoo.com.invalid> wrote:
> So do you think this hang is related to the native LDAP lib code?

It is possible but IMO not very likely. It has to corrutp memory just
enough to put a looping structure in apr_rmm.  What's your upgrade
history like?

-- 
Eric Covener
covener@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org



   

   



-- 
Born in Roswell... married an alien...
http://emptyhammock.com/


  

Re: [users@httpd] Hung thread

Posted by Jeff Trawick <tr...@gmail.com>.
On Wed, Jun 17, 2015 at 8:51 PM, Mark Jacquet <
mark_jacquet@yahoo.com.invalid> wrote:

> Just another oddity to add to the issue.
>
> Overnight several more hung threads appeared and the load on the system
> had jumped into the mid 20's.
> After killing these the load did not drop. Looking at the list of running
> processes I found httpd's running,spawned from the original root httpd
> process that *were not even displayed* in the scoreboard!!  After killing
> these hidden zombies off the load dropped again.
>

What's common about the processes?  Similar backtrace to the first one
posted?




>
> So now I have to catch and kill two types: Zombies on the scoreboard and
> hidden zombies.
>
> And this is cute. Some times the zombies hang around so long that when the
> system gets back to creating a new process for slot #1, if the zombie was
> originally in that slot it is displayed their along with it's brothers for
> the new process:
>
>
"scoreboard squatting"


e.g. Note process 19597 below
>
> *1-0*166310/33/1320_ 131.22202255280.01.6035.79
> 10.172.91.217newyahoo.oak.sap.corp:80NULL *1-0*166310/18/1087_
> 105.88340736980.00.6926.65 10.172.240.113www-dse.oak.sap.corp:80GET
> /cgi-bin/websql/websql.dir/QTS/bugsheetcont.hts?bugid=74133 *1-0*166310/11/1178_
> 76.49589542980.00.5634.78 10.172.91.92newyahoo.oak.sap.corp:80NULL *1-0*166310/32/1295_
> 92.17425417130.04.0342.07 10.172.240.113newyahoo.oak.sap.corp:80NULL *1-0*
> 195970/26/1319*W* 35.552441700.00.5437.10
> 10.172.248.87www-rev.oak.sap.corp:80GET
> /cgi-bin/rev.cgi?action=105;id=58037 HTTP/1.1 *1-0*166310/12/1427_
> 18.41794100.00.14238.52 10.172.240.113newyahoo.oak.sap.corp:80NULL *1-0*166310/27/1442_
> 30.67719695430.00.7835.07 10.172.85.9newyahoo.oak.sap.corp:80NULL *1-0*166310/19/784_
> 10.70940630.00.4520.95 10.172.246.203newyahoo.oak.sap.corp:80NULL *1-0*166310/8/1034_
> 2.86103144630.00.0124.04 10.172.90.155newyahoo.oak.sap.corp:80NULL *2-0*-0/0/99.
> 58.943145013820.00.002.15 10.136.66.135newyahoo.oak.sap.corp:80NULL *2-0*-0/0/82.
> 2181.923144824390.00.001.48 10.162.65.165www-dse.oak.sap.corp:80POST
> /cgi-bin/websql/websql.dir/QTS/bugsescalated.pl?product=AN *2-0*-0/0/162.
> 2027.12314509350.00.003.36 10.50.3.99newyahoo.oak.sap.corp:80NULL *2-0*-0/0/576.
> 1704.40314504100.00.0013.38 10.172.240.113newyahoo.oak.sap.corp:80NULL
> *2-0*-0/0/928. 1295.363145029750.00.0024.38
> 10.50.17.221newyahoo.oak.sap.corp:80NULL *2-0*-0/0/852.
> 1798.52314503810.00.0020.72 10.162.65.165newyahoo.oak.sap.corp:80NULL
> *2-0*-0/0/1084. 551.293145022210.00.0026.52
> 10.176.138.162newyahoo.oak.sap.corp:80NULL *2-0*-0/0/1180.
> 385.833145019630.00.0034.31 10.162.65.197newyahoo.oak.sap.corp:80NULL
> *2-0*-0/0/50. 50.713145000.00.001.62
> 10.58.181.166www-rev.oak.sap.corp:80GET
> /cgi-bin/rev.cgi?action=105;id=58051 HTTP/1.1 *2-0*137610/12/1078*W*
> 58.803489600.00.1031.67 10.172.107.38www-rev.oak.sap.corp:80POST
> /cgi-bin/rev.cgi HTTP/1.1 *2-0*-0/0/1075. 1061.5331450790.00.0031.65
> 10.172.90.88newyahoo.oak.sap.corp:80GET /server-status HTTP/1.1 *2-0*-0/0/1362.
> 46.803145080.00.0039.72 10.172.107.38www-rev.oak.sap.corp:80POST
> /cgi-bin/rev.cgi HTTP/1.1 *2-0*-0/0/1142. 56.693145011490.00.0035.22
> 10.172.240.113newyahoo.oak.sap.corp:80NUL
> Slot #2 currently not being used (still has zombie)
>
> MJ
>
>
>
>
> Mj
>
>
>
>
>   On Tuesday, June 16, 2015 5:42 PM, Mark Jacquet
> <ma...@yahoo.com.INVALID> wrote:
>
>
> Upgrade as in Apache upgrade or Solaris 5.10 patch upgrad? :)
>
> Apache is all new of course 2.4.12 with the latest add on sources (apr,
> pcre, etc)
> The bad news is the OS is not at all up to date. And for reasons I have no
> control over, I cannot patch.
> So if this is an OS issue then ......
>
> I seem to be running with the Sun Native LDAP SDK. Would building against
> different LDAP source help? (Open LDAP)?
>
> Long term plan -> moving all Apache servers to Linux
>
> Mj
>
>
>
>   On Tuesday, June 16, 2015 5:31 PM, Eric Covener <co...@gmail.com>
> wrote:
>
>
> On Tue, Jun 16, 2015 at 8:23 PM, Mark Jacquet
>
> <ma...@yahoo.com.invalid> wrote:
> > So do you think this hang is related to the native LDAP lib code?
>
>
> It is possible but IMO not very likely. It has to corrutp memory just
> enough to put a looping structure in apr_rmm.  What's your upgrade
> history like?
>
> --
> Eric Covener
> covener@gmail.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
> For additional commands, e-mail: users-help@httpd.apache.org
>
>
>
>
>
>
>


-- 
Born in Roswell... married an alien...
http://emptyhammock.com/

Re: [users@httpd] Hung thread

Posted by Mark Jacquet <ma...@yahoo.com.INVALID>.
Just another oddity to add to the issue.
Overnight several more hung threads appeared and the load on the system had jumped into the mid 20's.After killing these the load did not drop. Looking at the list of running processes I found httpd's running,spawned from the original root httpd process that *were not even displayed* in the scoreboard!!  After killing these hidden zombies off the load dropped again.
So now I have to catch and kill two types: Zombies on the scoreboard and hidden zombies.
And this is cute. Some times the zombies hang around so long that when the system gets back to creating a new process for slot #1, if the zombie was originally in that slot it is displayed their along with it's brothers for the new process:
e.g. Note process 19597 below

1-0166310/33/1320_131.22202255280.01.6035.7910.172.91.217newyahoo.oak.sap.corp:80NULL1-0166310/18/1087_105.88340736980.00.6926.6510.172.240.113www-dse.oak.sap.corp:80GET /cgi-bin/websql/websql.dir/QTS/bugsheetcont.hts?bugid=741331-0166310/11/1178_76.49589542980.00.5634.7810.172.91.92newyahoo.oak.sap.corp:80NULL1-0166310/32/1295_92.17425417130.04.0342.0710.172.240.113newyahoo.oak.sap.corp:80NULL1-0195970/26/1319W35.552441700.00.5437.1010.172.248.87www-rev.oak.sap.corp:80GET /cgi-bin/rev.cgi?action=105;id=58037 HTTP/1.11-0166310/12/1427_18.41794100.00.14238.5210.172.240.113newyahoo.oak.sap.corp:80NULL1-0166310/27/1442_30.67719695430.00.7835.0710.172.85.9newyahoo.oak.sap.corp:80NULL1-0166310/19/784_10.70940630.00.4520.9510.172.246.203newyahoo.oak.sap.corp:80NULL1-0166310/8/1034_2.86103144630.00.0124.0410.172.90.155newyahoo.oak.sap.corp:80NULL2-0-0/0/99.58.943145013820.00.002.1510.136.66.135newyahoo.oak.sap.corp:80NULL2-0-0/0/82.2181.923144824390.00.001.4810.162.65.165www-dse.oak.sap.corp:80POST /cgi-bin/websql/websql.dir/QTS/bugsescalated.pl?product=AN2-0-0/0/162.2027.12314509350.00.003.3610.50.3.99newyahoo.oak.sap.corp:80NULL2-0-0/0/576.1704.40314504100.00.0013.3810.172.240.113newyahoo.oak.sap.corp:80NULL2-0-0/0/928.1295.363145029750.00.0024.3810.50.17.221newyahoo.oak.sap.corp:80NULL2-0-0/0/852.1798.52314503810.00.0020.7210.162.65.165newyahoo.oak.sap.corp:80NULL2-0-0/0/1084.551.293145022210.00.0026.5210.176.138.162newyahoo.oak.sap.corp:80NULL2-0-0/0/1180.385.833145019630.00.0034.3110.162.65.197newyahoo.oak.sap.corp:80NULL2-0-0/0/50.50.713145000.00.001.6210.58.181.166www-rev.oak.sap.corp:80GET /cgi-bin/rev.cgi?action=105;id=58051 HTTP/1.12-0137610/12/1078W58.803489600.00.1031.6710.172.107.38www-rev.oak.sap.corp:80POST /cgi-bin/rev.cgi HTTP/1.12-0-0/0/1075.1061.5331450790.00.0031.6510.172.90.88newyahoo.oak.sap.corp:80GET /server-status HTTP/1.12-0-0/0/1362.46.803145080.00.0039.7210.172.107.38www-rev.oak.sap.corp:80POST /cgi-bin/rev.cgi HTTP/1.12-0-0/0/1142.56.693145011490.00.0035.2210.172.240.113newyahoo.oak.sap.corp:80NUL
Slot #2 currently not being used (still has zombie)
MJ



Mj 



     On Tuesday, June 16, 2015 5:42 PM, Mark Jacquet <ma...@yahoo.com.INVALID> wrote:
   

 Upgrade as in Apache upgrade or Solaris 5.10 patch upgrad? :)
Apache is all new of course 2.4.12 with the latest add on sources (apr, pcre, etc)The bad news is the OS is not at all up to date. And for reasons I have no control over, I cannot patch.So if this is an OS issue then ......
I seem to be running with the Sun Native LDAP SDK. Would building against  different LDAP source help? (Open LDAP)?

Long term plan -> moving all Apache servers to Linux
Mj
 


     On Tuesday, June 16, 2015 5:31 PM, Eric Covener <co...@gmail.com> wrote:
   

 On Tue, Jun 16, 2015 at 8:23 PM, Mark Jacquet
<ma...@yahoo.com.invalid> wrote:
> So do you think this hang is related to the native LDAP lib code?

It is possible but IMO not very likely. It has to corrutp memory just
enough to put a looping structure in apr_rmm.  What's your upgrade
history like?

-- 
Eric Covener
covener@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org



   

  

Re: [users@httpd] Hung thread

Posted by Mark Jacquet <ma...@yahoo.com.INVALID>.
Upgrade as in Apache upgrade or Solaris 5.10 patch upgrad? :)
Apache is all new of course 2.4.12 with the latest add on sources (apr, pcre, etc)The bad news is the OS is not at all up to date. And for reasons I have no control over, I cannot patch.So if this is an OS issue then ......
I seem to be running with the Sun Native LDAP SDK. Would building against  different LDAP source help? (Open LDAP)?

Long term plan -> moving all Apache servers to Linux
Mj
 


     On Tuesday, June 16, 2015 5:31 PM, Eric Covener <co...@gmail.com> wrote:
   

 On Tue, Jun 16, 2015 at 8:23 PM, Mark Jacquet
<ma...@yahoo.com.invalid> wrote:
> So do you think this hang is related to the native LDAP lib code?

It is possible but IMO not very likely. It has to corrutp memory just
enough to put a looping structure in apr_rmm.  What's your upgrade
history like?

-- 
Eric Covener
covener@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org



  

Re: [users@httpd] Hung thread

Posted by Eric Covener <co...@gmail.com>.
On Tue, Jun 16, 2015 at 8:23 PM, Mark Jacquet
<ma...@yahoo.com.invalid> wrote:
> So do you think this hang is related to the native LDAP lib code?

It is possible but IMO not very likely. It has to corrutp memory just
enough to put a looping structure in apr_rmm.  What's your upgrade
history like?

-- 
Eric Covener
covener@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] Hung thread

Posted by Mark Jacquet <ma...@yahoo.com.INVALID>.
I just did a test and killed off 4 of the 6 processes with multiple threads stuck in the same place.After each kill the "W's" went away (grocs gone from the scoreboard) and the load went down. The good news is that the server stayed up, and seems to be running fine.
So do you think this hang is related to the native LDAP lib code?
I built Apache/APR using:

[Mon Jun 08 14:30:49.297984 2015] [ldap:info] [pid 2604:tid 1] AH01318: APR LDAP: Built with Sun Microsystems Inc. LDAP SDK

I could download a different LDAP (OpenLDAP?) and rebuild with that.
MJ
 


     On Tuesday, June 16, 2015 4:45 PM, Jeff Trawick <tr...@gmail.com> wrote:
   

 
On Jun 16, 2015 18:26, "Mark Jacquet" <ma...@yahoo.com.invalid> wrote:
>
> I am seeing something very odd on our Apache 2.4.12 server  (SunOS myhostname 5.10 Generic_118833-36 sun4v sparc SUNW,Sun-Fire-T200)
> We are using MPM Worker.
>
> I have been watching the scoreboard all day monitoring system load and running processes/threads.
> Around 10AM the load jumped to from a normal < 1 to >7 then made it's way up to >20 where it has sat all day with 21 threads in status "W"
> I traced the threads back to the actual users here at work and asked them what they did, etc. No help there other than they both rapidly made requests to the server (one "restored" a browser session, the other rapidly clicked some URLs in a Word doc). One user even rebooted for me (no effect on Apache)
>
> In any case I have 21 threads in "W" state.
>
> The server has even gone on and created new process leaving these procs behind open with one or more thread active. But the load will not drop!
>
> Pstack of a hung process, this one only has one hung thread, looks like this:
>
>
> 3260:   /codeadm/http_servers/httpd/bin/httpd -f /codeadm/http_servers/httpd/c
> -----------------  lwp# 1 / thread# 1  --------------------
>  ff041714 lwp_wait (10, ffbff2ec)
>  ff03d11c _thrp_join (10, 0, ffbff354, 1, ffbff2ec, ff06cbc0) + 34
>  ff24fd08 apr_thread_join (ffbff3d4, 1ef320, ff06cbc0, 0, 0, ff3a2000) + 48
>  000d4490 join_workers (1ef4a0, 1f4a88, 1, 1eef00, 1eee50, 1883d0) + 2f8
>  000d4e80 child_main (2, d1988, ff06cbc0, 0, 0, ff3a2000) + 7f8
>  000d50a8 make_child (1883d0, 2, 134518, 7, 0, 1883d0) + 1b0
>  000d5cb0 perform_idle_server_maintenance (ffbff69c, ffbff698, ffbff684, 163188, 1883d0, ff3a0140) + a28
>  000d6300 server_main_loop (0, 0, 134518, 7, 0, 1883d0) + 548
>  000d67e8 worker_run (134518, 18a470, 1883d0, 150000, ff3a0100, ff3a0140) + 490
>  0005dd28 ap_run_mpm (163188, 18a470, 1883d0, 1883d0, 0, 0) + a8
>  0004e0e0 main     (5, ffbff8cc, ffbff8e4, 150000, ff3a0100, ff3a0140) + 17b0
>  0004b3b4 _start   (0, 0, 0, 0, 0, 0) + dc
> -----------------  lwp# 16 / thread# 16  --------------------
>  ff31dcc4 find_block_by_offset (19c550, 10, d778, 1, 0, 314628) + 8c
>  ff31e218 move_block (19c550, d778, 0, 0, 2, 0) + 228
>  ff31f44c apr_rmm_calloc (19c550, 18, fe8e4af8, c, 0, 314628) + 1fc
>  fe8e07bc util_ald_alloc (fe580670, 18, 0, 0, 2, 0) + 7c
>  fe8e1f20 util_ald_cache_insert (fe580670, fd0f9898, fe8e4af8, c, 0, 314628) + 170
>  fe8d9d2c uldap_cache_checkuserid (fe8e4af8, 0, 0, 0, 2, 0) + 1044
>  fe9e3f74 authn_ldap_check_password (0, fd0f99ac, 31609f, fd0f9998, 80808080, 1010101) + 834
>  fe982470 authenticate_basic_user (314628, 0, 3145e8, 8d, 237120, 25aec0) + 608
>  0007f750 ap_run_check_user_id (314628, 236e78, 236e78, 2, d, 25aec0) + 90
>  000818fc ap_process_request_internal (314628, 0, 3145e8, 8d, 237120, 25aec0) + 6e4
>  000c5288 ap_process_async_request (314628, 236e78, 236e78, 2, d, 25aec0) + 638
>  000c5428 ap_process_request (314628, 4, 314628, 8d, 237120, 25aec0) + 20
>  000bddc0 ap_process_http_sync_connection (237128, 236e78, 236e78, 2, d, 25aec0) + f0
>  000bdfbc ap_process_http_connection (237128, 236e78, 236e78, 8d, 237120, 25aec0) + 64
>  000ab038 ap_run_process_connection (237128, 236e78, 236e78, 2, d, 25aec0) + 90
>  000ab9bc ap_process_connection (237128, 236e78, 236e78, 8d, 237120, 25aec0) + 8c
>  000d235c process_socket (1ef320, 236e30, 236e78, 2, d, 25aec0) + ec
>  000d373c worker_thread (1ef320, 1f6ef0, 0, 0, 0, 0) + 49c
>  ff24f894 dummy_worker (1ef320, fd0fc000, 0, 0, ff24f840, 1) + 54
>  ff0404f4 _lwp_start (0, 0, 0, 0, 0, 0)
> -----------------  lwp# 17 / thread# 17  --------------------
>  ff24f840 dummy_worker(), exit value = 0x00000000
>         ** zombie (exited, not detached, not yet joined) **
> -----------------  lwp# 18 / thread# 18  --------------------
>  ff24f840 dummy_worker(), exit value = 0x00000000
>         ** zombie (exited, not detached, not yet joined) **
> -----------------  lwp# 19 / thread# 19  --------------------
>  ff24f840 dummy_worker(), exit value = 0x00000000
>         ** zombie (exited, not detached, not yet joined) **
> -----------------  lwp# 20 / thread# 20  --------------------
>  ff24f840 dummy_worker(), exit value = 0x00000000
>         ** zombie (exited, not detached, not yet joined) **
> -----------------  lwp# 21 / thread# 21  --------------------
>  ff24f840 dummy_worker(), exit value = 0x00000000
>         ** zombie (exited, not detached, not yet joined) **
> -----------------  lwp# 22 / thread# 22  --------------------
>  ff24f840 dummy_worker(), exit value = 0x00000000
>         ** zombie (exited, not detached, not yet joined) **
> -----------------  lwp# 23 / thread# 23  --------------------
>  ff24f840 dummy_worker(), exit value = 0x00000000
>         ** zombie (exited, not detached, not yet joined) **
> -----------------  lwp# 24 / thread# 24  --------------------
>  ff24f840 dummy_worker(), exit value = 0x00000000
>         ** zombie (exited, not detached, not yet joined) **
> -----------------  lwp# 25 / thread# 25  --------------------
>  ff24f840 dummy_worker(), exit value = 0x00000000
>         ** zombie (exited, not detached, not yet joined) **
> -----------------  lwp# 26 / thread# 26  --------------------
>  ff24f840 dummy_worker(), exit value = 0x00000000
>         ** zombie (exited, not detached, not yet joined) **
> -----------------  lwp# 27 / thread# 27  --------------------
>  ff24f840 dummy_worker(), exit value = 0x00000000
>         ** zombie (exited, not detached, not yet joined) **
> newyahoo%  
>
>
>
> Partial ScoreBoard looks like:
>
> Server Version: Apache/2.4.12 (Unix)
> Server MPM: worker
> Server Built: Jun 3 2015 17:19:20
>
> Current Time: Tuesday, 16-Jun-2015 15:01:45 PDT
> Restart Time: Monday, 08-Jun-2015 14:30:49 PDT
> Parent Server Config. Generation: 1
> Parent Server MPM Generation: 0
> Server uptime: 8 days 30 minutes 55 seconds
> Server load: 23.09 22.46 21.88
> Total accesses: 68346 - Total Traffic: 10.0 GB
> CPU Usage: u97541.5 s126.35 cu787.35 cs139.55 - 14.2% CPU load
> .0986 requests/sec - 15.1 kB/second - 152.7 kB/request
> 6 requests currently being processed, 94 idle workers
>
> _____________WW_____W__W_____________W_____W______.............W
> ....................W..W.W.W...W...W..........W.......WW..W.....
> ..........W...W..W.W..__________________________________________
> ________
>
> Scoreboard Key:
> "_" Waiting for Connection, "S" Starting up, "R" Reading Request,
> "W" Sending Reply, "K" Keepalive (read), "D" DNS Lookup,
> "C" Closing connection, "L" Logging, "G" Gracefully finishing,
> "I" Idle cleanup of worker, "." Open slot with no current process
>
>
> Net stat shows some hung connections in "CLOSE_WAIT" state for one of the hosts (but not all) that have hung thread/connections:
>
> newyahoo% netstat | grep clienthostname
> newyahoo.WWW         clienthostname.62580 65142      0 49896      0 CLOSE_WAIT
> newyahoo.WWW         clienthostname.62579 65142      0 49896      0 CLOSE_WAIT
> newyahoo.WWW         clienthostname.62582 65142      0 49896      0 CLOSE_WAIT
> newyahoo.WWW         clienthostname.62591 65142      0 49896      0 CLOSE_WAIT
>
>
> Can anyone assist in debugging this? 
>
> I would love to have these threads exist without having to manually restart the server.All threads with zombie status (all but 2) have already exited.  There is just #16 stuck in LDAP and the main thread waiting for it to exit.I don't think that this process could result in more than one non-idle thread in the status display.If the process is using CPU and this is really stuck here for a while, then I guess the thread in LDAP is looping, and it doesn't make things worse to kill the process, but perhaps there is corruption in shared memory already and threads in other processes will be affected if they aren't already.  Be ready to restart if threads keep getting stuck in the same place.

>
> Thanks
> MJ
>
>
>
>


  

Re: [users@httpd] Hung thread

Posted by Jeff Trawick <tr...@gmail.com>.
On Jun 16, 2015 18:26, "Mark Jacquet" <ma...@yahoo.com.invalid>
wrote:
>
> I am seeing something very odd on our Apache 2.4.12 server  (SunOS
myhostname 5.10 Generic_118833-36 sun4v sparc SUNW,Sun-Fire-T200)
> We are using MPM Worker.
>
> I have been watching the scoreboard all day monitoring system load and
running processes/threads.
> Around 10AM the load jumped to from a normal < 1 to >7 then made it's way
up to >20 where it has sat all day with 21 threads in status "W"
> I traced the threads back to the actual users here at work and asked them
what they did, etc. No help there other than they both rapidly made
requests to the server (one "restored" a browser session, the other rapidly
clicked some URLs in a Word doc). One user even rebooted for me (no effect
on Apache)
>
> In any case I have 21 threads in "W" state.
>
> The server has even gone on and created new process leaving these procs
behind open with one or more thread active. But the load will not drop!
>
> Pstack of a hung process, this one only has one hung thread, looks like
this:
>
>
> 3260:   /codeadm/http_servers/httpd/bin/httpd -f
/codeadm/http_servers/httpd/c
> -----------------  lwp# 1 / thread# 1  --------------------
>  ff041714 lwp_wait (10, ffbff2ec)
>  ff03d11c _thrp_join (10, 0, ffbff354, 1, ffbff2ec, ff06cbc0) + 34
>  ff24fd08 apr_thread_join (ffbff3d4, 1ef320, ff06cbc0, 0, 0, ff3a2000) +
48
>  000d4490 join_workers (1ef4a0, 1f4a88, 1, 1eef00, 1eee50, 1883d0) + 2f8
>  000d4e80 child_main (2, d1988, ff06cbc0, 0, 0, ff3a2000) + 7f8
>  000d50a8 make_child (1883d0, 2, 134518, 7, 0, 1883d0) + 1b0
>  000d5cb0 perform_idle_server_maintenance (ffbff69c, ffbff698, ffbff684,
163188, 1883d0, ff3a0140) + a28
>  000d6300 server_main_loop (0, 0, 134518, 7, 0, 1883d0) + 548
>  000d67e8 worker_run (134518, 18a470, 1883d0, 150000, ff3a0100, ff3a0140)
+ 490
>  0005dd28 ap_run_mpm (163188, 18a470, 1883d0, 1883d0, 0, 0) + a8
>  0004e0e0 main     (5, ffbff8cc, ffbff8e4, 150000, ff3a0100, ff3a0140) +
17b0
>  0004b3b4 _start   (0, 0, 0, 0, 0, 0) + dc
> -----------------  lwp# 16 / thread# 16  --------------------
>  ff31dcc4 find_block_by_offset (19c550, 10, d778, 1, 0, 314628) + 8c
>  ff31e218 move_block (19c550, d778, 0, 0, 2, 0) + 228
>  ff31f44c apr_rmm_calloc (19c550, 18, fe8e4af8, c, 0, 314628) + 1fc
>  fe8e07bc util_ald_alloc (fe580670, 18, 0, 0, 2, 0) + 7c
>  fe8e1f20 util_ald_cache_insert (fe580670, fd0f9898, fe8e4af8, c, 0,
314628) + 170
>  fe8d9d2c uldap_cache_checkuserid (fe8e4af8, 0, 0, 0, 2, 0) + 1044
>  fe9e3f74 authn_ldap_check_password (0, fd0f99ac, 31609f, fd0f9998,
80808080, 1010101) + 834
>  fe982470 authenticate_basic_user (314628, 0, 3145e8, 8d, 237120, 25aec0)
+ 608
>  0007f750 ap_run_check_user_id (314628, 236e78, 236e78, 2, d, 25aec0) + 90
>  000818fc ap_process_request_internal (314628, 0, 3145e8, 8d, 237120,
25aec0) + 6e4
>  000c5288 ap_process_async_request (314628, 236e78, 236e78, 2, d, 25aec0)
+ 638
>  000c5428 ap_process_request (314628, 4, 314628, 8d, 237120, 25aec0) + 20
>  000bddc0 ap_process_http_sync_connection (237128, 236e78, 236e78, 2, d,
25aec0) + f0
>  000bdfbc ap_process_http_connection (237128, 236e78, 236e78, 8d, 237120,
25aec0) + 64
>  000ab038 ap_run_process_connection (237128, 236e78, 236e78, 2, d,
25aec0) + 90
>  000ab9bc ap_process_connection (237128, 236e78, 236e78, 8d, 237120,
25aec0) + 8c
>  000d235c process_socket (1ef320, 236e30, 236e78, 2, d, 25aec0) + ec
>  000d373c worker_thread (1ef320, 1f6ef0, 0, 0, 0, 0) + 49c
>  ff24f894 dummy_worker (1ef320, fd0fc000, 0, 0, ff24f840, 1) + 54
>  ff0404f4 _lwp_start (0, 0, 0, 0, 0, 0)
> -----------------  lwp# 17 / thread# 17  --------------------
>  ff24f840 dummy_worker(), exit value = 0x00000000
>         ** zombie (exited, not detached, not yet joined) **
> -----------------  lwp# 18 / thread# 18  --------------------
>  ff24f840 dummy_worker(), exit value = 0x00000000
>         ** zombie (exited, not detached, not yet joined) **
> -----------------  lwp# 19 / thread# 19  --------------------
>  ff24f840 dummy_worker(), exit value = 0x00000000
>         ** zombie (exited, not detached, not yet joined) **
> -----------------  lwp# 20 / thread# 20  --------------------
>  ff24f840 dummy_worker(), exit value = 0x00000000
>         ** zombie (exited, not detached, not yet joined) **
> -----------------  lwp# 21 / thread# 21  --------------------
>  ff24f840 dummy_worker(), exit value = 0x00000000
>         ** zombie (exited, not detached, not yet joined) **
> -----------------  lwp# 22 / thread# 22  --------------------
>  ff24f840 dummy_worker(), exit value = 0x00000000
>         ** zombie (exited, not detached, not yet joined) **
> -----------------  lwp# 23 / thread# 23  --------------------
>  ff24f840 dummy_worker(), exit value = 0x00000000
>         ** zombie (exited, not detached, not yet joined) **
> -----------------  lwp# 24 / thread# 24  --------------------
>  ff24f840 dummy_worker(), exit value = 0x00000000
>         ** zombie (exited, not detached, not yet joined) **
> -----------------  lwp# 25 / thread# 25  --------------------
>  ff24f840 dummy_worker(), exit value = 0x00000000
>         ** zombie (exited, not detached, not yet joined) **
> -----------------  lwp# 26 / thread# 26  --------------------
>  ff24f840 dummy_worker(), exit value = 0x00000000
>         ** zombie (exited, not detached, not yet joined) **
> -----------------  lwp# 27 / thread# 27  --------------------
>  ff24f840 dummy_worker(), exit value = 0x00000000
>         ** zombie (exited, not detached, not yet joined) **
> newyahoo%
>
>
>
> Partial ScoreBoard looks like:
>
> Server Version: Apache/2.4.12 (Unix)
> Server MPM: worker
> Server Built: Jun 3 2015 17:19:20
>
> Current Time: Tuesday, 16-Jun-2015 15:01:45 PDT
> Restart Time: Monday, 08-Jun-2015 14:30:49 PDT
> Parent Server Config. Generation: 1
> Parent Server MPM Generation: 0
> Server uptime: 8 days 30 minutes 55 seconds
> Server load: 23.09 22.46 21.88
> Total accesses: 68346 - Total Traffic: 10.0 GB
> CPU Usage: u97541.5 s126.35 cu787.35 cs139.55 - 14.2% CPU load
> .0986 requests/sec - 15.1 kB/second - 152.7 kB/request
> 6 requests currently being processed, 94 idle workers
>
> _____________WW_____W__W_____________W_____W______.............W
> ....................W..W.W.W...W...W..........W.......WW..W.....
> ..........W...W..W.W..__________________________________________
> ________
>
> Scoreboard Key:
> "_" Waiting for Connection, "S" Starting up, "R" Reading Request,
> "W" Sending Reply, "K" Keepalive (read), "D" DNS Lookup,
> "C" Closing connection, "L" Logging, "G" Gracefully finishing,
> "I" Idle cleanup of worker, "." Open slot with no current process
>
>
> Net stat shows some hung connections in "CLOSE_WAIT" state for one of the
hosts (but not all) that have hung thread/connections:
>
> newyahoo% netstat | grep clienthostname
> newyahoo.WWW         clienthostname.62580 65142      0 49896      0
CLOSE_WAIT
> newyahoo.WWW         clienthostname.62579 65142      0 49896      0
CLOSE_WAIT
> newyahoo.WWW         clienthostname.62582 65142      0 49896      0
CLOSE_WAIT
> newyahoo.WWW         clienthostname.62591 65142      0 49896      0
CLOSE_WAIT
>
>
> Can anyone assist in debugging this?
>
> I would love to have these threads exist without having to manually
restart the server.

All threads with zombie status (all but 2) have already exited.  There is
just #16 stuck in LDAP and the main thread waiting for it to exit.

I don't think that this process could result in more than one non-idle
thread in the status display.

If the process is using CPU and this is really stuck here for a while, then
I guess the thread in LDAP is looping, and it doesn't make things worse to
kill the process, but perhaps there is corruption in shared memory already
and threads in other processes will be affected if they aren't already.  Be
ready to restart if threads keep getting stuck in the same place.

>
> Thanks
> MJ
>
>
>
>