You are viewing a plain text version of this content. The canonical link for it is here.
Posted to ruleqa@spamassassin.apache.org by da...@chaosreigns.com on 2013/03/02 16:36:41 UTC

mass-check made my server unresponsive for five hours?

I woke up to find I couldn't get into my server, and linode's graphs showing
that it had been swapping severely for five hours.  With enough patience, I
eventually got in, and found:


  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
26285 bind      20   0  365m 182m 1516 S    4 24.5   2452:37 named
 6578 darxus    20   0 64096  38m 1624 S    0  5.1  18:07.90 mass-check
 9142 darxus    20   0 64672  37m 1568 D    0  5.0   4:43.41 mass-check
 6571 darxus    20   0 62876  35m 1644 S    0  4.7  16:58.64 mass-check
 6575 darxus    20   0 62796  35m 1608 S    0  4.7  14:09.23 mass-check
 6572 darxus    20   0 60940  35m 1660 S    0  4.7  19:00.49 mass-check
 6573 darxus    20   0 62772  34m 1644 S    0  4.7  16:44.67 mass-check
 9145 darxus    20   0 64908  34m 1564 S    0  4.6   7:44.70 mass-check
 6574 darxus    20   0 62340  34m 1620 D   16  4.6  16:45.77 mass-check
 9150 darxus    20   0 62044  34m 1564 D    0  4.6   5:25.69 mass-check
 6576 darxus    20   0 61412  34m 1628 D   12  4.6  16:31.52 mass-check
 9146 darxus    20   0 62632  34m 1564 D    1  4.6   7:32.79 mass-check
 6577 darxus    20   0 61788  33m 1620 S    0  4.5  15:51.70 mass-check
 9155 darxus    20   0 62192  33m 1560 S    0  4.5   7:49.32 mass-check
11813 root      20   0 22744  17m 1112 D    2  2.4   0:00.32 smtpd
10668 postfix   20   0 27560  15m 1016 D    0  2.0   0:02.56 smtpd
10848 postfix   20   0 28052  14m 1092 S    0  1.9   0:02.08 smtpd
10855 postfix   20   0 28044  11m  968 D    0  1.5   0:02.23 smtpd
11820 root      20   0 11780  10m 1884 S    0  1.4   0:00.36 mrtg
11805 nobody    20   0  8764 6916 2060 S    0  0.9   0:00.60 postfix-policyd
11817 nobody    20   0  8368 6880 2088 S    0  0.9   0:00.25 postfix-policyd
11823 nobody    20   0  7640 5992 1916 D    4  0.8   0:00.18 postfix-policyd
10693 postfix   20   0 27524 5576 1100 S    0  0.7   0:01.91 smtpd
11614 nobody    20   0  8764 4780 1740 S    0  0.6   0:00.33 postfix-policyd


procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
17 23 629028   8048   2012  17876 1934  360  2794   418  958  567  2  1  1 96
 0 22 629612   8268   2176  17448 1870  856  2380   864  989  498  3  1  0 96
 0 22 629908   8064   2156  17040 1070  530  1088   586  720  389  2  0  0 97
 0 22 630760   8700   1912  16332 1572  952  1842   952  966  390  4  1  0 94
 0 21 631456   7840   1956  17072 1816  970  2626  1068 1229  558  2  1  0 96
 0 18 631688   8312   1924  17352 1270  546  1660   550  815  472  2  1  3 94
 0 22 632652   7736   1760  16972 1402  876  1558   886  836  511  0  1  3 96

The bind memory usage is about normal, due to mirroring dnswl.org with it.

I just removed my mass-check cron job.  Anybody else seeing substantially
increased memory usage from mass-check lately?

I had another bout of swap thrashing not long ago, now suspecting it was
the same thing.

This VM has 747 megabytes of ram.

-- 
"You will need: a big heavy rock, something with a bit of a swing to it...
perhaps Mars" - How to destroy the Earth
http://www.ChaosReigns.com

Re: mass-check made my server unresponsive for five hours?

Posted by Jari Fredriksson <ja...@iki.fi>.
02.03.2013 17:41, Axb kirjoitti:
> On 03/02/2013 04:36 PM, darxus@chaosreigns.com wrote:
>> This VM has 747 megabytes of ram.
>
> With that amount of ram I wouldn't run more than 4 simultaenous jobs.
> Try lowering the job count
>
>
>
>

My task went as usual today. No problems whatsoever. I have 8 gigabytes
RAM in machine doing it, and 1 gigabyte in DNS server.

-- 

Everything that you know is wrong, but you can be straightened out.



Re: mass-check made my server unresponsive for five hours?

Posted by Axb <ax...@gmail.com>.
On 03/02/2013 04:36 PM, darxus@chaosreigns.com wrote:
> This VM has 747 megabytes of ram.

With that amount of ram I wouldn't run more than 4 simultaenous jobs.
Try lowering the job count