You are viewing a plain text version of this content. The canonical link for it is here.
Posted to infrastructure-issues@apache.org by "jan iversen (JIRA)" <ji...@apache.org> on 2013/09/02 18:48:51 UTC

[jira] [Created] (INFRA-6714) ooo-wiki2-vm needs more cpu resources.

jan iversen created INFRA-6714:
----------------------------------

             Summary: ooo-wiki2-vm needs more cpu resources.
                 Key: INFRA-6714
                 URL: https://issues.apache.org/jira/browse/INFRA-6714
             Project: Infrastructure
          Issue Type: Improvement
          Components: Website
         Environment: ooo-wiki2-vm, httpd
            Reporter: jan iversen
            Priority: Critical


The aoo wiki is used by a lot more users due to our release 4.0, this has lead to massive timeouts.

According to rjung spiders might be part of problem which we try to block, but at the same time spiders (especially google) is important.

It will not help to allow more fpm, since the cpu is the blocker.

So I request a bit more CPU cores. I have asked rjung for help, since he knows a lot more than I and his analysis is also CPU is limiting.

copy of IRC:
<rjung> janIV: I had a short look at the aoo wiki yesterday. It seems to me it doesn't have enough CPU resources or phrased differently it uses to much CPU resources. So when the load gets up the VM can't cope with it and queues lots of request until Apache takes PHP out of the proxy.
<janIV> rjung: thx for your analysis, just to be sure, to start more fpm wont solve anything ?
<janIV> rjung: we have had the same problem 4 times today.
<rjung> janIV: I think "no". But that's only based on what I saw yesterday.
<janIV> rjung: mind if I refer to these lines in an infra requesting more cores ?
<rjung> janIV: Maybe there's a way to block spidetrs from expensive requests.
<rjung> janIV: I don't mind, but please both parts, also the "phrased differently" part.
<janIV> rjung: I tried to look at blocking, but to be honest my knowledge it not deep enough to make a block depending on cpu load.
<janIV> rjung: I will only be fair, in the jira, and secure you are asked, since you are much more specialist on this than I am.
<rjung> janIV: no that would be too complex. Just blocking things like GET /w/index.php?title=Special:RecentChanges&feed=atom HTTP/1.1 for UA containing bot.
<janIV> rjung: I thought you would say that, but e.g. the google spider is important to AOO (search results), but clearly not when wiki is overloaded.
<rjung> janIV: You can have a look the the access log. The last columns is the response time in microseconds. Se a "awk '$NF>=15000000' FILENAME" shows all requests taking longer than 15 seconds etc.
<rjung> But how important is spidering "GET /w/index.php?title=Special:RecentChanges&feed=atom HTTP/1.1"?
<janIV> rjung: NOT important, but can we block a spider on the url it request or only total ?
<rjung> For instance today there's 1676 times a request for "GET /w/index.php?title=Special:RecentChanges&feed=rss".
<janIV> rjung: UPs, too fast, that url, gives the spider what happened in the last 48hours, it has an option of the timestamp which they dont use.
<rjung> Of those 1676 none succeded, because all took longer than the timeout of 30 seconds. I guess they still kept your wiki pretty busy without delivering anything back.
<rjung> Maybe you can optimize the handling of such requests?
<janIV> rjung: can we make a fast patch, and block spiders, while looking for a long-term solution ?
<rjung> We can but you would need to provide the URIs which should be blocked.
<janIV> my problem is that I dont know what is spiders and whats real users, can we see who request the same url say more then 50 times a day ?
Not automatically. But I suggest we simply start by assuming anything is a spider that matches /bot/i in the user agent string.
<janIV> can you please make a block for that, then we can later look at enhancing it.

Its important to the AOO project, that our users have access to wiki and feel we support them. There are others out there, using something like this to announce AOO is practically dead, which of course if far from true.

I hope for a speed solution to the problem, if nothing else, then just to show that AOO is an important project.

thanks in advance
jan I.

Ps. for reference, at Oracle the wiki alone had 4 cores on a dedicated server.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira