You are viewing a plain text version of this content. The canonical link for it is here.
Posted to infrastructure-issues@apache.org by "Henri Yandell (JIRA)" <ji...@apache.org> on 2007/10/02 18:22:51 UTC
[jira] Assigned: (INFRA-1334) Intermittent Out Of Memory failures

     [ https://issues.apache.org/jira/browse/INFRA-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Henri Yandell reassigned INFRA-1334:
------------------------------------

    Assignee: Henri Yandell

> Intermittent Out Of Memory failures
> -----------------------------------
>
>                 Key: INFRA-1334
>                 URL: https://issues.apache.org/jira/browse/INFRA-1334
>             Project: Infrastructure
>          Issue Type: Bug
>      Security Level: public(Regular issues) 
>          Components: JIRA
>            Reporter: Ted Husted
>            Assignee: Henri Yandell
>
> [Summarized from an email thread]
> Occasionally, JIRA on Brutus will fail. Sometimes, it seems to "spin out of control", knocking  off other systems, including Bugzilla. Though, other times, it's been observed that the other JIRA instances are responsive even when one fails. 
> It's possible that we are seeing both OOMs and PermGen OOMs. The most recent set were due to some kind of client tool sending a malformed request that ended up requesting huge amounts of data, and consuming large amounts of memory. It's not clear whether the memory usage is a leak or because something that isn't coded to pipeline.
> Client tools account for half of the current JIRA traffic.
>  % wc -l issues_weblog 
>  738289
>  % grep -c soapservice issues_weblog 
>  372529
> The underlying problem may be that ill-conceived search requests can consume too much memory and bring the system down. (For other services, like Subversion, we've found ways to keep the CPUs in check.)
> These requests are being made by tools like mylyn, which some developers find very useful. The tools increase integration between development environments and the issue tracker. A page like this
>  * http://velocity.apache.org/engine/releases/velocity-1.5/jira-report.html
> is being built straight from the issue tracker.
> It is important that we take steps, since we can't trust a system that's starting to throw OOMs, since it could potentially lead to bad data.
> Since there seems to be more than one type of error condition, whenever a JIRA instance goes down, we should check is whether other instances are responsive. We need, for example, to isolate issues with the httpd front end. 
> We might want to look at setting up the Java Service Wrapper (http://wrapper.tanukisoftware.org) for our JVM(s) (its available for both Linux and Solaris). One thing the wrapper can do is monitor the JVM output for such things as OOM messages. Having JWS shut the JVM down and send a mail might help us isolate the problem. If possible, it might be helpful if the wrapper could monitoring JMX events within the monitored JVM, so that the Wrapper can take pro-active action.
> The JIRA searches are based on Lucene. We might want to consider asking one of our ASF Lucene experts to review the Atlassian source code. 
> Aside from searches, another cause could be something like DNS resolution via InetAddress? This is a known issue with Java's DNS cache that can cause memory leaks for each IP that hit the server. Any server application that does DNS resolution should be using dnsjava, which has a proper TTL-managed cache. There is also a JVM specific workaround that can be used in the meantime.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.