You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Andy Seaborne (JIRA)" <ji...@apache.org> on 2014/05/11 00:07:35 UTC
[jira] [Commented] (JENA-689) Fuseki/TDB memory leak for concurrent updates/queries

    [ https://issues.apache.org/jira/browse/JENA-689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13992757#comment-13992757 ] 

Andy Seaborne commented on JENA-689:
------------------------------------

What is going on is as Rob has said.  The transaction log can only be written back to the database when the database is not in use and having large numbers of read transactions is keeping it in use.  When readers are in parallel, by the time one finishes, another is already running hence no point in time for the database engine to clear up the committed backlog.

While the transaction log is itself on disk, it's a write ahead log.  The log of changes is written, i.e the new state, not the old state being replaced. The view of the database after the commit point is in-memory as well until there is a chance to flush the transaction log to the main database.  It is this that is causing the OOME - when there is no chance to flush the log.

The most extreme case of this is a single read transaction that runs for a very long time.  There is little that can be done about that.  The fuseki case is not quite that bad as it's reasonable to assume that it is not one long running reader but many overlapping ones that means there is no chance to clearup.

It's the thread of the last reader to leave the transaction manager that does the actual write-back to the database.

A possible change: After some limit, start to block readers because this will let the active transactions drain and so the transaction log will be flushed.

See JENA-567.  Please could you try that?  it is for scalable transactions but may help here - it would postpone the point where the system locks up, but remove the problem altogether.

> Fuseki/TDB memory leak for concurrent updates/queries
> -----------------------------------------------------
>
>                 Key: JENA-689
>                 URL: https://issues.apache.org/jira/browse/JENA-689
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: Fuseki, TDB
>    Affects Versions: Fuseki 1.0.1, TDB 1.0.1
>         Environment: OSX 10.9.2, 2.8 GHz, 16G RAM, SSD
> and CentOS release 6.4, x86_64, 64G RAM, SSD
>            Reporter: Ola Bildtsen
>         Attachments: FusekiTest.tar.gz, No Union Graph Test.png, Original Test.png, config-no-union.ttl, out.txt, out2.txt, query-no-union.groovy, query.groovy
>
>
> When running concurrent POST updates and queries against a Fuseki/TDB server, the server appears to bleed memory until it eventually runs out and dies with:
> {{java.lang.OutOfMemoryError: GC overhead limit exceeded}}
> Using the included TDB config file, sample data file, and Groovy script, the Fuseki/TDB server can consistently be knocked down.  The script runs four concurrent threads: one that repeatedly POSTs data (in separate contexts/graphs) and three that query the server for triple counts.
> To execute the script, do the following:
> # Install Groovy
> # Download and install jena-fuseki-1.0.1
> # Download the attached file {{FusekiTest.tar.gz}} and untar it in the jena-fuseki directory
> # Edit the {{fuseki-server}} script, set the max heap size to 2G {{(--Xmx2G)}}
> # Start the server with: {{./fuseki-server --config=config-test.ttl}}
> # In a separate window/shell, execute: {{groovy query.groovy}}
> # Wait a few minutes for the OOE to occur.  The script will output some stats.
> A typical run of the script will result in:
> {quote}
> Added context #1
> Added context #2
> Added context #3
> Added context #4
> Added context #5
> Added context #6
> Added context #7
> Added context #8
> Added context #9
> Query thread dying
> Total contexts added: 9
> Total triples added: 4500000
> Total successful queries: 155
> {quote}
> While this simple test fails consistently on OSX and running with a 2G heap Fuseki/TDB server, we've also observed it running on CentOS with a 16GB heap max and monitoring with NewRelic.  It took a lot longer, but the end result was the same: all the heaps (regular, eden, survivor, and old gen) eventually converge on their maximums and the JVM fails.
> It's interesting to note that if all the contexts/graphs are added FIRST (with no concurrent queries), everything works just fine.



--
This message was sent by Atlassian JIRA
(v6.2#6252)