You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Shawn Heisey (JIRA)" <ji...@apache.org> on 2015/03/30 17:14:54 UTC

[jira] [Comment Edited] (SOLR-7319) Workaround the "Four Month Bug" causing GC pause problems

    [ https://issues.apache.org/jira/browse/SOLR-7319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386834#comment-14386834 ] 

Shawn Heisey edited comment on SOLR-7319 at 3/30/15 3:14 PM:
-------------------------------------------------------------

Good questions, [~jim.ferenczi].  The option does appear to have helped GC pauses times for me, although it's hard to quantify.  I know that the *average* GC pause time dropped from .10 sec to .06 sec with Java 7 and G1GC.  This isn't a lot, but when there are thousands of collections, even a small difference like that adds up.  I wish I had a way to gather median, 75th, 95th, and 99th percentile info on GC pauses.  I have some indexes running on Java 8, but they are not yet big enough or active enough to give me useful GC info.  They are growing, and will soon be pushed into production.

I do not have any info on this problem with CMS, which is what the bin/solr script in 5.0 uses.

If you know something about how Lucene writes to disk that says it's not mmap when the directory is mmap, then you know more than I do.  I wonder whether heavy mmap *reads* might interfere with writing to the stats file.



was (Author: elyograg):
Good questions, [~jim.ferenczi].  The option does appear to have helped GC pauses times for me, although it's hard to quantify.  I know that the *average* GC pause time dropped from .10 sec to .06 sec.  This isn't a lot, but when there are thousands of collections, even a small difference like that adds up.  I wish I had a way to gather median, 75th, 95th, and 99th percentile info on GC pauses.

If you know something about how Lucene writes to disk that says it's not mmap when the directory is mmap, then you know more than I do.  I wonder whether heavy mmap *reads* might interfere with writing to the stats file.


> Workaround the "Four Month Bug" causing GC pause problems
> ---------------------------------------------------------
>
>                 Key: SOLR-7319
>                 URL: https://issues.apache.org/jira/browse/SOLR-7319
>             Project: Solr
>          Issue Type: Bug
>          Components: scripts and tools
>    Affects Versions: 5.0
>            Reporter: Shawn Heisey
>            Assignee: Shawn Heisey
>             Fix For: 5.1
>
>         Attachments: SOLR-7319.patch, SOLR-7319.patch, SOLR-7319.patch
>
>
> A twitter engineer found a bug in the JVM that contributes to GC pause problems:
> http://www.evanjones.ca/jvm-mmap-pause.html
> Problem summary (in case the blog post disappears):  The JVM calculates statistics on things like garbage collection and writes them to a file in the temp directory using MMAP.  If there is a lot of other MMAP write activity, which is precisely how Lucene accomplishes indexing and merging, it can result in a GC pause because the mmap write to the temp file is delayed.
> We should implement the workaround in the solr start scripts (disable creation of the mmap statistics tempfile) and document the impact in CHANGES.txt.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org