You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2009/01/08 22:15:00 UTC

[jira] Commented: (JCR-1931) SharedFieldCache$StringIndex memory leak causing OOM's

    [ https://issues.apache.org/jira/browse/JCR-1931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662122#action_12662122 ] 

Jukka Zitting commented on JCR-1931:
------------------------------------

Your solution sounds reasonable, do you already have a patch for it?

I'm planning to cut the 1.5.1 release early next week. Can we have this in trunk and tested before that?

> SharedFieldCache$StringIndex memory leak causing OOM's 
> -------------------------------------------------------
>
>                 Key: JCR-1931
>                 URL: https://issues.apache.org/jira/browse/JCR-1931
>             Project: Jackrabbit
>          Issue Type: Bug
>          Components: query
>    Affects Versions: 1.5.0
>            Reporter: Ard Schrijvers
>            Assignee: Ard Schrijvers
>            Priority: Critical
>             Fix For: 1.5.1
>
>         Attachments: OrderByOOMTest.java
>
>
> SharedFieldCache$StringIndex is not working properly. It is meant to cache the docnumbers in lucene along with the term to sort on. The issue is twofold. I have a solution for the second one, the first one is not really solvable from jr pov, because lucene index readers are already heavily caching Terms. 
> Explanation of the problem:
> For *each* unique property where is sorted on, a new lucene ScoreDocComparator is created (see SharedFieldComparator newComparator). This new comparator creates *per* lucene indexreader  SharedFieldCache.StringIndex which is stored in a WeakHashMap with as key, the indexreader . As this indexreader  almost *never* can be garbage collected (only if it is merged and thus unused after), the SharedFieldCache.StringIndex are there to be the rest of the jvm life (which is sometime short, as can be seen from the simple unittest attached).  Obviously, this results pretty fast in OOM.
> 1) issue one:  The cached terms[] in SharedFieldCache.StringIndex can become huge when you sort on a common property (date) which is present in a lot of nodes. It you sort on large properties, like 'title' this SharedFieldCache.StringIndex  will quickly use hundreds of Mb for a couple of hundred of thousand of nodes with a title. This issue is already a lucene issue, as lucene already caches the terms. OTOH, I really doubt whether we should index long string values as UNTOKENIZED in lucene at all. A half working solution might be a two-step solution, where the first sort is on the first 10 chars, and only if the comparator returns 0, take the entire string to sort on
> 2) issue two:  The cached terms[] in SharedFieldCache.StringIndex is frequently sparse, consuming an incredible amount of memory for string arrays containing mainly null values. For example (see attached unit test):
> - add 1.000.000 nodes
> - do a query and sort on a non existing property
> - you'll loose 1.000.000 * 4 bytes ~ 4 Mb of memory
> - sort on another non existing prop : another 4 Mb is lost
> - do it 100 times --> 400 Mb is lost, and can't be reclaimed
> I'll attach a solution which works really fine for me, still having the almost unavoidable memory absorption, but makes it much smaller. The solution is, that if < 10% of the String array is filled, i consider the array already sparse, and move to a HashMap solution. Performance does not decrease much (and in case of large sparsity increases because less memory consumption --> less gc, etc). 
> Perhaps it does not seem to be a common issue (certainly the unit test) but our production environments memory snapshots indicate most memory being held by the SharedFieldCache$StringIndex (and the lucene Terms, which is harder to avoid)
> I'd like to see this in the 1.5.1 if others are ok with it

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.