You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jeff Moss <jm...@heavyobjects.com> on 2010/10/07 23:56:09 UTC

Re: Memory usage

Taking Chris's information into mind I was able to isolate this to a test
case. I found this ticket that seems to indicate a fundamental problem in
the solr/lucene boundary.

https://issues.apache.org/jira/browse/SOLR-1111

Here's how to reproduce my results:
1. Create an index with a field like this:

    <fieldType name="sint" class="solr.SortableIntField"
sortMissingLast="true" omitNorms="true"/>
   <dynamicField name="foo_*" type="sint" indexed="true" stored="false"
omitNorms="true" />

2. Populate the index with test data the more the better

3. Run a set of queries that loop through foo_*, 1-1000 or so ought to fill
up any heap.

Here is what my heap looks like after running this test, attached. Can
anybody familiar with the issue above (SOLR-1111) tell me if that's what is
going on here, or I may need to submit a new bug.

Thanks,

-Jeff

On Thu, Sep 30, 2010 at 12:36 PM, Chris Hostetter
<ho...@fucit.org>wrote:

>
> : There are 14,696,502 documents, we are doing a lot of funky stuff but I'm
> : not sure which is most likely to cause an impact. We're sorting on a
> dynamic
> : field there are about 1000 different variants of this field that look
> like
> : "priority_sort_for_<client_id>", which is an integer field. I've heard
> that
> : sorting can have a big impact on memory consumption, could that be it?
>
> sorting on a field requires that an array of the corrisponding type be
> constructed for that field - the size of the array is the size of maxDoc
> (ie: the number of documents in your index, including deleted documents).
>
> If you are using TrieInts, and have an index with no deletions, sorting
> ~14.7Mil docs on 1000 diff int fields will take up about ~55GB.
>
> Thats a minimum just for the sorting of those int fields (SortablIntField
> which keeps a string version of the field value will be signifcantly
> bigger) and doesn't take into consideration any other data structures used
> for searching.
>
> I'm not a GC expert, but based on my limited understanding your graph
> actually seems fine to me .. particularly the part where it says
> you've configured a Max heap of ~122GB or ram, and it's
> never spend anytime doing ConcurrentMarkSweep.  My uneducated
> understanding of those two numbers is that you've told the JVM it can use
> an ungodly amount of RAM, so it is.  It's done some basic cleanup of
> young gen (ParNew) but because the heap size has never gone above 50GB,
> it hasn't found any reason to actualy start a CMS GC to look for dea
> objects in Old Gen that it can clean up.
>
>
> (Can someone who understands GC and JVM tunning better then me please
> sanity check me on that?)
>
>
> -Hoss
>
> --
> http://lucenerevolution.org/  ...  October 7-8, Boston
> http://bit.ly/stump-hoss      ...  Stump The Chump!
>
>