You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Rok Rejc <ro...@gmail.com> on 2010/12/21 22:02:19 UTC

Faceting memory requirements

Dear all,

I have created an index with aprox. 1.1 billion of documents (around 500GB)
running on Solr 1.4.1. (64 bit JVM).

I want to enable faceted navigation on am int field, which contains around
250 unique values.
According to the wiki there are two methods:

facet.method=fc which uses field cache. This method should use MaxDoc*4
bytes of memory which is around: 4.1GB.

facet.method=enum which crated a bitset for each unique value. This method
should use NumberOfUniqueValues * SizeOfBitSet which is around 32GB.

Are my calculations correct?

My memory settings in Tomcat (windows) are:
Initial memory pool: 4096 MB
Maximum memory pool: 8192 MB (total 12GB in my test machine)

I have tried to run a query
(...&facet=true&facet.field=PublisherId&facet.method=fc) but I am still
getting OOM:

HTTP Status 500 - Java heap space java.lang.OutOfMemoryError: Java heap
space at
org.apache.lucene.search.FieldCacheImpl$StringIndexCache.createValue(FieldCacheImpl.java:703)
at
org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:224)
at
org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:692)
at
org.apache.solr.request.SimpleFacets.getFieldCacheCounts(SimpleFacets.java:350)
at org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:255)
at
org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:283)
at
org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:166)
at
org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:72)
at
...

Any idea what am I doing wrong, or have I miscalculated the memory
requirements?

Many thanks,
Rok

Re: Faceting memory requirements

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Tue, Dec 21, 2010 at 4:02 PM, Rok Rejc <ro...@gmail.com> wrote:
> Dear all,
>
> I have created an index with aprox. 1.1 billion of documents (around 500GB)
> running on Solr 1.4.1. (64 bit JVM).
>
> I want to enable faceted navigation on am int field, which contains around
> 250 unique values.
> According to the wiki there are two methods:
>
> facet.method=fc which uses field cache. This method should use MaxDoc*4
> bytes of memory which is around: 4.1GB.

facet.method=fc uses the fieldcache, but it uses the StringIndex for
all field types currently, so
you need to add in space for the string representation of all the
unique values.  But this is only
250, so given the large number of docs, your estimate should still be close.

> facet.method=enum which crated a bitset for each unique value. This method
> should use NumberOfUniqueValues * SizeOfBitSet which is around 32GB.

A more efficient representation is used for a set when the set size is
less than maxDoc/64.
This set type uses an int per doc in the set, so should use roughly
the same amount of memory
as a numeric fieldcache entry.


> Are my calculations correct?
>
> My memory settings in Tomcat (windows) are:
> Initial memory pool: 4096 MB
> Maximum memory pool: 8192 MB (total 12GB in my test machine)
>
> I have tried to run a query
> (...&facet=true&facet.field=PublisherId&facet.method=fc) but I am still
> getting OOM:
>
> HTTP Status 500 - Java heap space java.lang.OutOfMemoryError: Java heap
> space at
> org.apache.lucene.search.FieldCacheImpl$StringIndexCache.createValue(FieldCacheImpl.java:703)
> at
> org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:224)
> at
> org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:692)
> at
> org.apache.solr.request.SimpleFacets.getFieldCacheCounts(SimpleFacets.java:350)
> at org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:255)
> at
> org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:283)
> at
> org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:166)
> at
> org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:72)
> at
> ...
>
> Any idea what am I doing wrong, or have I miscalculated the memory
> requirements?

Perhaps you are already sorting by another field or faceting on
another field that is causing a lot of memory to already be used, and
this pushes it over the edge?  Or perhaps the JVM simply can't find a
contiguous area of memory this large?
Line 703 is this:  so it's failing to create the first array:
      final int[] retArray = new int[reader.maxDoc()];

Although the line after it is even more troublesome:
      String[] mterms = new String[reader.maxDoc()+1];

Although you only need an array of 250 to contain all the unique
terms, the FieldCacheImpl starts out with maxDoc.

I think trunk will be far better in this regard.  You should also try
facet.method=enum though too.

-Yonik
http://www.lucidimagination.com