You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Robert Petersen <ro...@buy.com> on 2010/06/30 01:32:27 UTC

OOM on uninvert field request

Hello I am trying to find the right max and min settings for Java 1.6 on 20GB index with 8 million docs, running 1.6_018 JVM with solr 1.4, and am currently have java set to an even 4GB (export JAVA_OPTS="-Xmx4096m -Xms4096m") for both min and max which is doing pretty well but occasionally still getting the below OOM errors.  We're running on dual quad core xeons with 16GB memory installed.  I've been getting the below OOM exceptions still though.  

Is the memsize mentioned in the INFO for the uninvert in bytes?  Ie is memSize=29604020 mean 29MB?  We have a few hundred of these fields and they contain ints used as IDs, and so I guess could they eat all the memory to uninvert them all after we apply load and enough queries are performed.  Does the field type matter, would int be better than string if these are lookup ids sparsely populated across the index?  BTW these are used for faceting and filtering only.

		<dynamicField name="*_contentAttributeToken"  type="string"  indexed="true" multiValued="true"   stored="true" required="false"/>

Jun 29, 2010 3:54:50 PM org.apache.solr.request.UnInvertedField uninvert
INFO: UnInverted multi-valued field {field=768_contentAttributeToken,memSize=29604014,tindexSize=50,time=1841,phase1=1824,nTerms=1,bigTerms=0,termInstances=18,uses=0}
Jun 29, 2010 3:54:52 PM org.apache.solr.request.UnInvertedField uninvert
INFO: UnInverted multi-valued field {field=749_contentAttributeToken,memSize=29604020,tindexSize=56,time=1847,phase1=1829,nTerms=143,bigTerms=0,termInstances=951,uses=0}
Jun 29, 2010 3:54:59 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.OutOfMemoryError: Java heap space
	at org.apache.solr.request.UnInvertedField.uninvert(UnInvertedField.java:191)
	at org.apache.solr.request.UnInvertedField.<init>(UnInvertedField.java:178)
	at org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedField.java:839)
	at org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:250)
	at org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:283)
	at org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:166)

RE: OOM on uninvert field request

Posted by Robert Petersen <ro...@buy.com>.

Hey so after adding those GC options, I was able to incrementally push my max (and min) memory settings up and when we got to max=min=12GB we started looking much better!  One slave handles all the load with no OOMs at all!  I'm watching the live tomcat log using 'tail'.  Next I will convert that field type to (trie) int and reindex.  I'll have to start a new index from scratch with a field type change like that so I'll have to delete the old one first on our master... It takes us a couple days to index 15 million products (some are sets so the final index size is only 8 million) so I don't want to do *that* too often as the slaves will be quite stale by the time it's done!  :)

Thanks for the help!

-----Original Message-----
From: Robert Petersen [mailto:robertpe@buy.com] 
Sent: Wednesday, June 30, 2010 9:49 AM
To: solr-user@lucene.apache.org
Subject: RE: OOM on uninvert field request

At and above 4GB we get those GC errors though!  Should I switch to something like this?

Recommended Options
To use i-cms in Java SE 6, use the following command line options:

-XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode \
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps

Caused by: java.lang.RuntimeException: java.lang.OutOfMemoryError: GC overhead limit exceeded
	at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068)
	at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:418)
	at org.apache.solr.handler.SnapPuller.doCommit(SnapPuller.java:467)
	at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:319)
	... 11 more
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded

-----Original Message-----
From: Lance Norskog [mailto:goksron@gmail.com] 
Sent: Tuesday, June 29, 2010 8:42 PM
To: solr-user@lucene.apache.org
Subject: Re: OOM on uninvert field request

Yes, it is better to use ints for ids than strings. Also, the Trie int
fields have a compressed format that may cut the storage needs even
more. 8m * 4 = 32mb, times "a few hundred", we'll say 300, is 900mb of
IDs.  I don't know how these fields are stored, but if they are
separate objects we've blown up to several gigs (per-object overheads
are surprising).

4G is probably not enough for what you want. If you watch the total
memory with 'top' and hit it with different queries, you will get a
stronger sense of how much memory your use cases need.

On Tue, Jun 29, 2010 at 4:32 PM, Robert Petersen <ro...@buy.com> wrote:
> Hello I am trying to find the right max and min settings for Java 1.6 on 20GB index with 8 million docs, running 1.6_018 JVM with solr 1.4, and am currently have java set to an even 4GB (export JAVA_OPTS="-Xmx4096m -Xms4096m") for both min and max which is doing pretty well but occasionally still getting the below OOM errors.  We're running on dual quad core xeons with 16GB memory installed.  I've been getting the below OOM exceptions still though.
>
> Is the memsize mentioned in the INFO for the uninvert in bytes?  Ie is memSize=29604020 mean 29MB?  We have a few hundred of these fields and they contain ints used as IDs, and so I guess could they eat all the memory to uninvert them all after we apply load and enough queries are performed.  Does the field type matter, would int be better than string if these are lookup ids sparsely populated across the index?  BTW these are used for faceting and filtering only.
>
>                <dynamicField name="*_contentAttributeToken"  type="string"  indexed="true" multiValued="true"   stored="true" required="false"/>
>
> Jun 29, 2010 3:54:50 PM org.apache.solr.request.UnInvertedField uninvert
> INFO: UnInverted multi-valued field {field=768_contentAttributeToken,memSize=29604014,tindexSize=50,time=1841,phase1=1824,nTerms=1,bigTerms=0,termInstances=18,uses=0}
> Jun 29, 2010 3:54:52 PM org.apache.solr.request.UnInvertedField uninvert
> INFO: UnInverted multi-valued field {field=749_contentAttributeToken,memSize=29604020,tindexSize=56,time=1847,phase1=1829,nTerms=143,bigTerms=0,termInstances=951,uses=0}
> Jun 29, 2010 3:54:59 PM org.apache.solr.common.SolrException log
> SEVERE: java.lang.OutOfMemoryError: Java heap space
>        at org.apache.solr.request.UnInvertedField.uninvert(UnInvertedField.java:191)
>        at org.apache.solr.request.UnInvertedField.<init>(UnInvertedField.java:178)
>        at org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedField.java:839)
>        at org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:250)
>        at org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:283)
>        at org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:166)
>

-- 
Lance Norskog
goksron@gmail.com

RE: OOM on uninvert field request

Posted by Robert Petersen <ro...@buy.com>.

At and above 4GB we get those GC errors though!  Should I switch to something like this?

Recommended Options
To use i-cms in Java SE 6, use the following command line options:

-XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode \
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps


Caused by: java.lang.RuntimeException: java.lang.OutOfMemoryError: GC overhead limit exceeded
	at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068)
	at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:418)
	at org.apache.solr.handler.SnapPuller.doCommit(SnapPuller.java:467)
	at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:319)
	... 11 more
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded


-----Original Message-----
From: Lance Norskog [mailto:goksron@gmail.com] 
Sent: Tuesday, June 29, 2010 8:42 PM
To: solr-user@lucene.apache.org
Subject: Re: OOM on uninvert field request

Yes, it is better to use ints for ids than strings. Also, the Trie int
fields have a compressed format that may cut the storage needs even
more. 8m * 4 = 32mb, times "a few hundred", we'll say 300, is 900mb of
IDs.  I don't know how these fields are stored, but if they are
separate objects we've blown up to several gigs (per-object overheads
are surprising).

4G is probably not enough for what you want. If you watch the total
memory with 'top' and hit it with different queries, you will get a
stronger sense of how much memory your use cases need.

On Tue, Jun 29, 2010 at 4:32 PM, Robert Petersen <ro...@buy.com> wrote:
> Hello I am trying to find the right max and min settings for Java 1.6 on 20GB index with 8 million docs, running 1.6_018 JVM with solr 1.4, and am currently have java set to an even 4GB (export JAVA_OPTS="-Xmx4096m -Xms4096m") for both min and max which is doing pretty well but occasionally still getting the below OOM errors.  We're running on dual quad core xeons with 16GB memory installed.  I've been getting the below OOM exceptions still though.
>
> Is the memsize mentioned in the INFO for the uninvert in bytes?  Ie is memSize=29604020 mean 29MB?  We have a few hundred of these fields and they contain ints used as IDs, and so I guess could they eat all the memory to uninvert them all after we apply load and enough queries are performed.  Does the field type matter, would int be better than string if these are lookup ids sparsely populated across the index?  BTW these are used for faceting and filtering only.
>
>                <dynamicField name="*_contentAttributeToken"  type="string"  indexed="true" multiValued="true"   stored="true" required="false"/>
>
> Jun 29, 2010 3:54:50 PM org.apache.solr.request.UnInvertedField uninvert
> INFO: UnInverted multi-valued field {field=768_contentAttributeToken,memSize=29604014,tindexSize=50,time=1841,phase1=1824,nTerms=1,bigTerms=0,termInstances=18,uses=0}
> Jun 29, 2010 3:54:52 PM org.apache.solr.request.UnInvertedField uninvert
> INFO: UnInverted multi-valued field {field=749_contentAttributeToken,memSize=29604020,tindexSize=56,time=1847,phase1=1829,nTerms=143,bigTerms=0,termInstances=951,uses=0}
> Jun 29, 2010 3:54:59 PM org.apache.solr.common.SolrException log
> SEVERE: java.lang.OutOfMemoryError: Java heap space
>        at org.apache.solr.request.UnInvertedField.uninvert(UnInvertedField.java:191)
>        at org.apache.solr.request.UnInvertedField.<init>(UnInvertedField.java:178)
>        at org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedField.java:839)
>        at org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:250)
>        at org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:283)
>        at org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:166)
>



-- 
Lance Norskog
goksron@gmail.com

GC tuning - heap size autoranging

Posted by Robert Petersen <ro...@buy.com>.

Is this a true statement???  This seems to contradict other statements regarding setting the heap size I have seen here...

Default Heap Size
If not otherwise set on the command line, the initial and maximum heap sizes are calculated based on the amount of memory on the machine. The proportion of memory to use for the heap is controlled by the command line options DefaultInitialRAMFraction and DefaultMaxRAMFraction, as shown in the table below. (In the table, memory represents the amount of memory on the machine.)

Pasted from <http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html#available_collectors.selecting>

Re: OOM on uninvert field request

Posted by Lance Norskog <go...@gmail.com>.

Yes, it is better to use ints for ids than strings. Also, the Trie int
fields have a compressed format that may cut the storage needs even
more. 8m * 4 = 32mb, times "a few hundred", we'll say 300, is 900mb of
IDs.  I don't know how these fields are stored, but if they are
separate objects we've blown up to several gigs (per-object overheads
are surprising).

4G is probably not enough for what you want. If you watch the total
memory with 'top' and hit it with different queries, you will get a
stronger sense of how much memory your use cases need.

On Tue, Jun 29, 2010 at 4:32 PM, Robert Petersen <ro...@buy.com> wrote:
> Hello I am trying to find the right max and min settings for Java 1.6 on 20GB index with 8 million docs, running 1.6_018 JVM with solr 1.4, and am currently have java set to an even 4GB (export JAVA_OPTS="-Xmx4096m -Xms4096m") for both min and max which is doing pretty well but occasionally still getting the below OOM errors.  We're running on dual quad core xeons with 16GB memory installed.  I've been getting the below OOM exceptions still though.
>
> Is the memsize mentioned in the INFO for the uninvert in bytes?  Ie is memSize=29604020 mean 29MB?  We have a few hundred of these fields and they contain ints used as IDs, and so I guess could they eat all the memory to uninvert them all after we apply load and enough queries are performed.  Does the field type matter, would int be better than string if these are lookup ids sparsely populated across the index?  BTW these are used for faceting and filtering only.
>
>                <dynamicField name="*_contentAttributeToken"  type="string"  indexed="true" multiValued="true"   stored="true" required="false"/>
>
> Jun 29, 2010 3:54:50 PM org.apache.solr.request.UnInvertedField uninvert
> INFO: UnInverted multi-valued field {field=768_contentAttributeToken,memSize=29604014,tindexSize=50,time=1841,phase1=1824,nTerms=1,bigTerms=0,termInstances=18,uses=0}
> Jun 29, 2010 3:54:52 PM org.apache.solr.request.UnInvertedField uninvert
> INFO: UnInverted multi-valued field {field=749_contentAttributeToken,memSize=29604020,tindexSize=56,time=1847,phase1=1829,nTerms=143,bigTerms=0,termInstances=951,uses=0}
> Jun 29, 2010 3:54:59 PM org.apache.solr.common.SolrException log
> SEVERE: java.lang.OutOfMemoryError: Java heap space
>        at org.apache.solr.request.UnInvertedField.uninvert(UnInvertedField.java:191)
>        at org.apache.solr.request.UnInvertedField.<init>(UnInvertedField.java:178)
>        at org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedField.java:839)
>        at org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:250)
>        at org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:283)
>        at org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:166)
>



-- 
Lance Norskog
goksron@gmail.com

Re: OOM on uninvert field request

Posted by Chris Hostetter <ho...@fucit.org>.

: Subject: OOM on uninvert field request
: In-Reply-To: <12...@kratos>
: References: <12...@kratos>
:     <9F...@gmail.com>
:     <9e...@sm.webmail.pair.com>
:     <AA...@mail.gmail.com>
:     <F2...@cominvent.com>
:     <12...@kratos>

http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is "hidden" in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.
See Also:  http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking




-Hoss

Re: tomcat solr logs

Posted by Jeff Hammerbacher <ha...@cloudera.com>.

Hey Robert,

You may want to check out Flume for log file collection:
http://github.com/cloudera/flume. We don't currently allow Flume to populate
a Solr index, but that would be quite an interesting use case!

Later,
Jeff

On Wed, Jun 30, 2010 at 3:06 PM, Robert Petersen <ro...@buy.com> wrote:

> Sorry if this is at all off topic.  Our solr log files need grooming and we
> would also like to analyze them, perhaps pulling various data points into a
> DB table, is there a preferred app for doing log file analysis and/or an
> easy way to delete the old log files?
>

tomcat solr logs

Posted by Robert Petersen <ro...@buy.com>.

Sorry if this is at all off topic.  Our solr log files need grooming and we would also like to analyze them, perhaps pulling various data points into a DB table, is there a preferred app for doing log file analysis and/or an easy way to delete the old log files?

Re: OOM on uninvert field request

Posted by Lance Norskog <go...@gmail.com>.

On Wed, Jun 30, 2010 at 1:38 PM, Yonik Seeley
<yo...@lucidimagination.com> wrote:
> On Tue, Jun 29, 2010 at 7:32 PM, Robert Petersen <ro...@buy.com> wrote:
>> Hello I am trying to find the right max and min settings for Java 1.6 on 20GB index with 8 million docs, running 1.6_018 JVM with solr 1.4, and am currently have java set to an even 4GB (export JAVA_OPTS="-Xmx4096m -Xms4096m") for both min and max which is doing pretty well but occasionally still getting the below OOM errors.  We're running on dual quad core xeons with 16GB memory installed.  I've been getting the below OOM exceptions still though.
>>
>> Is the memsize mentioned in the INFO for the uninvert in bytes? is memSize=29604020 mean 29MB?
>
> Yes.
>
>> We have a few hundred of these fields and they contain ints used as IDs, and so I guess could they eat all the memory to uninvert them all after we apply load and enough queries are performed.  Does the field type matter, would int be better than string if these are lookup ids sparsely populated across the index?
>
> No, using UnInvertedField faceting, the fieldType won't matter much at
> all for the space it takes up.
>
> The key here is that it looks like the number of unique terms in these
> fields is low - you would probably do much better with
> facet.method=enum (which iterates over terms rather than documents).
>
> -Yonik
> http://www.lucidimagination.com
>



-- 
Lance Norskog
goksron@gmail.com

Re: OOM on uninvert field request

Posted by Yonik Seeley <yo...@lucidimagination.com>.

On Wed, Jun 30, 2010 at 6:19 PM, Robert Petersen <ro...@buy.com> wrote:
> Most of these hundreds of facet fields have tens of values but a couple have thousands, is thousands of different values too many to do enum or is that still ok?  If so I could apply it carte blanche to the whole field...

enum can still handle thousands, but often slower (and remember to
increase the size of your filterCache which will now see greater
usage).

I would do facet.method=enum for the default and then override that
for those few fields with thousands of unique terms via
f.123_contentAttributeToken.facet.method=fc

-Yonik
http://www.lucidimagination.com

> -----Original Message-----
> From: yseeley@gmail.com [mailto:yseeley@gmail.com] On Behalf Of Yonik Seeley
> Sent: Wednesday, June 30, 2010 1:38 PM
> To: solr-user@lucene.apache.org
> Subject: Re: OOM on uninvert field request
>
> On Tue, Jun 29, 2010 at 7:32 PM, Robert Petersen <ro...@buy.com> wrote:
>> Hello I am trying to find the right max and min settings for Java 1.6 on 20GB index with 8 million docs, running 1.6_018 JVM with solr 1.4, and am currently have java set to an even 4GB (export JAVA_OPTS="-Xmx4096m -Xms4096m") for both min and max which is doing pretty well but occasionally still getting the below OOM errors.  We're running on dual quad core xeons with 16GB memory installed.  I've been getting the below OOM exceptions still though.
>>
>> Is the memsize mentioned in the INFO for the uninvert in bytes? is memSize=29604020 mean 29MB?
>
> Yes.
>
>> We have a few hundred of these fields and they contain ints used as IDs, and so I guess could they eat all the memory to uninvert them all after we apply load and enough queries are performed.  Does the field type matter, would int be better than string if these are lookup ids sparsely populated across the index?
>
> No, using UnInvertedField faceting, the fieldType won't matter much at
> all for the space it takes up.
>
> The key here is that it looks like the number of unique terms in these
> fields is low - you would probably do much better with
> facet.method=enum (which iterates over terms rather than documents).
>
> -Yonik
> http://www.lucidimagination.com
>

RE: OOM on uninvert field request

Posted by Robert Petersen <ro...@buy.com>.

Most of these hundreds of facet fields have tens of values but a couple have thousands, is thousands of different values too many to do enum or is that still ok?  If so I could apply it carte blanche to the whole field...

-----Original Message-----
From: yseeley@gmail.com [mailto:yseeley@gmail.com] On Behalf Of Yonik Seeley
Sent: Wednesday, June 30, 2010 1:38 PM
To: solr-user@lucene.apache.org
Subject: Re: OOM on uninvert field request

On Tue, Jun 29, 2010 at 7:32 PM, Robert Petersen <ro...@buy.com> wrote:
> Hello I am trying to find the right max and min settings for Java 1.6 on 20GB index with 8 million docs, running 1.6_018 JVM with solr 1.4, and am currently have java set to an even 4GB (export JAVA_OPTS="-Xmx4096m -Xms4096m") for both min and max which is doing pretty well but occasionally still getting the below OOM errors.  We're running on dual quad core xeons with 16GB memory installed.  I've been getting the below OOM exceptions still though.
>
> Is the memsize mentioned in the INFO for the uninvert in bytes? is memSize=29604020 mean 29MB?

Yes.

> We have a few hundred of these fields and they contain ints used as IDs, and so I guess could they eat all the memory to uninvert them all after we apply load and enough queries are performed.  Does the field type matter, would int be better than string if these are lookup ids sparsely populated across the index?

No, using UnInvertedField faceting, the fieldType won't matter much at
all for the space it takes up.

The key here is that it looks like the number of unique terms in these
fields is low - you would probably do much better with
facet.method=enum (which iterates over terms rather than documents).

-Yonik
http://www.lucidimagination.com

Re: OOM on uninvert field request

Posted by Yonik Seeley <yo...@lucidimagination.com>.

On Tue, Jun 29, 2010 at 7:32 PM, Robert Petersen <ro...@buy.com> wrote:
> Hello I am trying to find the right max and min settings for Java 1.6 on 20GB index with 8 million docs, running 1.6_018 JVM with solr 1.4, and am currently have java set to an even 4GB (export JAVA_OPTS="-Xmx4096m -Xms4096m") for both min and max which is doing pretty well but occasionally still getting the below OOM errors.  We're running on dual quad core xeons with 16GB memory installed.  I've been getting the below OOM exceptions still though.
>
> Is the memsize mentioned in the INFO for the uninvert in bytes? is memSize=29604020 mean 29MB?

Yes.

> We have a few hundred of these fields and they contain ints used as IDs, and so I guess could they eat all the memory to uninvert them all after we apply load and enough queries are performed.  Does the field type matter, would int be better than string if these are lookup ids sparsely populated across the index?

No, using UnInvertedField faceting, the fieldType won't matter much at
all for the space it takes up.

The key here is that it looks like the number of unique terms in these
fields is low - you would probably do much better with
facet.method=enum (which iterates over terms rather than documents).

-Yonik
http://www.lucidimagination.com