You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Tom Burton-West <tb...@umich.edu> on 2013/11/08 19:56:44 UTC

Estimating peak memory use for UnInvertedField faceting

We are considering indexing our 11 million books at a page level, which
comes to about 3 billion Solr documents.

Our subject field  by necessity is multi-valued so the UnInvertedField is
used for faceting.

When testing an index of about 200 million documents, when we do a first
faceting on one field (query appended below), the memory use rises from
about 2.5 GB to 13GB.  If I run GC after the query the memory use goes down
to about 3GB and subsequent queries don't significantly increase the memory
use.

After the query is run various statistics from UnInvertedField are sent to
the log (see below), but they seem to represent the final data structure
rather than the peak.  For example memSize is listed as 1.8GB, while the
temporary data structure was probably closer to 10GB (total 13GB).

Is there a formula for estimating the peak memory size?
Can the statistics spit out to INFO be used to somehow estimate the peak
memory size?

Tom
-----

Nov 08, 2013 1:39:26 PM org.apache.solr.request.UnInvertedField <init>
INFO: UnInverted multi-valued field {field=topicStr,
memSize=1,768,101,824,
tindexSize=86,028,
time=45,854,
phase1=41,039,
nTerms=271,987,
bigTerms=0,
termInstances=569,429,716,
uses=0}
Nov 08, 2013 1:39:28 PM org.apache.solr.core.SolrCore execute

INFO: [core] webapp=/dev-3 path=/select
params={facet=true&facet.mincount=100&indent=true&q=ocr:the&facet.limit=30&facet.field=topicStr&wt=xml}
hits=138,605,690 status=0 QTime=49,797

Re: Estimating peak memory use for UnInvertedField faceting

Posted by iker huerga <ik...@gmail.com>.
Hi,

A similar issue happened to us a while ago and we got around it using
facet.method=enum, see
http://wiki.apache.org/solr/SimpleFacetParameters#facet.method

By default I think it uses the Field Cache to do the grouping/counting which
is very memory intensive.

Enum could take longer than fc but I guess it depends on your requirements
whether that's acceptable or not.

Re. profiling jvm usage etc, I tend to use jstat which gives (at least to
me) a clear picture of what's going n inside the jvm. In this specific case
you will see that most of the memory is being used in the Old Generation
space that's way the only way to clean it up is a full GC
http://docs.oracle.com/javase/1.5.0/docs/tooldocs/share/jstat.html

I use the following command

/usr/bin/java/jdk_.../jstat  -gc 3s pid > /path/to/file

Hope this helps

Best
Iker



--
View this message in context: http://lucene.472066.n3.nabble.com/Estimating-peak-memory-use-for-UnInvertedField-faceting-tp4100044p4114771.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Estimating peak memory use for UnInvertedField faceting

Posted by Otis Gospodnetic <ot...@gmail.com>.
Hi Tom,

I believe Solr will automatically use DocValues for faceting if you've
defined them in the schema.

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Mon, Nov 11, 2013 at 11:33 AM, Tom Burton-West <tb...@umich.edu> wrote:
> Thanks Otis,
>
>  I'm looking forward to the presentation videos.
>
> I'll look into using DocValues.    Re-indexing 200 million docs will take a
> while though :).
> Will Solr automatically use DocValues for faceting if you have DocValues for
> the field or is there some configuration or parameter that needs to be set?
>
> Tom
>
>
> On Sat, Nov 9, 2013 at 9:57 AM, Otis Gospodnetic
> <ot...@gmail.com> wrote:
>>
>> Hi Tom,
>>
>> Check http://blog.sematext.com/2013/11/09/presentation-solr-for-analytics/
>> .  It includes info about our experiment with DocValues, which clearly
>> shows lower heap usage, which means you'll get further without getting
>> this OOM.  In our experiments we didn't sort, facet, or group, and I
>> see you are faceting, which means that DocValues, which are more
>> efficient than FieldCache, should help you even more than it helped
>> us.
>>
>> The graphs are from SPM, which you could use to monitor your Solr
>> cluster, at least while you are tuning it.
>>
>> Otis
>> --
>> Performance Monitoring * Log Analytics * Search Analytics
>> Solr & Elasticsearch Support * http://sematext.com/
>>
>>
>> On Fri, Nov 8, 2013 at 2:41 PM, Tom Burton-West <tb...@umich.edu>
>> wrote:
>> > Hi Yonik,
>> >
>> > I don't know enough about JVM tuning and monitoring to do this in a
>> > clean
>> > way, so I just tried setting the max heap at 8GB and then 6GB to force
>> > garbage collection.  With it set to 6GB it goes into  a long GC loop and
>> > then runs out of heap (See below) .  The stack trace says the issue is
>> > with
>> > DocTErmOrds.uninvert:
>> > Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
>> > at org.apache.lucene.index.DocTermOrds.uninvert(DocTermOrds.java:405)
>> >
>> >  I'm guessing the actual peak is somewhere between 6 and 8 GB.
>> >
>> > BTW: is there some documentation somewhere that explains what the stats
>> > output to INFO mean?
>> >
>> > Tom
>> >
>> >
>> > java.lang.OutOfMemoryError: GC overhead limit exceeded</str><str
>> > name="trace">java.lang.RuntimeException: java.lang.OutOfMemoryError: GC
>> > overhead limit exceeded
>> > at
>> >
>> > org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:653)
>> > at
>> >
>> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:366)
>> > at
>> >
>> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
>> > at
>> >
>> > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215)
>> > at
>> >
>> > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
>> > at
>> >
>> > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
>> > at
>> >
>> > org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172)
>> > at
>> > org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:548)
>> > at
>> >
>> > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
>> > at
>> >
>> > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
>> > at
>> >
>> > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
>> > at
>> >
>> > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174)
>> > at
>> >
>> > org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:875)
>> > at
>> >
>> > org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665)
>> > at
>> >
>> > org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528)
>> > at
>> >
>> > org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81)
>> > at
>> >
>> > org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689)
>> > at java.lang.Thread.run(Thread.java:724)
>> > Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
>> > at org.apache.lucene.index.DocTermOrds.uninvert(DocTermOrds.java:405)
>> > at
>> > org.apache.solr.request.UnInvertedField.<init>(UnInvertedField.java:179)
>> > at
>> >
>> > org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedField.java:664)
>> > at
>> > org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:426)
>> > at
>> >
>> > org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:517)
>> > at
>> >
>> > org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:252)
>> > at
>> >
>> > org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:78)
>> > at
>> >
>> > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)
>> > at
>> >
>> > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
>> > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1817)
>> > at
>> >
>> > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639)
>> > at
>> >
>> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
>> > ... 16 more
>> > </str>
>> >
>> > ---
>> > Nov 08, 2013 1:39:26 PM org.apache.solr.request.UnInvertedField <init>
>> > INFO: UnInverted multi-valued field {field=topicStr,
>> > memSize=1,768,101,824,
>> > tindexSize=86,028,
>> > time=45,854,
>> > phase1=41,039,
>> > nTerms=271,987,
>> > bigTerms=0,
>> > termInstances=569,429,716,
>> > uses=0}
>> > Nov 08, 2013 1:39:28 PM org.apache.solr.core.SolrCore execute
>> >
>> > INFO: [core] webapp=/dev-3 path=/select
>> >
>> > params={facet=true&facet.mincount=100&indent=true&q=ocr:the&facet.limit=30&facet.field=topicStr&wt=xml}
>> > hits=138,605,690 status=0 QTime=49,797
>> >
>> >
>> >
>> > On Fri, Nov 8, 2013 at 2:01 PM, Yonik Seeley <yo...@heliosearch.com>
>> > wrote:
>> >>
>> >> On Fri, Nov 8, 2013 at 1:56 PM, Tom Burton-West <tb...@umich.edu>
>> >> wrote:
>> >> > When testing an index of about 200 million documents, when we do a
>> >> > first
>> >> > faceting on one field (query appended below), the memory use rises
>> >> > from
>> >> > about 2.5 GB to 13GB.  If I run GC after the query the memory use
>> >> > goes
>> >> > down
>> >> > to about 3GB and subsequent queries don't significantly increase the
>> >> > memory
>> >> > use.
>> >>
>> >> Is there a way to tell what the real max memory usage is?  I assume
>> >> 13GB is just the peak heap usage, but that could include a lot of
>> >> garbage.
>> >>
>> >> -Yonik
>> >> http://heliosearch.com -- making solr shine
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> >> For additional commands, e-mail: dev-help@lucene.apache.org
>> >>
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Estimating peak memory use for UnInvertedField faceting

Posted by Tom Burton-West <tb...@umich.edu>.
Thanks Otis,

 I'm looking forward to the presentation videos.

I'll look into using DocValues.    Re-indexing 200 million docs will take a
while though :).
Will Solr automatically use DocValues for faceting if you have DocValues
for the field or is there some configuration or parameter that needs to be
set?

Tom


On Sat, Nov 9, 2013 at 9:57 AM, Otis Gospodnetic <otis.gospodnetic@gmail.com
> wrote:

> Hi Tom,
>
> Check http://blog.sematext.com/2013/11/09/presentation-solr-for-analytics/
> .  It includes info about our experiment with DocValues, which clearly
> shows lower heap usage, which means you'll get further without getting
> this OOM.  In our experiments we didn't sort, facet, or group, and I
> see you are faceting, which means that DocValues, which are more
> efficient than FieldCache, should help you even more than it helped
> us.
>
> The graphs are from SPM, which you could use to monitor your Solr
> cluster, at least while you are tuning it.
>
> Otis
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
>
>
> On Fri, Nov 8, 2013 at 2:41 PM, Tom Burton-West <tb...@umich.edu>
> wrote:
> > Hi Yonik,
> >
> > I don't know enough about JVM tuning and monitoring to do this in a clean
> > way, so I just tried setting the max heap at 8GB and then 6GB to force
> > garbage collection.  With it set to 6GB it goes into  a long GC loop and
> > then runs out of heap (See below) .  The stack trace says the issue is
> with
> > DocTErmOrds.uninvert:
> > Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
> > at org.apache.lucene.index.DocTermOrds.uninvert(DocTermOrds.java:405)
> >
> >  I'm guessing the actual peak is somewhere between 6 and 8 GB.
> >
> > BTW: is there some documentation somewhere that explains what the stats
> > output to INFO mean?
> >
> > Tom
> >
> >
> > java.lang.OutOfMemoryError: GC overhead limit exceeded</str><str
> > name="trace">java.lang.RuntimeException: java.lang.OutOfMemoryError: GC
> > overhead limit exceeded
> > at
> >
> org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:653)
> > at
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:366)
> > at
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
> > at
> >
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215)
> > at
> >
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
> > at
> >
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
> > at
> >
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172)
> > at
> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:548)
> > at
> >
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
> > at
> >
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
> > at
> >
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
> > at
> >
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174)
> > at
> >
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:875)
> > at
> >
> org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665)
> > at
> >
> org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528)
> > at
> >
> org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81)
> > at
> >
> org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689)
> > at java.lang.Thread.run(Thread.java:724)
> > Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
> > at org.apache.lucene.index.DocTermOrds.uninvert(DocTermOrds.java:405)
> > at
> org.apache.solr.request.UnInvertedField.<init>(UnInvertedField.java:179)
> > at
> >
> org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedField.java:664)
> > at
> org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:426)
> > at
> >
> org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:517)
> > at
> >
> org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:252)
> > at
> >
> org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:78)
> > at
> >
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)
> > at
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1817)
> > at
> >
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639)
> > at
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
> > ... 16 more
> > </str>
> >
> > ---
> > Nov 08, 2013 1:39:26 PM org.apache.solr.request.UnInvertedField <init>
> > INFO: UnInverted multi-valued field {field=topicStr,
> > memSize=1,768,101,824,
> > tindexSize=86,028,
> > time=45,854,
> > phase1=41,039,
> > nTerms=271,987,
> > bigTerms=0,
> > termInstances=569,429,716,
> > uses=0}
> > Nov 08, 2013 1:39:28 PM org.apache.solr.core.SolrCore execute
> >
> > INFO: [core] webapp=/dev-3 path=/select
> >
> params={facet=true&facet.mincount=100&indent=true&q=ocr:the&facet.limit=30&facet.field=topicStr&wt=xml}
> > hits=138,605,690 status=0 QTime=49,797
> >
> >
> >
> > On Fri, Nov 8, 2013 at 2:01 PM, Yonik Seeley <yo...@heliosearch.com>
> wrote:
> >>
> >> On Fri, Nov 8, 2013 at 1:56 PM, Tom Burton-West <tb...@umich.edu>
> >> wrote:
> >> > When testing an index of about 200 million documents, when we do a
> first
> >> > faceting on one field (query appended below), the memory use rises
> from
> >> > about 2.5 GB to 13GB.  If I run GC after the query the memory use goes
> >> > down
> >> > to about 3GB and subsequent queries don't significantly increase the
> >> > memory
> >> > use.
> >>
> >> Is there a way to tell what the real max memory usage is?  I assume
> >> 13GB is just the peak heap usage, but that could include a lot of
> >> garbage.
> >>
> >> -Yonik
> >> http://heliosearch.com -- making solr shine
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: dev-help@lucene.apache.org
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Re: Estimating peak memory use for UnInvertedField faceting

Posted by Otis Gospodnetic <ot...@gmail.com>.
Hi Tom,

Check http://blog.sematext.com/2013/11/09/presentation-solr-for-analytics/
.  It includes info about our experiment with DocValues, which clearly
shows lower heap usage, which means you'll get further without getting
this OOM.  In our experiments we didn't sort, facet, or group, and I
see you are faceting, which means that DocValues, which are more
efficient than FieldCache, should help you even more than it helped
us.

The graphs are from SPM, which you could use to monitor your Solr
cluster, at least while you are tuning it.

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Fri, Nov 8, 2013 at 2:41 PM, Tom Burton-West <tb...@umich.edu> wrote:
> Hi Yonik,
>
> I don't know enough about JVM tuning and monitoring to do this in a clean
> way, so I just tried setting the max heap at 8GB and then 6GB to force
> garbage collection.  With it set to 6GB it goes into  a long GC loop and
> then runs out of heap (See below) .  The stack trace says the issue is with
> DocTErmOrds.uninvert:
> Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
> at org.apache.lucene.index.DocTermOrds.uninvert(DocTermOrds.java:405)
>
>  I'm guessing the actual peak is somewhere between 6 and 8 GB.
>
> BTW: is there some documentation somewhere that explains what the stats
> output to INFO mean?
>
> Tom
>
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded</str><str
> name="trace">java.lang.RuntimeException: java.lang.OutOfMemoryError: GC
> overhead limit exceeded
> at
> org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:653)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:366)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
> at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215)
> at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
> at
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
> at
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172)
> at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:548)
> at
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
> at
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
> at
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
> at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174)
> at
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:875)
> at
> org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665)
> at
> org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528)
> at
> org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81)
> at
> org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689)
> at java.lang.Thread.run(Thread.java:724)
> Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
> at org.apache.lucene.index.DocTermOrds.uninvert(DocTermOrds.java:405)
> at org.apache.solr.request.UnInvertedField.<init>(UnInvertedField.java:179)
> at
> org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedField.java:664)
> at org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:426)
> at
> org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:517)
> at
> org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:252)
> at
> org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:78)
> at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1817)
> at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
> ... 16 more
> </str>
>
> ---
> Nov 08, 2013 1:39:26 PM org.apache.solr.request.UnInvertedField <init>
> INFO: UnInverted multi-valued field {field=topicStr,
> memSize=1,768,101,824,
> tindexSize=86,028,
> time=45,854,
> phase1=41,039,
> nTerms=271,987,
> bigTerms=0,
> termInstances=569,429,716,
> uses=0}
> Nov 08, 2013 1:39:28 PM org.apache.solr.core.SolrCore execute
>
> INFO: [core] webapp=/dev-3 path=/select
> params={facet=true&facet.mincount=100&indent=true&q=ocr:the&facet.limit=30&facet.field=topicStr&wt=xml}
> hits=138,605,690 status=0 QTime=49,797
>
>
>
> On Fri, Nov 8, 2013 at 2:01 PM, Yonik Seeley <yo...@heliosearch.com> wrote:
>>
>> On Fri, Nov 8, 2013 at 1:56 PM, Tom Burton-West <tb...@umich.edu>
>> wrote:
>> > When testing an index of about 200 million documents, when we do a first
>> > faceting on one field (query appended below), the memory use rises from
>> > about 2.5 GB to 13GB.  If I run GC after the query the memory use goes
>> > down
>> > to about 3GB and subsequent queries don't significantly increase the
>> > memory
>> > use.
>>
>> Is there a way to tell what the real max memory usage is?  I assume
>> 13GB is just the peak heap usage, but that could include a lot of
>> garbage.
>>
>> -Yonik
>> http://heliosearch.com -- making solr shine
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Estimating peak memory use for UnInvertedField faceting

Posted by Tom Burton-West <tb...@umich.edu>.
Hi Yonik,

I don't know enough about JVM tuning and monitoring to do this in a clean
way, so I just tried setting the max heap at 8GB and then 6GB to force
garbage collection.  With it set to 6GB it goes into  a long GC loop and
then runs out of heap (See below) .  The stack trace says the issue is with
DocTErmOrds.uninvert:
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
at org.apache.lucene.index.DocTermOrds.uninvert(DocTermOrds.java:405)

 I'm guessing the actual peak is somewhere between 6 and 8 GB.

BTW: is there some documentation somewhere that explains what the stats
output to INFO mean?

Tom


java.lang.OutOfMemoryError: GC overhead limit exceeded</str><str
name="trace">java.lang.RuntimeException: java.lang.OutOfMemoryError: GC
overhead limit exceeded
at
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:653)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:366)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:548)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:875)
at
org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665)
at
org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528)
at
org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81)
at
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689)
at java.lang.Thread.run(Thread.java:724)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
at org.apache.lucene.index.DocTermOrds.uninvert(DocTermOrds.java:405)
at org.apache.solr.request.UnInvertedField.<init>(UnInvertedField.java:179)
at
org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedField.java:664)
at org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:426)
at
org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:517)
at
org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:252)
at
org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:78)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1817)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
... 16 more
</str>

---
Nov 08, 2013 1:39:26 PM org.apache.solr.request.UnInvertedField <init>
INFO: UnInverted multi-valued field {field=topicStr,
memSize=1,768,101,824,
tindexSize=86,028,
time=45,854,
phase1=41,039,
nTerms=271,987,
bigTerms=0,
termInstances=569,429,716,
uses=0}
Nov 08, 2013 1:39:28 PM org.apache.solr.core.SolrCore execute

INFO: [core] webapp=/dev-3 path=/select
params={facet=true&facet.mincount=100&indent=true&q=ocr:the&facet.limit=30&facet.field=topicStr&wt=xml}
hits=138,605,690 status=0 QTime=49,797



On Fri, Nov 8, 2013 at 2:01 PM, Yonik Seeley <yo...@heliosearch.com> wrote:

> On Fri, Nov 8, 2013 at 1:56 PM, Tom Burton-West <tb...@umich.edu>
> wrote:
> > When testing an index of about 200 million documents, when we do a first
> > faceting on one field (query appended below), the memory use rises from
> > about 2.5 GB to 13GB.  If I run GC after the query the memory use goes
> down
> > to about 3GB and subsequent queries don't significantly increase the
> memory
> > use.
>
> Is there a way to tell what the real max memory usage is?  I assume
> 13GB is just the peak heap usage, but that could include a lot of
> garbage.
>
> -Yonik
> http://heliosearch.com -- making solr shine
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Re: Estimating peak memory use for UnInvertedField faceting

Posted by Yonik Seeley <yo...@heliosearch.com>.
On Fri, Nov 8, 2013 at 1:56 PM, Tom Burton-West <tb...@umich.edu> wrote:
> When testing an index of about 200 million documents, when we do a first
> faceting on one field (query appended below), the memory use rises from
> about 2.5 GB to 13GB.  If I run GC after the query the memory use goes down
> to about 3GB and subsequent queries don't significantly increase the memory
> use.

Is there a way to tell what the real max memory usage is?  I assume
13GB is just the peak heap usage, but that could include a lot of
garbage.

-Yonik
http://heliosearch.com -- making solr shine

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org