You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by vybe3142 <vy...@gmail.com> on 2012/03/21 22:42:02 UTC

Solr / Tika crashing when attempting to index large files

While waiting for someohe to help answer my multicore config issue :),... I
decided to test SOLR's limits on a single instance/core config.

We occasionally need to index large text files (that must not be broken up).
This results in an out of memory error. I tried increasing tomcat's heap
size to 1 GB but this doesn't help. Is there an alternative approach that
addresses this issue?

The following trace was caused by an attempt to index a 300 MB text file.

Mar 21, 2012 4:38:33 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOf(Arrays.java:2882)
        at
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
        at
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:515)
        at java.lang.StringBuilder.append(StringBuilder.java:189)
        at
org.apache.solr.handler.extraction.SolrContentHandler.characters(SolrContentHandler.java:292)
        at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
        at
org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:222)
        at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
        at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
        at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
        at
org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
        at
org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
        at
org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
        at
org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:294)
        at
org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:268)
        at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:132)
        at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
        at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
        at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:129)
        at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:213)
        at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58)
        at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
        at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372)
        at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
        at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
        at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
        at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
        at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
        at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
        at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)



--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-Tika-crashing-when-attempting-to-index-large-files-tp3846939p3846939.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr / Tika crashing when attempting to index large files

Posted by Erick Erickson <er...@gmail.com>.
Why stop at 1G? But no, it's really all-or-nothing when you blast a file
at Solr. But be sure you're bumping the _solr_ heap, not just Tomcat's
heap.

Best
Erick

On Wed, Mar 21, 2012 at 5:42 PM, vybe3142 <vy...@gmail.com> wrote:
> While waiting for someohe to help answer my multicore config issue :),... I
> decided to test SOLR's limits on a single instance/core config.
>
> We occasionally need to index large text files (that must not be broken up).
> This results in an out of memory error. I tried increasing tomcat's heap
> size to 1 GB but this doesn't help. Is there an alternative approach that
> addresses this issue?
>
> The following trace was caused by an attempt to index a 300 MB text file.
>
> Mar 21, 2012 4:38:33 PM org.apache.solr.common.SolrException log
> SEVERE: java.lang.OutOfMemoryError: Java heap space
>        at java.util.Arrays.copyOf(Arrays.java:2882)
>        at
> java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
>        at
> java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:515)
>        at java.lang.StringBuilder.append(StringBuilder.java:189)
>        at
> org.apache.solr.handler.extraction.SolrContentHandler.characters(SolrContentHandler.java:292)
>        at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>        at
> org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:222)
>        at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>        at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>        at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>        at
> org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
>        at
> org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
>        at
> org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
>        at
> org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:294)
>        at
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:268)
>        at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:132)
>        at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
>        at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
>        at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:129)
>        at
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:213)
>        at
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58)
>        at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
>        at
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241)
>        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372)
>        at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
>        at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
>        at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>        at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>        at
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>        at
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
>        at
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
>        at
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Solr-Tika-crashing-when-attempting-to-index-large-files-tp3846939p3846939.html
> Sent from the Solr - User mailing list archive at Nabble.com.