You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Vitaliy Zhovtyuk (JIRA)" <ji...@apache.org> on 2014/08/05 22:51:14 UTC
[jira] [Updated] (SOLR-3881) frequent OOM in LanguageIdentifierUpdateProcessor

     [ https://issues.apache.org/jira/browse/SOLR-3881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vitaliy Zhovtyuk updated SOLR-3881:
-----------------------------------

    Attachment: SOLR-3881.patch

About moving concatFields() to the tika language identifier: I think the way to go is just move the whole method there, then change the detectLanguage() method to take the SolrInputDocument instead of a String. You don't need to carry over the field[] parameter from concatFields(), since data member inputFields will be accessible everywhere it's needed.
[VZ] This call looks more cleaner now, i changed inputFields to private now to reduce visibility scope

I should have mentioned previously: I don't like the maxAppendSize and maxTotalAppendSize names - "size" is ambiguous (could refer to bytes, chars, whatever), and "append" refers to an internal operation... I'd like to see "append"=>"field value" and "size"=>"chars": maxFieldValueChars, and maxTotalChars (since appending doesn't need to be mentioned for the global limit). The same thing goes for the default constants and the test method names.
[VZ] Renamed parameters and test methods

Some minor issues I found with your patch:
As I said previously: "We should also set default maxima for both per-value and total chars, rather than MAX_INT, as in the current patch."
The total chars default should be its own setting; I was thinking we could make it double the per-value default?
[VZ] added default value to maxTotalChars and changed both to 10K like in com.cybozu.labs.langdetect.Detector.maxLength

It's better not to reorder import statements unless you're already making significant changes to them; it distracts from the meat of the change. (You reordered them in LangDetectLanguageIdentifierUpdateProcessor and LanguageIdentifierUpdateProcessorFactoryTestCase)
[VZ] This is IDE optimization to put imports in alphabetical order - restored it to original order

In LanguageIdentifierUpdateProcessor.concatFields(), when you trim the concatenated text to maxTotalAppendSize, I think StringBuilder.setLength(maxTotalAppendSize); would be more efficient than StringBuilder.delete(maxTotalAppendSize, sb.length() - 1);
[VZ] Yep, cleaned that 

In addition to the test you added for the global limit, we should also test using both the per-value and global limits at the same time.
[VZ] Tests for both limits added

> frequent OOM in LanguageIdentifierUpdateProcessor
> -------------------------------------------------
>
>                 Key: SOLR-3881
>                 URL: https://issues.apache.org/jira/browse/SOLR-3881
>             Project: Solr
>          Issue Type: Bug
>          Components: update
>    Affects Versions: 4.0
>         Environment: CentOS 6.x, JDK 1.6, (java -server -Xms2G -Xmx2G -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=....)
>            Reporter: Rob Tulloh
>             Fix For: 4.9, 5.0
>
>         Attachments: SOLR-3881.patch, SOLR-3881.patch, SOLR-3881.patch, SOLR-3881.patch
>
>
> We are seeing frequent failures from Solr causing it to OOM. Here is the stack trace we observe when this happens:
> {noformat}
> Caused by: java.lang.OutOfMemoryError: Java heap space
>         at java.util.Arrays.copyOf(Arrays.java:2882)
>         at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
>         at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390)
>         at java.lang.StringBuffer.append(StringBuffer.java:224)
>         at org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.concatFields(LanguageIdentifierUpdateProcessor.java:286)
>         at org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.process(LanguageIdentifierUpdateProcessor.java:189)
>         at org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.processAdd(LanguageIdentifierUpdateProcessor.java:171)
>         at org.apache.solr.handler.BinaryUpdateRequestHandler$2.update(BinaryUpdateRequestHandler.java:90)
>         at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:140)
>         at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:120)
>         at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:221)
>         at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:105)
>         at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:186)
>         at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:112)
>         at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:147)
>         at org.apache.solr.handler.BinaryUpdateRequestHandler.parseAndLoadDocs(BinaryUpdateRequestHandler.java:100)
>         at org.apache.solr.handler.BinaryUpdateRequestHandler.access$000(BinaryUpdateRequestHandler.java:47)
>         at org.apache.solr.handler.BinaryUpdateRequestHandler$1.load(BinaryUpdateRequestHandler.java:58)
>         at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59)
>         at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
>         at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540)
>         at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:435)
>         at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256)
>         at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
>         at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
>         at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
>         at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
>         at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
>         at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
>         at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)
>         at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
>         at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org