You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Vitaliy Zhovtyuk (JIRA)" <ji...@apache.org> on 2014/08/05 22:51:14 UTC
[jira] [Updated] (SOLR-3881) frequent OOM in
LanguageIdentifierUpdateProcessor
[ https://issues.apache.org/jira/browse/SOLR-3881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vitaliy Zhovtyuk updated SOLR-3881:
-----------------------------------
Attachment: SOLR-3881.patch
About moving concatFields() to the tika language identifier: I think the way to go is just move the whole method there, then change the detectLanguage() method to take the SolrInputDocument instead of a String. You don't need to carry over the field[] parameter from concatFields(), since data member inputFields will be accessible everywhere it's needed.
[VZ] This call looks more cleaner now, i changed inputFields to private now to reduce visibility scope
I should have mentioned previously: I don't like the maxAppendSize and maxTotalAppendSize names - "size" is ambiguous (could refer to bytes, chars, whatever), and "append" refers to an internal operation... I'd like to see "append"=>"field value" and "size"=>"chars": maxFieldValueChars, and maxTotalChars (since appending doesn't need to be mentioned for the global limit). The same thing goes for the default constants and the test method names.
[VZ] Renamed parameters and test methods
Some minor issues I found with your patch:
As I said previously: "We should also set default maxima for both per-value and total chars, rather than MAX_INT, as in the current patch."
The total chars default should be its own setting; I was thinking we could make it double the per-value default?
[VZ] added default value to maxTotalChars and changed both to 10K like in com.cybozu.labs.langdetect.Detector.maxLength
It's better not to reorder import statements unless you're already making significant changes to them; it distracts from the meat of the change. (You reordered them in LangDetectLanguageIdentifierUpdateProcessor and LanguageIdentifierUpdateProcessorFactoryTestCase)
[VZ] This is IDE optimization to put imports in alphabetical order - restored it to original order
In LanguageIdentifierUpdateProcessor.concatFields(), when you trim the concatenated text to maxTotalAppendSize, I think StringBuilder.setLength(maxTotalAppendSize); would be more efficient than StringBuilder.delete(maxTotalAppendSize, sb.length() - 1);
[VZ] Yep, cleaned that
In addition to the test you added for the global limit, we should also test using both the per-value and global limits at the same time.
[VZ] Tests for both limits added
> frequent OOM in LanguageIdentifierUpdateProcessor
> -------------------------------------------------
>
> Key: SOLR-3881
> URL: https://issues.apache.org/jira/browse/SOLR-3881
> Project: Solr
> Issue Type: Bug
> Components: update
> Affects Versions: 4.0
> Environment: CentOS 6.x, JDK 1.6, (java -server -Xms2G -Xmx2G -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=....)
> Reporter: Rob Tulloh
> Fix For: 4.9, 5.0
>
> Attachments: SOLR-3881.patch, SOLR-3881.patch, SOLR-3881.patch, SOLR-3881.patch
>
>
> We are seeing frequent failures from Solr causing it to OOM. Here is the stack trace we observe when this happens:
> {noformat}
> Caused by: java.lang.OutOfMemoryError: Java heap space
> at java.util.Arrays.copyOf(Arrays.java:2882)
> at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
> at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390)
> at java.lang.StringBuffer.append(StringBuffer.java:224)
> at org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.concatFields(LanguageIdentifierUpdateProcessor.java:286)
> at org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.process(LanguageIdentifierUpdateProcessor.java:189)
> at org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.processAdd(LanguageIdentifierUpdateProcessor.java:171)
> at org.apache.solr.handler.BinaryUpdateRequestHandler$2.update(BinaryUpdateRequestHandler.java:90)
> at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:140)
> at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:120)
> at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:221)
> at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:105)
> at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:186)
> at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:112)
> at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:147)
> at org.apache.solr.handler.BinaryUpdateRequestHandler.parseAndLoadDocs(BinaryUpdateRequestHandler.java:100)
> at org.apache.solr.handler.BinaryUpdateRequestHandler.access$000(BinaryUpdateRequestHandler.java:47)
> at org.apache.solr.handler.BinaryUpdateRequestHandler$1.load(BinaryUpdateRequestHandler.java:58)
> at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59)
> at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540)
> at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:435)
> at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256)
> at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
> at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
> at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
> at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
> at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
> at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
> at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)
> at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
> at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org