You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Gargate, Siddharth" <sg...@ptc.com> on 2009/05/04 06:36:51 UTC

RE: OutofMemory on Highlightling

Hi all,

I tried few changes in DefaultSolrHighter.doHighlighting method to avoid
OOM errors. The code changes works fine with 256 MB max heap.

.....
     //  searcher.readDocs(readDocs, docs, fset);         //commented
out the readDocs call. This method was fetching the stored fields for
all rows
.....

// Highlight each document
    DocIterator iterator = docs.iterator();
    for (int i = 0; i < docs.size(); i++) {
       int docId = iterator.nextDoc();
     //  Document doc = readDocs[i];		
       Document doc = searcher.doc(i, fset);	//commented the line to
read the Document from readDocs array. Instead now calling the
searcher.doc method to fetch the Document object one by one. 

....
....

With the above changes Memory usage is extremely reduced. 

There was one more change required so that highlighting for alternate
field also works properly without OOM.

...
altList.add( len + altText.length() > alternateFieldLen ?
                                   altText.substring( 0,
alternateFieldLen - len ) : altText );
...

Modified the above line to :

altList.add( len + altText.length() > alternateFieldLen ?
                                   new String(altText.substring( 0,
alternateFieldLen - len )) : altText );

The substring is passed to create a new string object so that no
reference is held for the entire string.


Please let me know if this is a valid fix. Should I open an issue in
jira for this issue?

One issue I observed that search takes around 20-25 seconds. May be
because we are reading 1 MB text for 500 documents. 

Thanks,
Siddharth


-----Original Message-----
From: Gargate, Siddharth [mailto:sgargate@ptc.com] 
Sent: Tuesday, April 28, 2009 4:35 PM
To: solr-user@lucene.apache.org; solr-dev@lucene.apache.org
Subject: RE: OutofMemory on Highlightling

Is it possible to read only maxAnalyzedChars from the stored field
instead of reading the complete field in the memory? For instance, in my
case, is it possible to read only first 50K characters instead of
complete 1 MB stored text? That will help minimizing the memory usage
(Though, it will still take 50K * 500 * 2 = 50 MB for 500 results). 

I would really appreciate some feedback on this issue...

Thanks,
Siddharth


-----Original Message-----
From: Gargate, Siddharth [mailto:sgargate@ptc.com] 
Sent: Friday, April 24, 2009 10:46 AM
To: solr-user@lucene.apache.org
Subject: RE: OutofMemory on Highlightling

I am not sure whether lazy loading should help solve this problem. I
have set enableLazyFieldLoading to true but it is not helping.

I went through the code and observed that
DefaultSolrHighlighter.doHighlighting is reading all the documents and
the fields for highlighting (In my case, 1 MB stored field is read for
all documents). 

Also I am confused over the following code in SolrIndexSearcher.doc()
method

if(!enableLazyFieldLoading || fields == null) {
      d = searcher.getIndexReader().document(i);
    } else {
      d = searcher.getIndexReader().document(i, 
             new SetNonLazyFieldSelector(fields));
    }

Are we setting the fields as NonLazy even if lazy loading is enabled?

Thanks,
Siddharth

-----Original Message-----
From: Gargate, Siddharth [mailto:sgargate@ptc.com] 
Sent: Wednesday, April 22, 2009 11:12 AM
To: solr-user@lucene.apache.org
Subject: RE: OutofMemory on Highlightling

Here is the stack trace

SEVERE: java.lang.OutOfMemoryError: Java heap space
        at
java.lang.StringCoding$StringDecoder.decode(StringCoding.java:133)
        at java.lang.StringCoding.decode(StringCoding.java:173)
        at java.lang.String.<init>(String.java:444)
        at
org.apache.lucene.store.IndexInput.readString(IndexInput.java:125)
        at
org.apache.lucene.index.FieldsReader.addField(FieldsReader.java:390)
        at
org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:230)
        at
org.apache.lucene.index.SegmentReader.document(SegmentReader.java:892)
        at
org.apache.lucene.index.MultiSegmentReader.document(MultiSegmentReader.j
ava:277)
        at
org.apache.solr.search.SolrIndexReader.document(SolrIndexReader.java:176
)
        at
org.apache.solr.search.SolrIndexSearcher.doc(SolrIndexSearcher.java:457)
        at
org.apache.solr.search.SolrIndexSearcher.readDocs(SolrIndexSearcher.java
:482)
        at
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultS
olrHighlighter.java:253)
        at
org.apache.solr.handler.component.HighlightComponent.process(HighlightCo
mponent.java:84)
        at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(Search
Handler.java:195)
        at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB
ase.java:131)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333)
        at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.ja
va:303)
        at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j
ava:232)
        at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applica
tionFilterChain.java:235)
        at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilt
erChain.java:206)
        at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValv
e.java:233)
        at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValv
e.java:191)
        at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java
:128)
        at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java
:102)
        at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.
java:109)
        at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:2
86)
        at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:84
5)
        at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(
Http11Protocol.java:583)
        at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
        at java.lang.Thread.run(Thread.java:619)



-----Original Message-----
From: Gargate, Siddharth [mailto:sgargate@ptc.com] 
Sent: Wednesday, April 22, 2009 9:29 AM
To: solr-user@lucene.apache.org
Subject: RE: OutofMemory on Highlightling

I tried disabling the documentCache but still the same issue. 

<documentCache
      class="solr.LRUCache"
      size="0"
      initialSize="0"
      autowarmCount="0"/>



-----Original Message-----
From: Koji Sekiguchi [mailto:koji@r.email.ne.jp] 
Sent: Monday, April 20, 2009 4:38 PM
To: solr-user@lucene.apache.org
Subject: Re: OutofMemory on Highlightling

Gargate, Siddharth wrote:
> Anybody facing the same issue? Following is my configuration
> ...
> <field name="content" type="text" indexed="true" stored="false"
> multiValued="true"/>
> <field name="teaser" type="text" indexed="false" stored="true"/>
> <copyField source="content" dest="teaser" maxChars="1000000" />
> ...
>
> ...
> <requestHandler name="standard" class="solr.SearchHandler"
> default="true">
>      <lst name="defaults">
>        <str name="echoParams">explicit</str>
>        
>        <int name="rows">500</int>
> 	   <str name="hl">true</str>
> 		<str name="fl">id,score</str>
> 	    <str name="hl.fl">teaser</str>
> 		<str name="hl.alternateField">teaser</str>
> 		<int name="hl.fragsize">200</int>
> 		<int name="hl.maxAlternateFieldLength">200</int>
> 		<int name="hl.maxAnalyzedChars">500</int>
>      </lst>
>   </requestHandler>
> ...
>
> Search works fine if I disable highlighting and it brings 500 results.
> But if I enable hightlighting and set the no. of rows to just 20 I get
> OOME.
>
>   
How about switching documentCache off?

Koji



RE: OutofMemory on Highlightling

Posted by "Gargate, Siddharth" <sg...@ptc.com>.
I have opened an issue in jira to fix this issue

https://issues.apache.org/jira/browse/SOLR-1150


-----Original Message-----
From: Gargate, Siddharth [mailto:sgargate@ptc.com] 
Sent: Monday, May 04, 2009 10:07 AM
To: solr-dev@lucene.apache.org
Subject: RE: OutofMemory on Highlightling

Hi all,

I tried few changes in DefaultSolrHighter.doHighlighting method to avoid
OOM errors. The code changes works fine with 256 MB max heap.

.....
     //  searcher.readDocs(readDocs, docs, fset);         //commented
out the readDocs call. This method was fetching the stored fields for
all rows
.....

// Highlight each document
    DocIterator iterator = docs.iterator();
    for (int i = 0; i < docs.size(); i++) {
       int docId = iterator.nextDoc();
     //  Document doc = readDocs[i];		
       Document doc = searcher.doc(i, fset);	//commented the line to
read the Document from readDocs array. Instead now calling the
searcher.doc method to fetch the Document object one by one. 

....
....

With the above changes Memory usage is extremely reduced. 

There was one more change required so that highlighting for alternate
field also works properly without OOM.

...
altList.add( len + altText.length() > alternateFieldLen ?
                                   altText.substring( 0,
alternateFieldLen - len ) : altText );
...

Modified the above line to :

altList.add( len + altText.length() > alternateFieldLen ?
                                   new String(altText.substring( 0,
alternateFieldLen - len )) : altText );

The substring is passed to create a new string object so that no
reference is held for the entire string.


Please let me know if this is a valid fix. Should I open an issue in
jira for this issue?

One issue I observed that search takes around 20-25 seconds. May be
because we are reading 1 MB text for 500 documents. 

Thanks,
Siddharth


-----Original Message-----
From: Gargate, Siddharth [mailto:sgargate@ptc.com] 
Sent: Tuesday, April 28, 2009 4:35 PM
To: solr-user@lucene.apache.org; solr-dev@lucene.apache.org
Subject: RE: OutofMemory on Highlightling

Is it possible to read only maxAnalyzedChars from the stored field
instead of reading the complete field in the memory? For instance, in my
case, is it possible to read only first 50K characters instead of
complete 1 MB stored text? That will help minimizing the memory usage
(Though, it will still take 50K * 500 * 2 = 50 MB for 500 results). 

I would really appreciate some feedback on this issue...

Thanks,
Siddharth


-----Original Message-----
From: Gargate, Siddharth [mailto:sgargate@ptc.com] 
Sent: Friday, April 24, 2009 10:46 AM
To: solr-user@lucene.apache.org
Subject: RE: OutofMemory on Highlightling

I am not sure whether lazy loading should help solve this problem. I
have set enableLazyFieldLoading to true but it is not helping.

I went through the code and observed that
DefaultSolrHighlighter.doHighlighting is reading all the documents and
the fields for highlighting (In my case, 1 MB stored field is read for
all documents). 

Also I am confused over the following code in SolrIndexSearcher.doc()
method

if(!enableLazyFieldLoading || fields == null) {
      d = searcher.getIndexReader().document(i);
    } else {
      d = searcher.getIndexReader().document(i, 
             new SetNonLazyFieldSelector(fields));
    }

Are we setting the fields as NonLazy even if lazy loading is enabled?

Thanks,
Siddharth