You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Pranav Prakash <pr...@gmail.com> on 2011/12/09 09:41:32 UTC

Highlighting uses lots of memory and eventually slows down Solr

Hi Group,

I would like to have highlighting for search and I have the fields indexed
with the following schema (Solr 3.4)

<fieldType name="text_commongrams" class="solr.TextField">
 <analyzer>
<charFilter class="solr.HTMLStripCharFilterFactory"/>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
<filter class="solr.TrimFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="English"
protected="protwords.txt"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase
="true" expand="true"/>
<filter class="solr.CommonGramsFilterFactory" words="stopwords_en.txt"
ignoreCase="true"/>
<filter class="solr.StopFilterFactory" words="stopwords_en.txt" ignoreCase="
true"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0
"preserveOriginal="1"/>
</analyzer>
</fieldType>

<field name="transcript" type="text_commongrams" indexed="true" stored="true
" termVectors="true" termPositions="true" termOffsets="true"/>

<dynamicField name="*_en" type="text_commongrams" indexed="true" stored="
true" termVectors="true" termPositions="true" termOffsets="true"/>

And the following config

<highlighting>
 <fragmenter name="gap" class="org.apache.solr.highlight.GapFragmenter"
default="true">
 <lst name="defaults">
<int name="hl.fragsize">100</int>
</lst>
</fragmenter>
<fragmenter name="regex" class="org.apache.solr.highlight.RegexFragmenter">
 <lst name="defaults">
<int name="hl.fragsize">20</int>
<float name="hl.regex.slop">0.5</float>
<str name="hl.regex.pattern">[-\w ,/\n\"']{20,200}</str>
</lst>
</fragmenter>
<formatter name="html" class="org.apache.solr.highlight.HtmlFormatter"
default="true">
 <lst name="defaults">
 <str name="hl.simple.pre">
<![CDATA[ <strong> ]]>
</str>
<str name="hl.simple.post">
<![CDATA[ </strong> ]]>
</str>
</lst>
</formatter>
</highlighting>

The problem is that when I turn on highlighting, I face memory issues. The
Memory usage on system goes higher and higher until it consumes all the
memory (I dont receive OOM errors, there is always like 300 MB free
memory). The total memory I have is 48GiB. My Index size is 138GiB and
there are about 10m documents in the index.

I also get the following warning, but I am not sure how to get it done.

WARNING: Deprecated syntax found. <highlighting/> should move to
<searchComponent/>

My Solr log with highlighting turned on looks something like this

[core0] webapp=/solr path=/select
params={mm=3<90%25&qf=title^2&hl.simple.pre=<strong>&hl.fl=title,transcript,transcript_en&wt=ruby&hl=true&rows=12&defType=dismax&fl=id,title,description&debugQuery=false&start=0&q=asdfghjkl&bf=recip(ms(NOW,created_at),1.88e-11,1,1)&hl.simple.post=</strong>&ps=50}

Any help on this would be greatly appreciated. Thanks in advance !!

*Pranav Prakash*

"temet nosce"

Twitter <http://twitter.com/pranavprakash> | Blog <http://blog.myblive.com> |
Google <http://www.google.com/profiles/pranny>

Re: Highlighting uses lots of memory and eventually slows down Solr

Posted by Pranav Prakash <pr...@gmail.com>.
No respinse !! Bumping it up

*Pranav Prakash*

"temet nosce"

Twitter <http://twitter.com/pranavprakash> | Blog <http://blog.myblive.com> |
Google <http://www.google.com/profiles/pranny>


On Fri, Dec 9, 2011 at 14:11, Pranav Prakash <pr...@gmail.com> wrote:

> Hi Group,
>
> I would like to have highlighting for search and I have the fields indexed
> with the following schema (Solr 3.4)
>
> <fieldType name="text_commongrams" class="solr.TextField">
>  <analyzer>
> <charFilter class="solr.HTMLStripCharFilterFactory"/>
> <tokenizer class="solr.StandardTokenizerFactory"/>
> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
> <filter class="solr.TrimFilterFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.SnowballPorterFilterFactory" language="English"
> protected="protwords.txt"/>
> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
> <filter class="solr.CommonGramsFilterFactory" words="stopwords_en.txt"
> ignoreCase="true"/>
> <filter class="solr.StopFilterFactory" words="stopwords_en.txt" ignoreCase
> ="true"/>
> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
> generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll
> ="0"preserveOriginal="1"/>
> </analyzer>
> </fieldType>
>
> <field name="transcript" type="text_commongrams" indexed="true" stored="
> true" termVectors="true" termPositions="true" termOffsets="true"/>
>
> <dynamicField name="*_en" type="text_commongrams" indexed="true" stored="
> true" termVectors="true" termPositions="true" termOffsets="true"/>
>
> And the following config
>
> <highlighting>
>  <fragmenter name="gap" class="org.apache.solr.highlight.GapFragmenter"
> default="true">
>  <lst name="defaults">
> <int name="hl.fragsize">100</int>
> </lst>
> </fragmenter>
> <fragmenter name="regex" class="org.apache.solr.highlight.RegexFragmenter"
> >
>  <lst name="defaults">
> <int name="hl.fragsize">20</int>
> <float name="hl.regex.slop">0.5</float>
> <str name="hl.regex.pattern">[-\w ,/\n\"']{20,200}</str>
> </lst>
> </fragmenter>
> <formatter name="html" class="org.apache.solr.highlight.HtmlFormatter"
> default="true">
>  <lst name="defaults">
>  <str name="hl.simple.pre">
> <![CDATA[ <strong> ]]>
> </str>
> <str name="hl.simple.post">
> <![CDATA[ </strong> ]]>
> </str>
> </lst>
> </formatter>
> </highlighting>
>
> The problem is that when I turn on highlighting, I face memory issues. The
> Memory usage on system goes higher and higher until it consumes all the
> memory (I dont receive OOM errors, there is always like 300 MB free
> memory). The total memory I have is 48GiB. My Index size is 138GiB and
> there are about 10m documents in the index.
>
> I also get the following warning, but I am not sure how to get it done.
>
> WARNING: Deprecated syntax found. <highlighting/> should move to
> <searchComponent/>
>
> My Solr log with highlighting turned on looks something like this
>
>  [core0] webapp=/solr path=/select
> params={mm=3<90%25&qf=title^2&hl.simple.pre=<strong>&hl.fl=title,transcript,transcript_en&wt=ruby&hl=true&rows=12&defType=dismax&fl=id,title,description&debugQuery=false&start=0&q=asdfghjkl&bf=recip(ms(NOW,created_at),1.88e-11,1,1)&hl.simple.post=</strong>&ps=50}
>
> Any help on this would be greatly appreciated. Thanks in advance !!
>
> *Pranav Prakash*
>
> "temet nosce"
>
> Twitter <http://twitter.com/pranavprakash> | Blog<http://blog.myblive.com> |
> Google <http://www.google.com/profiles/pranny>
>