You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by sasarun <sa...@gmail.com> on 2017/08/09 14:15:47 UTC

Highlighting Performance improvement suggestions required - Solr 6.5.1

Hi All, 

I found quite a few discussions on the highlighting performance issue.
Though I tried to implement most of them, performance improvement was
negative. 
Currently index count is really low with about 922 records . But the field
on which highlighting is done is quite large data. Querying of data with
highlighting is taking lots of time with 85-90% time taken on highlighting. 
Configuration of  my set schema.xml is as below 

fieldType name="text_general" class="solr.TextField"
positionIncrementGap="100">
    <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" />
        
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> 
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" />
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>
<field name="customContent" type="text_general" indexed="true" stored="true"
termVectors="true" termPositions="true" termOffsets="true"
storeOffsetsWithPositions="true"/>
<field name="customContent_term" type="text_general" indexed="false"
stored="true"/>
    <copyField source="customContent"   dest="customContent_term"/>

Query used in solr is 

hl=true&hl.fl=customContent&hl.fragsize=500&hl.simple.pre=<HL>&hl.simple.post=</HL>&hl.snippets=1&hl.method=unified&hl.bs.type=SENTENCE&hl.fragListBuilder=simple&hl.maxAnalyzedChars=214748364&facet=true&facet.mincount=1&facet.limit=-1&facet.s
ort=count&debug=timing&facet.field=contentSpecific

Also note that We had tried fastvectorhighlighter too but the result was not
positive. Once when we tried to hl.offsetSource="term_vectors" with unified
result came up in half a second but it didnt had any highlight snippets.

One of the debug returned by solr is shared below for reference

time=8833.0,prepare={time=0.0,query={time=0.0},facet={time=0.0},facet_module={time=0.0},mlt={time=0.0},hig
hlight={time=0.0},stats={time=0.0},expand={time=0.0},terms={time=0.0},debug={time=0.0}},process={time=8826.0,query={time=867.0},facet={time=2.0},facet_module={time=0.0},mlt={time=0.0},highlight={time=7953.0},stats={time=0.0},expand={time=0.0},ter
ms={time=0.0},debug={time=0.0}},loadFieldValues={time=28.0}}

Any suggestions to  improve the performance would be of great help

Thanks, 
Arun



--
View this message in context: http://lucene.472066.n3.nabble.com/Highlighting-Performance-improvement-suggestions-required-Solr-6-5-1-tp4349767.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Highlighting Performance improvement suggestions required - Solr 6.5.1

Posted by Michael Braun <n3...@gmail.com>.
Have you attached JVisualVM or a similar tool for sampling when Solr is
answering the requests with highlight? What relevant methods are coming up?

On Wed, Aug 9, 2017 at 11:26 AM, sasarun <sa...@gmail.com> wrote:

> Hi Amrit,
>
> Thanks for the response. I did went through both and that is how I landed
> up
> with unified method for highlighter
>
> Thanks,
> Arun
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Highlighting-Performance-improvement-
> suggestions-required-Solr-6-5-1-tp4349767p4349781.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Highlighting Performance improvement suggestions required - Solr 6.5.1

Posted by sasarun <sa...@gmail.com>.
Hi Amrit, 

Thanks for the response. I did went through both and that is how I landed up
with unified method for highlighter

Thanks,
Arun



--
View this message in context: http://lucene.472066.n3.nabble.com/Highlighting-Performance-improvement-suggestions-required-Solr-6-5-1-tp4349767p4349781.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Highlighting Performance improvement suggestions required - Solr 6.5.1

Posted by Amrit Sarkar <sa...@gmail.com>.
Pardon I didn't go through details in configs and I guess you have already
went through the recent talks on highlighters, still sharing if not:

https://www.slideshare.net/lucidworks/solr-highlighting-at-full-speed-presented-by-timothy-rodriguez-bloomberg-david-smiley-d-w-smiley-llc
https://www.youtube.com/watch?v=tv5qKDKW8kk

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Wed, Aug 9, 2017 at 7:45 PM, sasarun <sa...@gmail.com> wrote:

> Hi All,
>
> I found quite a few discussions on the highlighting performance issue.
> Though I tried to implement most of them, performance improvement was
> negative.
> Currently index count is really low with about 922 records . But the field
> on which highlighting is done is quite large data. Querying of data with
> highlighting is taking lots of time with 85-90% time taken on highlighting.
> Configuration of  my set schema.xml is as below
>
> fieldType name="text_general" class="solr.TextField"
> positionIncrementGap="100">
>     <analyzer type="index">
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" />
>
>         <filter class="solr.LowerCaseFilterFactory"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" />
>         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>       </analyzer>
>     </fieldType>
> <field name="customContent" type="text_general" indexed="true"
> stored="true"
> termVectors="true" termPositions="true" termOffsets="true"
> storeOffsetsWithPositions="true"/>
> <field name="customContent_term" type="text_general" indexed="false"
> stored="true"/>
>     <copyField source="customContent"   dest="customContent_term"/>
>
> Query used in solr is
>
> hl=true&hl.fl=customContent&hl.fragsize=500&hl.simple.pre=
> <HL>&hl.simple.post=</HL>&hl.snippets=1&hl.method=unified&
> hl.bs.type=SENTENCE&hl.fragListBuilder=simple&hl.
> maxAnalyzedChars=214748364&facet=true&facet.mincount=1&
> facet.limit=-1&facet.s
> ort=count&debug=timing&facet.field=contentSpecific
>
> Also note that We had tried fastvectorhighlighter too but the result was
> not
> positive. Once when we tried to hl.offsetSource="term_vectors" with unified
> result came up in half a second but it didnt had any highlight snippets.
>
> One of the debug returned by solr is shared below for reference
>
> time=8833.0,prepare={time=0.0,query={time=0.0},facet={time=
> 0.0},facet_module={time=0.0},mlt={time=0.0},hig
> hlight={time=0.0},stats={time=0.0},expand={time=0.0},terms={
> time=0.0},debug={time=0.0}},process={time=8826.0,query={
> time=867.0},facet={time=2.0},facet_module={time=0.0},mlt={
> time=0.0},highlight={time=7953.0},stats={time=0.0},expand={time=0.0},ter
> ms={time=0.0},debug={time=0.0}},loadFieldValues={time=28.0}}
>
> Any suggestions to  improve the performance would be of great help
>
> Thanks,
> Arun
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Highlighting-Performance-improvement-
> suggestions-required-Solr-6-5-1-tp4349767.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>