You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Erwin Gunadi <fe...@gmail.com> on 2014/02/25 14:23:53 UTC

Performance problem on Solr query on stemmed values

Hi,

 

I would like to know whether anyone have experienced this kind of phenomena.

 

We are having performance problem regarding query on stemmed value.

I've documented the symptoms which I'm currently facing:

 


Search on field content

Search on field spell

Highlighting (on content field)

Processing speed


active

active

Active

Slow


active

not active

Active

Fast


active

active

not active

Fast


not active

active

Active

Slow


not active

active

not active

Fast

 

*Fast means 1000x faster than "slow".

 

Field Content is our index field, which holds original text, and spell is
the field with stemmed value.

According to my measurement result, search on both fields (stemmed and not
stemmed) is really fast.

But when I start to take highlighting into our query it takes too long to
process.

 

Best Regards

Erwin

RE: Performance problem on Solr query on stemmed values

Posted by Erwin Gunadi <fe...@gmail.com>.

Hi Erick,

thank you for the reply.
Yes, I'm using the fast vector highlighter (Solr 4.3). Every request should
only deliver 10 results.

Here is my schema configuration on both field:
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
	<analyzer type="index">
		<tokenizer class="solr.StandardTokenizerFactory" />
		<filter class="solr.LowerCaseFilterFactory" />
		<filter class="solr.WordDelimiterFilterFactory"
catenateWords="1" catenateNumbers="1" catenateAll="1"
			preserveOriginal="1" />
		<filter class="solr.ASCIIFoldingFilterFactory" />
	</analyzer>
	<analyzer type="query">
		<tokenizer class="solr.StandardTokenizerFactory" />
		<filter class="solr.LowerCaseFilterFactory" />
		<filter class="solr.ASCIIFoldingFilterFactory" />
	</analyzer>
	<analyzer type="multiterm">
		<tokenizer class="solr.WhitespaceTokenizerFactory" />
		<filter class="solr.ASCIIFoldingFilterFactory" />
	</analyzer>
</fieldType>
<fieldType name="textSpell" class="solr.TextField"
positionIncrementGap="100" omitNorms="true">
	<analyzer type="index">
		<tokenizer class="solr.StandardTokenizerFactory" />
		<filter class="solr.SnowballPorterFilterFactory"
language="German2" />
		<filter class="solr.LowerCaseFilterFactory" />
		<filter class="solr.StopFilterFactory" />
		<filter class="solr.RemoveDuplicatesTokenFilterFactory" />
		<filter class="solr.ShingleFilterFactory" />
	</analyzer>
	<analyzer type="query">
		<tokenizer class="solr.StandardTokenizerFactory" />
		<filter class="solr.SnowballPorterFilterFactory"
language="German2" />
		<filter class="solr.LowerCaseFilterFactory" />
		<filter class="solr.StandardFilterFactory" />
		<filter class="solr.RemoveDuplicatesTokenFilterFactory" />
	</analyzer>
	<analyzer type="multiterm">
		<tokenizer class="solr.WhitespaceTokenizerFactory" />
		<filter class="solr.ASCIIFoldingFilterFactory" />
	</analyzer>
</fieldType> 
<field name="spell" type="textSpell" indexed="true" multiValued="true" />
<field name="content" type="text" stored="true" indexed="true"
multiValued="true" termVectors="true" termPositions="true"
termOffsets="true" />

Field content contains in average around 5000 - 6000 words (only rough
estimation).

Best regards
Erwin




-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com] 
Sent: Tuesday, February 25, 2014 3:27 PM
To: solr-user@lucene.apache.org
Subject: Re: Performance problem on Solr query on stemmed values

Right, highlighting may have to re-analyze the input in order to return the
highlighted data. This will be significantly slower than the search,
especially if you have a large number of rows you're returning.

You can get better performance in highlighting by using
FastVectorHighlighter. See:

https://cwiki.apache.org/confluence/display/solr/FastVector+Highlighter

1000x is unusual, though, unless your fields are very large or you're
returning a lot of documents.

Best,
Erick


On Tue, Feb 25, 2014 at 5:23 AM, Erwin Gunadi <fe...@gmail.com>wrote:

> Hi,
>
>
>
> I would like to know whether anyone have experienced this kind of 
> phenomena.
>
>
>
> We are having performance problem regarding query on stemmed value.
>
> I've documented the symptoms which I'm currently facing:
>
>
>
>
> Search on field content
>
> Search on field spell
>
> Highlighting (on content field)
>
> Processing speed
>
>
> active
>
> active
>
> Active
>
> Slow
>
>
> active
>
> not active
>
> Active
>
> Fast
>
>
> active
>
> active
>
> not active
>
> Fast
>
>
> not active
>
> active
>
> Active
>
> Slow
>
>
> not active
>
> active
>
> not active
>
> Fast
>
>
>
> *Fast means 1000x faster than "slow".
>
>
>
> Field Content is our index field, which holds original text, and spell 
> is the field with stemmed value.
>
> According to my measurement result, search on both fields (stemmed and 
> not
> stemmed) is really fast.
>
> But when I start to take highlighting into our query it takes too long 
> to process.
>
>
>
> Best Regards
>
> Erwin
>
>

Re: Performance problem on Solr query on stemmed values

Posted by Erick Erickson <er...@gmail.com>.

Right, highlighting may have to re-analyze the input in order
to return the highlighted data. This will be significantly slower
than the search, especially if you have a large number
of rows you're returning.

You can get better performance in highlighting by using
FastVectorHighlighter. See:

https://cwiki.apache.org/confluence/display/solr/FastVector+Highlighter

1000x is unusual, though, unless your fields are very large or
you're returning a lot of documents.

Best,
Erick

On Tue, Feb 25, 2014 at 5:23 AM, Erwin Gunadi <fe...@gmail.com>wrote:

> Hi,
>
>
>
> I would like to know whether anyone have experienced this kind of
> phenomena.
>
>
>
> We are having performance problem regarding query on stemmed value.
>
> I've documented the symptoms which I'm currently facing:
>
>
>
>
> Search on field content
>
> Search on field spell
>
> Highlighting (on content field)
>
> Processing speed
>
>
> active
>
> active
>
> Active
>
> Slow
>
>
> active
>
> not active
>
> Active
>
> Fast
>
>
> active
>
> active
>
> not active
>
> Fast
>
>
> not active
>
> active
>
> Active
>
> Slow
>
>
> not active
>
> active
>
> not active
>
> Fast
>
>
>
> *Fast means 1000x faster than "slow".
>
>
>
> Field Content is our index field, which holds original text, and spell is
> the field with stemmed value.
>
> According to my measurement result, search on both fields (stemmed and not
> stemmed) is really fast.
>
> But when I start to take highlighting into our query it takes too long to
> process.
>
>
>
> Best Regards
>
> Erwin
>
>