You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Jamal, Sarfaraz" <Sa...@VerizonWireless.com.INVALID> on 2016/06/16 19:58:26 UTC

RE: [E] Re: Stemming

HI Ahmet,

Thanks for your guidance.

I just tried the following two configurations:

  <fieldType name="text_stem" class="solr.TextField">
		<analyzer>
		<tokenizer class="solr.WhitespaceTokenizerFactory"/>
		<filter class="solr.LowerCaseFilterFactory"/>
		<filter class="solr.SnowballPorterFilterFactory" language="English"/>
		</analyzer>
  </fieldType>

And

  <fieldType name="text_stem" class="solr.TextField">
		<analyzer>
		  <tokenizer class="solr.StandardTokenizerFactory"/>
		  <filter class="solr.StopFilterFactory" words="lang/stopwords_en.txt" ignoreCase="true"/>
		  <filter class="solr.LowerCaseFilterFactory"/>
		  <filter class="solr.EnglishPossessiveFilterFactory"/>
		  <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
		  <filter class="solr.SnowballPorterFilterFactory"/>
		</analyzer>
  </fieldType>

They both produced three different sets of results

-----Original Message-----
From: Ahmet Arslan [mailto:iorixxx@yahoo.com.INVALID] 
Sent: Thursday, June 16, 2016 3:37 PM
To: solr-user@lucene.apache.org
Subject: [E] Re: Stemming



Hi Jamal,

Snowball requires lowercase filter above it.
This is documented in javadocs but it is a small but important detail.
Please use a lowercase filter after the whitescpace tokenizer.


Ahmet
On Thursday, June 16, 2016 10:13 PM, "Jamal, Sarfaraz" <Sa...@VerizonWireless.com.INVALID> wrote:



Hi Guys,

I have enabled stemming:
  <fieldType name="text_stem" class="solr.TextField">
        <analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SnowballPorterFilterFactory" language="English"/>
        </analyzer>
  </fieldType>

In the Admin Analysis, I type in running or runs and they both break down to run.
However when I search for run, runs, or running with an actual query -

It brings back three different sets of results.

Is that correct?

I would imagine that all three would bring back the exact same resultset?

Sas