You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Dwaipayan Roy <dw...@gmail.com> on 2016/03/14 15:31:10 UTC

Problem with porter stemming

​I am using EnglishAnalyzer with my own stopword list. EnglishAnalyzer uses
the porter stemmer (snowball) to stem the words. But using the
EnglishAnalyzer, I am getting erroneous result for 'news'. 'news' is
getting stemmed into 'new'.

Any help would be appreciated.

Re: Problem with porter stemming

Posted by Benson Margulies <be...@basistech.com>.
Stemming is an inherently limited process. It doesn't know about the
word 'news', it just has a rule about 's'.

Some of us sell commercial products that do more complex linguistic
processing that knows about which words are which.

There may be open source implementations of similar technology.


On Mon, Mar 14, 2016 at 12:13 PM, Ahmet Arslan
<io...@yahoo.com.invalid> wrote:
> Hi Dwaipayan,
>
> Another way is to use KeywordMarkerFilter. Stemmer implementations respect this attribute.
> If you want to supply your own mappings, StemmerOverrideTokenFilter could be used as well.
>
> ahmet
>
>
> On Monday, March 14, 2016 4:31 PM, Dwaipayan Roy <dw...@gmail.com> wrote:
>
>
>
> I am using EnglishAnalyzer with my own stopword list. EnglishAnalyzer uses
> the porter stemmer (snowball) to stem the words. But using the
> EnglishAnalyzer, I am getting erroneous result for 'news'. 'news' is
> getting stemmed into 'new'.
>
> Any help would be appreciated.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Problem with porter stemming

Posted by Ahmet Arslan <io...@yahoo.com.INVALID>.
Hi Dwaipayan,

Another way is to use KeywordMarkerFilter. Stemmer implementations respect this attribute.
If you want to supply your own mappings, StemmerOverrideTokenFilter could be used as well.

ahmet


On Monday, March 14, 2016 4:31 PM, Dwaipayan Roy <dw...@gmail.com> wrote:



​I am using EnglishAnalyzer with my own stopword list. EnglishAnalyzer uses
the porter stemmer (snowball) to stem the words. But using the
EnglishAnalyzer, I am getting erroneous result for 'news'. 'news' is
getting stemmed into 'new'.

Any help would be appreciated.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Problem with porter stemming

Posted by Dwaipayan Roy <dw...@gmail.com>.
​Hello.

I want to set LMJelinekMercer Similarity (with lambda set to, say, 0.6) for
the Luke similarity calculation. Luke by default use the DefaultSimilarity.
Can​ anyone help with this? I use Lucene 4.10.4 and Luke for that version
of Lucene index.

Dwaipayan