You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Trey Jones (JIRA)" <ji...@apache.org> on 2018/07/20 14:33:00 UTC

[jira] [Created] (LUCENE-8416) Add tokenized version of o.o. to Stempel stopwords

Trey Jones created LUCENE-8416:
----------------------------------

             Summary: Add tokenized version of o.o. to Stempel stopwords
                 Key: LUCENE-8416
                 URL: https://issues.apache.org/jira/browse/LUCENE-8416
             Project: Lucene - Core
          Issue Type: Improvement
          Components: modules/analysis
            Reporter: Trey Jones


The Stempel stopword list ( lucene-solr/lucene/analysis/stempel/src/resources/org/apache/lucene/analysis/pl/stopwords.txt ) contains "o.o." which is a good stopword (it's part of the abbreviation for "limited liability company", which is "[sp. z o.o.|https://en.wiktionary.org/wiki/sp._z_o.o.]". However, the standard tokenizer changes "o.o." to "o.o" so the stopword filter has no effect.

Add "o.o" to the stopword list. (It's probably okay to leave "o.o." in the list, though, in case a different tokenizer is used.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org