You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Andrew Stromnov <st...@gmail.com> on 2007/07/10 09:32:30 UTC

Stemmer bug?

Working config (with proper russian stemming):

    <fieldType name="text" class="solr.TextField">
      <analyzer type="index"
class="org.apache.lucene.analysis.ru.RussianAnalyzer"></analyzer>
      <analyzer type="query"
class="org.apache.lucene.analysis.ru.RussianAnalyzer"></analyzer>
    </fieldType>


Non-working config (no russian stemming):

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.HTMLStripStandardTokenizerFactory" />
        <filter class="solr.LowerCaseFilterFactory" />
        <filter class="solr.SnowballPorterFilterFactory" language="Russian"
/>
      </analyzer>
      <analyzer type="query"
class="org.apache.lucene.analysis.ru.RussianAnalyzer">
        <tokenizer class="solr.WhitespaceTokenizerFactory" />
        <filter class="solr.LowerCaseFilterFactory" />
        <filter class="solr.SnowballPorterFilterFactory" language="Russian"
/>
      </analyzer>
    </fieldType>

-- 
View this message in context: http://www.nabble.com/Problem-with-Russian-stemmer-in-Solr-1.2-tf4049948.html#a11516099
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Stemmer bug?

Posted by Andrew Stromnov <st...@gmail.com>.
Hi

RussianAnalyzer produces russian stemmed forms, but
SnowballPorterFilterFactory with language="Russian" leaves _all_ russian
content unchanged.


hossman wrote:
> 
> 
> : Subject: Stemmer bug?
> 
> can you elaborate on what exactly you view as a bug?
> 
> if the issue is just that one of the examples stemms something in a way
> thta you think makes sense, but the other one does not that really isn't a
> bug so much as it is a comment on the effectiveness of the Snowball
> Stemmer for Russian vs the RussianStemmer class used by the
> RussianAnalzer.  if you like the stemming that comes out of hte
> RussianAnalyzer you can use the RussianStemFilter yourslf by creating a
> simple FilterFactory arround it (there are lots of examples in teh Solr
> code base)
> 
> Also keep in mind that the Snowball Stemmer is not designed to produce
> "real" words when it stems ... it's an algorithmic stemmer designed to
> produce artificial stems for common cases ... so if you think it's a bug
> because it produces terms that aren't real words -- it's not, that's just
> the way it works -- what matters is that it produces the same artificaial
> stem for related words.
> 
> -Hoss
> 

-- 
View this message in context: http://www.nabble.com/Problem-with-Russian-stemmer-in-Solr-1.2-tf4049948.html#a11530601
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Stemmer bug?

Posted by Chris Hostetter <ho...@fucit.org>.
: Subject: Stemmer bug?

can you elaborate on what exactly you view as a bug?

if the issue is just that one of the examples stemms something in a way
thta you think makes sense, but the other one does not that really isn't a
bug so much as it is a comment on the effectiveness of the Snowball
Stemmer for Russian vs the RussianStemmer class used by the
RussianAnalzer.  if you like the stemming that comes out of hte
RussianAnalyzer you can use the RussianStemFilter yourslf by creating a
simple FilterFactory arround it (there are lots of examples in teh Solr
code base)

Also keep in mind that the Snowball Stemmer is not designed to produce
"real" words when it stems ... it's an algorithmic stemmer designed to
produce artificial stems for common cases ... so if you think it's a bug
because it produces terms that aren't real words -- it's not, that's just
the way it works -- what matters is that it produces the same artificaial
stem for related words.



-Hoss