You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Erick Erickson <er...@gmail.com> on 2012/06/03 14:25:01 UTC

Re: Wildcard-Search Solr 3.5.0

Chiming in late here, just back from vacation. But off the top of my
head, I don't see any reason SnowballPorterFilterFactory shouldn't
be MultiTermAware.

I've created https://issues.apache.org/jira/browse/SOLR-3503 as
a placeholder.

Erick

On Fri, May 25, 2012 at 1:31 PM,  <sp...@gmx.eu> wrote:
>> I don't know the specific rules in these specific stemmers,
>> but generally a
>> "less aggressive" stemming (e.g., "plural-only") of
>> "paintings" would be
>> "painting", while a "more aggressive" stemming would be
>> "paint". For some
>> "aggressive" stemmers the stemmed word is not even a word.
>
> Sounds logically :)
>
>> It would be nice to have doc with some example words for each stemmer.
>
> Absolutely!
>
> Thx alot!
>

Re: Wildcard-Search Solr 3.5.0

Posted by Erick Erickson <er...@gmail.com>.
And I closed the JIRA, see the comments. But the short form is that
it's not worth the effort because of the edge cases. Jack writes
up some of them; the short form is "what does stemming
do with terms like organiz* ". Sure, it would produce one token (which is
the main restriction on a MultiTermAware filter), but the output
might not be anything equivalent to the stem of "organization", maybe
not even "organize". Better to avoid that rat-hole, it seems like one of those
problems that could suck up enormous amounts of time and _still_ not
do what's expected.

If you _really_ want to try this, you could always define your own
"multiterm" analysis component that included the stemmer, see:
http://www.lucidimagination.com/blog/2011/11/29/whats-with-lowercasing-wildcard-multiterm-queries-in-solr/
But don't say I didn't warn you <G>...

Best
Erick

On Sun, Jun 3, 2012 at 8:25 AM, Erick Erickson <er...@gmail.com> wrote:
> Chiming in late here, just back from vacation. But off the top of my
> head, I don't see any reason SnowballPorterFilterFactory shouldn't
> be MultiTermAware.
>
> I've created https://issues.apache.org/jira/browse/SOLR-3503 as
> a placeholder.
>
> Erick
>
> On Fri, May 25, 2012 at 1:31 PM,  <sp...@gmx.eu> wrote:
>>> I don't know the specific rules in these specific stemmers,
>>> but generally a
>>> "less aggressive" stemming (e.g., "plural-only") of
>>> "paintings" would be
>>> "painting", while a "more aggressive" stemming would be
>>> "paint". For some
>>> "aggressive" stemmers the stemmed word is not even a word.
>>
>> Sounds logically :)
>>
>>> It would be nice to have doc with some example words for each stemmer.
>>
>> Absolutely!
>>
>> Thx alot!
>>