You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Román González <rg...@normagricola.com> on 2014/05/05 13:00:47 UTC

Wildcard malfunctioning

Hi all!

 

Sorry in advance if this question was posted but I were unable to find it
with search engines.

 

Filter SpanishLightStemFilterFactory is not working properly with wildcards
or I’m misunderstanding something. I have the field

 

   <field name="cultivo_es" type="text_es" indexed="true" stored="true" />

 

With this type:

 

    <fieldType name="text_es" class="solr.TextField"
positionIncrementGap="100">

      <analyzer> 

        <tokenizer class="solr.StandardTokenizerFactory"/>

        <filter class="solr.LowerCaseFilterFactory"/>

        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="lang/stopwords_es.txt" format="snowball" />

        <filter class="solr.SpanishLightStemFilterFactory"/>

        <!-- more aggressive: <filter
class="solr.SnowballPorterFilterFactory" language="Spanish"/> -->

      </analyzer>

    </fieldType>

 

But I’m getting these results:

 

q = cultivo_es:uva

Getting 50 correct results

 

q = cultivo_es:uva*

Getting the same 50 correct results

 

q = cultivo_es:naranja

Getting the 50 correct results of “naranja”

 

q = cultivo_es:naranja*

Getting the 0 results !!!!!!!!!

 

It works fine if I remove SpanishLightStemFilterFactory filter, but I need
it in order to filter diacritics according to Spanish rules.

 

Thank you!!

 


RE: Wildcard malfunctioning

Posted by Román González <rg...@normagricola.com>.
SOLVED!

First solution I tried (the Ahmet's one) worked fine!

Thank you!

-----Mensaje original-----
De: Jack Krupansky [mailto:jack@basetechnology.com] 
Enviado el: lunes, 05 de mayo de 2014 13:19
Para: solr-user@lucene.apache.org; rgonzalez@normagricola.com
Asunto: Re: Wildcard malfunctioning

Generally, stemming filters are not supported when wildcards are present. 
Only a small subset of filters work with wildcards, such as the case conversion filters.

But, you stay that you are using the stemmer to remove diacritical marks... 
you can/should use ASCIIFoldingFilterFactory or MappingCharFilterFactory.

-- Jack Krupansky

-----Original Message-----
From: Román González
Sent: Monday, May 5, 2014 7:00 AM
To: solr-user@lucene.apache.org
Subject: Wildcard malfunctioning

Hi all!



Sorry in advance if this question was posted but I were unable to find it with search engines.



Filter SpanishLightStemFilterFactory is not working properly with wildcards or I’m misunderstanding something. I have the field



   <field name="cultivo_es" type="text_es" indexed="true" stored="true" />



With this type:



    <fieldType name="text_es" class="solr.TextField"
positionIncrementGap="100">

      <analyzer>

        <tokenizer class="solr.StandardTokenizerFactory"/>

        <filter class="solr.LowerCaseFilterFactory"/>

        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="lang/stopwords_es.txt" format="snowball" />

        <filter class="solr.SpanishLightStemFilterFactory"/>

        <!-- more aggressive: <filter
class="solr.SnowballPorterFilterFactory" language="Spanish"/> -->

      </analyzer>

    </fieldType>



But I’m getting these results:



q = cultivo_es:uva

Getting 50 correct results



q = cultivo_es:uva*

Getting the same 50 correct results



q = cultivo_es:naranja

Getting the 50 correct results of “naranja”



q = cultivo_es:naranja*

Getting the 0 results !!!!!!!!!



It works fine if I remove SpanishLightStemFilterFactory filter, but I need it in order to filter diacritics according to Spanish rules.



Thank you!!




Re: Wildcard malfunctioning

Posted by Shawn Heisey <so...@elyograg.org>.
On 5/5/2014 5:19 AM, Jack Krupansky wrote:
> But, you stay that you are using the stemmer to remove diacritical
> marks... you can/should use ASCIIFoldingFilterFactory or
> MappingCharFilterFactory.

I like ICUFoldingFilterFactory for this, but it does require additional
contrib jars (included in the Solr download).  It lowercases too.

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ICUFoldingFilterFactory

Thanks,
Shawn


Re: Wildcard malfunctioning

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
I mark all the filters that support wildcards with (multi) on my list:
http://www.solr-start.com/info/analyzers/ . I uses actual interface
markers to derive that list, so it should be most up to date.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Mon, May 5, 2014 at 6:19 PM, Jack Krupansky <ja...@basetechnology.com> wrote:
> Generally, stemming filters are not supported when wildcards are present.
> Only a small subset of filters work with wildcards, such as the case
> conversion filters.
>
> But, you stay that you are using the stemmer to remove diacritical marks...
> you can/should use ASCIIFoldingFilterFactory or MappingCharFilterFactory.
>
> -- Jack Krupansky
>
> -----Original Message----- From: Román González
> Sent: Monday, May 5, 2014 7:00 AM
> To: solr-user@lucene.apache.org
> Subject: Wildcard malfunctioning
>
>
> Hi all!
>
>
>
> Sorry in advance if this question was posted but I were unable to find it
> with search engines.
>
>
>
> Filter SpanishLightStemFilterFactory is not working properly with wildcards
> or I’m misunderstanding something. I have the field
>
>
>
>   <field name="cultivo_es" type="text_es" indexed="true" stored="true" />
>
>
>
> With this type:
>
>
>
>    <fieldType name="text_es" class="solr.TextField"
> positionIncrementGap="100">
>
>      <analyzer>
>
>        <tokenizer class="solr.StandardTokenizerFactory"/>
>
>        <filter class="solr.LowerCaseFilterFactory"/>
>
>        <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="lang/stopwords_es.txt" format="snowball" />
>
>        <filter class="solr.SpanishLightStemFilterFactory"/>
>
>        <!-- more aggressive: <filter
> class="solr.SnowballPorterFilterFactory" language="Spanish"/> -->
>
>      </analyzer>
>
>    </fieldType>
>
>
>
> But I’m getting these results:
>
>
>
> q = cultivo_es:uva
>
> Getting 50 correct results
>
>
>
> q = cultivo_es:uva*
>
> Getting the same 50 correct results
>
>
>
> q = cultivo_es:naranja
>
> Getting the 50 correct results of “naranja”
>
>
>
> q = cultivo_es:naranja*
>
> Getting the 0 results !!!!!!!!!
>
>
>
> It works fine if I remove SpanishLightStemFilterFactory filter, but I need
> it in order to filter diacritics according to Spanish rules.
>
>
>
> Thank you!!
>
>
>

Re: Wildcard malfunctioning

Posted by Jack Krupansky <ja...@basetechnology.com>.
Generally, stemming filters are not supported when wildcards are present. 
Only a small subset of filters work with wildcards, such as the case 
conversion filters.

But, you stay that you are using the stemmer to remove diacritical marks... 
you can/should use ASCIIFoldingFilterFactory or MappingCharFilterFactory.

-- Jack Krupansky

-----Original Message----- 
From: Román González
Sent: Monday, May 5, 2014 7:00 AM
To: solr-user@lucene.apache.org
Subject: Wildcard malfunctioning

Hi all!



Sorry in advance if this question was posted but I were unable to find it
with search engines.



Filter SpanishLightStemFilterFactory is not working properly with wildcards
or I’m misunderstanding something. I have the field



   <field name="cultivo_es" type="text_es" indexed="true" stored="true" />



With this type:



    <fieldType name="text_es" class="solr.TextField"
positionIncrementGap="100">

      <analyzer>

        <tokenizer class="solr.StandardTokenizerFactory"/>

        <filter class="solr.LowerCaseFilterFactory"/>

        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="lang/stopwords_es.txt" format="snowball" />

        <filter class="solr.SpanishLightStemFilterFactory"/>

        <!-- more aggressive: <filter
class="solr.SnowballPorterFilterFactory" language="Spanish"/> -->

      </analyzer>

    </fieldType>



But I’m getting these results:



q = cultivo_es:uva

Getting 50 correct results



q = cultivo_es:uva*

Getting the same 50 correct results



q = cultivo_es:naranja

Getting the 50 correct results of “naranja”



q = cultivo_es:naranja*

Getting the 0 results !!!!!!!!!



It works fine if I remove SpanishLightStemFilterFactory filter, but I need
it in order to filter diacritics according to Spanish rules.



Thank you!!




Re: Wildcard malfunctioning

Posted by Ahmet Arslan <io...@yahoo.com>.

Hi Roman,

What you are experiencing is a OK and known. Stemming and wildcard searches could be counter intuitive sometimes. But luckily remedy is available. Use the following filters, and your wildcard searches will be happy. Please not that this change will require solr-restart and re-index.

 <filter class="solr.KeywordRepeatFilterFactory"/>
 <filter class="solr.SpanishLightStemFilterFactory"/>
 <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>

Regarding diacritics, please see 
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ASCIIFoldingFilterFactory 
and http://wiki.apache.org/solr/MultitermQueryAnalysis

Ahmet


On Monday, May 5, 2014 2:01 PM, Román González <rg...@normagricola.com> wrote:
Hi all!



Sorry in advance if this question was posted but I were unable to find it
with search engines.



Filter SpanishLightStemFilterFactory is not working properly with wildcards
or I’m misunderstanding something. I have the field



   <field name="cultivo_es" type="text_es" indexed="true" stored="true" />



With this type:



    <fieldType name="text_es" class="solr.TextField"
positionIncrementGap="100">

      <analyzer> 

        <tokenizer class="solr.StandardTokenizerFactory"/>

        <filter class="solr.LowerCaseFilterFactory"/>

        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="lang/stopwords_es.txt" format="snowball" />

        <filter class="solr.SpanishLightStemFilterFactory"/>

        <!-- more aggressive: <filter
class="solr.SnowballPorterFilterFactory" language="Spanish"/> -->

      </analyzer>

    </fieldType>



But I’m getting these results:



q = cultivo_es:uva

Getting 50 correct results



q = cultivo_es:uva*

Getting the same 50 correct results



q = cultivo_es:naranja

Getting the 50 correct results of “naranja”



q = cultivo_es:naranja*

Getting the 0 results !!!!!!!!!



It works fine if I remove SpanishLightStemFilterFactory filter, but I need
it in order to filter diacritics according to Spanish rules.



Thank you!!