You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by benjelloun <an...@gmail.com> on 2014/07/31 16:48:47 UTC

Auto suggest with adding accents

Hello,

i'm trying to autosuggest frensh word with accents,
but if the user write q="gene" it will not suggest "genève", it will suggest
"general","genetic" ...

<searchComponent class="solr.SpellCheckComponent" name="suggests">
    <lst name="spellchecker">
      <str name="name">suggestDic</str>
      <str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
      <str
name="lookupImpl">org.apache.solr.spelling.suggest.fst.WFSTLookupFactory</str>
	  <str name="storeDir">suggestFolder</str>
      <str name="field">suggestField</str>  
      <str name="buildOnCommit">true</str>
	  <bool name="exactMatchFirst">true</bool>
	   <str name="sourceLocation ">suggest/emptyDic.txt</str>
    </lst>
    <str name="queryAnalyzerFieldType">textSuggest</str>
  </searchComponent>
  
  <requestHandler class="org.apache.solr.handler.component.SearchHandler"
name="/suggests">
    <lst name="defaults">
      <str name="name">suggests</str>
      <str name="spellcheck">true</str>
	  
      <str name="spellcheck.dictionary">suggestDic</str>
      <str name="spellcheck.onlyMorePopular">true</str>
      <str name="spellcheck.count">6</str>   
      <str name="spellcheck.collate">true</str>
	  <str name="spellcheck.maxCollations">6</str> 
      <str name="spellcheck.collateExtendedResults">true</str>  
    </lst>
    <arr name="components">
      <str>suggests</str>
    </arr>
  </requestHandler>
 
The field "suggestField" dont isolate accents.

Thanks for help,

Best regards,
Anass BENJELLOUN




--
View this message in context: http://lucene.472066.n3.nabble.com/Auto-suggest-with-adding-accents-tp4150379.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Auto suggest with adding accents

Posted by benjelloun <an...@gmail.com>.
Any one find any solution for this probleme ?



--
View this message in context: http://lucene.472066.n3.nabble.com/Auto-suggest-with-adding-accents-tp4150379p4150972.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Auto suggest with adding accents

Posted by benjelloun <an...@gmail.com>.
hello,
on the new suggester, when the field is multivalued="true", itsnot working
<str name="suggestAnalyzerFieldType">

i need to try the patch "LUCENE-3842" to test auto complete but i dont know
how.
i have Solr-4.7.2 not source code.
can some one help?

Best regards,
Anass BENJELLOUN



--
View this message in context: http://lucene.472066.n3.nabble.com/Auto-suggest-with-adding-accents-tp4150379p4150609.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Auto suggest with adding accents

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
Perhaps the actual suggester module is a better fit then:

http://blog.mikemccandless.com/2012/09/lucenes-new-analyzing-suggester.html
http://romiawasthy.blogspot.fi/2014/06/configure-solr-suggester.html

Also:
http://jayant7k.blogspot.com/2014/03/an-interesting-suggester-in-solr.html

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On Fri, Aug 1, 2014 at 3:21 PM, Otis Gospodnetic
<ot...@gmail.com> wrote:
> Aha.  I don't know if Solr Suggester can do that.  Let's see what others
> say.  I know http://www.sematext.com/products/autocomplete/ could do that.
>
> Otis
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
>
>
> On Fri, Aug 1, 2014 at 9:26 AM, benjelloun <an...@gmail.com> wrote:
>
>> hello,
>>
>> you didnt enderstand well my problem i give you exemple:
>> the document contain the word "genève".
>> q="gene"  auto suggestion give "geneve"
>> q="genè" auto suggestion give "genève"
>>
>> but what i need is q="gene" auto suggestion give "genève" with accent like
>> correction of word.
>> i tried to add spellchecker to correct it but the maximum of character for
>> correction is 2
>> maybe there is other solution,
>> i give my schema of field:
>>
>> <fieldType name="textSuggest" class="solr.TextField"
>> positionIncrementGap="100" omitNorms="true">
>>         <analyzer type="index">
>>         <tokenizer class="solr.StandardTokenizerFactory"/>
>>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>> <filter class="solr.StopFilterFactory" words="stopApostrophe.txt"
>> ignoreCase="true"/>
>> <filter class="solr.ASCIIFoldingFilterFactory" preserveOriginal="true"/>
>> <filter class="solr.LowerCaseFilterFactory" />
>>         <filter class="solr.StandardFilterFactory"/>
>>         </analyzer>
>>         <analyzer type="query">
>>          <tokenizer
>> class="solr.StandardTokenizerFactory"/>replacement="$2"/>-->
>>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>> <filter class="solr.StopFilterFactory" words="stopApostrophe.txt"
>> ignoreCase="true"/>
>> <filter class="solr.LowerCaseFilterFactory" />
>>         <filter class="solr.StandardFilterFactory"/>
>>         </analyzer>
>>     </fieldType>
>>
>> thanks best regards,
>> Anass BENJELLOUN
>>
>>
>>
>>
>> 2014-07-31 18:41 GMT+02:00 Otis Gospodnetic-5 [via Lucene] <
>> ml-node+s472066n4150410h35@n3.nabble.com>:
>>
>> > You need to do the opposite.  Make sure accents are NOT removed at index
>> &
>> > query time.
>> >
>> > Otis
>> > --
>> > Performance Monitoring * Log Analytics * Search Analytics
>> > Solr & Elasticsearch Support * http://sematext.com/
>> >
>> >
>> >
>> > On Thu, Jul 31, 2014 at 5:49 PM, benjelloun <[hidden email]
>> > <http://user/SendEmail.jtp?type=node&node=4150410&i=0>> wrote:
>> >
>> > > hi,
>> > >
>> > > q="gene"  it suggest "geneve"
>> > > ASCIIFoldingFilter work like isolate accent
>> > >
>> > > what i need to suggest is "genève"
>> > >
>> > > any idea?
>> > >
>> > > thanks
>> > > best reagards
>> > > Anass BENJELLOUN
>> > >
>> > >
>> > >
>> > > --
>> > > View this message in context:
>> > >
>> >
>> http://lucene.472066.n3.nabble.com/Auto-suggest-with-adding-accents-tp4150379p4150392.html
>> >
>> > > Sent from the Solr - User mailing list archive at Nabble.com.
>> > >
>> >
>> >
>> > ------------------------------
>> >  If you reply to this email, your message will be added to the discussion
>> > below:
>> >
>> >
>> http://lucene.472066.n3.nabble.com/Auto-suggest-with-adding-accents-tp4150379p4150410.html
>> >  To unsubscribe from Auto suggest with adding accents, click here
>> > <
>> http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4150379&code=YW5hc3MuYm5qQGdtYWlsLmNvbXw0MTUwMzc5fC0xMDQyNjMzMDgx
>> >
>> > .
>> > NAML
>> > <
>> http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml
>> >
>> >
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Auto-suggest-with-adding-accents-tp4150379p4150569.html
>> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Auto suggest with adding accents

Posted by Otis Gospodnetic <ot...@gmail.com>.
Aha.  I don't know if Solr Suggester can do that.  Let's see what others
say.  I know http://www.sematext.com/products/autocomplete/ could do that.

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Fri, Aug 1, 2014 at 9:26 AM, benjelloun <an...@gmail.com> wrote:

> hello,
>
> you didnt enderstand well my problem i give you exemple:
> the document contain the word "genève".
> q="gene"  auto suggestion give "geneve"
> q="genè" auto suggestion give "genève"
>
> but what i need is q="gene" auto suggestion give "genève" with accent like
> correction of word.
> i tried to add spellchecker to correct it but the maximum of character for
> correction is 2
> maybe there is other solution,
> i give my schema of field:
>
> <fieldType name="textSuggest" class="solr.TextField"
> positionIncrementGap="100" omitNorms="true">
>         <analyzer type="index">
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
> <filter class="solr.StopFilterFactory" words="stopApostrophe.txt"
> ignoreCase="true"/>
> <filter class="solr.ASCIIFoldingFilterFactory" preserveOriginal="true"/>
> <filter class="solr.LowerCaseFilterFactory" />
>         <filter class="solr.StandardFilterFactory"/>
>         </analyzer>
>         <analyzer type="query">
>          <tokenizer
> class="solr.StandardTokenizerFactory"/>replacement="$2"/>-->
>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
> <filter class="solr.StopFilterFactory" words="stopApostrophe.txt"
> ignoreCase="true"/>
> <filter class="solr.LowerCaseFilterFactory" />
>         <filter class="solr.StandardFilterFactory"/>
>         </analyzer>
>     </fieldType>
>
> thanks best regards,
> Anass BENJELLOUN
>
>
>
>
> 2014-07-31 18:41 GMT+02:00 Otis Gospodnetic-5 [via Lucene] <
> ml-node+s472066n4150410h35@n3.nabble.com>:
>
> > You need to do the opposite.  Make sure accents are NOT removed at index
> &
> > query time.
> >
> > Otis
> > --
> > Performance Monitoring * Log Analytics * Search Analytics
> > Solr & Elasticsearch Support * http://sematext.com/
> >
> >
> >
> > On Thu, Jul 31, 2014 at 5:49 PM, benjelloun <[hidden email]
> > <http://user/SendEmail.jtp?type=node&node=4150410&i=0>> wrote:
> >
> > > hi,
> > >
> > > q="gene"  it suggest "geneve"
> > > ASCIIFoldingFilter work like isolate accent
> > >
> > > what i need to suggest is "genève"
> > >
> > > any idea?
> > >
> > > thanks
> > > best reagards
> > > Anass BENJELLOUN
> > >
> > >
> > >
> > > --
> > > View this message in context:
> > >
> >
> http://lucene.472066.n3.nabble.com/Auto-suggest-with-adding-accents-tp4150379p4150392.html
> >
> > > Sent from the Solr - User mailing list archive at Nabble.com.
> > >
> >
> >
> > ------------------------------
> >  If you reply to this email, your message will be added to the discussion
> > below:
> >
> >
> http://lucene.472066.n3.nabble.com/Auto-suggest-with-adding-accents-tp4150379p4150410.html
> >  To unsubscribe from Auto suggest with adding accents, click here
> > <
> http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4150379&code=YW5hc3MuYm5qQGdtYWlsLmNvbXw0MTUwMzc5fC0xMDQyNjMzMDgx
> >
> > .
> > NAML
> > <
> http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml
> >
> >
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Auto-suggest-with-adding-accents-tp4150379p4150569.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Auto suggest with adding accents

Posted by benjelloun <an...@gmail.com>.
hello,

you didnt enderstand well my problem i give you exemple:
the document contain the word "genève".
q="gene"  auto suggestion give "geneve"
q="genè" auto suggestion give "genève"

but what i need is q="gene" auto suggestion give "genève" with accent like
correction of word.
i tried to add spellchecker to correct it but the maximum of character for
correction is 2
maybe there is other solution,
i give my schema of field:

<fieldType name="textSuggest" class="solr.TextField"
positionIncrementGap="100" omitNorms="true">
        <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
<filter class="solr.StopFilterFactory" words="stopApostrophe.txt"
ignoreCase="true"/>
<filter class="solr.ASCIIFoldingFilterFactory" preserveOriginal="true"/>
<filter class="solr.LowerCaseFilterFactory" />
        <filter class="solr.StandardFilterFactory"/>
        </analyzer>
        <analyzer type="query">
         <tokenizer
class="solr.StandardTokenizerFactory"/>replacement="$2"/>-->
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
<filter class="solr.StopFilterFactory" words="stopApostrophe.txt"
ignoreCase="true"/>
<filter class="solr.LowerCaseFilterFactory" />
        <filter class="solr.StandardFilterFactory"/>
        </analyzer>
    </fieldType>

thanks best regards,
Anass BENJELLOUN




2014-07-31 18:41 GMT+02:00 Otis Gospodnetic-5 [via Lucene] <
ml-node+s472066n4150410h35@n3.nabble.com>:

> You need to do the opposite.  Make sure accents are NOT removed at index &
> query time.
>
> Otis
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
>
>
>
> On Thu, Jul 31, 2014 at 5:49 PM, benjelloun <[hidden email]
> <http://user/SendEmail.jtp?type=node&node=4150410&i=0>> wrote:
>
> > hi,
> >
> > q="gene"  it suggest "geneve"
> > ASCIIFoldingFilter work like isolate accent
> >
> > what i need to suggest is "genève"
> >
> > any idea?
> >
> > thanks
> > best reagards
> > Anass BENJELLOUN
> >
> >
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/Auto-suggest-with-adding-accents-tp4150379p4150392.html
>
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/Auto-suggest-with-adding-accents-tp4150379p4150410.html
>  To unsubscribe from Auto suggest with adding accents, click here
> <http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4150379&code=YW5hc3MuYm5qQGdtYWlsLmNvbXw0MTUwMzc5fC0xMDQyNjMzMDgx>
> .
> NAML
> <http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: http://lucene.472066.n3.nabble.com/Auto-suggest-with-adding-accents-tp4150379p4150569.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Auto suggest with adding accents

Posted by Otis Gospodnetic <ot...@gmail.com>.
You need to do the opposite.  Make sure accents are NOT removed at index &
query time.

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Thu, Jul 31, 2014 at 5:49 PM, benjelloun <an...@gmail.com> wrote:

> hi,
>
> q="gene"  it suggest "geneve"
> ASCIIFoldingFilter work like isolate accent
>
> what i need to suggest is "genève"
>
> any idea?
>
> thanks
> best reagards
> Anass BENJELLOUN
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Auto-suggest-with-adding-accents-tp4150379p4150392.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Auto suggest with adding accents

Posted by benjelloun <an...@gmail.com>.
hi,

q="gene"  it suggest "geneve"
ASCIIFoldingFilter work like isolate accent

what i need to suggest is "genève"

any idea?

thanks
best reagards
Anass BENJELLOUN



--
View this message in context: http://lucene.472066.n3.nabble.com/Auto-suggest-with-adding-accents-tp4150379p4150392.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Auto suggest with adding accents

Posted by Ahmet Arslan <io...@yahoo.com.INVALID>.
Hi,

What happens when you add ASCIIFoldingFilter to field type definition of suggestField?

Ahmet


On Thursday, July 31, 2014 5:49 PM, benjelloun <an...@gmail.com> wrote:
Hello,

i'm trying to autosuggest frensh word with accents,
but if the user write q="gene" it will not suggest "genève", it will suggest
"general","genetic" ...

<searchComponent class="solr.SpellCheckComponent" name="suggests">
    <lst name="spellchecker">
      <str name="name">suggestDic</str>
      <str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
      <str
name="lookupImpl">org.apache.solr.spelling.suggest.fst.WFSTLookupFactory</str>
      <str name="storeDir">suggestFolder</str>
      <str name="field">suggestField</str>  
      <str name="buildOnCommit">true</str>
      <bool name="exactMatchFirst">true</bool>
       <str name="sourceLocation ">suggest/emptyDic.txt</str>
    </lst>
    <str name="queryAnalyzerFieldType">textSuggest</str>
  </searchComponent>
  
  <requestHandler class="org.apache.solr.handler.component.SearchHandler"
name="/suggests">
    <lst name="defaults">
      <str name="name">suggests</str>
      <str name="spellcheck">true</str>
      
      <str name="spellcheck.dictionary">suggestDic</str>
      <str name="spellcheck.onlyMorePopular">true</str>
      <str name="spellcheck.count">6</str>  
      <str name="spellcheck.collate">true</str>
      <str name="spellcheck.maxCollations">6</str> 
      <str name="spellcheck.collateExtendedResults">true</str>  
    </lst>
    <arr name="components">
      <str>suggests</str>
    </arr>
  </requestHandler>

The field "suggestField" dont isolate accents.

Thanks for help,

Best regards,
Anass BENJELLOUN




--
View this message in context: http://lucene.472066.n3.nabble.com/Auto-suggest-with-adding-accents-tp4150379.html
Sent from the Solr - User mailing list archive at Nabble.com.