You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Clemens Wyss DEV <cl...@mysign.ch> on 2014/12/04 09:05:04 UTC

Keeping capitalization in suggestions?

When I index a text such as "Chamäleon" and look for suggestions for "chamä" and/or "Chamä", I'd expect to get "Chamäleon" (uppercased). 
But what happens is

If lowecasefilter (see below (1)) set
"chamä" returns "chamäleon"
"Chamä" does not match

If lowecasefilter (1) not set
"Chamä" returns "Chamäleon"
"chamä" does not match

I guess lowecasefilter should not be set/active, but then how do I get matches even if the search term is lowercased?

Context:
schema.xml
...
    <fieldType class="solr.TextField" name="text_de" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_de.txt"/>
        <filter class="solr.GermanLightStemFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_de.txt"/>
        <filter class="solr.GermanLightStemFilterFactory"/>
      </analyzer>
    </fieldType>
...
    <fieldType class="solr.TextField" name="text_suggest" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.UAX29URLEmailTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
        <filter class="solr.LowerCaseFilterFactory"/> <!-- (1) -->
      </analyzer>
    </fieldType>

solrconfig.xml
-----------------
...
    <requestHandler class="org.apache.solr.handler.component.SearchHandler" name="/suggest">
        <lst name="defaults">
            <str name="echoParams">none</str>
            <str name="wt">json</str>
            <str name="indent">false</str>
            <str name="spellcheck">true</str>
            <str name="spellcheck.dictionary">suggestDictionary</str>
            <str name="spellcheck.onlyMorePopular">true</str>
            <str name="spellcheck.count">5</str>
            <str name="spellcheck.collate">false</str>
        </lst>
        <arr name="components">
            <str>suggest</str>
        </arr>
    </requestHandler>
...
    <searchComponent class="solr.SpellCheckComponent" name="suggest">
        <lst name="spellchecker">
            <str name="name">suggestDictionary</str>
            <str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
            <str name="lookupImpl">org.apache.solr.spelling.suggest.fst.FSTLookupFactory</str>
            <str name="field">suggest</str>
            <float name="threshold">0.</float>
            <str name="buildOnCommit">true</str>
        </lst>
    </searchComponent>
...


Re: Keeping capitalization in suggestions?

Posted by Gopal Patwa <go...@gmail.com>.
More detail can be found in Solr Docs

https://cwiki.apache.org/confluence/display/solr/Suggester


On Thu, Dec 4, 2014 at 6:33 AM, Clemens Wyss DEV <cl...@mysign.ch>
wrote:

> Enter the "factory"! ;)
> <str
> name="lookupImpl">org.apache.solr.spelling.suggest.fst.AnalyzingInfixLookupFactory</str>
>
> -----Ursprüngliche Nachricht-----
> Von: Clemens Wyss DEV [mailto:clemensdev@mysign.ch]
> Gesendet: Donnerstag, 4. Dezember 2014 14:46
> An: solr-user@lucene.apache.org
> Betreff: AW: Keeping capitalization in suggestions?
>
> Thx.
> Where (in which jar) do I find
> org.apache.solr.spelling.suggest.AnalyzingInfixSuggester ?
> Or:
> How do I declare the suggest-searchComponent in solrconfig.xml to make use
> of (Lucene's?) AnalyzingInfixSuggester
>
> -----Ursprüngliche Nachricht-----
> Von: Michael Sokolov [mailto:msokolov@safaribooksonline.com]
> Gesendet: Donnerstag, 4. Dezember 2014 14:05
> An: solr-user@lucene.apache.org
> Betreff: Re: Keeping capitalization in suggestions?
>
> Have a look at AnalyzingInfixSuggester - it does what you want.
>
> -Mike
>
> On 12/4/14 3:05 AM, Clemens Wyss DEV wrote:
> > When I index a text such as "Chamäleon" and look for suggestions for
> "chamä" and/or "Chamä", I'd expect to get "Chamäleon" (uppercased).
> > But what happens is
> >
> > If lowecasefilter (see below (1)) set
> > "chamä" returns "chamäleon"
> > "Chamä" does not match
> >
> > If lowecasefilter (1) not set
> > "Chamä" returns "Chamäleon"
> > "chamä" does not match
> >
> > I guess lowecasefilter should not be set/active, but then how do I get
> matches even if the search term is lowercased?
> >
> > Context:
> > schema.xml
> > ...
> >      <fieldType class="solr.TextField" name="text_de"
> positionIncrementGap="100">
> >        <analyzer type="index">
> >          <tokenizer class="solr.StandardTokenizerFactory"/>
> >          <filter class="solr.LowerCaseFilterFactory"/>
> >          <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="lang/stopwords_de.txt"/>
> >          <filter class="solr.GermanLightStemFilterFactory"/>
> >        </analyzer>
> >        <analyzer type="query">
> >          <tokenizer class="solr.StandardTokenizerFactory"/>
> >          <filter class="solr.SynonymFilterFactory" expand="true"
> ignoreCase="true" synonyms="synonyms.txt"/>
> >          <filter class="solr.LowerCaseFilterFactory"/>
> >          <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="lang/stopwords_de.txt"/>
> >          <filter class="solr.GermanLightStemFilterFactory"/>
> >        </analyzer>
> >      </fieldType>
> > ...
> >      <fieldType class="solr.TextField" name="text_suggest"
> positionIncrementGap="100">
> >        <analyzer>
> >          <tokenizer class="solr.UAX29URLEmailTokenizerFactory"/>
> >          <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt"/>
> >          <filter class="solr.LowerCaseFilterFactory"/> <!-- (1) -->
> >        </analyzer>
> >      </fieldType>
> >
> > solrconfig.xml
> > -----------------
> > ...
> >      <requestHandler
> class="org.apache.solr.handler.component.SearchHandler" name="/suggest">
> >          <lst name="defaults">
> >              <str name="echoParams">none</str>
> >              <str name="wt">json</str>
> >              <str name="indent">false</str>
> >              <str name="spellcheck">true</str>
> >              <str name="spellcheck.dictionary">suggestDictionary</str>
> >              <str name="spellcheck.onlyMorePopular">true</str>
> >              <str name="spellcheck.count">5</str>
> >              <str name="spellcheck.collate">false</str>
> >          </lst>
> >          <arr name="components">
> >              <str>suggest</str>
> >          </arr>
> >      </requestHandler>
> > ...
> >      <searchComponent class="solr.SpellCheckComponent" name="suggest">
> >          <lst name="spellchecker">
> >              <str name="name">suggestDictionary</str>
> >              <str
> name="classname">org.apache.solr.spelling.suggest.Suggester</str>
> >              <str
> name="lookupImpl">org.apache.solr.spelling.suggest.fst.FSTLookupFactory</str>
> >              <str name="field">suggest</str>
> >              <float name="threshold">0.</float>
> >              <str name="buildOnCommit">true</str>
> >          </lst>
> >      </searchComponent>
> > ...
> >
>
>

AW: Keeping capitalization in suggestions?

Posted by Clemens Wyss DEV <cl...@mysign.ch>.
Enter the "factory"! ;)
<str name="lookupImpl">org.apache.solr.spelling.suggest.fst.AnalyzingInfixLookupFactory</str>

-----Ursprüngliche Nachricht-----
Von: Clemens Wyss DEV [mailto:clemensdev@mysign.ch] 
Gesendet: Donnerstag, 4. Dezember 2014 14:46
An: solr-user@lucene.apache.org
Betreff: AW: Keeping capitalization in suggestions?

Thx.
Where (in which jar) do I find org.apache.solr.spelling.suggest.AnalyzingInfixSuggester ?
Or:
How do I declare the suggest-searchComponent in solrconfig.xml to make use of (Lucene's?) AnalyzingInfixSuggester

-----Ursprüngliche Nachricht-----
Von: Michael Sokolov [mailto:msokolov@safaribooksonline.com] 
Gesendet: Donnerstag, 4. Dezember 2014 14:05
An: solr-user@lucene.apache.org
Betreff: Re: Keeping capitalization in suggestions?

Have a look at AnalyzingInfixSuggester - it does what you want.

-Mike

On 12/4/14 3:05 AM, Clemens Wyss DEV wrote:
> When I index a text such as "Chamäleon" and look for suggestions for "chamä" and/or "Chamä", I'd expect to get "Chamäleon" (uppercased).
> But what happens is
>
> If lowecasefilter (see below (1)) set
> "chamä" returns "chamäleon"
> "Chamä" does not match
>
> If lowecasefilter (1) not set
> "Chamä" returns "Chamäleon"
> "chamä" does not match
>
> I guess lowecasefilter should not be set/active, but then how do I get matches even if the search term is lowercased?
>
> Context:
> schema.xml
> ...
>      <fieldType class="solr.TextField" name="text_de" positionIncrementGap="100">
>        <analyzer type="index">
>          <tokenizer class="solr.StandardTokenizerFactory"/>
>          <filter class="solr.LowerCaseFilterFactory"/>
>          <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_de.txt"/>
>          <filter class="solr.GermanLightStemFilterFactory"/>
>        </analyzer>
>        <analyzer type="query">
>          <tokenizer class="solr.StandardTokenizerFactory"/>
>          <filter class="solr.SynonymFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/>
>          <filter class="solr.LowerCaseFilterFactory"/>
>          <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_de.txt"/>
>          <filter class="solr.GermanLightStemFilterFactory"/>
>        </analyzer>
>      </fieldType>
> ...
>      <fieldType class="solr.TextField" name="text_suggest" positionIncrementGap="100">
>        <analyzer>
>          <tokenizer class="solr.UAX29URLEmailTokenizerFactory"/>
>          <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
>          <filter class="solr.LowerCaseFilterFactory"/> <!-- (1) -->
>        </analyzer>
>      </fieldType>
>
> solrconfig.xml
> -----------------
> ...
>      <requestHandler class="org.apache.solr.handler.component.SearchHandler" name="/suggest">
>          <lst name="defaults">
>              <str name="echoParams">none</str>
>              <str name="wt">json</str>
>              <str name="indent">false</str>
>              <str name="spellcheck">true</str>
>              <str name="spellcheck.dictionary">suggestDictionary</str>
>              <str name="spellcheck.onlyMorePopular">true</str>
>              <str name="spellcheck.count">5</str>
>              <str name="spellcheck.collate">false</str>
>          </lst>
>          <arr name="components">
>              <str>suggest</str>
>          </arr>
>      </requestHandler>
> ...
>      <searchComponent class="solr.SpellCheckComponent" name="suggest">
>          <lst name="spellchecker">
>              <str name="name">suggestDictionary</str>
>              <str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
>              <str name="lookupImpl">org.apache.solr.spelling.suggest.fst.FSTLookupFactory</str>
>              <str name="field">suggest</str>
>              <float name="threshold">0.</float>
>              <str name="buildOnCommit">true</str>
>          </lst>
>      </searchComponent>
> ...
>


AW: Keeping capitalization in suggestions?

Posted by Clemens Wyss DEV <cl...@mysign.ch>.
Thx.
Where (in which jar) do I find org.apache.solr.spelling.suggest.AnalyzingInfixSuggester ?
Or:
How do I declare the suggest-searchComponent in solrconfig.xml to make use of (Lucene's?) AnalyzingInfixSuggester

-----Ursprüngliche Nachricht-----
Von: Michael Sokolov [mailto:msokolov@safaribooksonline.com] 
Gesendet: Donnerstag, 4. Dezember 2014 14:05
An: solr-user@lucene.apache.org
Betreff: Re: Keeping capitalization in suggestions?

Have a look at AnalyzingInfixSuggester - it does what you want.

-Mike

On 12/4/14 3:05 AM, Clemens Wyss DEV wrote:
> When I index a text such as "Chamäleon" and look for suggestions for "chamä" and/or "Chamä", I'd expect to get "Chamäleon" (uppercased).
> But what happens is
>
> If lowecasefilter (see below (1)) set
> "chamä" returns "chamäleon"
> "Chamä" does not match
>
> If lowecasefilter (1) not set
> "Chamä" returns "Chamäleon"
> "chamä" does not match
>
> I guess lowecasefilter should not be set/active, but then how do I get matches even if the search term is lowercased?
>
> Context:
> schema.xml
> ...
>      <fieldType class="solr.TextField" name="text_de" positionIncrementGap="100">
>        <analyzer type="index">
>          <tokenizer class="solr.StandardTokenizerFactory"/>
>          <filter class="solr.LowerCaseFilterFactory"/>
>          <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_de.txt"/>
>          <filter class="solr.GermanLightStemFilterFactory"/>
>        </analyzer>
>        <analyzer type="query">
>          <tokenizer class="solr.StandardTokenizerFactory"/>
>          <filter class="solr.SynonymFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/>
>          <filter class="solr.LowerCaseFilterFactory"/>
>          <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_de.txt"/>
>          <filter class="solr.GermanLightStemFilterFactory"/>
>        </analyzer>
>      </fieldType>
> ...
>      <fieldType class="solr.TextField" name="text_suggest" positionIncrementGap="100">
>        <analyzer>
>          <tokenizer class="solr.UAX29URLEmailTokenizerFactory"/>
>          <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
>          <filter class="solr.LowerCaseFilterFactory"/> <!-- (1) -->
>        </analyzer>
>      </fieldType>
>
> solrconfig.xml
> -----------------
> ...
>      <requestHandler class="org.apache.solr.handler.component.SearchHandler" name="/suggest">
>          <lst name="defaults">
>              <str name="echoParams">none</str>
>              <str name="wt">json</str>
>              <str name="indent">false</str>
>              <str name="spellcheck">true</str>
>              <str name="spellcheck.dictionary">suggestDictionary</str>
>              <str name="spellcheck.onlyMorePopular">true</str>
>              <str name="spellcheck.count">5</str>
>              <str name="spellcheck.collate">false</str>
>          </lst>
>          <arr name="components">
>              <str>suggest</str>
>          </arr>
>      </requestHandler>
> ...
>      <searchComponent class="solr.SpellCheckComponent" name="suggest">
>          <lst name="spellchecker">
>              <str name="name">suggestDictionary</str>
>              <str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
>              <str name="lookupImpl">org.apache.solr.spelling.suggest.fst.FSTLookupFactory</str>
>              <str name="field">suggest</str>
>              <float name="threshold">0.</float>
>              <str name="buildOnCommit">true</str>
>          </lst>
>      </searchComponent>
> ...
>


Re: AW: AW: Keeping capitalization in suggestions?

Posted by Ryan Yacyshyn <ry...@gmail.com>.
Hi Clemens,

I recently added typeahead functionality to something I'm playing with and
I used the EdgeNGramFilterFactory to help. I just tried this out after
adding a doc with "Chamäleon" in my title.

I was able to get "Chamäleon", with a capital C, returned I searched for
chama, Chama, chamã, and Chamã.

Here's what I have in my files:

-----------------
solrconfig.xml:

<requestHandler name="/suggest_movie" class="solr.SearchHandler">
  <lst name="defaults">
    <str name="wt">json</str>
    <str name="defType">edismax</str>
    <str name="rows">10</str>
    <str name="omitHeader">true</str> <!-- keeping the response as lean as
possible so not returning header info.. -->
    <str name="fl">value:title</str> <!-- only returning 'title', and I
want that key to be called 'value' in the response.. -->
    <str name="qf">title^10 suggest_ngram</str> <!-- boosting title to show
on top if exact match with query.. -->
  </lst>
</requestHandler>

-----------------
schema.xml:

<fieldType name="text_suggest_ngram" class="solr.TextField"
positionIncrementGap="100">
 <analyzer type="index">
   <tokenizer class="solr.UAX29URLEmailTokenizerFactory" />
   <filter class="solr.StopFilterFactory" ignoreCase="true"
words="lang/stopwords_en.txt" />
   <filter class="solr.LowerCaseFilterFactory" />
   <filter class="solr.ASCIIFoldingFilterFactory" />
   <filter class="solr.EnglishPossessiveFilterFactory" />
   <filter class="solr.EdgeNGramFilterFactory" minGramSize="2"
maxGramSize="10" /> <!-- create edge n-grams of each term when indexing,
not when querying.. -->
 </analyzer>
 <analyzer type="query">
   <tokenizer class="solr.UAX29URLEmailTokenizerFactory" />
   <filter class="solr.StopFilterFactory" ignoreCase="true"
words="lang/stopwords_en.txt" />
   <filter class="solr.LowerCaseFilterFactory" />
   <filter class="solr.ASCIIFoldingFilterFactory" />
   <filter class="solr.EnglishPossessiveFilterFactory" />
 </analyzer>
</fieldType>

...

<field name="suggest_ngram" type="text_suggest_ngram" indexed="true"
stored="false" />

...

<copyField source="title" dest="suggest_ngram" />

-----------------
request:

http://localhost:8983/solr/movies/suggest_movie?q=chama

-----------------
response:

{
    "response": {
        "numFound": 1,
        "start": 0,
        "docs": [
            {
                "value": "Chamäleon"
            }
        ]
    }
}

Hope this helps?

Ryan




On Tue Dec 09 2014 at 7:21:02 AM Michael Sokolov <
msokolov@safaribooksonline.com> wrote:

> Clemens --
>
>    what I do (see suggestions of titles of books on $EMPLOYER's web
> site) is to define a field with no analysis (type=keyword, use
> KeywordAnalyzer) and build the suggestions from that.  Then tell AIS to
> use an analyzer internally to pick out word from that (StandardAnalyzer,
> or WhitespaceAnalyzer, with LowerCaseFilter - however you want the
> matching to work in the suggester).  It will return the terms from the
> source field.
>
> You didn't show the definition of your "suggest" field - I expect it
> must be analyzed, right?  Just don't do that.
>
> -Mike
>
> On 12/09/2014 08:58 AM, Clemens Wyss DEV wrote:
> > Thanks for all the insightful links.
> > I tried http://www.cominvent.com/2012/01/25/super-flexible-autocompl
> ete-with-solr but that approach returns searchresults instead of
> term-suggestions.
> >
> > I have (at the moment) a solution based on http://wiki.apache.org/solr/
> TermsComponent . But I might want multi-term-suggestions (and fuzzyness).
> > Therefore I'd be very much interested how AnalyzingInfixLookupFactory
> (or any other suggest-component) would allow to
> > a) return case-sensitive suggestions (i.e. as-indexed/stored)
> > b) allow case-insensitive suggestion-lookup
> > ?
> > Anybody else doing what I'd like to do?
> >
> > -----Ursprüngliche Nachricht-----
> > Von: Ahmet Arslan [mailto:iorixxx@yahoo.com.INVALID]
> > Gesendet: Montag, 8. Dezember 2014 19:25
> > An: solr-user@lucene.apache.org
> > Betreff: Re: AW: Keeping capitalization in suggestions?
> >
> > Hi Clemens,
> >
> > There a a number of ways to implement auto complete/suggest. Some of
> them pull data from indexed terms, therefore they will be lowercased. Some
> pull data from stored values, therefore capitalisation is preserved.
> >
> > Here are great resources on this topic.
> >
> > https://lucidworks.com/blog/auto-suggest-from-popular-querie
> s-using-edgengrams/
> > http://blog.trifork.com/2012/02/15/different-ways-to-make-au
> to-suggestions-with-solr/
> > http://www.cominvent.com/2012/01/25/super-flexible-autocompl
> ete-with-solr/
> >
> > Ahmet
> >
> >
> > On Monday, December 8, 2014 5:43 PM, Clemens Wyss DEV <
> clemensdev@mysign.ch> wrote:
> >
> > Allthough making use of AnalyzingInfixSuggester I still getting "either
> or".
> >
> > When lowercase-filter is active I always get suggestions, BUT they are
> lowercased (i.e. "chamäleon").
> > When lowercase-filter is not active I only get suggestions when querying
> "Chamä"
> >
> > my solrconfig.xml
> > ...
> >      <requestHandler class="org.apache.solr.handler.component.SearchHandler"
> name="/suggest">
> >          <lst name="defaults">
> >              <str name="echoParams">none</str>
> >              <str name="wt">json</str>
> >              <str name="indent">false</str>
> >              <str name="spellcheck">true</str>
> >              <str name="spellcheck.dictionary">suggestDictionary</str>
> >              <str name="spellcheck.onlyMorePopular">true</str>
> >              <str name="spellcheck.count">5</str>
> >              <str name="spellcheck.collate">false</str>
> >          </lst>
> >          <arr name="components">
> >              <str>suggest</str>
> >          </arr>
> >      </requestHandler>
> > ...
> >      <searchComponent class="solr.SpellCheckComponent" name="suggest">
> >        <lst name="spellchecker">
> >          <str name="name">suggestDictionary</str>
> >          <str name="classname">org.apache.solr.spelling.suggest.
> Suggester</str>
> >          <str name="lookupImpl">org.apache.solr.spelling.suggest.fst.
> AnalyzingInfixLookupFactory</str>
> >          <str name="dictionaryImpl">org.apache.solr.spelling.suggest.
> DocumentDictionaryFactory</str>
> >          <str name="field">suggest</str>
> >          <str name="buildOnCommit">true</str>
> >          <str name="storeDir">suggester</str>
> >          <str name="suggestAnalyzerFieldType">text_suggest</str>
> >          <str name="minPrefixChars">4</str>
> >        </lst>
> >      </searchComponent>
> > ...
> >
> > my schema.xml
> > ...
> > <field indexed="true" multiValued="true" name="suggest" stored="false"
> type="text_suggest"/> ...
> >      <fieldType class="solr.TextField" name="text_suggest"
> positionIncrementGap="100">
> >        <analyzer type="index">
> >          <tokenizer class="solr.UAX29URLEmailTokenizerFactory"/>
> >          <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt"/>
> > <!-- <filter class="solr.LowerCaseFilterFactory"/> -->
> >        </analyzer>
> >        <analyzer type="query">
> >          <tokenizer class="solr.UAX29URLEmailTokenizerFactory"/>
> >          <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt"/>
> > <!--        <filter class="solr.LowerCaseFilterFactory"/>    -->
> >    </analyzer>
> >      </fieldType>
> > ...
> >
> >
> > -----Ursprüngliche Nachricht-----
> > Von: Michael Sokolov [mailto:msokolov@safaribooksonline.com]
> > Gesendet: Donnerstag, 4. Dezember 2014 14:05
> > An: solr-user@lucene.apache.org
> > Betreff: Re: Keeping capitalization in suggestions?
> >
> > Have a look at AnalyzingInfixSuggester - it does what you want.
> >
> > -Mike
> >
> > On 12/4/14 3:05 AM, Clemens Wyss DEV wrote:
> >> When I index a text such as "Chamäleon" and look for suggestions for
> "chamä" and/or "Chamä", I'd expect to get "Chamäleon" (uppercased).
> >> But what happens is
> >>
> >> If lowecasefilter (see below (1)) set
> >> "chamä" returns "chamäleon"
> >> "Chamä" does not match
> >>
> >> If lowecasefilter (1) not set
> >> "Chamä" returns "Chamäleon"
> >> "chamä" does not match
> >>
> >> I guess lowecasefilter should not be set/active, but then how do I get
> matches even if the search term is lowercased?
> >>
> >> Context:
> >> schema.xml
> >> ...
> >>       <fieldType class="solr.TextField" name="text_de"
> positionIncrementGap="100">
> >>         <analyzer type="index">
> >>           <tokenizer class="solr.StandardTokenizerFactory"/>
> >>           <filter class="solr.LowerCaseFilterFactory"/>
> >>           <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="lang/stopwords_de.txt"/>
> >>           <filter class="solr.GermanLightStemFilterFactory"/>
> >>         </analyzer>
> >>         <analyzer type="query">
> >>           <tokenizer class="solr.StandardTokenizerFactory"/>
> >>           <filter class="solr.SynonymFilterFactory" expand="true"
> ignoreCase="true" synonyms="synonyms.txt"/>
> >>           <filter class="solr.LowerCaseFilterFactory"/>
> >>           <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="lang/stopwords_de.txt"/>
> >>           <filter class="solr.GermanLightStemFilterFactory"/>
> >>         </analyzer>
> >>       </fieldType>
> >> ...
> >>       <fieldType class="solr.TextField" name="text_suggest"
> positionIncrementGap="100">
> >>         <analyzer>
> >>           <tokenizer class="solr.UAX29URLEmailTokenizerFactory"/>
> >>           <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt"/>
> >>           <filter class="solr.LowerCaseFilterFactory"/> <!-- (1) -->
> >>         </analyzer>
> >>       </fieldType>
> >>
> >> solrconfig.xml
> >> -----------------
> >> ...
> >>       <requestHandler class="org.apache.solr.handler.component.SearchHandler"
> name="/suggest">
> >>           <lst name="defaults">
> >>               <str name="echoParams">none</str>
> >>               <str name="wt">json</str>
> >>               <str name="indent">false</str>
> >>               <str name="spellcheck">true</str>
> >>               <str name="spellcheck.dictionary">suggestDictionary</str>
> >>               <str name="spellcheck.onlyMorePopular">true</str>
> >>               <str name="spellcheck.count">5</str>
> >>               <str name="spellcheck.collate">false</str>
> >>           </lst>
> >>           <arr name="components">
> >>               <str>suggest</str>
> >>           </arr>
> >>       </requestHandler>
> >> ...
> >>       <searchComponent class="solr.SpellCheckComponent" name="suggest">
> >>           <lst name="spellchecker">
> >>               <str name="name">suggestDictionary</str>
> >>               <str name="classname">org.apache.solr.spelling.suggest.
> Suggester</str>
> >>               <str name="lookupImpl">org.apache.s
> olr.spelling.suggest.fst.FSTLookupFactory</str>
> >>               <str name="field">suggest</str>
> >>               <float name="threshold">0.</float>
> >>               <str name="buildOnCommit">true</str>
> >>           </lst>
> >>       </searchComponent>
> >> ...
> >>
>
>

Re: AW: AW: Keeping capitalization in suggestions?

Posted by Michael Sokolov <ms...@safaribooksonline.com>.
Clemens --

   what I do (see suggestions of titles of books on $EMPLOYER's web 
site) is to define a field with no analysis (type=keyword, use 
KeywordAnalyzer) and build the suggestions from that.  Then tell AIS to 
use an analyzer internally to pick out word from that (StandardAnalyzer, 
or WhitespaceAnalyzer, with LowerCaseFilter - however you want the 
matching to work in the suggester).  It will return the terms from the 
source field.

You didn't show the definition of your "suggest" field - I expect it 
must be analyzed, right?  Just don't do that.

-Mike

On 12/09/2014 08:58 AM, Clemens Wyss DEV wrote:
> Thanks for all the insightful links.
> I tried http://www.cominvent.com/2012/01/25/super-flexible-autocomplete-with-solr but that approach returns searchresults instead of term-suggestions.
>
> I have (at the moment) a solution based on http://wiki.apache.org/solr/TermsComponent . But I might want multi-term-suggestions (and fuzzyness).
> Therefore I'd be very much interested how AnalyzingInfixLookupFactory (or any other suggest-component) would allow to
> a) return case-sensitive suggestions (i.e. as-indexed/stored)
> b) allow case-insensitive suggestion-lookup
> ?
> Anybody else doing what I'd like to do?
>
> -----Ursprüngliche Nachricht-----
> Von: Ahmet Arslan [mailto:iorixxx@yahoo.com.INVALID]
> Gesendet: Montag, 8. Dezember 2014 19:25
> An: solr-user@lucene.apache.org
> Betreff: Re: AW: Keeping capitalization in suggestions?
>
> Hi Clemens,
>
> There a a number of ways to implement auto complete/suggest. Some of them pull data from indexed terms, therefore they will be lowercased. Some pull data from stored values, therefore capitalisation is preserved.
>
> Here are great resources on this topic.
>
> https://lucidworks.com/blog/auto-suggest-from-popular-queries-using-edgengrams/
> http://blog.trifork.com/2012/02/15/different-ways-to-make-auto-suggestions-with-solr/
> http://www.cominvent.com/2012/01/25/super-flexible-autocomplete-with-solr/
>
> Ahmet
>
>
> On Monday, December 8, 2014 5:43 PM, Clemens Wyss DEV <cl...@mysign.ch> wrote:
>
> Allthough making use of AnalyzingInfixSuggester I still getting "either or".
>
> When lowercase-filter is active I always get suggestions, BUT they are lowercased (i.e. "chamäleon").
> When lowercase-filter is not active I only get suggestions when querying "Chamä"
>
> my solrconfig.xml
> ...
>      <requestHandler class="org.apache.solr.handler.component.SearchHandler" name="/suggest">
>          <lst name="defaults">
>              <str name="echoParams">none</str>
>              <str name="wt">json</str>
>              <str name="indent">false</str>
>              <str name="spellcheck">true</str>
>              <str name="spellcheck.dictionary">suggestDictionary</str>
>              <str name="spellcheck.onlyMorePopular">true</str>
>              <str name="spellcheck.count">5</str>
>              <str name="spellcheck.collate">false</str>
>          </lst>
>          <arr name="components">
>              <str>suggest</str>
>          </arr>
>      </requestHandler>
> ...
>      <searchComponent class="solr.SpellCheckComponent" name="suggest">
>        <lst name="spellchecker">
>          <str name="name">suggestDictionary</str>
>          <str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
>          <str name="lookupImpl">org.apache.solr.spelling.suggest.fst.AnalyzingInfixLookupFactory</str>
>          <str name="dictionaryImpl">org.apache.solr.spelling.suggest.DocumentDictionaryFactory</str>
>          <str name="field">suggest</str>
>          <str name="buildOnCommit">true</str>
>          <str name="storeDir">suggester</str>
>          <str name="suggestAnalyzerFieldType">text_suggest</str>
>          <str name="minPrefixChars">4</str>
>        </lst>
>      </searchComponent>
> ...
>
> my schema.xml
> ...
> <field indexed="true" multiValued="true" name="suggest" stored="false" type="text_suggest"/> ...
>      <fieldType class="solr.TextField" name="text_suggest" positionIncrementGap="100">
>        <analyzer type="index">
>          <tokenizer class="solr.UAX29URLEmailTokenizerFactory"/>
>          <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
> <!-- <filter class="solr.LowerCaseFilterFactory"/> -->
>        </analyzer>
>        <analyzer type="query">
>          <tokenizer class="solr.UAX29URLEmailTokenizerFactory"/>
>          <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
> <!--        <filter class="solr.LowerCaseFilterFactory"/>    -->
>    </analyzer>
>      </fieldType>
> ...
>
>
> -----Ursprüngliche Nachricht-----
> Von: Michael Sokolov [mailto:msokolov@safaribooksonline.com]
> Gesendet: Donnerstag, 4. Dezember 2014 14:05
> An: solr-user@lucene.apache.org
> Betreff: Re: Keeping capitalization in suggestions?
>
> Have a look at AnalyzingInfixSuggester - it does what you want.
>
> -Mike
>
> On 12/4/14 3:05 AM, Clemens Wyss DEV wrote:
>> When I index a text such as "Chamäleon" and look for suggestions for "chamä" and/or "Chamä", I'd expect to get "Chamäleon" (uppercased).
>> But what happens is
>>
>> If lowecasefilter (see below (1)) set
>> "chamä" returns "chamäleon"
>> "Chamä" does not match
>>
>> If lowecasefilter (1) not set
>> "Chamä" returns "Chamäleon"
>> "chamä" does not match
>>
>> I guess lowecasefilter should not be set/active, but then how do I get matches even if the search term is lowercased?
>>
>> Context:
>> schema.xml
>> ...
>>       <fieldType class="solr.TextField" name="text_de" positionIncrementGap="100">
>>         <analyzer type="index">
>>           <tokenizer class="solr.StandardTokenizerFactory"/>
>>           <filter class="solr.LowerCaseFilterFactory"/>
>>           <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_de.txt"/>
>>           <filter class="solr.GermanLightStemFilterFactory"/>
>>         </analyzer>
>>         <analyzer type="query">
>>           <tokenizer class="solr.StandardTokenizerFactory"/>
>>           <filter class="solr.SynonymFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/>
>>           <filter class="solr.LowerCaseFilterFactory"/>
>>           <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_de.txt"/>
>>           <filter class="solr.GermanLightStemFilterFactory"/>
>>         </analyzer>
>>       </fieldType>
>> ...
>>       <fieldType class="solr.TextField" name="text_suggest" positionIncrementGap="100">
>>         <analyzer>
>>           <tokenizer class="solr.UAX29URLEmailTokenizerFactory"/>
>>           <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
>>           <filter class="solr.LowerCaseFilterFactory"/> <!-- (1) -->
>>         </analyzer>
>>       </fieldType>
>>
>> solrconfig.xml
>> -----------------
>> ...
>>       <requestHandler class="org.apache.solr.handler.component.SearchHandler" name="/suggest">
>>           <lst name="defaults">
>>               <str name="echoParams">none</str>
>>               <str name="wt">json</str>
>>               <str name="indent">false</str>
>>               <str name="spellcheck">true</str>
>>               <str name="spellcheck.dictionary">suggestDictionary</str>
>>               <str name="spellcheck.onlyMorePopular">true</str>
>>               <str name="spellcheck.count">5</str>
>>               <str name="spellcheck.collate">false</str>
>>           </lst>
>>           <arr name="components">
>>               <str>suggest</str>
>>           </arr>
>>       </requestHandler>
>> ...
>>       <searchComponent class="solr.SpellCheckComponent" name="suggest">
>>           <lst name="spellchecker">
>>               <str name="name">suggestDictionary</str>
>>               <str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
>>               <str name="lookupImpl">org.apache.solr.spelling.suggest.fst.FSTLookupFactory</str>
>>               <str name="field">suggest</str>
>>               <float name="threshold">0.</float>
>>               <str name="buildOnCommit">true</str>
>>           </lst>
>>       </searchComponent>
>> ...
>>


AW: AW: Keeping capitalization in suggestions?

Posted by Clemens Wyss DEV <cl...@mysign.ch>.
Thanks for all the insightful links.
I tried http://www.cominvent.com/2012/01/25/super-flexible-autocomplete-with-solr but that approach returns searchresults instead of term-suggestions.

I have (at the moment) a solution based on http://wiki.apache.org/solr/TermsComponent . But I might want multi-term-suggestions (and fuzzyness). 
Therefore I'd be very much interested how AnalyzingInfixLookupFactory (or any other suggest-component) would allow to
a) return case-sensitive suggestions (i.e. as-indexed/stored)
b) allow case-insensitive suggestion-lookup
?
Anybody else doing what I'd like to do?

-----Ursprüngliche Nachricht-----
Von: Ahmet Arslan [mailto:iorixxx@yahoo.com.INVALID] 
Gesendet: Montag, 8. Dezember 2014 19:25
An: solr-user@lucene.apache.org
Betreff: Re: AW: Keeping capitalization in suggestions?

Hi Clemens,

There a a number of ways to implement auto complete/suggest. Some of them pull data from indexed terms, therefore they will be lowercased. Some pull data from stored values, therefore capitalisation is preserved.

Here are great resources on this topic.

https://lucidworks.com/blog/auto-suggest-from-popular-queries-using-edgengrams/
http://blog.trifork.com/2012/02/15/different-ways-to-make-auto-suggestions-with-solr/
http://www.cominvent.com/2012/01/25/super-flexible-autocomplete-with-solr/

Ahmet


On Monday, December 8, 2014 5:43 PM, Clemens Wyss DEV <cl...@mysign.ch> wrote:

Allthough making use of AnalyzingInfixSuggester I still getting "either or".

When lowercase-filter is active I always get suggestions, BUT they are lowercased (i.e. "chamäleon").
When lowercase-filter is not active I only get suggestions when querying "Chamä"

my solrconfig.xml
...
    <requestHandler class="org.apache.solr.handler.component.SearchHandler" name="/suggest">
        <lst name="defaults">
            <str name="echoParams">none</str>
            <str name="wt">json</str>
            <str name="indent">false</str>
            <str name="spellcheck">true</str>
            <str name="spellcheck.dictionary">suggestDictionary</str>
            <str name="spellcheck.onlyMorePopular">true</str>
            <str name="spellcheck.count">5</str>
            <str name="spellcheck.collate">false</str>
        </lst>
        <arr name="components">
            <str>suggest</str>
        </arr>
    </requestHandler>
...
    <searchComponent class="solr.SpellCheckComponent" name="suggest">
      <lst name="spellchecker">
        <str name="name">suggestDictionary</str>
        <str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
        <str name="lookupImpl">org.apache.solr.spelling.suggest.fst.AnalyzingInfixLookupFactory</str>
        <str name="dictionaryImpl">org.apache.solr.spelling.suggest.DocumentDictionaryFactory</str>
        <str name="field">suggest</str>  
        <str name="buildOnCommit">true</str>
        <str name="storeDir">suggester</str>
        <str name="suggestAnalyzerFieldType">text_suggest</str>
        <str name="minPrefixChars">4</str>
      </lst>
    </searchComponent>
...

my schema.xml
...
<field indexed="true" multiValued="true" name="suggest" stored="false" type="text_suggest"/> ...
    <fieldType class="solr.TextField" name="text_suggest" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.UAX29URLEmailTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
<!-- <filter class="solr.LowerCaseFilterFactory"/> -->        
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.UAX29URLEmailTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> 
<!--        <filter class="solr.LowerCaseFilterFactory"/>    -->    
  </analyzer>      
    </fieldType>
...


-----Ursprüngliche Nachricht-----
Von: Michael Sokolov [mailto:msokolov@safaribooksonline.com]
Gesendet: Donnerstag, 4. Dezember 2014 14:05
An: solr-user@lucene.apache.org
Betreff: Re: Keeping capitalization in suggestions?

Have a look at AnalyzingInfixSuggester - it does what you want.

-Mike

On 12/4/14 3:05 AM, Clemens Wyss DEV wrote:
> When I index a text such as "Chamäleon" and look for suggestions for "chamä" and/or "Chamä", I'd expect to get "Chamäleon" (uppercased).
> But what happens is
>
> If lowecasefilter (see below (1)) set
> "chamä" returns "chamäleon"
> "Chamä" does not match
>
> If lowecasefilter (1) not set
> "Chamä" returns "Chamäleon"
> "chamä" does not match
>
> I guess lowecasefilter should not be set/active, but then how do I get matches even if the search term is lowercased?
>
> Context:
> schema.xml
> ...
>      <fieldType class="solr.TextField" name="text_de" positionIncrementGap="100">
>        <analyzer type="index">
>          <tokenizer class="solr.StandardTokenizerFactory"/>
>          <filter class="solr.LowerCaseFilterFactory"/>
>          <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_de.txt"/>
>          <filter class="solr.GermanLightStemFilterFactory"/>
>        </analyzer>
>        <analyzer type="query">
>          <tokenizer class="solr.StandardTokenizerFactory"/>
>          <filter class="solr.SynonymFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/>
>          <filter class="solr.LowerCaseFilterFactory"/>
>          <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_de.txt"/>
>          <filter class="solr.GermanLightStemFilterFactory"/>
>        </analyzer>
>      </fieldType>
> ...
>      <fieldType class="solr.TextField" name="text_suggest" positionIncrementGap="100">
>        <analyzer>
>          <tokenizer class="solr.UAX29URLEmailTokenizerFactory"/>
>          <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
>          <filter class="solr.LowerCaseFilterFactory"/> <!-- (1) -->
>        </analyzer>
>      </fieldType>
>
> solrconfig.xml
> -----------------
> ...
>      <requestHandler class="org.apache.solr.handler.component.SearchHandler" name="/suggest">
>          <lst name="defaults">
>              <str name="echoParams">none</str>
>              <str name="wt">json</str>
>              <str name="indent">false</str>
>              <str name="spellcheck">true</str>
>              <str name="spellcheck.dictionary">suggestDictionary</str>
>              <str name="spellcheck.onlyMorePopular">true</str>
>              <str name="spellcheck.count">5</str>
>              <str name="spellcheck.collate">false</str>
>          </lst>
>          <arr name="components">
>              <str>suggest</str>
>          </arr>
>      </requestHandler>
> ...
>      <searchComponent class="solr.SpellCheckComponent" name="suggest">
>          <lst name="spellchecker">
>              <str name="name">suggestDictionary</str>
>              <str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
>              <str name="lookupImpl">org.apache.solr.spelling.suggest.fst.FSTLookupFactory</str>
>              <str name="field">suggest</str>
>              <float name="threshold">0.</float>
>              <str name="buildOnCommit">true</str>
>          </lst>
>      </searchComponent>
> ...
>

Re: AW: Keeping capitalization in suggestions?

Posted by Ahmet Arslan <io...@yahoo.com.INVALID>.
Hi Clemens,

There a a number of ways to implement auto complete/suggest. Some of them pull data from indexed terms, therefore they will be lowercased. Some pull data from stored values, therefore capitalisation is preserved.

Here are great resources on this topic.

https://lucidworks.com/blog/auto-suggest-from-popular-queries-using-edgengrams/
http://blog.trifork.com/2012/02/15/different-ways-to-make-auto-suggestions-with-solr/
http://www.cominvent.com/2012/01/25/super-flexible-autocomplete-with-solr/

Ahmet


On Monday, December 8, 2014 5:43 PM, Clemens Wyss DEV <cl...@mysign.ch> wrote:

Allthough making use of AnalyzingInfixSuggester I still getting "either or".

When lowercase-filter is active I always get suggestions, BUT they are lowercased (i.e. "chamäleon").
When lowercase-filter is not active I only get suggestions when querying "Chamä"

my solrconfig.xml
...
    <requestHandler class="org.apache.solr.handler.component.SearchHandler" name="/suggest">
        <lst name="defaults">
            <str name="echoParams">none</str>
            <str name="wt">json</str>
            <str name="indent">false</str>
            <str name="spellcheck">true</str>
            <str name="spellcheck.dictionary">suggestDictionary</str>
            <str name="spellcheck.onlyMorePopular">true</str>
            <str name="spellcheck.count">5</str>
            <str name="spellcheck.collate">false</str>
        </lst>
        <arr name="components">
            <str>suggest</str>
        </arr>
    </requestHandler>
...
    <searchComponent class="solr.SpellCheckComponent" name="suggest">
      <lst name="spellchecker">
        <str name="name">suggestDictionary</str>
        <str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
        <str name="lookupImpl">org.apache.solr.spelling.suggest.fst.AnalyzingInfixLookupFactory</str>
        <str name="dictionaryImpl">org.apache.solr.spelling.suggest.DocumentDictionaryFactory</str>
        <str name="field">suggest</str>  
        <str name="buildOnCommit">true</str>
        <str name="storeDir">suggester</str>
        <str name="suggestAnalyzerFieldType">text_suggest</str>
        <str name="minPrefixChars">4</str>
      </lst>
    </searchComponent>
...

my schema.xml
...
<field indexed="true" multiValued="true" name="suggest" stored="false" type="text_suggest"/>
...
    <fieldType class="solr.TextField" name="text_suggest" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.UAX29URLEmailTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
<!-- <filter class="solr.LowerCaseFilterFactory"/> -->        
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.UAX29URLEmailTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> 
<!--        <filter class="solr.LowerCaseFilterFactory"/>    -->    
  </analyzer>      
    </fieldType>
...


-----Ursprüngliche Nachricht-----
Von: Michael Sokolov [mailto:msokolov@safaribooksonline.com] 
Gesendet: Donnerstag, 4. Dezember 2014 14:05
An: solr-user@lucene.apache.org
Betreff: Re: Keeping capitalization in suggestions?

Have a look at AnalyzingInfixSuggester - it does what you want.

-Mike

On 12/4/14 3:05 AM, Clemens Wyss DEV wrote:
> When I index a text such as "Chamäleon" and look for suggestions for "chamä" and/or "Chamä", I'd expect to get "Chamäleon" (uppercased).
> But what happens is
>
> If lowecasefilter (see below (1)) set
> "chamä" returns "chamäleon"
> "Chamä" does not match
>
> If lowecasefilter (1) not set
> "Chamä" returns "Chamäleon"
> "chamä" does not match
>
> I guess lowecasefilter should not be set/active, but then how do I get matches even if the search term is lowercased?
>
> Context:
> schema.xml
> ...
>      <fieldType class="solr.TextField" name="text_de" positionIncrementGap="100">
>        <analyzer type="index">
>          <tokenizer class="solr.StandardTokenizerFactory"/>
>          <filter class="solr.LowerCaseFilterFactory"/>
>          <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_de.txt"/>
>          <filter class="solr.GermanLightStemFilterFactory"/>
>        </analyzer>
>        <analyzer type="query">
>          <tokenizer class="solr.StandardTokenizerFactory"/>
>          <filter class="solr.SynonymFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/>
>          <filter class="solr.LowerCaseFilterFactory"/>
>          <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_de.txt"/>
>          <filter class="solr.GermanLightStemFilterFactory"/>
>        </analyzer>
>      </fieldType>
> ...
>      <fieldType class="solr.TextField" name="text_suggest" positionIncrementGap="100">
>        <analyzer>
>          <tokenizer class="solr.UAX29URLEmailTokenizerFactory"/>
>          <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
>          <filter class="solr.LowerCaseFilterFactory"/> <!-- (1) -->
>        </analyzer>
>      </fieldType>
>
> solrconfig.xml
> -----------------
> ...
>      <requestHandler class="org.apache.solr.handler.component.SearchHandler" name="/suggest">
>          <lst name="defaults">
>              <str name="echoParams">none</str>
>              <str name="wt">json</str>
>              <str name="indent">false</str>
>              <str name="spellcheck">true</str>
>              <str name="spellcheck.dictionary">suggestDictionary</str>
>              <str name="spellcheck.onlyMorePopular">true</str>
>              <str name="spellcheck.count">5</str>
>              <str name="spellcheck.collate">false</str>
>          </lst>
>          <arr name="components">
>              <str>suggest</str>
>          </arr>
>      </requestHandler>
> ...
>      <searchComponent class="solr.SpellCheckComponent" name="suggest">
>          <lst name="spellchecker">
>              <str name="name">suggestDictionary</str>
>              <str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
>              <str name="lookupImpl">org.apache.solr.spelling.suggest.fst.FSTLookupFactory</str>
>              <str name="field">suggest</str>
>              <float name="threshold">0.</float>
>              <str name="buildOnCommit">true</str>
>          </lst>
>      </searchComponent>
> ...
>

AW: Keeping capitalization in suggestions?

Posted by Clemens Wyss DEV <cl...@mysign.ch>.
Allthough making use of AnalyzingInfixSuggester I still getting "either or".

When lowercase-filter is active I always get suggestions, BUT they are lowercased (i.e. "chamäleon").
When lowercase-filter is not active I only get suggestions when querying "Chamä"

my solrconfig.xml
...
    <requestHandler class="org.apache.solr.handler.component.SearchHandler" name="/suggest">
        <lst name="defaults">
            <str name="echoParams">none</str>
            <str name="wt">json</str>
            <str name="indent">false</str>
            <str name="spellcheck">true</str>
            <str name="spellcheck.dictionary">suggestDictionary</str>
            <str name="spellcheck.onlyMorePopular">true</str>
            <str name="spellcheck.count">5</str>
            <str name="spellcheck.collate">false</str>
        </lst>
        <arr name="components">
            <str>suggest</str>
        </arr>
    </requestHandler>
...
    <searchComponent class="solr.SpellCheckComponent" name="suggest">
      <lst name="spellchecker">
        <str name="name">suggestDictionary</str>
        <str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
        <str name="lookupImpl">org.apache.solr.spelling.suggest.fst.AnalyzingInfixLookupFactory</str>
        <str name="dictionaryImpl">org.apache.solr.spelling.suggest.DocumentDictionaryFactory</str>
        <str name="field">suggest</str>  
        <str name="buildOnCommit">true</str>
        <str name="storeDir">suggester</str>
        <str name="suggestAnalyzerFieldType">text_suggest</str>
        <str name="minPrefixChars">4</str>
      </lst>
    </searchComponent>
...

my schema.xml
...
<field indexed="true" multiValued="true" name="suggest" stored="false" type="text_suggest"/>
...
    <fieldType class="solr.TextField" name="text_suggest" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.UAX29URLEmailTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
<!-- <filter class="solr.LowerCaseFilterFactory"/> -->        
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.UAX29URLEmailTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> 
<!--        <filter class="solr.LowerCaseFilterFactory"/>    -->    
  </analyzer>      
    </fieldType>
...

-----Ursprüngliche Nachricht-----
Von: Michael Sokolov [mailto:msokolov@safaribooksonline.com] 
Gesendet: Donnerstag, 4. Dezember 2014 14:05
An: solr-user@lucene.apache.org
Betreff: Re: Keeping capitalization in suggestions?

Have a look at AnalyzingInfixSuggester - it does what you want.

-Mike

On 12/4/14 3:05 AM, Clemens Wyss DEV wrote:
> When I index a text such as "Chamäleon" and look for suggestions for "chamä" and/or "Chamä", I'd expect to get "Chamäleon" (uppercased).
> But what happens is
>
> If lowecasefilter (see below (1)) set
> "chamä" returns "chamäleon"
> "Chamä" does not match
>
> If lowecasefilter (1) not set
> "Chamä" returns "Chamäleon"
> "chamä" does not match
>
> I guess lowecasefilter should not be set/active, but then how do I get matches even if the search term is lowercased?
>
> Context:
> schema.xml
> ...
>      <fieldType class="solr.TextField" name="text_de" positionIncrementGap="100">
>        <analyzer type="index">
>          <tokenizer class="solr.StandardTokenizerFactory"/>
>          <filter class="solr.LowerCaseFilterFactory"/>
>          <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_de.txt"/>
>          <filter class="solr.GermanLightStemFilterFactory"/>
>        </analyzer>
>        <analyzer type="query">
>          <tokenizer class="solr.StandardTokenizerFactory"/>
>          <filter class="solr.SynonymFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/>
>          <filter class="solr.LowerCaseFilterFactory"/>
>          <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_de.txt"/>
>          <filter class="solr.GermanLightStemFilterFactory"/>
>        </analyzer>
>      </fieldType>
> ...
>      <fieldType class="solr.TextField" name="text_suggest" positionIncrementGap="100">
>        <analyzer>
>          <tokenizer class="solr.UAX29URLEmailTokenizerFactory"/>
>          <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
>          <filter class="solr.LowerCaseFilterFactory"/> <!-- (1) -->
>        </analyzer>
>      </fieldType>
>
> solrconfig.xml
> -----------------
> ...
>      <requestHandler class="org.apache.solr.handler.component.SearchHandler" name="/suggest">
>          <lst name="defaults">
>              <str name="echoParams">none</str>
>              <str name="wt">json</str>
>              <str name="indent">false</str>
>              <str name="spellcheck">true</str>
>              <str name="spellcheck.dictionary">suggestDictionary</str>
>              <str name="spellcheck.onlyMorePopular">true</str>
>              <str name="spellcheck.count">5</str>
>              <str name="spellcheck.collate">false</str>
>          </lst>
>          <arr name="components">
>              <str>suggest</str>
>          </arr>
>      </requestHandler>
> ...
>      <searchComponent class="solr.SpellCheckComponent" name="suggest">
>          <lst name="spellchecker">
>              <str name="name">suggestDictionary</str>
>              <str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
>              <str name="lookupImpl">org.apache.solr.spelling.suggest.fst.FSTLookupFactory</str>
>              <str name="field">suggest</str>
>              <float name="threshold">0.</float>
>              <str name="buildOnCommit">true</str>
>          </lst>
>      </searchComponent>
> ...
>


Re: Keeping capitalization in suggestions?

Posted by Michael Sokolov <ms...@safaribooksonline.com>.
Have a look at AnalyzingInfixSuggester - it does what you want.

-Mike

On 12/4/14 3:05 AM, Clemens Wyss DEV wrote:
> When I index a text such as "Chamäleon" and look for suggestions for "chamä" and/or "Chamä", I'd expect to get "Chamäleon" (uppercased).
> But what happens is
>
> If lowecasefilter (see below (1)) set
> "chamä" returns "chamäleon"
> "Chamä" does not match
>
> If lowecasefilter (1) not set
> "Chamä" returns "Chamäleon"
> "chamä" does not match
>
> I guess lowecasefilter should not be set/active, but then how do I get matches even if the search term is lowercased?
>
> Context:
> schema.xml
> ...
>      <fieldType class="solr.TextField" name="text_de" positionIncrementGap="100">
>        <analyzer type="index">
>          <tokenizer class="solr.StandardTokenizerFactory"/>
>          <filter class="solr.LowerCaseFilterFactory"/>
>          <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_de.txt"/>
>          <filter class="solr.GermanLightStemFilterFactory"/>
>        </analyzer>
>        <analyzer type="query">
>          <tokenizer class="solr.StandardTokenizerFactory"/>
>          <filter class="solr.SynonymFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/>
>          <filter class="solr.LowerCaseFilterFactory"/>
>          <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_de.txt"/>
>          <filter class="solr.GermanLightStemFilterFactory"/>
>        </analyzer>
>      </fieldType>
> ...
>      <fieldType class="solr.TextField" name="text_suggest" positionIncrementGap="100">
>        <analyzer>
>          <tokenizer class="solr.UAX29URLEmailTokenizerFactory"/>
>          <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
>          <filter class="solr.LowerCaseFilterFactory"/> <!-- (1) -->
>        </analyzer>
>      </fieldType>
>
> solrconfig.xml
> -----------------
> ...
>      <requestHandler class="org.apache.solr.handler.component.SearchHandler" name="/suggest">
>          <lst name="defaults">
>              <str name="echoParams">none</str>
>              <str name="wt">json</str>
>              <str name="indent">false</str>
>              <str name="spellcheck">true</str>
>              <str name="spellcheck.dictionary">suggestDictionary</str>
>              <str name="spellcheck.onlyMorePopular">true</str>
>              <str name="spellcheck.count">5</str>
>              <str name="spellcheck.collate">false</str>
>          </lst>
>          <arr name="components">
>              <str>suggest</str>
>          </arr>
>      </requestHandler>
> ...
>      <searchComponent class="solr.SpellCheckComponent" name="suggest">
>          <lst name="spellchecker">
>              <str name="name">suggestDictionary</str>
>              <str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
>              <str name="lookupImpl">org.apache.solr.spelling.suggest.fst.FSTLookupFactory</str>
>              <str name="field">suggest</str>
>              <float name="threshold">0.</float>
>              <str name="buildOnCommit">true</str>
>          </lst>
>      </searchComponent>
> ...
>