You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Alexis Aravena Silva <aa...@itsofteg.com> on 2017/04/01 03:29:44 UTC

Suggestions with EdgeNGramFilterFactory and FuzzyLookupFactory

Hello All,


I'm using the suggester component in Solr 6.4 with FuzzyLookupFactory and AnalyzingInfixLookupFactory, everything was ok until added EdgeNGramFilterFactory to my field type definition, after loading 8 documents, I index manually, the process of indexing consumes 16GB of my hard disk, something so weird, this happens only with the FuzzyLookupFactory, during the process of indexing I noticed that Solr creates a temp file in "solr-6.4.0\server\tmp", this is my configuration:

solrconfig.xml:

<searchComponent name="suggest" class="solr.SuggestComponent">
    <lst name="suggester">
      <str name="name">fuzzySuggester</str>
      <str name="lookupImpl">FuzzyLookupFactory</str>
      <str name="indexPath">fuzzy_suggestions</str>
      <str name="dictionaryImpl">DocumentDictionaryFactory</str>
      <str name="field">_sugerencia_</str>
      <str name="payloadField">idTipoRegistro</str>
      <str name="suggestAnalyzerFieldType">text_suggestion</str>
      <str name="buildOnStartup">false</str>
      <str name="buildOnCommit">false</str>
    </lst>
    <lst name="suggester">
      <str name="name">infixSuggester</str>
      <str name="lookupImpl">AnalyzingInfixLookupFactory</str>
      <str name="indexPath">infix_suggestions</str>
      <str name="dictionaryImpl">DocumentDictionaryFactory</str>
      <str name="field">_sugerencia_</str>
      <str name="payloadField">idTipoRegistro</str>
      <str name="suggestAnalyzerFieldType">text_suggestion</str>
      <str name="buildOnStartup">false</str>
      <str name="buildOnCommit">false</str>
    </lst>
  </searchComponent>
  <requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy" >
    <lst name="defaults">
      <str name="suggest">true</str>
      <str name="suggest.dictionary">infixSuggester</str>
      <str name="suggest.dictionary">fuzzySuggester</str>
      <str name="suggest.onlyMorePopular">true</str>
      <str name="suggest.count">10</str>
      <str name="suggest.collate">true</str>
    </lst>
    <arr name="components">
      <str>suggest</str>
    </arr>
  </requestHandler>



shema.xml


<field name="_sugerencia_" type="text_suggestion" indexed="true" stored="true" multiValued="false" />


<fieldType name="text_suggestion" class="solr.TextField" positionIncrementGap="100" multiValued="true">
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="50" />
        <filter class="solr.ASCIIFoldingFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.ASCIIFoldingFilterFactory"/>
      </analyzer>
    </fieldType>


If I remove EdgeNGramFilterFactory everything works ok, but I require this filter for the suggestions.


¿What is the problem?


Saludos,

Alexis Aravena S.

Scrum Master & Agile Coach

Celular: +569 69080134

Correo: aaravena@itsofteg.com


Re: Suggestions with EdgeNGramFilterFactory and FuzzyLookupFactory

Posted by Alexis Aravena Silva <aa...@itsofteg.com>.
Hi Alexandre,


Using only the suggester when I query the word "ferrada", not always I get the results, I don't know why, for example if I query:


ferr: I get result

ferra: I don't get result

ferrad: I don't get result

ferrada: I get result


Then I thought that typing letter by letter I'd get the result, that's why I added the filter, I read that EdgeNGramFilterFactory allows suggestions letter by letter.


Do you have any suggestion?, I need that all typing word returns result, please.



Regards.


________________________________
From: Alexandre Rafalovitch <ar...@gmail.com>
Sent: Saturday, April 1, 2017 10:17:13 AM
To: solr-user
Subject: Re: Suggestions with EdgeNGramFilterFactory and FuzzyLookupFactory

Why do you think you need that filter when you are already using suggester
component.

What specific case is it supposed to solve?

Regards,
   Alex

On 31 Mar 2017 11:30 PM, "Alexis Aravena Silva" <aa...@itsofteg.com>
wrote:

> Hello All,
>
>
> I'm using the suggester component in Solr 6.4 with FuzzyLookupFactory and
> AnalyzingInfixLookupFactory, everything was ok until added
> EdgeNGramFilterFactory to my field type definition, after loading 8
> documents, I index manually, the process of indexing consumes 16GB of my
> hard disk, something so weird, this happens only with the
> FuzzyLookupFactory, during the process of indexing I noticed that Solr
> creates a temp file in "solr-6.4.0\server\tmp", this is my configuration:
>
> solrconfig.xml:
>
> <searchComponent name="suggest" class="solr.SuggestComponent">
>     <lst name="suggester">
>       <str name="name">fuzzySuggester</str>
>       <str name="lookupImpl">FuzzyLookupFactory</str>
>       <str name="indexPath">fuzzy_suggestions</str>
>       <str name="dictionaryImpl">DocumentDictionaryFactory</str>
>       <str name="field">_sugerencia_</str>
>       <str name="payloadField">idTipoRegistro</str>
>       <str name="suggestAnalyzerFieldType">text_suggestion</str>
>       <str name="buildOnStartup">false</str>
>       <str name="buildOnCommit">false</str>
>     </lst>
>     <lst name="suggester">
>       <str name="name">infixSuggester</str>
>       <str name="lookupImpl">AnalyzingInfixLookupFactory</str>
>       <str name="indexPath">infix_suggestions</str>
>       <str name="dictionaryImpl">DocumentDictionaryFactory</str>
>       <str name="field">_sugerencia_</str>
>       <str name="payloadField">idTipoRegistro</str>
>       <str name="suggestAnalyzerFieldType">text_suggestion</str>
>       <str name="buildOnStartup">false</str>
>       <str name="buildOnCommit">false</str>
>     </lst>
>   </searchComponent>
>   <requestHandler name="/suggest" class="solr.SearchHandler"
> startup="lazy" >
>     <lst name="defaults">
>       <str name="suggest">true</str>
>       <str name="suggest.dictionary">infixSuggester</str>
>       <str name="suggest.dictionary">fuzzySuggester</str>
>       <str name="suggest.onlyMorePopular">true</str>
>       <str name="suggest.count">10</str>
>       <str name="suggest.collate">true</str>
>     </lst>
>     <arr name="components">
>       <str>suggest</str>
>     </arr>
>   </requestHandler>
>
>
>
> shema.xml
>
>
> <field name="_sugerencia_" type="text_suggestion" indexed="true"
> stored="true" multiValued="false" />
>
>
> <fieldType name="text_suggestion" class="solr.TextField"
> positionIncrementGap="100" multiValued="true">
>       <analyzer type="index">
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" />
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.EdgeNGramFilterFactory" minGramSize="1"
> maxGramSize="50" />
>         <filter class="solr.ASCIIFoldingFilterFactory"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" />
>         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.ASCIIFoldingFilterFactory"/>
>       </analyzer>
>     </fieldType>
>
>
> If I remove EdgeNGramFilterFactory everything works ok, but I require this
> filter for the suggestions.
>
>
> ¿What is the problem?
>
>
> Saludos,
>
> Alexis Aravena S.
>
> Scrum Master & Agile Coach
>
> Celular: +569 69080134
>
> Correo: aaravena@itsofteg.com
>
>

Re: Suggestions with EdgeNGramFilterFactory and FuzzyLookupFactory

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
Why do you think you need that filter when you are already using suggester
component.

What specific case is it supposed to solve?

Regards,
   Alex

On 31 Mar 2017 11:30 PM, "Alexis Aravena Silva" <aa...@itsofteg.com>
wrote:

> Hello All,
>
>
> I'm using the suggester component in Solr 6.4 with FuzzyLookupFactory and
> AnalyzingInfixLookupFactory, everything was ok until added
> EdgeNGramFilterFactory to my field type definition, after loading 8
> documents, I index manually, the process of indexing consumes 16GB of my
> hard disk, something so weird, this happens only with the
> FuzzyLookupFactory, during the process of indexing I noticed that Solr
> creates a temp file in "solr-6.4.0\server\tmp", this is my configuration:
>
> solrconfig.xml:
>
> <searchComponent name="suggest" class="solr.SuggestComponent">
>     <lst name="suggester">
>       <str name="name">fuzzySuggester</str>
>       <str name="lookupImpl">FuzzyLookupFactory</str>
>       <str name="indexPath">fuzzy_suggestions</str>
>       <str name="dictionaryImpl">DocumentDictionaryFactory</str>
>       <str name="field">_sugerencia_</str>
>       <str name="payloadField">idTipoRegistro</str>
>       <str name="suggestAnalyzerFieldType">text_suggestion</str>
>       <str name="buildOnStartup">false</str>
>       <str name="buildOnCommit">false</str>
>     </lst>
>     <lst name="suggester">
>       <str name="name">infixSuggester</str>
>       <str name="lookupImpl">AnalyzingInfixLookupFactory</str>
>       <str name="indexPath">infix_suggestions</str>
>       <str name="dictionaryImpl">DocumentDictionaryFactory</str>
>       <str name="field">_sugerencia_</str>
>       <str name="payloadField">idTipoRegistro</str>
>       <str name="suggestAnalyzerFieldType">text_suggestion</str>
>       <str name="buildOnStartup">false</str>
>       <str name="buildOnCommit">false</str>
>     </lst>
>   </searchComponent>
>   <requestHandler name="/suggest" class="solr.SearchHandler"
> startup="lazy" >
>     <lst name="defaults">
>       <str name="suggest">true</str>
>       <str name="suggest.dictionary">infixSuggester</str>
>       <str name="suggest.dictionary">fuzzySuggester</str>
>       <str name="suggest.onlyMorePopular">true</str>
>       <str name="suggest.count">10</str>
>       <str name="suggest.collate">true</str>
>     </lst>
>     <arr name="components">
>       <str>suggest</str>
>     </arr>
>   </requestHandler>
>
>
>
> shema.xml
>
>
> <field name="_sugerencia_" type="text_suggestion" indexed="true"
> stored="true" multiValued="false" />
>
>
> <fieldType name="text_suggestion" class="solr.TextField"
> positionIncrementGap="100" multiValued="true">
>       <analyzer type="index">
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" />
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.EdgeNGramFilterFactory" minGramSize="1"
> maxGramSize="50" />
>         <filter class="solr.ASCIIFoldingFilterFactory"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" />
>         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.ASCIIFoldingFilterFactory"/>
>       </analyzer>
>     </fieldType>
>
>
> If I remove EdgeNGramFilterFactory everything works ok, but I require this
> filter for the suggestions.
>
>
> ¿What is the problem?
>
>
> Saludos,
>
> Alexis Aravena S.
>
> Scrum Master & Agile Coach
>
> Celular: +569 69080134
>
> Correo: aaravena@itsofteg.com
>
>