You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Alexis Aravena Silva <aa...@itsofteg.com> on 2017/04/04 21:11:51 UTC

Problems creating index for suggestions

Hi,


I'm creating an index for suggestions, when I rebuild the index with 8 documents, Solr creates a temp file that consumes over 20GB in the process and It takes more than 10 minutes in reindex, what is the problem?, It's illogic that Solr takes so long and consumes such size of my disk:



Filed Type Definition:


<fieldType name="text_suggestion" class="solr.TextField" positionIncrementGap="100" multiValued="true">
      <analyzer type="index">
        <tokenizer class="solr.LowerCaseTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="15" />
        <filter class="solr.ASCIIFoldingFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.LowerCaseTokenizerFactory"/>
         <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <filter class="solr.ASCIIFoldingFilterFactory"/>
      </analyzer>
    </fieldType>


Suggester Configuration:


<searchComponent name="suggest" class="solr.SuggestComponent">
    <lst name="suggester">
      <str name="name">fuzzySuggester</str>
      <str name="lookupImpl">FuzzyLookupFactory</str>
      <str name="indexPath">fuzzy_suggestions</str>
      <str name="dictionaryImpl">DocumentDictionaryFactory</str>
      <str name="field">_sugerencia_</str>
      <str name="payloadField">idTipoRegistro</str>
      <str name="suggestAnalyzerFieldType">text_suggestion</str>
      <str name="buildOnStartup">false</str>
      <str name="buildOnCommit">true</str>
    </lst>
    <lst name="suggester">
      <str name="name">infixSuggester</str>
      <str name="lookupImpl">AnalyzingInfixLookupFactory</str>
      <str name="indexPath">infix_suggestions</str>
      <str name="dictionaryImpl">DocumentDictionaryFactory</str>
      <str name="field">_sugerencia_</str>
      <str name="payloadField">idTipoRegistro</str>
      <str name="suggestAnalyzerFieldType">text_suggestion</str>
      <str name="buildOnStartup">false</str>
      <str name="buildOnCommit">true</str>
    </lst>
  </searchComponent>
  <requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy" >
    <lst name="defaults">
      <str name="suggest">true</str>
      <str name="suggest.dictionary">infixSuggester</str>
      <str name="suggest.dictionary">fuzzySuggester</str>
      <str name="suggest.onlyMorePopular">true</str>
      <str name="suggest.count">10</str>
      <str name="suggest.collate">true</str>
    </lst>
    <arr name="components">
      <str>suggest</str>
    </arr>
  </requestHandler>



I rebuild the suggestions once by week, that's why I set buildOnCommit = true.


Regards.

Re: Problems creating index for suggestions

Posted by Alexis Aravena Silva <aa...@itsofteg.com>.
Thanks Alessandro, I'll read the article.


Saludos,

Alexis Aravena S.

Scrum Master & Agile Coach

Celular: +569 69080134

Correo: aaravena@itsofteg.com


________________________________
From: alessandro.benedetti <a....@sease.io>
Sent: Friday, April 7, 2017 11:39:56 AM
To: solr-user@lucene.apache.org
Subject: Re: Problems creating index for suggestions

Hi Alexis,
this is not a reason for the 20Gb overhead, but for sure you are using ina
wrong way the suggester component.
You don't want the analysis chain to produce edge ngrams and then build the
FST out of those tokens.
Read the chapters related the suggesters you are interested.
it may be useful to understand how the suggesters work.
You should use an analysis without the edgeNgram token filter at least.

[1] http://alexbenedetti.blogspot.co.uk/2015/07/solr-you-complete-me.html

Cheers



-----
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io<http://www.sease.io>
--
View this message in context: http://lucene.472066.n3.nabble.com/Problems-creating-index-for-suggestions-tp4328392p4328914.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Problems creating index for suggestions

Posted by "alessandro.benedetti" <a....@sease.io>.
Hi Alexis,
this is not a reason for the 20Gb overhead, but for sure you are using ina 
wrong way the suggester component.
You don't want the analysis chain to produce edge ngrams and then build the
FST out of those tokens.
Read the chapters related the suggesters you are interested.
it may be useful to understand how the suggesters work.
You should use an analysis without the edgeNgram token filter at least.

[1] http://alexbenedetti.blogspot.co.uk/2015/07/solr-you-complete-me.html

Cheers



-----
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
View this message in context: http://lucene.472066.n3.nabble.com/Problems-creating-index-for-suggestions-tp4328392p4328914.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Problems creating index for suggestions

Posted by Alexis Aravena Silva <aa...@itsofteg.com>.
Hi Erick,


numDocs and MaxDocs = 8.


This is the content of the field _sugerencia_:


[cid:e03430ab-ff19-4955-a6da-d50b38e89b3d]



I've noticed that the problem is when Solr builds the fuzzySuggester index, in this type of suggestion, the temp file grow greatly and when the process finish it disappears.



Regards.


________________________________
From: Erick Erickson <er...@gmail.com>
Sent: Tuesday, April 4, 2017 8:05:42 PM
To: solr-user
Subject: Re: Problems creating index for suggestions

Something's indeed not what I'd expect here. One note: buildOnCommit
will rebuild the suggester every time the index has a document
committed _anywhere_. So if there's any activity at all in terms of
indexing your suggester is being built. I.e. if you have your
autocommit interval set to 1 minute and are actively indexing, your
suggester gets rebuilt every minute.

But that's not your problem. How big is the index this suggester is
part of? You say 8 documents. Exclusive of the suggester parts of the
index, how big is the rest of your index on disk?

The suggester re-reads all of the stored values in your entire base
index for the field _sugerencia_ to build itself. So I'm guessing that
when you say the index is 8 documents it's not quite what you think it
is.

On the admin screen, what are numDocs and maxDocs for the index in question?

Best,
Erick

On Tue, Apr 4, 2017 at 2:11 PM, Alexis Aravena Silva
<aa...@itsofteg.com> wrote:
> Hi,
>
>
> I'm creating an index for suggestions, when I rebuild the index with 8 documents, Solr creates a temp file that consumes over 20GB in the process and It takes more than 10 minutes in reindex, what is the problem?, It's illogic that Solr takes so long and consumes such size of my disk:
>
>
>
> Filed Type Definition:
>
>
> <fieldType name="text_suggestion" class="solr.TextField" positionIncrementGap="100" multiValued="true">
>       <analyzer type="index">
>         <tokenizer class="solr.LowerCaseTokenizerFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
>         <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="15" />
>         <filter class="solr.ASCIIFoldingFilterFactory"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.LowerCaseTokenizerFactory"/>
>          <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
>         <filter class="solr.ASCIIFoldingFilterFactory"/>
>       </analyzer>
>     </fieldType>
>
>
> Suggester Configuration:
>
>
> <searchComponent name="suggest" class="solr.SuggestComponent">
>     <lst name="suggester">
>       <str name="name">fuzzySuggester</str>
>       <str name="lookupImpl">FuzzyLookupFactory</str>
>       <str name="indexPath">fuzzy_suggestions</str>
>       <str name="dictionaryImpl">DocumentDictionaryFactory</str>
>       <str name="field">_sugerencia_</str>
>       <str name="payloadField">idTipoRegistro</str>
>       <str name="suggestAnalyzerFieldType">text_suggestion</str>
>       <str name="buildOnStartup">false</str>
>       <str name="buildOnCommit">true</str>
>     </lst>
>     <lst name="suggester">
>       <str name="name">infixSuggester</str>
>       <str name="lookupImpl">AnalyzingInfixLookupFactory</str>
>       <str name="indexPath">infix_suggestions</str>
>       <str name="dictionaryImpl">DocumentDictionaryFactory</str>
>       <str name="field">_sugerencia_</str>
>       <str name="payloadField">idTipoRegistro</str>
>       <str name="suggestAnalyzerFieldType">text_suggestion</str>
>       <str name="buildOnStartup">false</str>
>       <str name="buildOnCommit">true</str>
>     </lst>
>   </searchComponent>
>   <requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy" >
>     <lst name="defaults">
>       <str name="suggest">true</str>
>       <str name="suggest.dictionary">infixSuggester</str>
>       <str name="suggest.dictionary">fuzzySuggester</str>
>       <str name="suggest.onlyMorePopular">true</str>
>       <str name="suggest.count">10</str>
>       <str name="suggest.collate">true</str>
>     </lst>
>     <arr name="components">
>       <str>suggest</str>
>     </arr>
>   </requestHandler>
>
>
>
> I rebuild the suggestions once by week, that's why I set buildOnCommit = true.
>
>
> Regards.

Re: Problems creating index for suggestions

Posted by Erick Erickson <er...@gmail.com>.
Something's indeed not what I'd expect here. One note: buildOnCommit
will rebuild the suggester every time the index has a document
committed _anywhere_. So if there's any activity at all in terms of
indexing your suggester is being built. I.e. if you have your
autocommit interval set to 1 minute and are actively indexing, your
suggester gets rebuilt every minute.

But that's not your problem. How big is the index this suggester is
part of? You say 8 documents. Exclusive of the suggester parts of the
index, how big is the rest of your index on disk?

The suggester re-reads all of the stored values in your entire base
index for the field _sugerencia_ to build itself. So I'm guessing that
when you say the index is 8 documents it's not quite what you think it
is.

On the admin screen, what are numDocs and maxDocs for the index in question?

Best,
Erick

On Tue, Apr 4, 2017 at 2:11 PM, Alexis Aravena Silva
<aa...@itsofteg.com> wrote:
> Hi,
>
>
> I'm creating an index for suggestions, when I rebuild the index with 8 documents, Solr creates a temp file that consumes over 20GB in the process and It takes more than 10 minutes in reindex, what is the problem?, It's illogic that Solr takes so long and consumes such size of my disk:
>
>
>
> Filed Type Definition:
>
>
> <fieldType name="text_suggestion" class="solr.TextField" positionIncrementGap="100" multiValued="true">
>       <analyzer type="index">
>         <tokenizer class="solr.LowerCaseTokenizerFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
>         <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="15" />
>         <filter class="solr.ASCIIFoldingFilterFactory"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.LowerCaseTokenizerFactory"/>
>          <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
>         <filter class="solr.ASCIIFoldingFilterFactory"/>
>       </analyzer>
>     </fieldType>
>
>
> Suggester Configuration:
>
>
> <searchComponent name="suggest" class="solr.SuggestComponent">
>     <lst name="suggester">
>       <str name="name">fuzzySuggester</str>
>       <str name="lookupImpl">FuzzyLookupFactory</str>
>       <str name="indexPath">fuzzy_suggestions</str>
>       <str name="dictionaryImpl">DocumentDictionaryFactory</str>
>       <str name="field">_sugerencia_</str>
>       <str name="payloadField">idTipoRegistro</str>
>       <str name="suggestAnalyzerFieldType">text_suggestion</str>
>       <str name="buildOnStartup">false</str>
>       <str name="buildOnCommit">true</str>
>     </lst>
>     <lst name="suggester">
>       <str name="name">infixSuggester</str>
>       <str name="lookupImpl">AnalyzingInfixLookupFactory</str>
>       <str name="indexPath">infix_suggestions</str>
>       <str name="dictionaryImpl">DocumentDictionaryFactory</str>
>       <str name="field">_sugerencia_</str>
>       <str name="payloadField">idTipoRegistro</str>
>       <str name="suggestAnalyzerFieldType">text_suggestion</str>
>       <str name="buildOnStartup">false</str>
>       <str name="buildOnCommit">true</str>
>     </lst>
>   </searchComponent>
>   <requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy" >
>     <lst name="defaults">
>       <str name="suggest">true</str>
>       <str name="suggest.dictionary">infixSuggester</str>
>       <str name="suggest.dictionary">fuzzySuggester</str>
>       <str name="suggest.onlyMorePopular">true</str>
>       <str name="suggest.count">10</str>
>       <str name="suggest.collate">true</str>
>     </lst>
>     <arr name="components">
>       <str>suggest</str>
>     </arr>
>   </requestHandler>
>
>
>
> I rebuild the suggestions once by week, that's why I set buildOnCommit = true.
>
>
> Regards.