You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by geeky2 <ge...@hotmail.com> on 2012/03/27 16:06:58 UTC

preventing words from being indexed in spellcheck dictionary?

hello all,

i am creating a spellcheck dictionary from the itemDescSpell field in my
schema.

is there a way to prevent certain words from entering the dictionary - as
the dictionary is being built?

thanks for any help
mark

// snipped from solarconfig.xml

    <lst name="spellchecker">
      <str name="name">default</str>
      <str name="field">itemDescSpell</str>
      <str name="buildOnOptimize">true</str>
      <str name="spellcheckIndexDir">spellchecker_mark</str>
    


--
View this message in context: http://lucene.472066.n3.nabble.com/preventing-words-from-being-indexed-in-spellcheck-dictionary-tp3861472p3861472.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: preventing words from being indexed in spellcheck dictionary?

Posted by geeky2 <ge...@hotmail.com>.
thank you, James.

--
View this message in context: http://lucene.472066.n3.nabble.com/preventing-words-from-being-indexed-in-spellcheck-dictionary-tp3861472p3865670.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: preventing words from being indexed in spellcheck dictionary?

Posted by "Dyer, James" <Ja...@ingrambook.com>.
Assuming you're just using this field for spellcheck and not for queries, then it doesn't matter.  But the correct way to do it is to have it in both places.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: geeky2 [mailto:geeky2@hotmail.com] 
Sent: Tuesday, March 27, 2012 3:42 PM
To: solr-user@lucene.apache.org
Subject: RE: preventing words from being indexed in spellcheck dictionary?

hello,

should i apply the StopFilterFactory at index time or query time.

right now - per the schema below - i am applying it at BOTH index time and
query time.

is this correct?

thank you,
mark


// snipped from schema.xml



    <field name="itemDescSpell" type="textSpell"/>


  <fieldType name="textSpell" class="solr.TextField"
positionIncrementGap="100" stored="false" multiValued="true">
    <analyzer type="index">
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
      <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
      <filter class="solr.StandardFilterFactory"/>
      <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
    </analyzer>
    <analyzer type="query">
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
      <filter class="solr.StandardFilterFactory"/>
      <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
    </analyzer>
  </fieldType>


--
View this message in context: http://lucene.472066.n3.nabble.com/preventing-words-from-being-indexed-in-spellcheck-dictionary-tp3861472p3862722.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: preventing words from being indexed in spellcheck dictionary?

Posted by geeky2 <ge...@hotmail.com>.
hello,

should i apply the StopFilterFactory at index time or query time.

right now - per the schema below - i am applying it at BOTH index time and
query time.

is this correct?

thank you,
mark


// snipped from schema.xml



    <field name="itemDescSpell" type="textSpell"/>


  <fieldType name="textSpell" class="solr.TextField"
positionIncrementGap="100" stored="false" multiValued="true">
    <analyzer type="index">
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
      <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
      <filter class="solr.StandardFilterFactory"/>
      <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
    </analyzer>
    <analyzer type="query">
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
      <filter class="solr.StandardFilterFactory"/>
      <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
    </analyzer>
  </fieldType>


--
View this message in context: http://lucene.472066.n3.nabble.com/preventing-words-from-being-indexed-in-spellcheck-dictionary-tp3861472p3862722.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: preventing words from being indexed in spellcheck dictionary?

Posted by geeky2 <ge...@hotmail.com>.
thank you very much for the info ;)



--
View this message in context: http://lucene.472066.n3.nabble.com/preventing-words-from-being-indexed-in-spellcheck-dictionary-tp3861472p3861987.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: preventing words from being indexed in spellcheck dictionary?

Posted by "Dyer, James" <Ja...@ingrambook.com>.
If the list of words isn't very long, you can add a StopFilter to the analysis for "itemDescSpell" and put the words you don't want in the stop list.  If you want to prevent low-occuring words from being sued as corrections, use the "thresholdTokenFrequency" in your spellcheck configuration.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: geeky2 [mailto:geeky2@hotmail.com] 
Sent: Tuesday, March 27, 2012 9:07 AM
To: solr-user@lucene.apache.org
Subject: preventing words from being indexed in spellcheck dictionary?

hello all,

i am creating a spellcheck dictionary from the itemDescSpell field in my
schema.

is there a way to prevent certain words from entering the dictionary - as
the dictionary is being built?

thanks for any help
mark

// snipped from solarconfig.xml

    <lst name="spellchecker">
      <str name="name">default</str>
      <str name="field">itemDescSpell</str>
      <str name="buildOnOptimize">true</str>
      <str name="spellcheckIndexDir">spellchecker_mark</str>
    


--
View this message in context: http://lucene.472066.n3.nabble.com/preventing-words-from-being-indexed-in-spellcheck-dictionary-tp3861472p3861472.html
Sent from the Solr - User mailing list archive at Nabble.com.