You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Sri Sirisha Vallabhaneni <si...@gmail.com> on 2018/06/20 03:36:55 UTC

Solr 6.5 autosuggest suggests misspelt words and unwanted words

Hi ,

My Data contains un-curated data - which consists of *cuss words, misspelt
words* like *neeeed* instead of *need. *We are using a
auto-suggest/auto-complete that heavily relies on indexed data to recommend
suggestions as the user types in his query. We are using a list of stop
words consisting of cuss words to keep check on what is recommended to the
user and this list might get huge with time as well. Is there any clean way
to get around the problem

1. of eliminating cuss words entirely in suggestions
2. not suggesting misspelt words at all.

Thanks and Regards,
Sri

Re: Solr 6.5 autosuggest suggests misspelt words and unwanted words

Posted by Alessandro Benedetti <a....@sease.io>.
Hi,
you should curate your data, that is fundamental to have an healthy search
solution, but let's see what you can do anyway :

1) curate a dictionary of such bad words and then configure analysis to skip
them
2) Have you tried different dictionary implementations ? I would assume that
each single mispelled word has a low document frequency. You could use the
High Frequency Document Dictionary[1] and see how it goes.


[1]
https://lucene.apache.org/solr/guide/7_3/suggester.html#highfrequencydictionaryfactory



-----
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html