You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Iniyan <in...@gmail.com> on 2015/03/24 18:19:53 UTC
Regarding detection of duplication
Hi,
My requirement is to detect duplication in title after removing punctuation
marks, stop words, accented characters.
I am trying to do exact match . After that I am thinking of applying
filters.
I have tried solr. KeywordTokenizerFactory . It does exact matching. But
when I add
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"
enablePositionIncrements="true" />
Stop filter is not working.
But If I apply solr.StandardTokenizerFactory , am not getting the exact
match.
Title:
What is a apple?
What is an apple?
What is the apple?
When I type "What is a apple" I need to get all the above.
Could you please let me know that Is there any tokenizer/filter matching my
requirement.
--
View this message in context: http://lucene.472066.n3.nabble.com/Regarding-detection-of-duplication-tp4194975.html
Sent from the Solr - User mailing list archive at Nabble.com.