You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Iniyan <in...@gmail.com> on 2015/03/24 18:19:53 UTC

Regarding detection of duplication

Hi,

My requirement is to detect duplication in title after removing punctuation
marks, stop words, accented characters.

I am trying to do exact match . After that I am thinking of applying
filters. 

I have tried solr. KeywordTokenizerFactory . It does exact matching. But
when I add 

<filter class="solr.StopFilterFactory" ignoreCase="true"
                                    words="stopwords.txt"
enablePositionIncrements="true" />

Stop filter is not working.

But If I apply solr.StandardTokenizerFactory , am not getting the exact
match.


Title:

What is a apple?
What is an apple?
What is the apple?

When I type "What is a apple" I need to get all the above.

Could you please let me know that Is there any tokenizer/filter matching my
requirement.



--
View this message in context: http://lucene.472066.n3.nabble.com/Regarding-detection-of-duplication-tp4194975.html
Sent from the Solr - User mailing list archive at Nabble.com.