You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jae Joo <ja...@gmail.com> on 2009/12/22 23:33:02 UTC
solr.RemoveDuplicatesTokenFilterFactory
Hi,
Here is the string to be indexed without duplication.
Kitchen Cabinet Utah Kitchen Remodeling Utah
Is RemoveDuplicatesTokenFilterFactory for this solution? or for something
else?
Jae
Re: solr.RemoveDuplicatesTokenFilterFactory
Posted by Chris Hostetter <ho...@fucit.org>.
: Here is the string to be indexed without duplication.
:
: Kitchen Cabinet Utah Kitchen Remodeling Utah
:
: Is RemoveDuplicatesTokenFilterFactory for this solution? or for something
: else?
it depeneds on what you want to do ... you've given us an example of some
input, but you haven't elaborated on what solution you want.
This is hte documentation for RemoveDuplicatesTokenFilter...
A TokenFilter which filters out Tokens at the same position and Term
text as the previous token in the stream.
...it only removes duplicates that occur at the same position, so if your
goal is to only have "Kitchen" and "Utah" indexed once, then it will only
od that if you have a tokenizer (or some other token filter) that flattens
out the positionIncrements of all the tokens to 0.
-Hoss
Re: solr.RemoveDuplicatesTokenFilterFactory
Posted by Lance Norskog <go...@gmail.com>.
Looking at the code, it does appear to be what you want.
In the analysis.jsp page, you can see exactly how your text is processed.
http://localhost;8983/solr/admin/analysis.jsp
On Tue, Dec 22, 2009 at 2:33 PM, Jae Joo <ja...@gmail.com> wrote:
> Hi,
>
> Here is the string to be indexed without duplication.
>
> Kitchen Cabinet Utah Kitchen Remodeling Utah
>
> Is RemoveDuplicatesTokenFilterFactory for this solution? or for something
> else?
>
> Jae
>
--
Lance Norskog
goksron@gmail.com