You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jae Joo <ja...@gmail.com> on 2009/12/22 23:33:02 UTC

solr.RemoveDuplicatesTokenFilterFactory

Hi,

Here is the string to be indexed without duplication.

Kitchen Cabinet Utah Kitchen Remodeling Utah

Is RemoveDuplicatesTokenFilterFactory for this solution? or for something
else?

Jae

Re: solr.RemoveDuplicatesTokenFilterFactory

Posted by Chris Hostetter <ho...@fucit.org>.
: Here is the string to be indexed without duplication.
: 
: Kitchen Cabinet Utah Kitchen Remodeling Utah
: 
: Is RemoveDuplicatesTokenFilterFactory for this solution? or for something
: else?

it depeneds on what you want to do ... you've given us an example of some 
input, but you haven't elaborated on what solution you want.

This is hte documentation for RemoveDuplicatesTokenFilter...

   A TokenFilter which filters out Tokens at the same position and Term
   text as the previous token in the stream.

...it only removes duplicates that occur at the same position, so if your 
goal is to only have "Kitchen" and "Utah" indexed once, then it will only 
od that if you have a tokenizer (or some other token filter) that flattens 
out the positionIncrements of all the tokens to 0.


-Hoss


Re: solr.RemoveDuplicatesTokenFilterFactory

Posted by Lance Norskog <go...@gmail.com>.
Looking at the code, it does appear to be what you want.

In the analysis.jsp page, you can see exactly how your text is processed.

http://localhost;8983/solr/admin/analysis.jsp

On Tue, Dec 22, 2009 at 2:33 PM, Jae Joo <ja...@gmail.com> wrote:
> Hi,
>
> Here is the string to be indexed without duplication.
>
> Kitchen Cabinet Utah Kitchen Remodeling Utah
>
> Is RemoveDuplicatesTokenFilterFactory for this solution? or for something
> else?
>
> Jae
>



-- 
Lance Norskog
goksron@gmail.com