You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by David Giffin <da...@giffin.org> on 2009/06/03 19:57:42 UTC

Token filter on multivalue field

Hi There,

I'm working on a unique token filter, to eliminate duplicates on a
multivalue field. My filter works properly for a single value field.
It seems that a new TokenFilter is created for each value in the
multivalue field. I need to maintain an array of used tokens across
all of the values in the multivalue field. Is there a good way to do
this? Here is my current code:

public class UniqueTokenFilter extends TokenFilter {

    private ArrayList words;
    public UniqueTokenFilter(TokenStream input) {
        super(input);
        this.words = new ArrayList();
    }

    @Override
    public final Token next(Token in) throws IOException {
        for (Token token=input.next(in); token!=null; token=input.next()) {
            if ( !words.contains(token.term()) ) {
                words.add(token.term());
                return token;
            }
        }
        return null;
    }
}

Thanks,
David

Re: Token filter on multivalue field

Posted by David Giffin <da...@giffin.org>.
I'm doing a combination of update processor and token filter. The
token filter is necessary to reduce the duplicates after stemming has
occurred.

David

2009/6/4 Noble Paul നോബിള്‍  नोब्ळ् <no...@corp.aol.com>:
> isn't better to use an UpdateProcessor  for this?
>
> On Thu, Jun 4, 2009 at 1:52 AM, Otis Gospodnetic
> <ot...@yahoo.com> wrote:
>>
>> Hello,
>>
>> It's ugly, but the first thing that came to mind was ThreadLocal.
>>
>>  Otis
>> --
>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>
>>
>>
>> ----- Original Message ----
>>> From: David Giffin <da...@giffin.org>
>>> To: solr-user@lucene.apache.org
>>> Sent: Wednesday, June 3, 2009 1:57:42 PM
>>> Subject: Token filter on multivalue field
>>>
>>> Hi There,
>>>
>>> I'm working on a unique token filter, to eliminate duplicates on a
>>> multivalue field. My filter works properly for a single value field.
>>> It seems that a new TokenFilter is created for each value in the
>>> multivalue field. I need to maintain an array of used tokens across
>>> all of the values in the multivalue field. Is there a good way to do
>>> this? Here is my current code:
>>>
>>> public class UniqueTokenFilter extends TokenFilter {
>>>
>>>     private ArrayList words;
>>>     public UniqueTokenFilter(TokenStream input) {
>>>         super(input);
>>>         this.words = new ArrayList();
>>>     }
>>>
>>>     @Override
>>>     public final Token next(Token in) throws IOException {
>>>         for (Token token=input.next(in); token!=null; token=input.next()) {
>>>             if ( !words.contains(token.term()) ) {
>>>                 words.add(token.term());
>>>                 return token;
>>>             }
>>>         }
>>>         return null;
>>>     }
>>> }
>>>
>>> Thanks,
>>> David
>>
>>
>
>
>
> --
> -----------------------------------------------------
> Noble Paul | Principal Engineer| AOL | http://aol.com
>

Re: Token filter on multivalue field

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@corp.aol.com>.
isn't better to use an UpdateProcessor  for this?

On Thu, Jun 4, 2009 at 1:52 AM, Otis Gospodnetic
<ot...@yahoo.com> wrote:
>
> Hello,
>
> It's ugly, but the first thing that came to mind was ThreadLocal.
>
>  Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> ----- Original Message ----
>> From: David Giffin <da...@giffin.org>
>> To: solr-user@lucene.apache.org
>> Sent: Wednesday, June 3, 2009 1:57:42 PM
>> Subject: Token filter on multivalue field
>>
>> Hi There,
>>
>> I'm working on a unique token filter, to eliminate duplicates on a
>> multivalue field. My filter works properly for a single value field.
>> It seems that a new TokenFilter is created for each value in the
>> multivalue field. I need to maintain an array of used tokens across
>> all of the values in the multivalue field. Is there a good way to do
>> this? Here is my current code:
>>
>> public class UniqueTokenFilter extends TokenFilter {
>>
>>     private ArrayList words;
>>     public UniqueTokenFilter(TokenStream input) {
>>         super(input);
>>         this.words = new ArrayList();
>>     }
>>
>>     @Override
>>     public final Token next(Token in) throws IOException {
>>         for (Token token=input.next(in); token!=null; token=input.next()) {
>>             if ( !words.contains(token.term()) ) {
>>                 words.add(token.term());
>>                 return token;
>>             }
>>         }
>>         return null;
>>     }
>> }
>>
>> Thanks,
>> David
>
>



-- 
-----------------------------------------------------
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: Token filter on multivalue field

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Hello,

It's ugly, but the first thing that came to mind was ThreadLocal.

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: David Giffin <da...@giffin.org>
> To: solr-user@lucene.apache.org
> Sent: Wednesday, June 3, 2009 1:57:42 PM
> Subject: Token filter on multivalue field
> 
> Hi There,
> 
> I'm working on a unique token filter, to eliminate duplicates on a
> multivalue field. My filter works properly for a single value field.
> It seems that a new TokenFilter is created for each value in the
> multivalue field. I need to maintain an array of used tokens across
> all of the values in the multivalue field. Is there a good way to do
> this? Here is my current code:
> 
> public class UniqueTokenFilter extends TokenFilter {
> 
>     private ArrayList words;
>     public UniqueTokenFilter(TokenStream input) {
>         super(input);
>         this.words = new ArrayList();
>     }
> 
>     @Override
>     public final Token next(Token in) throws IOException {
>         for (Token token=input.next(in); token!=null; token=input.next()) {
>             if ( !words.contains(token.term()) ) {
>                 words.add(token.term());
>                 return token;
>             }
>         }
>         return null;
>     }
> }
> 
> Thanks,
> David