You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by David Giffin <da...@giffin.org> on 2009/06/03 19:57:42 UTC
Token filter on multivalue field
Hi There,
I'm working on a unique token filter, to eliminate duplicates on a
multivalue field. My filter works properly for a single value field.
It seems that a new TokenFilter is created for each value in the
multivalue field. I need to maintain an array of used tokens across
all of the values in the multivalue field. Is there a good way to do
this? Here is my current code:
public class UniqueTokenFilter extends TokenFilter {
private ArrayList words;
public UniqueTokenFilter(TokenStream input) {
super(input);
this.words = new ArrayList();
}
@Override
public final Token next(Token in) throws IOException {
for (Token token=input.next(in); token!=null; token=input.next()) {
if ( !words.contains(token.term()) ) {
words.add(token.term());
return token;
}
}
return null;
}
}
Thanks,
David
Re: Token filter on multivalue field
Posted by David Giffin <da...@giffin.org>.
I'm doing a combination of update processor and token filter. The
token filter is necessary to reduce the duplicates after stemming has
occurred.
David
2009/6/4 Noble Paul നോബിള് नोब्ळ् <no...@corp.aol.com>:
> isn't better to use an UpdateProcessor for this?
>
> On Thu, Jun 4, 2009 at 1:52 AM, Otis Gospodnetic
> <ot...@yahoo.com> wrote:
>>
>> Hello,
>>
>> It's ugly, but the first thing that came to mind was ThreadLocal.
>>
>> Otis
>> --
>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>
>>
>>
>> ----- Original Message ----
>>> From: David Giffin <da...@giffin.org>
>>> To: solr-user@lucene.apache.org
>>> Sent: Wednesday, June 3, 2009 1:57:42 PM
>>> Subject: Token filter on multivalue field
>>>
>>> Hi There,
>>>
>>> I'm working on a unique token filter, to eliminate duplicates on a
>>> multivalue field. My filter works properly for a single value field.
>>> It seems that a new TokenFilter is created for each value in the
>>> multivalue field. I need to maintain an array of used tokens across
>>> all of the values in the multivalue field. Is there a good way to do
>>> this? Here is my current code:
>>>
>>> public class UniqueTokenFilter extends TokenFilter {
>>>
>>> private ArrayList words;
>>> public UniqueTokenFilter(TokenStream input) {
>>> super(input);
>>> this.words = new ArrayList();
>>> }
>>>
>>> @Override
>>> public final Token next(Token in) throws IOException {
>>> for (Token token=input.next(in); token!=null; token=input.next()) {
>>> if ( !words.contains(token.term()) ) {
>>> words.add(token.term());
>>> return token;
>>> }
>>> }
>>> return null;
>>> }
>>> }
>>>
>>> Thanks,
>>> David
>>
>>
>
>
>
> --
> -----------------------------------------------------
> Noble Paul | Principal Engineer| AOL | http://aol.com
>
Re: Token filter on multivalue field
Posted by Noble Paul നോബിള് नोब्ळ् <no...@corp.aol.com>.
isn't better to use an UpdateProcessor for this?
On Thu, Jun 4, 2009 at 1:52 AM, Otis Gospodnetic
<ot...@yahoo.com> wrote:
>
> Hello,
>
> It's ugly, but the first thing that came to mind was ThreadLocal.
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> ----- Original Message ----
>> From: David Giffin <da...@giffin.org>
>> To: solr-user@lucene.apache.org
>> Sent: Wednesday, June 3, 2009 1:57:42 PM
>> Subject: Token filter on multivalue field
>>
>> Hi There,
>>
>> I'm working on a unique token filter, to eliminate duplicates on a
>> multivalue field. My filter works properly for a single value field.
>> It seems that a new TokenFilter is created for each value in the
>> multivalue field. I need to maintain an array of used tokens across
>> all of the values in the multivalue field. Is there a good way to do
>> this? Here is my current code:
>>
>> public class UniqueTokenFilter extends TokenFilter {
>>
>> private ArrayList words;
>> public UniqueTokenFilter(TokenStream input) {
>> super(input);
>> this.words = new ArrayList();
>> }
>>
>> @Override
>> public final Token next(Token in) throws IOException {
>> for (Token token=input.next(in); token!=null; token=input.next()) {
>> if ( !words.contains(token.term()) ) {
>> words.add(token.term());
>> return token;
>> }
>> }
>> return null;
>> }
>> }
>>
>> Thanks,
>> David
>
>
--
-----------------------------------------------------
Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Token filter on multivalue field
Posted by Otis Gospodnetic <ot...@yahoo.com>.
Hello,
It's ugly, but the first thing that came to mind was ThreadLocal.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
----- Original Message ----
> From: David Giffin <da...@giffin.org>
> To: solr-user@lucene.apache.org
> Sent: Wednesday, June 3, 2009 1:57:42 PM
> Subject: Token filter on multivalue field
>
> Hi There,
>
> I'm working on a unique token filter, to eliminate duplicates on a
> multivalue field. My filter works properly for a single value field.
> It seems that a new TokenFilter is created for each value in the
> multivalue field. I need to maintain an array of used tokens across
> all of the values in the multivalue field. Is there a good way to do
> this? Here is my current code:
>
> public class UniqueTokenFilter extends TokenFilter {
>
> private ArrayList words;
> public UniqueTokenFilter(TokenStream input) {
> super(input);
> this.words = new ArrayList();
> }
>
> @Override
> public final Token next(Token in) throws IOException {
> for (Token token=input.next(in); token!=null; token=input.next()) {
> if ( !words.contains(token.term()) ) {
> words.add(token.term());
> return token;
> }
> }
> return null;
> }
> }
>
> Thanks,
> David