You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@lucene.apache.org by Xi Shen <da...@gmail.com> on 2012/12/21 08:27:20 UTC

Fwd: Which token filter can combine 2 terms into 1?

Hi,

I am looking for a token filter that can combine 2 terms into 1? E.g.

the input has been tokenized by white space:

t1 t2 t2a t3

I want a filter that output:

t1 t2t2a t3

I know it is a very special case, and I am thinking about develop a filter
of my own. But I cannot figure out which API I should use to look for terms
in a Token Stream.


-- 
Regards,
David Shen

http://about.me/davidshen
https://twitter.com/#!/davidshen84

Re: Which token filter can combine 2 terms into 1?

Posted by Xi Shen <da...@gmail.com>.
Hi Steve,

This is a language dependent case. Basically, I will use white space token
filter to process the input. But some of the inputs should be one term,
instead of split into 2 terms. I think am thinking developing a special
filter to fix these terms.


On Fri, Dec 21, 2012 at 3:34 PM, Steve Rowe <sa...@gmail.com> wrote:

> Hi David,
>
> Not very many people read this mailing list - I suggest you switch to the
> java-user list - see <http://lucene.apache.org/core/discussion.html>.
>
> SingleFilter and CommonGramsFilter combine terms, though the conditions
> under which they do so don't appear to be the same as what you want.
>
> Why are only the second two terms combined?
>
> Steve
>
> On Dec 21, 2012, at 2:27 AM, Xi Shen <da...@gmail.com> wrote:
>
> > Hi,
> >
> > I am looking for a token filter that can combine 2 terms into 1? E.g.
> >
> > the input has been tokenized by white space:
> >
> > t1 t2 t2a t3
> >
> > I want a filter that output:
> >
> > t1 t2t2a t3
> >
> > I know it is a very special case, and I am thinking about develop a
> filter
> > of my own. But I cannot figure out which API I should use to look for
> terms
> > in a Token Stream.
> >
> >
> > --
> > Regards,
> > David Shen
> >
> > http://about.me/davidshen
> > https://twitter.com/#!/davidshen84
>
>


-- 
Regards,
David Shen

http://about.me/davidshen
https://twitter.com/#!/davidshen84

Re: Which token filter can combine 2 terms into 1?

Posted by Steve Rowe <sa...@gmail.com>.
Hi David,

Not very many people read this mailing list - I suggest you switch to the java-user list - see <http://lucene.apache.org/core/discussion.html>.

SingleFilter and CommonGramsFilter combine terms, though the conditions under which they do so don't appear to be the same as what you want.

Why are only the second two terms combined?

Steve

On Dec 21, 2012, at 2:27 AM, Xi Shen <da...@gmail.com> wrote:

> Hi,
> 
> I am looking for a token filter that can combine 2 terms into 1? E.g.
> 
> the input has been tokenized by white space:
> 
> t1 t2 t2a t3
> 
> I want a filter that output:
> 
> t1 t2t2a t3
> 
> I know it is a very special case, and I am thinking about develop a filter
> of my own. But I cannot figure out which API I should use to look for terms
> in a Token Stream.
> 
> 
> -- 
> Regards,
> David Shen
> 
> http://about.me/davidshen
> https://twitter.com/#!/davidshen84