You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Kevin <gl...@gmail.com> on 2013/11/06 00:32:20 UTC

Modify the StandardTokenizerFactory to concatenate all words

Currently I'm using StandardTokenizerFactory which tokenizes the words
bases on spaces. For Toy Story it will create tokens toy and story.
Ideally, I would want to extend the functionality ofStandardTokenizerFactory to
create tokens toy, story, and toy story. How do I do that?

Re: Modify the StandardTokenizerFactory to concatenate all words

Posted by Benson Margulies <be...@basistech.com>.
How would you expect to recognize that 'Toy Story' is a thing?


On Tue, Nov 5, 2013 at 6:32 PM, Kevin <gl...@gmail.com> wrote:

> Currently I'm using StandardTokenizerFactory which tokenizes the words
> bases on spaces. For Toy Story it will create tokens toy and story.
> Ideally, I would want to extend the functionality
> ofStandardTokenizerFactory to
> create tokens toy, story, and toy story. How do I do that?
>