You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Gucko Gucko <gu...@googlemail.com> on 2013/06/12 20:39:20 UTC
Remove/Filter emails from a TokenStream?
Hello all,
is there a filter I can use to remove emails from a TokenStream?
so far I'm using this to remove numbers, URls, and I would like to remove
emails too:
Tokenizer tokenizer = new UAX29URLEmailTokenizer(Version.LUCENE_43,
new StringReader(text));
Set<String> stopTypes = new HashSet<String>();
stopTypes.add("<URL>");
stopTypes.add("<NUM>");
TokenStream stream = new TypeTokenFilter(true, tokenizer, stopTypes);
stream = new StandardFilter( Version.LUCENE_43, stream );
stream = new LowerCaseFilter(Version.LUCENE_43, stream);
Thanks a million!
Best
Re: Remove/Filter emails from a TokenStream?
Posted by Gucko Gucko <gu...@googlemail.com>.
Hello,
I figured out how to solve this. I just added stopTypes.add("<EMAIL>");
On Wed, Jun 12, 2013 at 8:39 PM, Gucko Gucko <gu...@googlemail.com>wrote:
> Hello all,
>
> is there a filter I can use to remove emails from a TokenStream?
>
> so far I'm using this to remove numbers, URls, and I would like to remove
> emails too:
>
> Tokenizer tokenizer = new UAX29URLEmailTokenizer(Version.LUCENE_43,
>
> new StringReader(text));
>
> Set<String> stopTypes = new HashSet<String>();
>
> stopTypes.add("<URL>");
>
> stopTypes.add("<NUM>");
>
> TokenStream stream = new TypeTokenFilter(true, tokenizer, stopTypes);
>
> stream = new StandardFilter( Version.LUCENE_43, stream );
>
> stream = new LowerCaseFilter(Version.LUCENE_43, stream);
>
>
> Thanks a million!
>
>
> Best
>