You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Minh Kama Yie <mi...@nuix.com.au> on 2002/10/28 06:32:32 UTC

Parsing email addresses with StandardTokenizer.

Hi all,

Please forgive me if this question has been asked elsewhere but I can't seem to find an answer for this in the documentation. The code for StandardTokenizer is a little too deep to go into right now :), so I thought I    'd post to the list first.

If I'm using the standard analyzer, which in turn uses StandardTokenizer, how would the following email addresses be parsed?

- tom.jones@abc.com
- sheryl@abc.com

If I did a search for "abc.com", which entries should turn up? 
Right now I'm only getting tom.jones@abc.com, and if this is correct then what are the standard tokenizing rules regarding the "@" sign, and where can I read up on this without looking at the hexedecimal values in StandardTokenizer? 

I've basically been asked why the document for sheryl@abc.com doesn't turn up in the search results for "abc.com".

Thanks in advance.

Regards,

Minh Kama Yie

This message is intended only for the named recipient. 
If you are not the intended recipient you are notified that
disclosing, copying, distributing or taking any action 
in reliance on the contents of this information is strictly 
prohibited.