You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Minh Kama Yie <mi...@nuix.com.au> on 2002/10/28 06:32:32 UTC
Parsing email addresses with StandardTokenizer.
Hi all,
Please forgive me if this question has been asked elsewhere but I can't seem to find an answer for this in the documentation. The code for StandardTokenizer is a little too deep to go into right now :), so I thought I 'd post to the list first.
If I'm using the standard analyzer, which in turn uses StandardTokenizer, how would the following email addresses be parsed?
- tom.jones@abc.com
- sheryl@abc.com
If I did a search for "abc.com", which entries should turn up?
Right now I'm only getting tom.jones@abc.com, and if this is correct then what are the standard tokenizing rules regarding the "@" sign, and where can I read up on this without looking at the hexedecimal values in StandardTokenizer?
I've basically been asked why the document for sheryl@abc.com doesn't turn up in the search results for "abc.com".
Thanks in advance.
Regards,
Minh Kama Yie
This message is intended only for the named recipient.
If you are not the intended recipient you are notified that
disclosing, copying, distributing or taking any action
in reliance on the contents of this information is strictly
prohibited.