You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Mike Richmond <ri...@gmail.com> on 2006/06/21 20:50:28 UTC
Custom E-mail Tokenizer
I have created a custom e-mail tokenizer and am trying to make e-mail
addresses more searchable inside of solr (without having to rely on
wildcard/prefix queries), but am running into a couple problems using
it.
I created a tokenizer that when given the e-mail address
"java-user@lucene.apache.org" it produces the following tokens (this
was discussed on the java lucene users group and can be found here:
http://www.nabble.com/indexing-emails-t1800267.html#a4932444):
java-user@lucene.apache.org
java
user
java-user
lucene.apache.org
lucene
apache.org
org
I then added the following to my schema configuration:
<fieldtype name="email" class="solr.StrField">
<analyzer type="index">
<tokenizer
class="com.willetts.wmail.analysis.EmailTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldtype>
If I then fire up solr and use the analysis tool from the admin page,
it seems to work exacly as I would expect (i.e. email addresses that I
type in do get broken up into the correct tokens). However, when I
add data to this index and then attempt to perform a search using the
search interface I can not get any matches. For example when I add
"richmondmike@gmail.com" to a field that has type "email" (see schema
configuration above) I can not get the terms "richmondmike", or
"gmail" or "gmail.com" to match any of the results.
Do I need to use a custom fieldtype class as well instead of using
"solr.StrField"? Any help would be greatly appreciated.
Thanks in advance,
Mike
Re: Custom E-mail Tokenizer
Posted by Mike Richmond <ri...@gmail.com>.
Worked like a champ. Thanks for the quick reply.
--Mike
On 6/21/06, Chris Hostetter <ho...@fucit.org> wrote:
>
> : <fieldtype name="email" class="solr.StrField">
> : <analyzer type="index">
> : <tokenizer
> : class="com.willetts.wmail.analysis.EmailTokenizerFactory"/>
> : <filter class="solr.LowerCaseFilterFactory"/>
> : </analyzer>
> : </fieldtype>
>
> Try changing the fieldtype class to solr.TextField ... i've never seen
> anyone try to use an analyzer with StrField (if you'd asked me before you
> tried it, i would have guess the schema file wouldn't have even loaded
> properly)
>
>
> -Hoss
>
>
Re: Custom E-mail Tokenizer
Posted by Chris Hostetter <ho...@fucit.org>.
: <fieldtype name="email" class="solr.StrField">
: <analyzer type="index">
: <tokenizer
: class="com.willetts.wmail.analysis.EmailTokenizerFactory"/>
: <filter class="solr.LowerCaseFilterFactory"/>
: </analyzer>
: </fieldtype>
Try changing the fieldtype class to solr.TextField ... i've never seen
anyone try to use an analyzer with StrField (if you'd asked me before you
tried it, i would have guess the schema file wouldn't have even loaded
properly)
-Hoss