You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Kai Gülzau <kg...@novomind.com> on 2012/03/16 14:59:24 UTC

mailto: scheme aware tokenizer

Is there any analyzer out there which handles the mailto: scheme?

UAX29URLEmailTokenizer seems to split at the wrong place:

mailto:test@example.org ->
mailto:test
example.org

As a workaround I use

<charFilter class="solr.PatternReplaceCharFilterFactory" pattern="mailto:" replacement="mailto: "/>

Regards,

Kai Gülzau

novomind AG
__________________________________

Bramfelder Straße 121 • 22305 Hamburg

phone +49 (0)40 808071138 • fax +49 (0)40 808071-100
email kguelzau@novomind.com • http://www.novomind.com

Vorstand : Peter Samuelsen (Vors.) • Stefan Grieben • Thomas Köhler
Aufsichtsratsvorsitzender: Werner Preuschhof
Gesellschaftssitz: Hamburg • HR B93508 Amtsgericht Hamburg

RE: mailto: scheme aware tokenizer

Posted by Steven A Rowe <sa...@syr.edu>.
Hi Kai,

I have created an issue for this: https://issues.apache.org/jira/browse/LUCENE-3880

Thanks for reporting!

Steve

-----Original Message-----
From: Kai Gülzau [mailto:kguelzau@novomind.com] 
Sent: Friday, March 16, 2012 9:59 AM
To: solr-user@lucene.apache.org
Subject: mailto: scheme aware tokenizer

Is there any analyzer out there which handles the mailto: scheme?

UAX29URLEmailTokenizer seems to split at the wrong place:

mailto:test@example.org ->
mailto:test
example.org

As a workaround I use

<charFilter class="solr.PatternReplaceCharFilterFactory" pattern="mailto:" replacement="mailto: "/>

Regards,

Kai Gülzau

novomind AG
__________________________________

Bramfelder Straße 121 • 22305 Hamburg

phone +49 (0)40 808071138 • fax +49 (0)40 808071-100 email kguelzau@novomind.com • http://www.novomind.com

Vorstand : Peter Samuelsen (Vors.) • Stefan Grieben • Thomas Köhler
Aufsichtsratsvorsitzender: Werner Preuschhof
Gesellschaftssitz: Hamburg • HR B93508 Amtsgericht Hamburg