You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Ma...@rzf.fin-nrw.de on 2010/07/29 16:25:04 UTC

search with special chars like € @ % §

hi, 
what is the best way to deal with searches with special chars like § (paragraph), € (euro), @ (at in emails), % and so forth.
i think that the WordDelimiterFilters is working on such chars (on index-time and on query-time).

the greatest problem i see is, that there can be an optional space between those chars and numbers, like 50% or 50 %, or §235 or § 235 and so on.
so even if i get the WordDelimiter (or any other filter) right and working with those chars i think there is no way to deal with the optional spaces.

anyone have a solution for this.



markus

Re: search with special chars like € @ % §

Posted by Erick Erickson <er...@gmail.com>.
Could you provide some more details on your use case? This sounds like an XY
problem (see http://people.apache.org/~hossman/#xyproblem). The reason I
say this is that you're probably going to shoot yourself in the foot if you
require such symbols, leading to an "interesting" user experience.

That said, you can pre-process your data for both indexing and seaching
by, say applying a regex that strategically removes the spaces you care
about and using, say, the whitespacetokenizer. I'll also apply a
lowercasefilter.

HTH
Erick

On Thu, Jul 29, 2010 at 10:25 AM, <Ma...@rzf.fin-nrw.de> wrote:

> hi,
> what is the best way to deal with searches with special chars like §
> (paragraph), € (euro), @ (at in emails), % and so forth.
> i think that the WordDelimiterFilters is working on such chars (on
> index-time and on query-time).
>
> the greatest problem i see is, that there can be an optional space between
> those chars and numbers, like 50% or 50 %, or §235 or § 235 and so on.
> so even if i get the WordDelimiter (or any other filter) right and working
> with those chars i think there is no way to deal with the optional spaces.
>
> anyone have a solution for this.
>
>
>
> markus
>