You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Robert Brown <ro...@intelcompute.com> on 2012/02/06 10:39:57 UTC

Symbols in synonyms

is it good practice, common, or even possible to put symbols in my 
list of synonyms?

I'm having trouble indexing and searching for "A&E", with it being 
split on the &.

we already convert .net to dotnet, but don't want to store every 
combination of 2 letters, A&E, M&E, etc.




--

IntelCompute
Web Design & Local Online Marketing

http://www.intelcompute.com


Re: Symbols in synonyms

Posted by Erick Erickson <er...@gmail.com>.
You're probably looking at a custom tokenizer and/or filter chain here. Or
at least creatively combining the ones that exist. The admin/analysis
page will be your friend.

Even if you define these as synonyms, the rest of the analysis chain may
break them up so you really have to look at the effects of the entire
analysis chain. I'd start with a really simple one (not the stock ones) and
build up. Especially beware of WordDelimiterFilterFactory for instance....

Best
Erick

On Mon, Feb 6, 2012 at 4:39 AM, Robert Brown <ro...@intelcompute.com> wrote:
> is it good practice, common, or even possible to put symbols in my list of
> synonyms?
>
> I'm having trouble indexing and searching for "A&E", with it being split on
> the &.
>
> we already convert .net to dotnet, but don't want to store every combination
> of 2 letters, A&E, M&E, etc.
>
>
>
>
> --
>
> IntelCompute
> Web Design & Local Online Marketing
>
> http://www.intelcompute.com
>

Re: Symbols in synonyms

Posted by Chris Hostetter <ho...@fucit.org>.
: is it good practice, common, or even possible to put symbols in my list of
: synonyms?

it entirely depends on your usecases, and wether you want words with those 
symbols to have synonyms.

: I'm having trouble indexing and searching for "A&E", with it being split on
: the &.

that sounds like you are using an analayzer that doesn't do what you want 
-- i'm guessing you should pick a differnet tokenizer, but the problem may 
also be if/how you are using WordDelimiterFilter


-Hoss