You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Nils Weinander <ni...@gmail.com> on 2011/06/14 11:18:52 UTC
ISOLatin1AccentFilterFactory vs ASCIIFoldingFilterFactory
Hi all, I'm new to the list (but not totally new to Solr).
The documentation states that ISOLatin1AccentFilterFactory is deprecated
in favour of ASCIIFoldingFilterFactory:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ISOLatin1AccentFilterFactory
I see problems with this. If I have understood ASCIIFoldingFilterFactory
correctly it folds both accented characters like 'é' to 'e' and national
characters like 'ö' to 'o'. The former is desirable, the latter very much not
when indexing for example scandinavian languages. Is there a way to
limit which characters are folded?
--
____________________________________________________________
Nils Weinander
RE: ISOLatin1AccentFilterFactory vs ASCIIFoldingFilterFactory
Posted by Steven A Rowe <sa...@syr.edu>.
On 6/14/2011 at 7:12 AM, Ahmet Arslan wrote:
> --- On Tue, 6/14/11, Nils Weinander <ni...@gmail.com> wrote:
> > The documentation states that ISOLatin1AccentFilterFactory
> > is deprecated in favour of ASCIIFoldingFilterFactory:
[...]
> > Is there a way to limit which characters are folded?
>
> With MappingCharFilterFactory you have fully control over which
> characters are folded. You can see the default mappings in
> mapping-ISOLatin1Accent.txt file.
>
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.MappingCharFilterFactory
There is also mapping-FoldToASCII.txt, which, when used with MappingCharFilterFactory, corresponds to ASCIIFoldingFilterFactory.
Steve
Re: ISOLatin1AccentFilterFactory vs ASCIIFoldingFilterFactory
Posted by Nils Weinander <ni...@gmail.com>.
On Tue, Jun 14, 2011 at 1:11 PM, Ahmet Arslan <io...@yahoo.com> wrote:
>
> With MappingCharFilterFactory you have fully control over which characters are folded. You can see the default mappings in
> mapping-ISOLatin1Accent.txt file.
>
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.MappingCharFilterFactory
Thanks Ahmet! Exactly what I needed.
____________________________________________________________
Nils Weinander
Re: ISOLatin1AccentFilterFactory vs ASCIIFoldingFilterFactory
Posted by Ahmet Arslan <io...@yahoo.com>.
--- On Tue, 6/14/11, Nils Weinander <ni...@gmail.com> wrote:
> From: Nils Weinander <ni...@gmail.com>
> Subject: ISOLatin1AccentFilterFactory vs ASCIIFoldingFilterFactory
> To: solr-user@lucene.apache.org
> Date: Tuesday, June 14, 2011, 12:18 PM
> Hi all, I'm new to the list (but not
> totally new to Solr).
>
> The documentation states that ISOLatin1AccentFilterFactory
> is deprecated
> in favour of ASCIIFoldingFilterFactory:
>
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ISOLatin1AccentFilterFactory
>
> I see problems with this. If I have understood
> ASCIIFoldingFilterFactory
> correctly it folds both accented characters like 'é' to
> 'e' and national
> characters like 'ö' to 'o'. The former is desirable, the
> latter very much not
> when indexing for example scandinavian languages. Is there
> a way to
> limit which characters are folded?
With MappingCharFilterFactory you have fully control over which characters are folded. You can see the default mappings in
mapping-ISOLatin1Accent.txt file.
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.MappingCharFilterFactory