You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Gastone Penzo <ga...@gmail.com> on 2012/06/06 16:43:22 UTC

problem with mapping-iso accents

Hi,
i have a problem ISOaccent tokenize filter.

i have e field in my schema with this filter:

<charFilter class="solr.MappingCharFilterFactory"
mapping="mapping-ISOLatin1Accent.txt"/>

if i try this filter with analyisis tool in solr admin panel it works.

for example:

sarà => sara.

but when i create indexes it doesn't work. in the index the field is "sarà"
with accent. why?

i use ad mysqlconnector to create indexes directly from mysql db

the mysql db is in uft-8, the connector charset is utf-8, solr is in utf-8
by default.

recently i changed my java from openjdk to sun-jdk. can be that the reason?

thanx



-- 
*Gastone Penzo*
*
*

Re: problem with mapping-iso accents

Posted by Erick Erickson <er...@gmail.com>.
First, please post usage/configuration questions over on the user's list, see:
http://lucene.apache.org/solr/discussion.html. The dev list is intended for
discussing development issues/bugs/etc.

You're probably being fooled by setting 'stored="true" '. When you return
the value of a field in a document (by the "fl" parameter or similar) you're
getting the original, unanalyzed value. To see what's actually indexed in
the document itself, try using the admin/schema browser page or the
TermsComponent (see: http://wiki.apache.org/solr/TermsComponent)

A quick test would be to search for tue unaccented version and see if the
document is found....

Best
Erick

On Wed, Jun 6, 2012 at 10:43 AM, Gastone Penzo <ga...@gmail.com> wrote:
> Hi,
> i have a problem ISOaccent tokenize filter.
>
> i have e field in my schema with this filter:
>
> <charFilter class="solr.MappingCharFilterFactory"
> mapping="mapping-ISOLatin1Accent.txt"/>
>
> if i try this filter with analyisis tool in solr admin panel it works.
>
> for example:
>
> sarà => sara.
>
> but when i create indexes it doesn't work. in the index the field is "sarà"
> with accent. why?
>
> i use ad mysqlconnector to create indexes directly from mysql db
>
> the mysql db is in uft-8, the connector charset is utf-8, solr is in utf-8
> by default.
>
> recently i changed my java from openjdk to sun-jdk. can be that the reason?
>
> thanx
>
>
>
> --
> Gastone Penzo
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org