You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by binoybt <bi...@gmail.com> on 2012/08/31 16:34:15 UTC

UTF-8 without BOM French characters issue

Hi


I am facing an issue with french characters being converted to junk
characters after indexing. 

I am XPathEntityProcessor for indexing the xml file i have generated using
the Java application.

<entity name="feedback" pk="PARTY_ID" url="${file.fileAbsolutePath}"
processor="XPathEntityProcessor" forEach="/docs/add"
transformer="TemplateTransformer,RegexTransformer,DateFormatTransformer,com.solr.custom.timestamp.TimestampTransformer">

After digging into the issue i found that the cause for the same is because
the xml files are in the format "UTF-8 without BOM"? Is there a way to get
out of this issue.

French character : étaient état

Thanks
Binoy



--
View this message in context: http://lucene.472066.n3.nabble.com/UTF-8-without-BOM-French-characters-issue-tp4004751.html
Sent from the Solr - User mailing list archive at Nabble.com.