You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Merlin Morgenstern <me...@googlemail.com> on 2012/03/16 13:00:49 UTC

utf8 encoding for solr not working

I am running solr 3.5 with a mysql data connector. Solr is configured to
use UTF8 as encoding:

<dataConfig>
<dataSource type="JdbcDataSource"
            driver="com.mysql.jdbc.Driver"
            url="jdbc:mysql://localhost/test"
            user="solr"
            password="tester"
            convertType="true"
            batchSize="-1"
            encoding="UTF-8" />

unfortunatelly solr does encode special characters like "ä" into
htmlentities:

&auml;

which leads to problems when cutting strings with php mb_substr(..)

How can I configure solr to deliver UTF-8 instead of htmlentities?

Thank you for any help.

Re: utf8 encoding for solr not working

Posted by Tanguy Moal <ta...@gmail.com>.
I think you're using PHP to request solr.

You can ask solr to respond in several different formats (xml, json, 
php, ...), see http://wiki.apache.org/solr/QueryResponseWriter .

Depending on how you connect to solr from php, you may want to use 
html_entity_decode before using mb_substr.

--
Tanguy

Le 16/03/2012 13:00, Merlin Morgenstern a écrit :
> I am running solr 3.5 with a mysql data connector. Solr is configured to
> use UTF8 as encoding:
>
> <dataConfig>
> <dataSource type="JdbcDataSource"
>              driver="com.mysql.jdbc.Driver"
>              url="jdbc:mysql://localhost/test"
>              user="solr"
>              password="tester"
>              convertType="true"
>              batchSize="-1"
>              encoding="UTF-8" />
>
> unfortunatelly solr does encode special characters like "ä" into
> htmlentities:
>
> &auml;
>
> which leads to problems when cutting strings with php mb_substr(..)
>
> How can I configure solr to deliver UTF-8 instead of htmlentities?
>
> Thank you for any help.
>