You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Sascha Szott <sz...@zib.de> on 2009/11/18 14:48:36 UTC
VelocityResponseWriter/Solritas character encoding issue
Hi,
I've played around with Solr's VelocityResponseWriter (which is indeed a
very useful feature for rapid prototyping). I've realized that Velocity
uses ISO-8859-1 as default character encoding. I've changed this setting
to UTF-8 in my velocity.properties file (inside the conf directory), i.e.,
input.encoding=UTF-8
output.encoding=UTF-8
and checked that the settings were successfully loaded.
Within the main Velocity template, browse.vm, the character encoding is
set to UTF-8 as well, i.e.,
<meta http-equiv="content-type" content="text/html; charset=UTF-8"/>
After starting Solr (which is deployed in a Tomcat 6 server on a Ubuntu
machine), I ran into some character encoding problems.
Due to the change of input.encoding to UTF-8, no problems occur when
non-ASCII characters are presend in the query string, e.g. german
umlauts. But unfortunately, something is wrong with the encoding of
characters in the html page that is generated by VelocityResponseWriter.
The non-ASCII characters aren't displayed properly (for example, FF
prints a black diamond with a white question mark). If I manually set
the encoding to ISO-8859-1, the non-ASCII characters are displayed
correctly. Does anybody have a clue?
Thanks in advance,
Sascha
Re: VelocityResponseWriter/Solritas character encoding issue
Posted by Sascha Szott <sz...@zib.de>.
Hi Lance,
Lance Norskog wrote:
> What platform are you using? Windows does not use UTF-8 by default,
> and this can cause subtle problems. If you can do the same thing on
> other platforms (Linux, Mac) that would help narrow down the problem.
My Solr server runs in a Tomcat server on a Ubuntu Linux machine.
-Sascha
>
> On Wed, Nov 18, 2009 at 8:15 AM, Sascha Szott <sz...@zib.de> wrote:
>> Hi Erik,
>>
>> Erik Hatcher wrote:
>>>
>>> Can you give me a test document that causes an issue? Â (maybe send me
>>> a
>>> Solr XML document in private e-mail). Â I'll see what I can do once I
>>> can
>>> see the issue first hand.
>>
>> Thank you! Just try the utf8-example.xml file in the exampledoc
>> directory.
>> After having indexed the document, the output of the script test_utf8.sh
>> suggests to me that everything works correctly:
>>
>> Â Solr server is up.
>> Â HTTP GET is accepting UTF-8
>> Â HTTP POST is accepting UTF-8
>> Â HTTP POST does not default to UTF-8
>> Â HTTP GET is accepting UTF-8 beyond the basic multilingual plane
>> Â HTTP POST is accepting UTF-8 beyond the basic multilingual plane
>> Â HTTP POST + URL params is accepting UTF-8 beyond the basic
>> multilingual
>>
>> If I'm using the standard QueryResponseWriter and the query q=umlauts,
>> the
>> responding xml page contains properly printed non-ASCII characters. The
>> same
>> query against the VelocityResponseWriter returns a lot of Unicode
>> replacement characters (u+FFFD) instead.
>>
>> -Sascha
>>
>>>
>>> On Nov 18, 2009, at 2:48 PM, Sascha Szott wrote:
>>>
>>>> Hi,
>>>>
>>>> I've played around with Solr's VelocityResponseWriter (which is indeed
>>>> a
>>>> very useful feature for rapid prototyping). I've realized that
>>>> Velocity uses
>>>> ISO-8859-1 as default character encoding. I've changed this setting to
>>>> UTF-8
>>>> in my velocity.properties file (inside the conf directory), i.e.,
>>>>
>>>> Â input.encoding=UTF-8
>>>> Â output.encoding=UTF-8
>>>>
>>>> and checked that the settings were successfully loaded.
>>>>
>>>> Within the main Velocity template, browse.vm, the character encoding
>>>> is
>>>> set to UTF-8 as well, i.e.,
>>>>
>>>> Â <meta http-equiv="content-type" content="text/html; charset=UTF-8"/>
>>>>
>>>> After starting Solr (which is deployed in a Tomcat 6 server on a
>>>> Ubuntu
>>>> machine), I ran into some character encoding problems.
>>>>
>>>> Due to the change of input.encoding to UTF-8, no problems occur when
>>>> non-ASCII characters are presend in the query string, e.g. german
>>>> umlauts.
>>>> But unfortunately, something is wrong with the encoding of characters
>>>> in the
>>>> html page that is generated by VelocityResponseWriter. The non-ASCII
>>>> characters aren't displayed properly (for example, FF prints a black
>>>> diamond
>>>> with a white question mark). If I manually set the encoding to
>>>> ISO-8859-1,
>>>> the non-ASCII characters are displayed correctly. Does anybody have a
>>>> clue?
>>>>
>>>> Thanks in advance,
>>>> Sascha
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>
>>
>
>
>
> --
> Lance Norskog
> goksron@gmail.com
>
Re: VelocityResponseWriter/Solritas character encoding issue
Posted by Lance Norskog <go...@gmail.com>.
What platform are you using? Windows does not use UTF-8 by default,
and this can cause subtle problems. If you can do the same thing on
other platforms (Linux, Mac) that would help narrow down the problem.
On Wed, Nov 18, 2009 at 8:15 AM, Sascha Szott <sz...@zib.de> wrote:
> Hi Erik,
>
> Erik Hatcher wrote:
>>
>> Can you give me a test document that causes an issue? (maybe send me a
>> Solr XML document in private e-mail). I'll see what I can do once I can
>> see the issue first hand.
>
> Thank you! Just try the utf8-example.xml file in the exampledoc directory.
> After having indexed the document, the output of the script test_utf8.sh
> suggests to me that everything works correctly:
>
> Solr server is up.
> HTTP GET is accepting UTF-8
> HTTP POST is accepting UTF-8
> HTTP POST does not default to UTF-8
> HTTP GET is accepting UTF-8 beyond the basic multilingual plane
> HTTP POST is accepting UTF-8 beyond the basic multilingual plane
> HTTP POST + URL params is accepting UTF-8 beyond the basic multilingual
>
> If I'm using the standard QueryResponseWriter and the query q=umlauts, the
> responding xml page contains properly printed non-ASCII characters. The same
> query against the VelocityResponseWriter returns a lot of Unicode
> replacement characters (u+FFFD) instead.
>
> -Sascha
>
>>
>> On Nov 18, 2009, at 2:48 PM, Sascha Szott wrote:
>>
>>> Hi,
>>>
>>> I've played around with Solr's VelocityResponseWriter (which is indeed a
>>> very useful feature for rapid prototyping). I've realized that Velocity uses
>>> ISO-8859-1 as default character encoding. I've changed this setting to UTF-8
>>> in my velocity.properties file (inside the conf directory), i.e.,
>>>
>>> input.encoding=UTF-8
>>> output.encoding=UTF-8
>>>
>>> and checked that the settings were successfully loaded.
>>>
>>> Within the main Velocity template, browse.vm, the character encoding is
>>> set to UTF-8 as well, i.e.,
>>>
>>> <meta http-equiv="content-type" content="text/html; charset=UTF-8"/>
>>>
>>> After starting Solr (which is deployed in a Tomcat 6 server on a Ubuntu
>>> machine), I ran into some character encoding problems.
>>>
>>> Due to the change of input.encoding to UTF-8, no problems occur when
>>> non-ASCII characters are presend in the query string, e.g. german umlauts.
>>> But unfortunately, something is wrong with the encoding of characters in the
>>> html page that is generated by VelocityResponseWriter. The non-ASCII
>>> characters aren't displayed properly (for example, FF prints a black diamond
>>> with a white question mark). If I manually set the encoding to ISO-8859-1,
>>> the non-ASCII characters are displayed correctly. Does anybody have a clue?
>>>
>>> Thanks in advance,
>>> Sascha
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>
>
--
Lance Norskog
goksron@gmail.com
Re: VelocityResponseWriter/Solritas character encoding issue
Posted by Sascha Szott <sz...@zib.de>.
Hi Erik,
Erik Hatcher wrote:
> Can you give me a test document that causes an issue? (maybe send me a
> Solr XML document in private e-mail). I'll see what I can do once I
> can see the issue first hand.
Thank you! Just try the utf8-example.xml file in the exampledoc
directory. After having indexed the document, the output of the script
test_utf8.sh suggests to me that everything works correctly:
Solr server is up.
HTTP GET is accepting UTF-8
HTTP POST is accepting UTF-8
HTTP POST does not default to UTF-8
HTTP GET is accepting UTF-8 beyond the basic multilingual plane
HTTP POST is accepting UTF-8 beyond the basic multilingual plane
HTTP POST + URL params is accepting UTF-8 beyond the basic multilingual
If I'm using the standard QueryResponseWriter and the query q=umlauts,
the responding xml page contains properly printed non-ASCII characters.
The same query against the VelocityResponseWriter returns a lot of
Unicode replacement characters (u+FFFD) instead.
-Sascha
>
> On Nov 18, 2009, at 2:48 PM, Sascha Szott wrote:
>
>> Hi,
>>
>> I've played around with Solr's VelocityResponseWriter (which is indeed
>> a very useful feature for rapid prototyping). I've realized that
>> Velocity uses ISO-8859-1 as default character encoding. I've changed
>> this setting to UTF-8 in my velocity.properties file (inside the conf
>> directory), i.e.,
>>
>> input.encoding=UTF-8
>> output.encoding=UTF-8
>>
>> and checked that the settings were successfully loaded.
>>
>> Within the main Velocity template, browse.vm, the character encoding
>> is set to UTF-8 as well, i.e.,
>>
>> <meta http-equiv="content-type" content="text/html; charset=UTF-8"/>
>>
>> After starting Solr (which is deployed in a Tomcat 6 server on a
>> Ubuntu machine), I ran into some character encoding problems.
>>
>> Due to the change of input.encoding to UTF-8, no problems occur when
>> non-ASCII characters are presend in the query string, e.g. german
>> umlauts. But unfortunately, something is wrong with the encoding of
>> characters in the html page that is generated by
>> VelocityResponseWriter. The non-ASCII characters aren't displayed
>> properly (for example, FF prints a black diamond with a white question
>> mark). If I manually set the encoding to ISO-8859-1, the non-ASCII
>> characters are displayed correctly. Does anybody have a clue?
>>
>> Thanks in advance,
>> Sascha
>>
>>
>>
>>
>>
>>
>>
[Solved] Re: VelocityResponseWriter/Solritas character encoding issue
Posted by Sascha Szott <sz...@zib.de>.
Hi Erik,
I've finally solved the problem. Unfortunately, the parameter
v.contentType was not described in the Solr wiki (I've fixed that now).
The point is, you must specify (in your solrconfig.xml)
<str name="v.contentType">text/xml;charset=UTF-8</str>
in order to receive correctly UTF-8 encoded HTML. That's it!
Best,
Sascha
Erik Hatcher schrieb:
> Sascha,
>
> Can you give me a test document that causes an issue? (maybe send me a
> Solr XML document in private e-mail). I'll see what I can do once I
> can see the issue first hand.
>
> Erik
>
>
> On Nov 18, 2009, at 2:48 PM, Sascha Szott wrote:
>
>> Hi,
>>
>> I've played around with Solr's VelocityResponseWriter (which is indeed
>> a very useful feature for rapid prototyping). I've realized that
>> Velocity uses ISO-8859-1 as default character encoding. I've changed
>> this setting to UTF-8 in my velocity.properties file (inside the conf
>> directory), i.e.,
>>
>> input.encoding=UTF-8
>> output.encoding=UTF-8
>>
>> and checked that the settings were successfully loaded.
>>
>> Within the main Velocity template, browse.vm, the character encoding
>> is set to UTF-8 as well, i.e.,
>>
>> <meta http-equiv="content-type" content="text/html; charset=UTF-8"/>
>>
>> After starting Solr (which is deployed in a Tomcat 6 server on a
>> Ubuntu machine), I ran into some character encoding problems.
>>
>> Due to the change of input.encoding to UTF-8, no problems occur when
>> non-ASCII characters are presend in the query string, e.g. german
>> umlauts. But unfortunately, something is wrong with the encoding of
>> characters in the html page that is generated by
>> VelocityResponseWriter. The non-ASCII characters aren't displayed
>> properly (for example, FF prints a black diamond with a white question
>> mark). If I manually set the encoding to ISO-8859-1, the non-ASCII
>> characters are displayed correctly. Does anybody have a clue?
>>
>> Thanks in advance,
>> Sascha
>>
>>
Re: VelocityResponseWriter/Solritas character encoding issue
Posted by Erik Hatcher <er...@gmail.com>.
Sascha,
Can you give me a test document that causes an issue? (maybe send me
a Solr XML document in private e-mail). I'll see what I can do once
I can see the issue first hand.
Erik
On Nov 18, 2009, at 2:48 PM, Sascha Szott wrote:
> Hi,
>
> I've played around with Solr's VelocityResponseWriter (which is
> indeed a very useful feature for rapid prototyping). I've realized
> that Velocity uses ISO-8859-1 as default character encoding. I've
> changed this setting to UTF-8 in my velocity.properties file (inside
> the conf directory), i.e.,
>
> input.encoding=UTF-8
> output.encoding=UTF-8
>
> and checked that the settings were successfully loaded.
>
> Within the main Velocity template, browse.vm, the character encoding
> is set to UTF-8 as well, i.e.,
>
> <meta http-equiv="content-type" content="text/html; charset=UTF-8"/>
>
> After starting Solr (which is deployed in a Tomcat 6 server on a
> Ubuntu machine), I ran into some character encoding problems.
>
> Due to the change of input.encoding to UTF-8, no problems occur when
> non-ASCII characters are presend in the query string, e.g. german
> umlauts. But unfortunately, something is wrong with the encoding of
> characters in the html page that is generated by
> VelocityResponseWriter. The non-ASCII characters aren't displayed
> properly (for example, FF prints a black diamond with a white
> question mark). If I manually set the encoding to ISO-8859-1, the
> non-ASCII characters are displayed correctly. Does anybody have a
> clue?
>
> Thanks in advance,
> Sascha
>
>
>
>
>
>
>