You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Eric Pugh <ep...@opensourceconnections.com> on 2010/03/11 00:44:20 UTC

Updating FAQ for International Characters?

Hi all,

On the wiki page http://wiki.apache.org/solr/FAQ under the section  
"Why don't International Characters Work?" there are a number of  
options specified for dealing with a character like À (an A with a  
caret, the agrave character).

Any time a character like that was index Solr through a unknown entity  
error.  But if converted to &#192; or &Agrave; then everything works  
great.

I tried out using Tomcat versus Jetty and got the same results.   
Before I edit the FAQ, wanted to touch base that others haven't been  
able to fully index documents with characters like À.

Eric



-----------------------------------------------------
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com
Co-Author: Solr 1.4 Enterprise Search Server available from http://www.packtpub.com/solr-1-4-enterprise-search-server
Free/Busy: http://tinyurl.com/eric-cal









Re: Updating FAQ for International Characters?

Posted by Lance Norskog <go...@gmail.com>.
You might also try using CDATA blocks to wrap your Unicode text. It is
usually much easier to view the text while debugging these problems.

On Thu, Mar 11, 2010 at 12:13 AM, Eric Pugh
<ep...@opensourceconnections.com> wrote:
> So I am using Sunspot to post over, which means an extra layer of
> indirection between mean and my XML!  I will look tomorrow.
>
>
> On Mar 10, 2010, at 7:21 PM, Chris Hostetter wrote:
>
>>
>> : Any time a character like that was index Solr through a unknown entity
>> error.
>> : But if converted to &#192; or &Agrave; then everything works great.
>> :
>> : I tried out using Tomcat versus Jetty and got the same results.  Before
>> I edit
>>
>> Uh, you mean like the characters in exampledocs/utf8-example.xml ?
>>
>> it contains literale utf8 characters, and it works fine.
>>
>> Based on your "&#192;" comment I assume you are posting XML ... are you
>> sure you are using the utf8 charset?
>>
>> -Hoss
>>
>
> -----------------------------------------------------
> Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 |
> http://www.opensourceconnections.com
> Co-Author: Solr 1.4 Enterprise Search Server available from
> http://www.packtpub.com/solr-1-4-enterprise-search-server
> Free/Busy: http://tinyurl.com/eric-cal
>
>
>
>
>
>
>
>
>



-- 
Lance Norskog
goksron@gmail.com

Re: Updating FAQ for International Characters?

Posted by Eric Pugh <ep...@opensourceconnections.com>.
So I am using Sunspot to post over, which means an extra layer of  
indirection between mean and my XML!  I will look tomorrow.


On Mar 10, 2010, at 7:21 PM, Chris Hostetter wrote:

>
> : Any time a character like that was index Solr through a unknown  
> entity error.
> : But if converted to &#192; or &Agrave; then everything works great.
> :
> : I tried out using Tomcat versus Jetty and got the same results.   
> Before I edit
>
> Uh, you mean like the characters in exampledocs/utf8-example.xml ?
>
> it contains literale utf8 characters, and it works fine.
>
> Based on your "&#192;" comment I assume you are posting XML ... are  
> you
> sure you are using the utf8 charset?
>
> -Hoss
>

-----------------------------------------------------
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com
Co-Author: Solr 1.4 Enterprise Search Server available from http://www.packtpub.com/solr-1-4-enterprise-search-server
Free/Busy: http://tinyurl.com/eric-cal









Re: Updating FAQ for International Characters?

Posted by Chris Hostetter <ho...@fucit.org>.
: Any time a character like that was index Solr through a unknown entity error.
: But if converted to &#192; or &Agrave; then everything works great.
: 
: I tried out using Tomcat versus Jetty and got the same results.  Before I edit

Uh, you mean like the characters in exampledocs/utf8-example.xml ?

it contains literale utf8 characters, and it works fine.

Based on your "&#192;" comment I assume you are posting XML ... are you 
sure you are using the utf8 charset?

-Hoss