You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by sunnyfr <jo...@gmail.com> on 2008/10/21 17:44:51 UTC

Re: To make sure XML is UTF-8

Hi Jeffrey,

How did you manage with your database conneciton in latin-1 to get your
information properly in utf-8 ?
to manage stemming .... everything ???

Thanks a lot,

How did you manage if 

Tiong Jeffrey wrote:
> 
> Hi Ajanta,
> 
> thanks! Since I used PHP, I managed to use the PHP decode function to
> change
> it to UTF-8.
> 
> But just a question, even if we change mysql default char-set to UTF-8,
> and
> if the input originally is in other format, the mysql engine won't help to
> convert it to UTF-8 rite? I think my question is, what is the use of
> defining the char-set in mysql other than for labeling purpose?
> 
> Thanks!
> 
> Jeffrey
> 
> On 6/13/07, Ajanta Phatak <ap...@limewire.com> wrote:
>>
>> Hi
>>
>> Not sure if you've had a solution for your problem yet, but I had dealt
>> with a similar issue that is mentioned below and hopefully it'll help
>> you too. Of course, this assumes that your original data is in utf-8
>> format.
>>
>> The default charset encoding for mysql is Latin1 and our display format
>> was utf-8 and that was the problem. These are the steps I performed to
>> get the search data in utf-8 format..
>>
>> Changed the my.cnf as so (though we can avoid this by executing commands
>> on every new connection if we don't want the whole db in utf format):
>>
>> Under: [mysqld] added:
>> # setting default charset to utf-8
>> collation_server=utf8_unicode_ci
>> character_set_server=utf8
>> default-character-set=utf8
>>
>> Under: [client]
>> default-character-set=utf8
>>
>> After changing, restarted mysqld, re-created the db, re-inserted all the
>> data again in the db using my data insert code (java program) and
>> re-created the Solr index. The key is to change the settings for both
>> the mysqld and client sections in my.cnf - the mysqld setting is to make
>> sure that mysql doesn't convert it to latin1 while storing the data and
>> the client setting is to ensure that the data is not converted while
>> accessing - going in or coming out from the server.
>>
>> Ajanta.
>>
>>
>> Tiong Jeffrey wrote:
>> > Ya you are right! After I change it to UTF-8 the error still there... I
>> > looked at the log, this is what it appears,
>> >
>> > 127.0.0.1 -  -  [10/06/2007:03:52:06 +0000] "POST /solr/update
>> > HTTP/1.1" 500
>> > 4022
>> >
>> > I tried to search but couldn't understand what error is this, anybody
>> has
>> > any idea on this?
>> >
>> > Thanks!!!
>> >
>> > On 6/10/07, Chris Hostetter <ho...@fucit.org> wrote:
>> >>
>> >> : way during indexing is - "FATAL: Connection error (is Solr running
>> at
>> >> : http://localhost/solr/update
>> >> : ?): java.io.IOException: Server returned HTTP Response code: 500 for
>> >> URL:
>> >> : http://local/solr/update"
>> >> : 4.Although the error code doesnt specify is XML utf-8 code error,
>> >> but I
>> >> did
>> >> : a bit research, and look at the XML file that i have, it doesn't
>> >> fulfill
>> >> the
>> >> : utf-8 encoding
>> >>
>> >> I *strongly* encourage you to look at the body of the response and/or
>> >> the
>> >> error log of your Servlet container and find out *exactly* what the
>> >> cause
>> >> of the error is ... you could spend a lot of time working on this and
>> >> discover it's not your real problem.
>> >>
>> >>
>> >>
>> >> -Hoss
>> >>
>> >
>>
> 
> 

-- 
View this message in context: http://www.nabble.com/To-make-sure-XML-is-UTF-8-tp11031646p20093197.html
Sent from the Solr - User mailing list archive at Nabble.com.