You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Michael Genereux <mg...@gmail.com> on 2010/03/27 00:09:17 UTC

Rails data to CouchDB

I was going to send the problem below in an email but instead I found
and am going to share the solution so others don't have to deal with
it.

--- Problem ---
I'm exporting a portion of a massive MySQL database that I'm thinking
is better suited to a CouchDB database.  The MySQL database supports a
Rails application and the MySQL server is set to UTF8 for everything.
 I'm using to_json in Rails that appears to convert the records just
fine to JSON.  I get about 200 records converted and imported into
CouchDB and then the process dies with "Invalid UTF-8 JSON".  One of
my fields in the offending record has the word "fĂȘte".  The JSON
produced by Rails doesn't convert this character to the \u0000
notation.  I don't think it should have to but maybe I'm not
understanding the standard.

--- Solution ---
The original importer of the data took ISO-8599-1 data and jammed it
into a UTF-8 field in the database.  The character that I was having
problems with was being auto translated by the web browser as a kind
of ASCII/IOS-8859-1/Windows-1252 fallback on non UTF-8 characters.  So
I could cut and paste the converted Unicode character from phpMyAdmin
right into Futon and the JSON was valid.  The solution within Rails
was after I converted the Rails object to JSON, I ran "json_data =
Iconv.conv( 'utf-8', 'iso-8859-1', json_data )" to clean out bad
characters.  Worked like a charm!