You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cayenne.apache.org by Andrus Adamchik <an...@objectstyle.org> on 2007/06/01 09:25:36 UTC

Re: UTF8 problem

Usually MySQL JDBC driver can detect the encoding. Not sure why it  
does not in your case. But you can always force UTF-8 via a  
connection URL parameter:

jdbc:mysql://localhost....?characterEncoding=UTF-8

http://dev.mysql.com/doc/refman/5.0/en/connector-j-reference- 
configuration-properties.html

Andrus


On May 31, 2007, at 3:43 PM, marco turchi wrote:

> Dear experts,
> I have a strange situation when I read data from mysql using cayenne.
> The table is encoded by utf8, I read the data, but some characters,
> like stressed letter, are not corrected encoded.
> If i read data inside mysql by a query, they are correct, otherwise
> if I read data by cayenne, and then I show them, they are not.
>
> Please, can u help me?
>
> thanks a lot
> Marco
>


Re: UTF8 problem

Posted by marco turchi <ma...@gmail.com>.
:-(
I have found the problem.....
Server characterset:    latin1
Db     characterset:    latin1
Client characterset:    latin1
Conn.  characterset:    latin1

all the characterset are latin1 and not UTF8... it is a tragedy!!!!!!
If I do not misunderstand,  I need to recreate the db using the utf8
characterset. Is it possible to change now the characterset without
recreate the db??

Thanks a lot for your time
Marco



On 6/1/07, Andrus Adamchik <an...@objectstyle.org> wrote:
>
> On Jun 1, 2007, at 8:44 PM, marco turchi wrote:
>
> > Dear Andrus,
> > I have not the admin privileges, so I cannot use mydb :-(
> > I'll ask to the administrator to check it for me.
>
> You need to replace "mydb" with the actual name of your database.
>
>
> > Should I add the "useUnicode=true" parameter in the same place of
> > "characterEncoding=UTF-8"?
>
> Yes, this is a part of the JDBC URL. You can also use them both at
> the same time: jdbc:mysql://....?characterEncoding=UTF-8&useUnicode=true
>
> Andrus
>

Re: UTF8 problem

Posted by marco turchi <ma...@gmail.com>.
Dear Andrus,
first of all, thanks a lot for your patience.

I have created a new database using character set utf-8, and colletion utf8...
Then i set the following variables:
 SET character_set_server = utf8;
 SET character_set_client = utf8;
 SET character_set_db = utf8;
now the "status" command gives me:
Server characterset:    utf8
Db     characterset:    utf8
Client characterset:    utf8
Conn.  characterset:    utf8

then i create a test table:
create table prova ( id_prova  int(11) NOT NULL auto_increment,
description longtext, PRIMARY KEY  (id_prova) ) character set  utf8;
and I have filled it with utf8 data.

and now....
...
...
it works!!!!!!!!!!!!

As a test, I try to fill the test table with the data in the old
table, and I get the same wrong situation because they are latin1.
The point, like you told me, is that the data in my old table are not
utf8, I hope to be able to convert them...
Thanks for everything
Marco




On 6/3/07, Andrus Adamchik <an...@objectstyle.org> wrote:
>
> On Jun 2, 2007, at 9:43 PM, marco turchi wrote:
>
> > | summary       | longtext     | utf8_general_ci | YES  |
> > | description   | longtext     | utf8_general_ci | YES  |
>
> I believe the third column is "collation", not "encoding"; so
> encoding is still "latin".
>
>
> > My idea is that the data are encoding UTF8 inside the table, but when
> > Cayenne creates a connection, all the data that pass through that
> > connection are encoding latin1. Is it right?
>
> Disclaimer *** : I've never tried the advise below myself, only found
> it in MySQL docs. So I suggest taking a full MySQL dump from your
> production DB, loading it to an offline test DB, and trying it there
> first, before applying to production.
>
>
>
> With ALTER DATABASE and ALTER TABLE you can change the default
> database charset and a default table charset on MySQL 5.0:
>
> http://dev.mysql.com/doc/refman/5.0/en/alter-database.html
> http://dev.mysql.com/doc/refman/5.0/en/alter-table.html
>
> Be careful with various ALTER TABLE charset options. According to the
> docs there are different ways to address a number of related but
> distinct charset conversion issues. You need to pick the one that is
> appropriate for you. So definitely do it on a test DB first.
>
> Good luck!
> Andrus
>
>
>
>

Re: UTF8 problem

Posted by Andrus Adamchik <an...@objectstyle.org>.
On Jun 2, 2007, at 9:43 PM, marco turchi wrote:

> | summary       | longtext     | utf8_general_ci | YES  |
> | description   | longtext     | utf8_general_ci | YES  |

I believe the third column is "collation", not "encoding"; so  
encoding is still "latin".


> My idea is that the data are encoding UTF8 inside the table, but when
> Cayenne creates a connection, all the data that pass through that
> connection are encoding latin1. Is it right?

Disclaimer *** : I've never tried the advise below myself, only found  
it in MySQL docs. So I suggest taking a full MySQL dump from your  
production DB, loading it to an offline test DB, and trying it there  
first, before applying to production.



With ALTER DATABASE and ALTER TABLE you can change the default  
database charset and a default table charset on MySQL 5.0:

http://dev.mysql.com/doc/refman/5.0/en/alter-database.html
http://dev.mysql.com/doc/refman/5.0/en/alter-table.html

Be careful with various ALTER TABLE charset options. According to the  
docs there are different ways to address a number of related but  
distinct charset conversion issues. You need to pick the one that is  
appropriate for you. So definitely do it on a test DB first.

Good luck!
Andrus




Re: UTF8 problem

Posted by marco turchi <ma...@gmail.com>.
To be honest I do not understand which is the right encoding of my data.
If I type "status", i obtain that server, db, connection and client
are latin1, but if I type "show full columns from FeedsAll;", I
obtain:
| summary       | longtext     | utf8_general_ci | YES  |
| description   | longtext     | utf8_general_ci | YES  |

My idea is that the data are encoding UTF8 inside the table, but when
Cayenne creates a connection, all the data that pass through that
connection are encoding latin1. Is it right?

Thanks a lot
Marco



On 6/1/07, Andrus Adamchik <an...@objectstyle.org> wrote:
>
> On Jun 1, 2007, at 8:44 PM, marco turchi wrote:
>
> > Dear Andrus,
> > I have not the admin privileges, so I cannot use mydb :-(
> > I'll ask to the administrator to check it for me.
>
> You need to replace "mydb" with the actual name of your database.
>
>
> > Should I add the "useUnicode=true" parameter in the same place of
> > "characterEncoding=UTF-8"?
>
> Yes, this is a part of the JDBC URL. You can also use them both at
> the same time: jdbc:mysql://....?characterEncoding=UTF-8&useUnicode=true
>
> Andrus
>

Re: UTF8 problem

Posted by Andrus Adamchik <an...@objectstyle.org>.
On Jun 1, 2007, at 8:44 PM, marco turchi wrote:

> Dear Andrus,
> I have not the admin privileges, so I cannot use mydb :-(
> I'll ask to the administrator to check it for me.

You need to replace "mydb" with the actual name of your database.


> Should I add the "useUnicode=true" parameter in the same place of
> "characterEncoding=UTF-8"?

Yes, this is a part of the JDBC URL. You can also use them both at  
the same time: jdbc:mysql://....?characterEncoding=UTF-8&useUnicode=true

Andrus

RE: UTF8 problem

Posted by Fredrik Liden <fl...@translate.com>.
Hi Marco,

I'm guessing the text is stored correctly as UTF-8 in the database.
The "Ã " symbol showing up instead of "à" indicates that it's stored as UTF-8 but reading or displaying it as windows-1252 (western european).

Type "sfasdfasdfasdà" in notepad. Save it as UTF-8.
Then in the bottom right corner change the encoding to Western European. You'll see "sfasdfasdfasdà "

If you're viewing it in a Webpage, the first thing I would check is  make sure that in the browser the "view -> encoding is set to UTF-8 and not Western European or something else. If utf-8 and still not showing up correctly somewhere along the lines you'd need to specify to JDBC to fetch it as UTF-8. Let me know how it goes as I need UTF-8 support in my next app.

Fredrik


-----Original Message-----
From: marco turchi [mailto:marco.turchi@gmail.com] 
Sent: Friday, June 01, 2007 11:44 AM
To: user@cayenne.apache.org
Subject: Re: UTF8 problem

Dear Andrus,
I have not the admin privileges, so I cannot use mydb :-(
I'll ask to the administrator to check it for me.
Should I add the "useUnicode=true" parameter in the same place of
"characterEncoding=UTF-8"?

Thanks
Marco
On 6/1/07, Andrus Adamchik <an...@objectstyle.org> wrote:
> I still suspect that this is a JDBC or MySQL problem, not Cayenne.
> Here is another URL parameter you may try: "useUnicode=true". Also
> you may want to doublecheck whether database was configured to
> support UTF-8. Enter "mysql" prompt and do something like this:
>
>  > use mydb;
>  > status;
>
> This should print a bunch of info, including this:
>
> Server characterset:    utf-8
> Db     characterset:    utf-8
> Client characterset:    utf-8
> Conn.  characterset:    utf-8
>
> If it prints anything other than utf-8, you may need to recreate the
> DB with an appropriate charset.
>
> Andrus
>
>
> On Jun 1, 2007, at 8:07 PM, marco turchi wrote:
> > Dear Kevin,
> > I have tried, but nothing has changed.
> > here I have an Italian example of my problem:
> > 1a)Assemblea e scontro in redazione per l'allegato di Michela
> > Brambilla Interviene anche il direttore Belpietro: "Il quotidiano sarà
> > in edicola" Inserto dei "Circoli della Libertà" e al "Giornale" scatta
> > lo sciopero
> > 1b)Assemblea e scontro in redazione per l'allegato di Michela
> > Brambilla Interviene anche il direttore Belpietro: "Il quotidiano sarÃ
> > in edicola" Inserto dei "Circoli della Libertà " e al "Giornale"
> > scatta lo sciopero
> >
> > where 1a is obtained directly by mysql, and 1b is obtained by Java/
> > Cayenne.
> > A difference that I have noticed is that:
> > if I write on two different files the sentence using mysql for 1a and
> > Java for 1b. The first is encoded as UTF-8 Unicode English text,while
> > the second as UTF-8 Unicode text.
> > Sorry about that.
> >
> > Thanks
> > Marco
> >
> >
> > On 6/1/07, Kevin Menard <km...@servprise.com> wrote:
> >> > -----Original Message-----
> >> > From: marco turchi [mailto:marco.turchi@gmail.com]
> >> > Sent: Friday, June 01, 2007 9:26 AM
> >> > To: user@cayenne.apache.org
> >> > Subject: Re: UTF8 problem
> >> >
> >> > I'm using Cayenne 1.2.1, could the version be the problem?
> >> > Note that the languages of my shell are:
> >> > en_GB.UTF-8:en_GB:en
> >>
> >> I don't know for certain that this will fix your problem, but you
> >> should
> >> probably try 1.2.3.  It's the latest 1.2.x release, is fully
> >> backward-compatible with 1.2.1, and includes a decent number of bug
> >> fixes.
> >>
> >> --
> >> Kevin
> >>
> >
>
>

Re: UTF8 problem

Posted by marco turchi <ma...@gmail.com>.
Dear Andrus,
I have not the admin privileges, so I cannot use mydb :-(
I'll ask to the administrator to check it for me.
Should I add the "useUnicode=true" parameter in the same place of
"characterEncoding=UTF-8"?

Thanks
Marco
On 6/1/07, Andrus Adamchik <an...@objectstyle.org> wrote:
> I still suspect that this is a JDBC or MySQL problem, not Cayenne.
> Here is another URL parameter you may try: "useUnicode=true". Also
> you may want to doublecheck whether database was configured to
> support UTF-8. Enter "mysql" prompt and do something like this:
>
>  > use mydb;
>  > status;
>
> This should print a bunch of info, including this:
>
> Server characterset:    utf-8
> Db     characterset:    utf-8
> Client characterset:    utf-8
> Conn.  characterset:    utf-8
>
> If it prints anything other than utf-8, you may need to recreate the
> DB with an appropriate charset.
>
> Andrus
>
>
> On Jun 1, 2007, at 8:07 PM, marco turchi wrote:
> > Dear Kevin,
> > I have tried, but nothing has changed.
> > here I have an Italian example of my problem:
> > 1a)Assemblea e scontro in redazione per l'allegato di Michela
> > Brambilla Interviene anche il direttore Belpietro: "Il quotidiano sarà
> > in edicola" Inserto dei "Circoli della Libertà" e al "Giornale" scatta
> > lo sciopero
> > 1b)Assemblea e scontro in redazione per l'allegato di Michela
> > Brambilla Interviene anche il direttore Belpietro: "Il quotidiano sarÃ
> > in edicola" Inserto dei "Circoli della Libertà " e al "Giornale"
> > scatta lo sciopero
> >
> > where 1a is obtained directly by mysql, and 1b is obtained by Java/
> > Cayenne.
> > A difference that I have noticed is that:
> > if I write on two different files the sentence using mysql for 1a and
> > Java for 1b. The first is encoded as UTF-8 Unicode English text,while
> > the second as UTF-8 Unicode text.
> > Sorry about that.
> >
> > Thanks
> > Marco
> >
> >
> > On 6/1/07, Kevin Menard <km...@servprise.com> wrote:
> >> > -----Original Message-----
> >> > From: marco turchi [mailto:marco.turchi@gmail.com]
> >> > Sent: Friday, June 01, 2007 9:26 AM
> >> > To: user@cayenne.apache.org
> >> > Subject: Re: UTF8 problem
> >> >
> >> > I'm using Cayenne 1.2.1, could the version be the problem?
> >> > Note that the languages of my shell are:
> >> > en_GB.UTF-8:en_GB:en
> >>
> >> I don't know for certain that this will fix your problem, but you
> >> should
> >> probably try 1.2.3.  It's the latest 1.2.x release, is fully
> >> backward-compatible with 1.2.1, and includes a decent number of bug
> >> fixes.
> >>
> >> --
> >> Kevin
> >>
> >
>
>

Re: UTF8 problem

Posted by Andrus Adamchik <an...@objectstyle.org>.
I still suspect that this is a JDBC or MySQL problem, not Cayenne.  
Here is another URL parameter you may try: "useUnicode=true". Also  
you may want to doublecheck whether database was configured to  
support UTF-8. Enter "mysql" prompt and do something like this:

 > use mydb;
 > status;

This should print a bunch of info, including this:

Server characterset:    utf-8
Db     characterset:    utf-8
Client characterset:    utf-8
Conn.  characterset:    utf-8

If it prints anything other than utf-8, you may need to recreate the  
DB with an appropriate charset.

Andrus


On Jun 1, 2007, at 8:07 PM, marco turchi wrote:
> Dear Kevin,
> I have tried, but nothing has changed.
> here I have an Italian example of my problem:
> 1a)Assemblea e scontro in redazione per l'allegato di Michela
> Brambilla Interviene anche il direttore Belpietro: "Il quotidiano sarà
> in edicola" Inserto dei "Circoli della Libertà" e al "Giornale" scatta
> lo sciopero
> 1b)Assemblea e scontro in redazione per l'allegato di Michela
> Brambilla Interviene anche il direttore Belpietro: "Il quotidiano sarÃ
> in edicola" Inserto dei "Circoli della Libertà " e al "Giornale"
> scatta lo sciopero
>
> where 1a is obtained directly by mysql, and 1b is obtained by Java/ 
> Cayenne.
> A difference that I have noticed is that:
> if I write on two different files the sentence using mysql for 1a and
> Java for 1b. The first is encoded as UTF-8 Unicode English text,while
> the second as UTF-8 Unicode text.
> Sorry about that.
>
> Thanks
> Marco
>
>
> On 6/1/07, Kevin Menard <km...@servprise.com> wrote:
>> > -----Original Message-----
>> > From: marco turchi [mailto:marco.turchi@gmail.com]
>> > Sent: Friday, June 01, 2007 9:26 AM
>> > To: user@cayenne.apache.org
>> > Subject: Re: UTF8 problem
>> >
>> > I'm using Cayenne 1.2.1, could the version be the problem?
>> > Note that the languages of my shell are:
>> > en_GB.UTF-8:en_GB:en
>>
>> I don't know for certain that this will fix your problem, but you  
>> should
>> probably try 1.2.3.  It's the latest 1.2.x release, is fully
>> backward-compatible with 1.2.1, and includes a decent number of bug
>> fixes.
>>
>> --
>> Kevin
>>
>


Re: UTF8 problem

Posted by marco turchi <ma...@gmail.com>.
Dear Kevin,
I have tried, but nothing has changed.
here I have an Italian example of my problem:
1a)Assemblea e scontro in redazione per l'allegato di Michela
Brambilla Interviene anche il direttore Belpietro: "Il quotidiano sarà
in edicola" Inserto dei "Circoli della Libertà" e al "Giornale" scatta
lo sciopero
1b)Assemblea e scontro in redazione per l'allegato di Michela
Brambilla Interviene anche il direttore Belpietro: "Il quotidiano sarÃ
 in edicola" Inserto dei "Circoli della Libertà " e al "Giornale"
scatta lo sciopero

where 1a is obtained directly by mysql, and 1b is obtained by Java/Cayenne.
A difference that I have noticed is that:
if I write on two different files the sentence using mysql for 1a and
Java for 1b. The first is encoded as UTF-8 Unicode English text,while
the second as UTF-8 Unicode text.
Sorry about that.

Thanks
Marco


On 6/1/07, Kevin Menard <km...@servprise.com> wrote:
> > -----Original Message-----
> > From: marco turchi [mailto:marco.turchi@gmail.com]
> > Sent: Friday, June 01, 2007 9:26 AM
> > To: user@cayenne.apache.org
> > Subject: Re: UTF8 problem
> >
> > I'm using Cayenne 1.2.1, could the version be the problem?
> > Note that the languages of my shell are:
> > en_GB.UTF-8:en_GB:en
>
> I don't know for certain that this will fix your problem, but you should
> probably try 1.2.3.  It's the latest 1.2.x release, is fully
> backward-compatible with 1.2.1, and includes a decent number of bug
> fixes.
>
> --
> Kevin
>

RE: UTF8 problem

Posted by Kevin Menard <km...@servprise.com>.
> -----Original Message-----
> From: marco turchi [mailto:marco.turchi@gmail.com] 
> Sent: Friday, June 01, 2007 9:26 AM
> To: user@cayenne.apache.org
> Subject: Re: UTF8 problem
> 
> I'm using Cayenne 1.2.1, could the version be the problem?
> Note that the languages of my shell are:
> en_GB.UTF-8:en_GB:en

I don't know for certain that this will fix your problem, but you should
probably try 1.2.3.  It's the latest 1.2.x release, is fully
backward-compatible with 1.2.1, and includes a decent number of bug
fixes.

-- 
Kevin

Re: UTF8 problem

Posted by marco turchi <ma...@gmail.com>.
Dear Andrus,
I force the charset encoding, but nothing changes.
The strange things is that if I run my software and I put the output
in a file using a pipe, it is:
pippo.txt:             UTF-8 Unicode text, with very long lines
but the stressed letter are wrong encoded... while in the database
they are well encoded....

I'm using Cayenne 1.2.1, could the version be the problem?
Note that the languages of my shell are:
en_GB.UTF-8:en_GB:en

Thanks a lot
Marco

On 6/1/07, Andrus Adamchik <an...@objectstyle.org> wrote:
> Usually MySQL JDBC driver can detect the encoding. Not sure why it
> does not in your case. But you can always force UTF-8 via a
> connection URL parameter:
>
> jdbc:mysql://localhost....?characterEncoding=UTF-8
>
> http://dev.mysql.com/doc/refman/5.0/en/connector-j-reference-
> configuration-properties.html
>
> Andrus
>
>
> On May 31, 2007, at 3:43 PM, marco turchi wrote:
>
> > Dear experts,
> > I have a strange situation when I read data from mysql using cayenne.
> > The table is encoded by utf8, I read the data, but some characters,
> > like stressed letter, are not corrected encoded.
> > If i read data inside mysql by a query, they are correct, otherwise
> > if I read data by cayenne, and then I show them, they are not.
> >
> > Please, can u help me?
> >
> > thanks a lot
> > Marco
> >
>
>