You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ofbiz.apache.org by John Hays <jo...@MaverickLabel.com> on 2009/05/12 22:59:51 UTC

Latin1 encoding

I am looking at importing thousands of records from an existing system  
into OFBIZ.  Our traditional storage charset is UTF8.

I note that OFBIZ seems to like to use Latin1 as its encoding charset.

When I load a generated XML file (created using webtools against a tab  
delimited file), from the command line, it chokes on any accented  
characters in names, etc. (Seems to work better through the form  
interface)

If I move the tables to UTF8, do I break OFBIZ?


John D. Hays
Director of Information Technology



www.mavericklabel.com
120 West Dayton Street
Edmonds, WA 98020-4180


Re: Latin1 encoding

Posted by "John D. Hays" <jo...@mavericklabel.com>.
David E. Jones wrote:
> On Tue, 2009-05-12 at 18:35 -0700, John D. Hays wrote:
>   
>> David E Jones wrote:
>>     
>>> Could you be more specific? What makes you think OFBiz likes Latin1 as 
>>> an encoding character set?
>>>
>>>       
>> Its in the entity definitions on the SVN code.
>>     
>
> I'm not sure what this means... do you mean specifically the
> entityengine.xml file?
>   

Yes.
>   
>>> As a random guess of the direction you're going: which database are 
>>> you using, is it MySQL?
>>>       
>> Yes, MySQL.
>>     
>
> This could be the issue. MySQL doesn't handle UTF-8 (or any multi-byte
> character set) very well. There are (or used to be) some JDBC driver
> issues, but the big problem is that MySQL column sizes are in bytes and
> NOT in characters, and a single UTF-8 character takes 3 bytes. In other
> words, if you put a 100 character UTF-8 string into the database it will
> require 300 bytes, and if it is a size 255 column then BOOM! String too
> long error message.
>
> That is why in the entityengine.xml file the default datasource for
> MySQL has the char set as a non multi-byte character set.
>
> If you need to do internationalized text OFBiz will handle it great, but
> MySQL won't. I'd recommend you use Postgres or something else instead.
>
> -David
>
>   
In the cases that I am referring to, we are currently using MySQL and 
UTF-8.  We could switch to a different database such Postgres, but UTF-8 
seems to work in MySQL for what we have been doing. (I have had to use 
byte[] to pull some UTF-8 strings rather than the getString through jdbc 
before, but have made it work.)

Will investigate further.




Re: Latin1 encoding

Posted by "David E. Jones" <da...@hotwaxmedia.com>.
On Tue, 2009-05-12 at 18:35 -0700, John D. Hays wrote:
> David E Jones wrote:
> >
> > Could you be more specific? What makes you think OFBiz likes Latin1 as 
> > an encoding character set?
> >
> Its in the entity definitions on the SVN code.

I'm not sure what this means... do you mean specifically the
entityengine.xml file?

> > As a random guess of the direction you're going: which database are 
> > you using, is it MySQL?
> 
> Yes, MySQL.

This could be the issue. MySQL doesn't handle UTF-8 (or any multi-byte
character set) very well. There are (or used to be) some JDBC driver
issues, but the big problem is that MySQL column sizes are in bytes and
NOT in characters, and a single UTF-8 character takes 3 bytes. In other
words, if you put a 100 character UTF-8 string into the database it will
require 300 bytes, and if it is a size 255 column then BOOM! String too
long error message.

That is why in the entityengine.xml file the default datasource for
MySQL has the char set as a non multi-byte character set.

If you need to do internationalized text OFBiz will handle it great, but
MySQL won't. I'd recommend you use Postgres or something else instead.

-David


> > On May 12, 2009, at 2:59 PM, John Hays wrote:
> >
> >> I am looking at importing thousands of records from an existing 
> >> system into OFBIZ.  Our traditional storage charset is UTF8.
> >>
> >> I note that OFBIZ seems to like to use Latin1 as its encoding charset.
> >>
> >> When I load a generated XML file (created using webtools against a 
> >> tab delimited file), from the command line, it chokes on any accented 
> >> characters in names, etc. (Seems to work better through the form 
> >> interface)
> >>
> >> If I move the tables to UTF8, do I break OFBIZ?
> >>
> >>
> >> John D. Hays
> >> Director of Information Technology
> >>
> >>
> >>
> >> www.mavericklabel.com
> >> 120 West Dayton Street
> >> Edmonds, WA 98020-4180
> >>
> >
> 


Re: Latin1 encoding

Posted by "John D. Hays" <jo...@mavericklabel.com>.
David E Jones wrote:
>
> Could you be more specific? What makes you think OFBiz likes Latin1 as 
> an encoding character set?
>
Its in the entity definitions on the SVN code.
> As a random guess of the direction you're going: which database are 
> you using, is it MySQL?

Yes, MySQL.
>
> -David
>
>
> On May 12, 2009, at 2:59 PM, John Hays wrote:
>
>> I am looking at importing thousands of records from an existing 
>> system into OFBIZ.  Our traditional storage charset is UTF8.
>>
>> I note that OFBIZ seems to like to use Latin1 as its encoding charset.
>>
>> When I load a generated XML file (created using webtools against a 
>> tab delimited file), from the command line, it chokes on any accented 
>> characters in names, etc. (Seems to work better through the form 
>> interface)
>>
>> If I move the tables to UTF8, do I break OFBIZ?
>>
>>
>> John D. Hays
>> Director of Information Technology
>>
>>
>>
>> www.mavericklabel.com
>> 120 West Dayton Street
>> Edmonds, WA 98020-4180
>>
>


Re: Latin1 encoding

Posted by David E Jones <da...@hotwaxmedia.com>.
Could you be more specific? What makes you think OFBiz likes Latin1 as  
an encoding character set?

As a random guess of the direction you're going: which database are  
you using, is it MySQL?

-David


On May 12, 2009, at 2:59 PM, John Hays wrote:

> I am looking at importing thousands of records from an existing  
> system into OFBIZ.  Our traditional storage charset is UTF8.
>
> I note that OFBIZ seems to like to use Latin1 as its encoding charset.
>
> When I load a generated XML file (created using webtools against a  
> tab delimited file), from the command line, it chokes on any  
> accented characters in names, etc. (Seems to work better through the  
> form interface)
>
> If I move the tables to UTF8, do I break OFBIZ?
>
>
> John D. Hays
> Director of Information Technology
>
>
>
> www.mavericklabel.com
> 120 West Dayton Street
> Edmonds, WA 98020-4180
>