You are viewing a plain text version of this content. The canonical link for it is here.

Posted to derby-user@db.apache.org by David Van Couvering <da...@vancouvering.com> on 2007/09/06 19:35:27 UTC

Derby and character set encodings

Hi, all.  I am getting some questions from Ken Frank NetBeans
internationalization quality team about Java DB and character set
encodings.  Rather than try and play go-between, I'm including him
here so he can directly ask any follow-on questions.

Ken would like to understand how Derby makes use of character
encodings, and how it is affected by  various settings.  How does
Derby handle things if the encoding is set to something different from
our default of UTF-8?  Are we impacted, or do we rely on Java routines
such as the Collator and Comparator class to handle this?

Sorry if I'm talking out my ear, i18n is not one of my fortes.

Thanks,

David

Re: Derby and character set encodings

Posted by Mike Matrigali <mi...@sbcglobal.net>.

This is mixing a lot of things up.  I also may use the wrong
terminology here.

Character set encodings really only come into play with tools like
ij, and import getting the string from the environment into derby.  The more
standard interaction is using jdbc to load a java string into derby.
At that level we don't do anything with encodings.

We happen to use a modified utf8 to store stuff to disk, and this is
not configurable.  But no user interface should depend on this encoding, 
and Derby could change this storage in the future.

Logically all strings at runtime are converted to standard java char.

Before 10.3 we always used standard java string compare which did a 
numerical comparison of the unicode value of chars to arrive at 
ordering.  That is still the default.  In 10.3 an option was added to
set the territory based collation when the database is created such that 
comparison is dependent on the territory of the database.  For this 
standard java
rule based Collator interfaces are used.  This is documented in the latest
derby release.

David Van Couvering wrote:
> Hi, all.  I am getting some questions from Ken Frank NetBeans
> internationalization quality team about Java DB and character set
> encodings.  Rather than try and play go-between, I'm including him
> here so he can directly ask any follow-on questions.
> 
> Ken would like to understand how Derby makes use of character
> encodings, and how it is affected by  various settings.  How does
> Derby handle things if the encoding is set to something different from
> our default of UTF-8?  Are we impacted, or do we rely on Java routines
> such as the Collator and Comparator class to handle this?
> 
> Sorry if I'm talking out my ear, i18n is not one of my fortes.
> 
> Thanks,
> 
> David
>

Re: Derby and character set encodings

Posted by Daniel John Debrunner <dj...@apache.org>.

Ken Frank wrote:
> the one remaining question, for the folks at derby-user (and adding 
> derby-dev) is the first one:
> 
> 1.  when one creates a new derby database,
> is the database created with a certain encoding that will be used ?

No. The database doesn't have an encoding.

Dan.

Re: Derby and character set encodings

Posted by Ken Frank <Ke...@Sun.COM>.

the one remaining question, for the folks at derby-user (and adding 
derby-dev) is the first one:

1.  when one creates a new derby database,
is the database created with a certain encoding that will be used ?

or is there an argument given to create command that can indicate the encoding to be used ?

And if so, is that encoding the default encoding of the locale I am in when I run
the create database command or is it utf-8 always ?
(for example, for one of the Japanese locales of Solaris, the encoding of it
is euc-jp)

or could it be that of the encoding of the locale the actual dbase server
is started in ?  (which might be java's view of the users locale/encoding
which would be I think the same as the OS locale user is in)

that is, user might start the db server in some separate locale from where they start netbeans.

Thanks - Ken
===========================================================================




David Van Couvering wrote:
> I think I can actually answer some of these questions :)
>
> On 9/6/07, Ken Frank <Ke...@sun.com> wrote:
>   
>> Thanks David for sending this.
>>
>> Let me note a few questions:
>>
>> 1.  when one creates a new database,
>> is the database created with a certain encoding that will be used ?
>>
>> And if so, is that encoding that of the locale I am in when I run
>> the create database commands or is it utf-8 always ?
>> (for example, for one of the Japanese locales of Solaris, the encoding of it
>> is euc-jp)
>>
>> or could it be that of the encoding of the locale the actual dbase server
>> is started in ?  (which might be java's view of the users locale/encoding
>> which would be I think the same as the OS locale user is in)
>>
>> I saw this from derby docs:
>> "To support users in many different languages, Derby's SQL parser
>> understands all Unicode characters and allows any Unicode character or
>> number to be used in an identifier."
>>
>> but I don't know if it means that there is no concept of an encoding
>> for a database itself or not.
>>
>> I think with Oracle for example, there is an argument to create database
>> that lets one specify the encoding of it.
>>
>>     
>
> This question stumps me, I'll leave it to others...
>
>   
>> 2.  The locale the user is in when starting derby server -
>> what things are affected by that - ie encoding of dbase, messages to
>> user (if translated), time, date, etc ?
>> (vs user needing to set separate variables or properties)
>>
>>     
>
> I don't know what "encoding of the dbase" means, but the other display
> stuff: exception messages, time and date and money formats, etc., are
> all controlled by locale.
>
>   
>> 3.  I think its allowed for identifiers like database names,
>> table and column names, to have non ascii in them, if proper
>> quoting is used when referring to them  ?
>>
>>     
>
> Yes, that's right.
>
>   
>> Thanks - Ken
>>
>>
>> David Van Couvering wrote:
>>
>>     
>>> Hi, all.  I am getting some questions from Ken Frank NetBeans
>>> internationalization quality team about Java DB and character set
>>> encodings.  Rather than try and play go-between, I'm including him
>>> here so he can directly ask any follow-on questions.
>>>
>>> Ken would like to understand how Derby makes use of character
>>> encodings, and how it is affected by  various settings.  How does
>>> Derby handle things if the encoding is set to something different from
>>> our default of UTF-8?  Are we impacted, or do we rely on Java routines
>>> such as the Collator and Comparator class to handle this?
>>>
>>> Sorry if I'm talking out my ear, i18n is not one of my fortes.
>>>
>>> Thanks,
>>>
>>> David
>>>
>>>
>>>       

-- 
========================================
if your reply to this mail bounces,
and reply was sent to kenf@<somemachinename>,
then please reply to ken.frank@sun.com  instead
===========================================

Re: Derby and character set encodings

Posted by Ken Frank <Ke...@Sun.COM>.

the one remaining question, for the folks at derby-user (and adding 
derby-dev) is the first one:

1.  when one creates a new derby database,
is the database created with a certain encoding that will be used ?

or is there an argument given to create command that can indicate the encoding to be used ?

And if so, is that encoding the default encoding of the locale I am in when I run
the create database command or is it utf-8 always ?
(for example, for one of the Japanese locales of Solaris, the encoding of it
is euc-jp)

or could it be that of the encoding of the locale the actual dbase server
is started in ?  (which might be java's view of the users locale/encoding
which would be I think the same as the OS locale user is in)

that is, user might start the db server in some separate locale from where they start netbeans.

Thanks - Ken
===========================================================================




David Van Couvering wrote:
> I think I can actually answer some of these questions :)
>
> On 9/6/07, Ken Frank <Ke...@sun.com> wrote:
>   
>> Thanks David for sending this.
>>
>> Let me note a few questions:
>>
>> 1.  when one creates a new database,
>> is the database created with a certain encoding that will be used ?
>>
>> And if so, is that encoding that of the locale I am in when I run
>> the create database commands or is it utf-8 always ?
>> (for example, for one of the Japanese locales of Solaris, the encoding of it
>> is euc-jp)
>>
>> or could it be that of the encoding of the locale the actual dbase server
>> is started in ?  (which might be java's view of the users locale/encoding
>> which would be I think the same as the OS locale user is in)
>>
>> I saw this from derby docs:
>> "To support users in many different languages, Derby's SQL parser
>> understands all Unicode characters and allows any Unicode character or
>> number to be used in an identifier."
>>
>> but I don't know if it means that there is no concept of an encoding
>> for a database itself or not.
>>
>> I think with Oracle for example, there is an argument to create database
>> that lets one specify the encoding of it.
>>
>>     
>
> This question stumps me, I'll leave it to others...
>
>   
>> 2.  The locale the user is in when starting derby server -
>> what things are affected by that - ie encoding of dbase, messages to
>> user (if translated), time, date, etc ?
>> (vs user needing to set separate variables or properties)
>>
>>     
>
> I don't know what "encoding of the dbase" means, but the other display
> stuff: exception messages, time and date and money formats, etc., are
> all controlled by locale.
>
>   
>> 3.  I think its allowed for identifiers like database names,
>> table and column names, to have non ascii in them, if proper
>> quoting is used when referring to them  ?
>>
>>     
>
> Yes, that's right.
>
>   
>> Thanks - Ken
>>
>>
>> David Van Couvering wrote:
>>
>>     
>>> Hi, all.  I am getting some questions from Ken Frank NetBeans
>>> internationalization quality team about Java DB and character set
>>> encodings.  Rather than try and play go-between, I'm including him
>>> here so he can directly ask any follow-on questions.
>>>
>>> Ken would like to understand how Derby makes use of character
>>> encodings, and how it is affected by  various settings.  How does
>>> Derby handle things if the encoding is set to something different from
>>> our default of UTF-8?  Are we impacted, or do we rely on Java routines
>>> such as the Collator and Comparator class to handle this?
>>>
>>> Sorry if I'm talking out my ear, i18n is not one of my fortes.
>>>
>>> Thanks,
>>>
>>> David
>>>
>>>
>>>       

-- 
========================================
if your reply to this mail bounces,
and reply was sent to kenf@<somemachinename>,
then please reply to ken.frank@sun.com  instead
===========================================

Re: Derby and character set encodings

Posted by Ken Frank <Ke...@Sun.COM>.

its the correct Andrey; he works with me on i18n;
but also thanks for sending to Andrei also.

Ken


David Van Couvering wrote:

>I think this was actually meant to go to a different Andrei (sorry Andrey)
>
>On 9/6/07, David Van Couvering <da...@vancouvering.com> wrote:
>  
>
>>I think I can actually answer some of these questions :)
>>
>>On 9/6/07, Ken Frank <Ke...@sun.com> wrote:
>>    
>>
>>>Thanks David for sending this.
>>>
>>>Let me note a few questions:
>>>
>>>1.  when one creates a new database,
>>>is the database created with a certain encoding that will be used ?
>>>
>>>And if so, is that encoding that of the locale I am in when I run
>>>the create database commands or is it utf-8 always ?
>>>(for example, for one of the Japanese locales of Solaris, the encoding of it
>>>is euc-jp)
>>>
>>>or could it be that of the encoding of the locale the actual dbase server
>>>is started in ?  (which might be java's view of the users locale/encoding
>>>which would be I think the same as the OS locale user is in)
>>>
>>>I saw this from derby docs:
>>>"To support users in many different languages, Derby's SQL parser
>>>understands all Unicode characters and allows any Unicode character or
>>>number to be used in an identifier."
>>>
>>>but I don't know if it means that there is no concept of an encoding
>>>for a database itself or not.
>>>
>>>I think with Oracle for example, there is an argument to create database
>>>that lets one specify the encoding of it.
>>>
>>>      
>>>
>>This question stumps me, I'll leave it to others...
>>
>>    
>>
>>>2.  The locale the user is in when starting derby server -
>>>what things are affected by that - ie encoding of dbase, messages to
>>>user (if translated), time, date, etc ?
>>>(vs user needing to set separate variables or properties)
>>>
>>>      
>>>
>>I don't know what "encoding of the dbase" means, but the other display
>>stuff: exception messages, time and date and money formats, etc., are
>>all controlled by locale.
>>
>>    
>>
>>>3.  I think its allowed for identifiers like database names,
>>>table and column names, to have non ascii in them, if proper
>>>quoting is used when referring to them  ?
>>>
>>>      
>>>
>>Yes, that's right.
>>
>>    
>>
>>>Thanks - Ken
>>>
>>>
>>>David Van Couvering wrote:
>>>
>>>      
>>>
>>>>Hi, all.  I am getting some questions from Ken Frank NetBeans
>>>>internationalization quality team about Java DB and character set
>>>>encodings.  Rather than try and play go-between, I'm including him
>>>>here so he can directly ask any follow-on questions.
>>>>
>>>>Ken would like to understand how Derby makes use of character
>>>>encodings, and how it is affected by  various settings.  How does
>>>>Derby handle things if the encoding is set to something different from
>>>>our default of UTF-8?  Are we impacted, or do we rely on Java routines
>>>>such as the Collator and Comparator class to handle this?
>>>>
>>>>Sorry if I'm talking out my ear, i18n is not one of my fortes.
>>>>
>>>>Thanks,
>>>>
>>>>David
>>>>
>>>>
>>>>        
>>>>

Re: Derby and character set encodings

Posted by David Van Couvering <da...@vancouvering.com>.

I think this was actually meant to go to a different Andrei (sorry Andrey)

On 9/6/07, David Van Couvering <da...@vancouvering.com> wrote:
> I think I can actually answer some of these questions :)
>
> On 9/6/07, Ken Frank <Ke...@sun.com> wrote:
> > Thanks David for sending this.
> >
> > Let me note a few questions:
> >
> > 1.  when one creates a new database,
> > is the database created with a certain encoding that will be used ?
> >
> > And if so, is that encoding that of the locale I am in when I run
> > the create database commands or is it utf-8 always ?
> > (for example, for one of the Japanese locales of Solaris, the encoding of it
> > is euc-jp)
> >
> > or could it be that of the encoding of the locale the actual dbase server
> > is started in ?  (which might be java's view of the users locale/encoding
> > which would be I think the same as the OS locale user is in)
> >
> > I saw this from derby docs:
> > "To support users in many different languages, Derby's SQL parser
> > understands all Unicode characters and allows any Unicode character or
> > number to be used in an identifier."
> >
> > but I don't know if it means that there is no concept of an encoding
> > for a database itself or not.
> >
> > I think with Oracle for example, there is an argument to create database
> > that lets one specify the encoding of it.
> >
>
> This question stumps me, I'll leave it to others...
>
> >
> >
> > 2.  The locale the user is in when starting derby server -
> > what things are affected by that - ie encoding of dbase, messages to
> > user (if translated), time, date, etc ?
> > (vs user needing to set separate variables or properties)
> >
>
> I don't know what "encoding of the dbase" means, but the other display
> stuff: exception messages, time and date and money formats, etc., are
> all controlled by locale.
>
> > 3.  I think its allowed for identifiers like database names,
> > table and column names, to have non ascii in them, if proper
> > quoting is used when referring to them  ?
> >
>
> Yes, that's right.
>
> >
> > Thanks - Ken
> >
> >
> > David Van Couvering wrote:
> >
> > >Hi, all.  I am getting some questions from Ken Frank NetBeans
> > >internationalization quality team about Java DB and character set
> > >encodings.  Rather than try and play go-between, I'm including him
> > >here so he can directly ask any follow-on questions.
> > >
> > >Ken would like to understand how Derby makes use of character
> > >encodings, and how it is affected by  various settings.  How does
> > >Derby handle things if the encoding is set to something different from
> > >our default of UTF-8?  Are we impacted, or do we rely on Java routines
> > >such as the Collator and Comparator class to handle this?
> > >
> > >Sorry if I'm talking out my ear, i18n is not one of my fortes.
> > >
> > >Thanks,
> > >
> > >David
> > >
> > >
> >
>

Re: Derby and character set encodings

Posted by David Van Couvering <da...@vancouvering.com>.

I think I can actually answer some of these questions :)

On 9/6/07, Ken Frank <Ke...@sun.com> wrote:
> Thanks David for sending this.
>
> Let me note a few questions:
>
> 1.  when one creates a new database,
> is the database created with a certain encoding that will be used ?
>
> And if so, is that encoding that of the locale I am in when I run
> the create database commands or is it utf-8 always ?
> (for example, for one of the Japanese locales of Solaris, the encoding of it
> is euc-jp)
>
> or could it be that of the encoding of the locale the actual dbase server
> is started in ?  (which might be java's view of the users locale/encoding
> which would be I think the same as the OS locale user is in)
>
> I saw this from derby docs:
> "To support users in many different languages, Derby's SQL parser
> understands all Unicode characters and allows any Unicode character or
> number to be used in an identifier."
>
> but I don't know if it means that there is no concept of an encoding
> for a database itself or not.
>
> I think with Oracle for example, there is an argument to create database
> that lets one specify the encoding of it.
>

This question stumps me, I'll leave it to others...

>
>
> 2.  The locale the user is in when starting derby server -
> what things are affected by that - ie encoding of dbase, messages to
> user (if translated), time, date, etc ?
> (vs user needing to set separate variables or properties)
>

I don't know what "encoding of the dbase" means, but the other display
stuff: exception messages, time and date and money formats, etc., are
all controlled by locale.

> 3.  I think its allowed for identifiers like database names,
> table and column names, to have non ascii in them, if proper
> quoting is used when referring to them  ?
>

Yes, that's right.

>
> Thanks - Ken
>
>
> David Van Couvering wrote:
>
> >Hi, all.  I am getting some questions from Ken Frank NetBeans
> >internationalization quality team about Java DB and character set
> >encodings.  Rather than try and play go-between, I'm including him
> >here so he can directly ask any follow-on questions.
> >
> >Ken would like to understand how Derby makes use of character
> >encodings, and how it is affected by  various settings.  How does
> >Derby handle things if the encoding is set to something different from
> >our default of UTF-8?  Are we impacted, or do we rely on Java routines
> >such as the Collator and Comparator class to handle this?
> >
> >Sorry if I'm talking out my ear, i18n is not one of my fortes.
> >
> >Thanks,
> >
> >David
> >
> >
>

Re: Derby and character set encodings

Posted by Ken Frank <Ke...@Sun.COM>.

Thanks David for sending this.

Let me note a few questions:

1.  when one creates a new database,
is the database created with a certain encoding that will be used ?

And if so, is that encoding that of the locale I am in when I run
the create database commands or is it utf-8 always ?
(for example, for one of the Japanese locales of Solaris, the encoding of it
is euc-jp)

or could it be that of the encoding of the locale the actual dbase server
is started in ?  (which might be java's view of the users locale/encoding
which would be I think the same as the OS locale user is in)

I saw this from derby docs:
"To support users in many different languages, Derby's SQL parser 
understands all Unicode characters and allows any Unicode character or 
number to be used in an identifier."

but I don't know if it means that there is no concept of an encoding
for a database itself or not.

I think with Oracle for example, there is an argument to create database
that lets one specify the encoding of it.

2.  The locale the user is in when starting derby server -
what things are affected by that - ie encoding of dbase, messages to
user (if translated), time, date, etc ?
(vs user needing to set separate variables or properties)

3.  I think its allowed for identifiers like database names,
table and column names, to have non ascii in them, if proper
quoting is used when referring to them  ?

Thanks - Ken

David Van Couvering wrote:

>Hi, all.  I am getting some questions from Ken Frank NetBeans
>internationalization quality team about Java DB and character set
>encodings.  Rather than try and play go-between, I'm including him
>here so he can directly ask any follow-on questions.
>
>Ken would like to understand how Derby makes use of character
>encodings, and how it is affected by  various settings.  How does
>Derby handle things if the encoding is set to something different from
>our default of UTF-8?  Are we impacted, or do we rely on Java routines
>such as the Collator and Comparator class to handle this?
>
>Sorry if I'm talking out my ear, i18n is not one of my fortes.
>
>Thanks,
>
>David
>  
>