You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-user@db.apache.org by Brett Wooldridge <br...@gmail.com> on 2011/04/13 03:21:08 UTC

Strange collation

I have perhaps a misunderstanding/mis-expectation of a database created with
collation TERRITORY_BASED:PRIMARY.

>From the manual:

"Territory based with collation strength PRIMARY. Specify this value to
make Derby behave similarly to many other databases, for which PRIMARY is
commonly the default. PRIMARY typically means that only differences in base
letters are considered significant, whereas differences in accents or case
are not considered significant."

So, I was _hoping_ that indeed Derby would behave similarly to many other
databases, but ...

Assume I create a database as follows:

   jdbc:derby:test;create=true;collation=TERRITORY_BASED:PRIMARY


And create a table as follows:


CREATE TABLE foo (

   id INTEGER GENERATED BY DEFAULT AS IDENTITY,

   column1 VARCHAR(255)

)


And insert a test row...


INSERT INTO foo (column1) VALUES ('test')


A "standard" query works as expected ...


SELECT * FROM foo WHERE column1 LIKE 'TE%'


... does indeed perform a case in-sensitive LIKE as desired, and returns
this row:


ID    COLUMN1

-------------------------------

1      test


So far, so good.  Now this query...


SELECT * FROM foo WHERE id = '1'   -- Note the quoted '1' rather than a raw
1


... returns this error:


Error: Comparisons between 'INTEGER' and 'CHAR (UCS_BASIC)' are not
supported. Types must be comparable. String types must also have matching
collation. If collation does not match, a possible solution is to cast
operands to force them to the default collation (e.g. SELECT tablename FROM
sys.systables WHERE CAST(tablename AS VARCHAR(128)) = 'T1')

SQLState:  42818

ErrorCode: -1


Unfortunately, this is VERY unlike other databases.  This is a trivial query
which I would expect to work.  If Derby knows the collation the the
database, why are casts not automatically performed?  I actually encountered
this error first with this query to list all tables:


SELECT T2.* FROM SYS.SYSTABLES systabs, TABLE
(SYSCS_DIAG.SPACE_TABLE(systabs.tablename)) AS T2

WHERE systabs.tabletype = 'T'

ORDER BY isindex, conglomeratename


Which returns a similar error whereby the only solution is to
CAST(systabs.tabletype AS CHAR).


My company is, or rather was, in the process of switching our product from
Derby with default (UCS_BASIC) collation, to TERRITORY_BASED:PRIMARY in
order to solve performance issues related to the inability to perform case
in-sensitive queries (because using a function like LCASE() will not use an
index).


Unfortunately, given the size of the code-base (several hundred thousand
lines), the fact that a myriad of these CAST related errors could be hiding
in nooks and crannies is forcing us to reconsider.  Even if we _could_ find
and correct all queries were something like an INTEGER is being compared to
a string, having to put CAST everywhere we might interact with system tables
seems gross.


Is there any solution to this?  Any connection mode, startup parameter, etc.
that we're missing?  Is there any reason Derby shouldn't perform these CAST
automatically?  No other database I've used (Oracle, SQL Server, MySQL,
PostgreSQL, ...) seem to have this problem.


Thanks,

Brett Wooldridge

Re: Strange collation

Posted by Rick Hillegas <ri...@oracle.com>.
Hi Brett,

Some comments inline...

On 4/12/11 6:21 PM, Brett Wooldridge wrote:
> I have perhaps a misunderstanding/mis-expectation of a database 
> created with collation TERRITORY_BASED:PRIMARY.
>
> From the manual:
>
> "Territory based with collation strength PRIMARY. Specify this value 
> to make Derby behave similarly to many other databases, for which 
> PRIMARY is commonly the default. PRIMARY typically means that only 
> differences in base letters are considered significant, whereas 
> differences in accents or case are not considered significant."
>
> So, I was _hoping_ that indeed Derby would behave similarly to many 
> other databases, but ...
>
> Assume I create a database as follows:
>
> jdbc:derby:test;create=true;collation=TERRITORY_BASED:PRIMARY
>
>
> And create a table as follows:
>
>
> CREATE TABLE foo (
>
>    id INTEGER GENERATED BY DEFAULT AS IDENTITY,
>
>    column1 VARCHAR(255)
>
> )
>
>
> And insert a test row...
>
>
> INSERT INTO foo (column1) VALUES ('test')
>
>
> A "standard" query works as expected ...
>
>
> SELECT * FROM foo WHERE column1 LIKE 'TE%'
>
>
> ... does indeed perform a case in-sensitive LIKE as desired, and 
> returns this row:
>
>
> ID    COLUMN1
>
> -------------------------------
>
> 1      test
>
>
> So far, so good.  Now this query...
>
>
> SELECT * FROM foo WHERE id = '1'   -- Note the quoted '1' rather than 
> a raw 1
>
>
> ... returns this error:
>
>
> Error: Comparisons between 'INTEGER' and 'CHAR (UCS_BASIC)' are not 
> supported. Types must be comparable. String types must also have 
> matching collation. If collation does not match, a possible solution 
> is to cast operands to force them to the default collation (e.g. 
> SELECT tablename FROM sys.systables WHERE CAST(tablename AS 
> VARCHAR(128)) = 'T1')
>
> SQLState:  42818
>
> ErrorCode: -1
>
This query also fails in vanilla databases which have no collation 
specified. This behavior is described in the Derby Reference Guide in 
the section titled "Data type assignments and comparison, sorting, and 
ordering".
>
>
> Unfortunately, this is VERY unlike other databases.  This is a trivial 
> query which I would expect to work.  If Derby knows the collation the 
> the database, why are casts not automatically performed?  I actually 
> encountered this error first with this query to list all tables:
>
>
> SELECT T2.* FROM SYS.SYSTABLES systabs, TABLE 
> (SYSCS_DIAG.SPACE_TABLE(systabs.tablename)) AS T2
>
> WHERE systabs.tabletype = 'T'
>
> ORDER BY isindex, conglomeratename
>
>
> Which returns a similar error whereby the only solution is to 
> CAST(systabs.tabletype AS CHAR).
>
>
> My company is, or rather was, in the process of switching our product 
> from Derby with default (UCS_BASIC) collation, to 
> TERRITORY_BASED:PRIMARY in order to solve performance issues related 
> to the inability to perform case in-sensitive queries (because using a 
> function like LCASE() will not use an index).
>
Another solution, but one which takes up more space, is to build an 
index on a generated column:

CREATE TABLE foo (

    id INTEGER GENERATED BY DEFAULT AS IDENTITY,

    column1 VARCHAR(255),
    column2 varchar(255) generated always as ( lower( column1 ) )

);
create index lfooidx on foo( column2, id );


select * from foo where column2 = 'abc';


Hope this helps,
-Rick
>
>
> Unfortunately, given the size of the code-base (several hundred 
> thousand lines), the fact that a myriad of these CAST related errors 
> could be hiding in nooks and crannies is forcing us to reconsider. 
>  Even if we _could_ find and correct all queries were something like 
> an INTEGER is being compared to a string, having to put CAST 
> everywhere we might interact with system tables seems gross.
>
>
> Is there any solution to this?  Any connection mode, startup 
> parameter, etc. that we're missing?  Is there any reason Derby 
> shouldn't perform these CAST automatically?  No other database I've 
> used (Oracle, SQL Server, MySQL, PostgreSQL, ...) seem to have this 
> problem.
>
>
> Thanks,
>
> Brett Wooldridge
>
>