You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by Kathey Marsden <km...@sbcglobal.net> on 2007/07/20 20:15:52 UTC
Single character does not match high value unicode character with
collation TERRITORY_BASED. Is this a bug
With TERRITORY_BASED collation '_' does not match the character
\uFA2D. It is the same for english or norwegian. FOR collation
UCS_BASIC it matches fine. Could you tell me if this is a bug?
Here is a program to reproduce.
Kathey
import java.sql.*;
public class HighCharacter {
public static void main(String args[]) throws Exception
{
System.out.println("\n Territory no_NO");
Class.forName("org.apache.derby.jdbc.EmbeddedDriver");
Connection conn =
DriverManager.getConnection("jdbc:derby:nordb;create=true;territory=no_NO;collation=TERRITORY_BASED");
testLikeWithHighestValidCharacter(conn);
conn.close();
System.out.println("\n Territory en_US");
conn =
DriverManager.getConnection("jdbc:derby:endb;create=true;territory=en_US;collation=TERRITORY_BASED");
testLikeWithHighestValidCharacter(conn);
conn.close();
System.out.println("\n Collation USC_BASIC");
conn = DriverManager.getConnection("jdbc:derby:basicdb;create=true");
testLikeWithHighestValidCharacter(conn);
}
public static void testLikeWithHighestValidCharacter(Connection conn)
throws SQLException {
Statement stmt = conn.createStatement();
try {
stmt.executeUpdate("drop table t1");
}catch (SQLException se)
{// drop failure ok.
}
stmt.executeUpdate("create table t1(c11 int)");
stmt.executeUpdate("insert into t1 values 1");
// \uFA2D - the highest valid character according to
// Character.isDefined() of JDK 1.4;
PreparedStatement ps =
conn.prepareStatement("select 1 from t1 where '\uFA2D' like ?");
String[] match = { "%", "_", "\uFA2D" };
for (int i = 0; i < match.length; i++) {
System.out.println("select 1 from t1 where '\\uFA2D' like " + match[i]);
ps.setString(1, match[i]);
ResultSet rs = ps.executeQuery();
if( rs.next() && rs.getString(1).equals("1"))
System.out.println("PASS");
else
System.out.println("FAIL: no match");
rs.close();
}
}
}
Re: Single character does not match high value unicode character
with collation TERRITORY_BASED. Is this a bug
Posted by Daniel John Debrunner <dj...@apache.org>.
Mamta Satoor wrote:
> The method above uses the passed RuleBasedCollator to find the collation
> element for '_'. For our specific example, in Norwegian, '_' translates
> into only one collation element (vs 2 elements for '\uFA2D'). When
> looking for '_', we eliminate only 1 collation element from the array
> created for '\uFA2D' because '_' got translated into 1 collation
> element.
That in itself looks like a bug. _ means match any single character, at
no point should the code be translating _ into a collation element. The
use of _ as a 'any character' has no relationship to the collation value
for the character underscore.
Dan.
Re: Single character does not match high value unicode character with collation TERRITORY_BASED. Is this a bug
Posted by Mamta Satoor <ms...@gmail.com>.
Hi Kathey,
I debugged the code below and it looks like _ not matching \uFA2D might be a
bug.
The actual code for comparison happens in the existing code that was left
over for National character types. In SQLChar and in the newly introduced
classes for collation, there are two methods
public BooleanDataValue like(DataValueDescriptor pattern)
public BooleanDataValue like(DataValueDescriptor pattern,DataValueDescriptor
escape) throws StandardException
In SQLChar, we check if we are dealing with national character types and if
so, we do special code for it's like implementation. The same special code
gets used for collation related classes like CollatorSQLChar.
The special processing involves getting the collation elements using the
RuleBasedCollator for the character string. The collation elements for a
string are obtained using RuleBasedCollator.getCollationElementIterator(
characterString.getString()). Taking specific example of Norwegian, '\uFA2D'
converts into 2 (and not 1 and this is the cause of the problem) collation
elements. These collation elements are passed as in int array to following
method in iapi.types.Like class
public static Boolean like(int[] value, int valueLength, int[] pattern, int
patternLength, RuleBasedCollator collator)
The method above uses the passed RuleBasedCollator to find the collation
element for '_'. For our specific example, in Norwegian, '_' translates into
only one collation element (vs 2 elements for '\uFA2D'). When looking for
'_', we eliminate only 1 collation element from the array created for
'\uFA2D' because '_' got translated into 1 collation element. Following is
the code copied from Like.like
else if (matchSpecial(pat, pLoc, pEnd, anyCharInts))
{
// regardless of the char, it matches
vLoc += anyCharInts.length;
pLoc += anyCharInts.length;
result = checkLengths(vLoc, vEnd, pLoc, pat, pEnd, anyStringInts);
if (result != null)
return result;
}
So, it seems that the code above can't assume that the collation elements
for all the characters in say Norwegian are 1 in length just because
collation element for '_' is 1 element.
I think we should go ahead and open a jira entry for this. Would like to
hear if anyone has any comments on this.
thanks,
Mamta
On 7/20/07, Kathey Marsden <km...@sbcglobal.net> wrote:
>
> With TERRITORY_BASED collation '_' does not match the character
> \uFA2D. It is the same for english or norwegian. FOR collation
> UCS_BASIC it matches fine. Could you tell me if this is a bug?
> Here is a program to reproduce.
>
>
> Kathey
>
>
> import java.sql.*;
>
> public class HighCharacter {
>
> public static void main(String args[]) throws Exception
> {
> System.out.println("\n Territory no_NO");
> Class.forName("org.apache.derby.jdbc.EmbeddedDriver");
> Connection conn =
> DriverManager.getConnection("jdbc:derby:nordb;create=true;territory=no_NO;collation=TERRITORY_BASED");
>
> testLikeWithHighestValidCharacter(conn);
> conn.close();
> System.out.println("\n Territory en_US");
> conn =
> DriverManager.getConnection("jdbc:derby:endb;create=true;territory=en_US;collation=TERRITORY_BASED");
>
> testLikeWithHighestValidCharacter(conn);
> conn.close();
> System.out.println("\n Collation USC_BASIC");
> conn = DriverManager.getConnection("jdbc:derby:basicdb;create=true");
> testLikeWithHighestValidCharacter(conn);
>
> }
>
>
> public static void testLikeWithHighestValidCharacter(Connection conn)
> throws SQLException {
> Statement stmt = conn.createStatement();
> try {
> stmt.executeUpdate("drop table t1");
> }catch (SQLException se)
> {// drop failure ok.
> }
> stmt.executeUpdate("create table t1(c11 int)");
> stmt.executeUpdate("insert into t1 values 1");
>
>
> // \uFA2D - the highest valid character according to
> // Character.isDefined() of JDK 1.4;
> PreparedStatement ps =
> conn.prepareStatement("select 1 from t1 where '\uFA2D' like ?");
>
> String[] match = { "%", "_", "\uFA2D" };
>
> for (int i = 0; i < match.length; i++) {
> System.out.println("select 1 from t1 where '\\uFA2D' like " +
> match[i]);
> ps.setString(1, match[i]);
> ResultSet rs = ps.executeQuery ();
> if( rs.next() && rs.getString(1).equals("1"))
> System.out.println("PASS");
> else
> System.out.println("FAIL: no match");
>
> rs.close();
> }
>
> }
> }
>
>
>
Re: Single character does not match high value unicode character with collation TERRITORY_BASED. Is this a bug
Posted by Mamta Satoor <ms...@gmail.com>.
Kathey, let me take a look at this.
thanks,
Mamta
On 7/20/07, Kathey Marsden <km...@sbcglobal.net> wrote:
>
> With TERRITORY_BASED collation '_' does not match the character
> \uFA2D. It is the same for english or norwegian. FOR collation
> UCS_BASIC it matches fine. Could you tell me if this is a bug?
> Here is a program to reproduce.
>
>
> Kathey
>
>
> import java.sql.*;
>
> public class HighCharacter {
>
> public static void main(String args[]) throws Exception
> {
> System.out.println("\n Territory no_NO");
> Class.forName("org.apache.derby.jdbc.EmbeddedDriver");
> Connection conn =
> DriverManager.getConnection
> ("jdbc:derby:nordb;create=true;territory=no_NO;collation=TERRITORY_BASED");
> testLikeWithHighestValidCharacter(conn);
> conn.close();
> System.out.println("\n Territory en_US");
> conn =
> DriverManager.getConnection
> ("jdbc:derby:endb;create=true;territory=en_US;collation=TERRITORY_BASED");
> testLikeWithHighestValidCharacter(conn);
> conn.close();
> System.out.println("\n Collation USC_BASIC");
> conn = DriverManager.getConnection("jdbc:derby:basicdb;create=true");
> testLikeWithHighestValidCharacter(conn);
>
> }
>
>
> public static void testLikeWithHighestValidCharacter(Connection conn)
> throws SQLException {
> Statement stmt = conn.createStatement();
> try {
> stmt.executeUpdate("drop table t1");
> }catch (SQLException se)
> {// drop failure ok.
> }
> stmt.executeUpdate("create table t1(c11 int)");
> stmt.executeUpdate("insert into t1 values 1");
>
>
> // \uFA2D - the highest valid character according to
> // Character.isDefined() of JDK 1.4;
> PreparedStatement ps =
> conn.prepareStatement("select 1 from t1 where '\uFA2D' like ?");
>
> String[] match = { "%", "_", "\uFA2D" };
>
> for (int i = 0; i < match.length; i++) {
> System.out.println("select 1 from t1 where '\\uFA2D' like " +
> match[i]);
> ps.setString(1, match[i]);
> ResultSet rs = ps.executeQuery();
> if( rs.next() && rs.getString(1).equals("1"))
> System.out.println("PASS");
> else
> System.out.println("FAIL: no match");
>
> rs.close();
> }
>
> }
> }
>
>
>