You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-commits@db.apache.org by Apache Wiki <wi...@apache.org> on 2006/06/30 17:38:32 UTC

[Db-derby Wiki] Update of "LanguageBasedOrdering" by KatheyMarsden

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Db-derby Wiki" for change notification.

The following page has been changed by KatheyMarsden:
http://wiki.apache.org/db-derby/LanguageBasedOrdering

New page:
This page describes how to implement a function to perform language based ordering.  It is largely copied from  the email suggestion by Daniel Debrunner [http://www.nabble.com/Language-based-ordering-is-possible-in-Derby-tf482505.html#a1313448 Language based ordering is possible in Derby] but incorporates suggestions from Oyvind to add language and country parameters and adds a function to sort by the JVM default locale.


== Why do we need a Language based ordering function? ==
By default Derby performs Unicode character comparison for ordering.  This is not suitable for many languages. A Derby function can be used to specify the correct Language for ordering.


== The Start of a Solution ==

The example is not a complete solution but a place to start.
Please fix up the example here on the Wiki as you make it better.

The trick is to write an user defined function that takes a string  and returns a value that is correct for ordering on. Say we have a function LOCALE_ORDER that takes language, country and value, then you can write:

{{{

SELECT ID, NAME FROM CUSTOMER ORDER BY LOCALE_ORDER('pl','PL',name)

}}}


Java provides all the functionality required, so it's really just a
few lines of Java code.

The code uses java.text.Collator to obtain a byte array that represents
the collation key of a string and then Derby sorts on that byte array, as
a VARCHAR FOR BIT DATA.

It needs some more investigation to ensure Derby orders the returned
byte arrays correctly, particularly if one byte array is larger than the other, when up to that point they have the same contents. Also more checking is needed to see if the Derby ordering matches the expected language collation.

Another issue is the length of the returned byte array. We would need to
see how long it can be and what would happen it it returned a value
longer than the declared type of the SQL function.

An alternative would be to change the Java method
to return a String representation of the byte array and sort on that.

One general issue with this approach is that indexes won't be used. [http://issues.apache.org/jira/browse/DERBY-455 DERBY-455] would help here.

== Example SQL Functions ==
There are two example functions for lanuguage based ordering.

"LOCALE_ORDER" takes a language, a country and a value parameter and returns a collation key. The locale can be specified in the query.

Syntax:

LOCALE_ORDER(LanguageCode, CountryCode, value)

LanguageCode  - lowercase two-letter ISO-639 code
CountryCode   - uppercase two-letter ISO-3166 code
value         - value to order on, typically a column name

Example:

{{{
SELECT ID, NAME FROM CUSTOMER ORDER BY LOCALE_ORDER('pl','PL',name)
}}}

JVM_DEFAULT_LOCALE_ORDER takes only a value and will sort based on the JVM default locale.

Syntax:

JVM_DEFAULT_LOCALE_ORDER(value)

value         - value to order on, typically a column name


Example:

{{{
SELECT ID, NAME FROM CUSTOMER ORDER BY JVM_DEFAULT_LOCALE_ORDER(name)
}}}



To create the functions run the script below in ij
{{{
CONNECT 'jdbc:derby:testdb;create=true';

CREATE FUNCTION LOCALE_ORDER(
LANGUAGE_CODE VARCHAR(2),
COUNTRY_CODE VARCHAR(2),
VALUE VARCHAR(128))
RETURNS VARCHAR(255) FOR BIT DATA
LANGUAGE JAVA
PARAMETER STYLE JAVA
NO SQL
RETURNS NULL ON NULL INPUT
EXTERNAL NAME 'derbytest.LocaleOrder.localeOrder';

CREATE FUNCTION JVM_DEFAULT_LOCALE_ORDER(
VALUE VARCHAR(128))
RETURNS VARCHAR(255) FOR BIT DATA
LANGUAGE JAVA
PARAMETER STYLE JAVA
NO SQL
RETURNS NULL ON NULL INPUT
EXTERNAL NAME 'derbytest.LocaleOrder.defaultLocaleOrder';

}}}


== Java code for function implementation ==

{{{
package derbytest;

import java.text.CollationKey;
import java.text.Collator;
import java.util.Locale;

public class LocaleOrder {


    /**
     *  Function that can be used by Derby for language based ordering.
     *  Uses java.text.Collator to obtain a byte array that represents
     *  the collation key of a string and then Derby sorts on that byte array.
     *
     * RESOLVE: These Comments from Dan who submitted the original code/idea
     *          to the Derby list
     *  - What will happen if one byte array is larger than the other,
     *    when up to that point they have the same
     *  - Test that Derby ordering matches the expected language collation.
     *  - An alternative is to return a String representation of the byte
     *   array and sort on that.
     *
     * @param language lowercase two-letter ISO-639 code.
     * @param country uppercase two-letter ISO-3166 code.
     * @param value Value for which collationKey is needed for ordering
     * @return collationKey for value
     *
     */
    public static byte[] localeOrder(String language, String country,
        			String value)
        {

    		Locale locale  = new Locale(language, country);
        	Collator collator = Collator.getInstance(locale);

                CollationKey key = collator.getCollationKey(value);

                byte[] rawData = key.toByteArray();

                return rawData;
        }

    /**
     * Function used for order by the JVM default Locale
     *
     * @param value Value for which collation key is needed for ordering
     * @return collation key for value based on default Locale
     * @see #localeOrder(String language, String country, String value)
     */
    public static byte[] defaultLocaleOrder(String value)
    {
    	Locale locale  = Locale.getDefault();
    	return localeOrder(locale.getLanguage(), locale.getCountry(), value);
    }
}

}}}


== Test Code ==


{{{
package derbytest;

import java.sql.*;
import java.util.Locale;


import org.apache.derby.jdbc.EmbeddedDriver;

public class Test {

        private static final String[] NAMES =
        {
                // Just Smith, Zebra, Acorn with alternate A,S and Z
                "Smith",
                "Zebra",
                "\u0104corn",
                "\u017Bebra",
                "Acorn",
                "\u015Amith",
                "aacorn",
        };

        public static void main(String[] args) throws SQLException
        {
                new EmbeddedDriver();
                Connection conn = DriverManager.getConnection("jdbc:derby:testdb");

                Statement s = conn.createStatement();

                try {
                s.execute("CREATE TABLE CUSTOMER(ID INT, NAME VARCHAR(40))");

                conn.setAutoCommit(false);
                PreparedStatement ps = conn.prepareStatement("INSERT INTO CUSTOMER VALUES(?,?)");

                for (int i = 0; i < NAMES.length; i++)
                {
                        ps.setInt(1, i);
                        ps.setString(2, NAMES[i]);
                        ps.executeUpdate();
                }
                conn.commit();
                ps.close();
                } catch (SQLException sqle)
                {

                }

                System.out.println("DEFAULT Unicode character set ordering");
                query(s, "SELECT ID, NAME FROM CUSTOMER ORDER BY NAME");

                System.out.println("Polish ordering");
                query(s, "SELECT ID, NAME FROM CUSTOMER ORDER BY LOCALE_ORDER('pl','PL',NAME)");

                System.out.println("JVM Default Locale ordering for wherever you are:" +
                		Locale.getDefault());
                query(s, "SELECT ID, NAME FROM CUSTOMER ORDER BY JVM_DEFAULT_LOCALE_ORDER(NAME)");

                System.out.println("Norwegian ordering by forcing the Norwegian default locale");
                Locale.setDefault(new Locale("no","NO"));

                query(s, "SELECT ID, NAME FROM CUSTOMER ORDER BY JVM_DEFAULT_LOCALE_ORDER(NAME)");
        }

        private static void query(Statement s, String sql) throws SQLException
        {
                System.out.println(sql);
                ResultSet rs = s.executeQuery(sql);
                while (rs.next())
                {
                        System.out.print("  ");
                        System.out.print(rs.getString(1));
                        System.out.print(" , ");
                        System.out.println(rs.getString(2));
                }
                rs.close();
        }
}
}}}


== Test.java  Output ==

The test output shows the various language based orderings.  For example aacorn sorts to the bottom for Norwegian.

{{{
$ java derbytest.Test
DEFAULT Unicode character set ordering
SELECT ID, NAME FROM CUSTOMER ORDER BY NAME
  4 , Acorn
  0 , Smith
  1 , Zebra
  6 , aacorn
  2 , ?corn
  5 , ?mith
  3 , ?ebra
Polish ordering
SELECT ID, NAME FROM CUSTOMER ORDER BY LOCALE_ORDER('pl','PL',NAME)
  6 , aacorn
  4 , Acorn
  2 , ?corn
  0 , Smith
  5 , ?mith
  1 , Zebra
  3 , ?ebra
JVM Default Locale ordering for wherever you are:en_US
SELECT ID, NAME FROM CUSTOMER ORDER BY JVM_DEFAULT_LOCALE_ORDER(NAME)
  6 , aacorn
  4 , Acorn
  2 , ?corn
  0 , Smith
  5 , ?mith
  1 , Zebra
  3 , ?ebra
Norwegian ordering by forcing the Norwegian default locale
SELECT ID, NAME FROM CUSTOMER ORDER BY JVM_DEFAULT_LOCALE_ORDER(NAME)
  4 , Acorn
  2 , ?corn
  0 , Smith
  5 , ?mith
  1 , Zebra
  3 , ?ebra
  6 , aacorn
}}}


== Related Pages ==

["DerbySQLroutines"]