You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/10/05 06:01:27 UTC

[jira] [Commented] (PHOENIX-4237) Allow sorting on (Java) collation keys for non-English locales

    [ https://issues.apache.org/jira/browse/PHOENIX-4237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16192500#comment-16192500 ] 

ASF GitHub Bot commented on PHOENIX-4237:
-----------------------------------------

GitHub user shehzaadn opened a pull request:

    https://github.com/apache/phoenix/pull/275

    PHOENIX-4237: Add function to calculate Java collation keys

    Here we implement a generalized solution for calculating Java collation keys by creating Java collators based on a user locale. These collation keys can then be used in an ORDER BY clause to sort strings in a natural-language-appropriate way. We add a new Phoenix function COLLKEY. In general usage for this function will be:
    
    select name from my_table order by COLLKEY(name, 'zh_TW')
    
    We use artifacts from the ICU4J project and recently open-sourced grammaticus project (by Maven dependency). We were forced to include some code from ICU4J because some jars produced by that project aren't published in Maven. We also include code from Salesforce that has been licensed for open-source release but not yet published as artifacts in maven.
    
    There are three commits that split the changes into three logical pieces:
    
    1) f8cb121: Add the external source code described above
    2) fdbb5e0: Make changes needed to the Phoenix license due to the above (and fix to what seems to be an existing bug) 
    3) 98cfc10: The actual function implementation of COLLKEY - new code that uses the code introduced above and newly introduced dependencies via maven.
    
    Thanks in advance to the Phoenix community for your feedback on this.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/shehzaadn/phoenix master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/phoenix/pull/275.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #275
    
----
commit f8cb121145163591345eea70acbc313098e23e21
Author: Shehzaad <sn...@salesforce.com>
Date:   2017-09-30T01:52:46Z

    (1) add ICU4J source code for charset/localespi jars and (2) add Salesforce i18n-util source code

commit fdbb5e009a767e0f6df385dc9a1a8472b32cc361
Author: Shehzaad <sn...@salesforce.com>
Date:   2017-10-02T17:55:39Z

    (1) Fix text of 3-clause BSD License, (2) add Unicode license, (3) add mention of bundling ICU4J and i18n-util code

commit 98cfc10bac3c48ec3e7ceb47bea0b60556265c85
Author: Shehzaad <sn...@salesforce.com>
Date:   2017-10-02T21:58:31Z

    add function COLLKEY to Phoenix to calculate a Java collation key on a given string with the collator derived from an ISO locale code and some other parameters

----


> Allow sorting on (Java) collation keys for non-English locales
> --------------------------------------------------------------
>
>                 Key: PHOENIX-4237
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4237
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Shehzaad Nakhoda
>
> Strings stored via Phoenix can be composed from a subset of the entire set of Unicode characters. The natural sort order for strings for different languages often differs from the order dictated by the binary representation of the characters of these strings. Java provides the idea of a Collator which given an input string and a (language) locale can generate a Collation Key which can then be used to compare strings in that natural order.
> Salesforce has recently open-sourced grammaticus. IBM has open-sourced ICU4J some time ago. These technologies can be combined to provide a robust new Phoenix function that can be used in an ORDER BY clause to sort strings according to the user's locale.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)