You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Stamatis Zampetakis (Jira)" <ji...@apache.org> on 2020/04/27 09:48:00 UTC

[jira] [Commented] (CALCITE-3951) Support different string comparison based on SqlCollation

    [ https://issues.apache.org/jira/browse/CALCITE-3951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17093270#comment-17093270 ] 

Stamatis Zampetakis commented on CALCITE-3951:
----------------------------------------------

Thanks for pushing this forward [~rubenql]. 

I am not sure if SqlCollation is the place to keep the comparison logic. 

Here is what the SQL standard says about comparisons of character strings.

*4.2.2 Comparison of character strings*

Two character strings are comparable if and only if either they have the same character set or there exists at
least one collation that is applicable to both their respective character sets (which is possible only if the character
sets share the same repertoire).

A collation is defined by [ISO14651] as “a process by which two strings are determined to be in exactly one
of the relationships of less than, greater than, or equal to one another”. Each collation known in an SQL-environment is applicable to one or more character sets, and for each character set, one or more collations are
applicable to it, one of which is associated with it as its character set collation.

Anything that has a declared type can, if that type is a character string type, be associated with a collation
applicable to its character set; this is known as a declared type collation. Every declared type that is a character
string type has a collation derivation, this being either none, implicit, or explicit. The collation derivation of a
declared type with a declared type collation that is explicitly or implicitly specified by a <data type> is implicit.
If the collation derivation of a declared type that has a declared type collation is not implicit, then it is explicit.
The collation derivation of an expression of character string type that has no declared type collation is none.

An operation that explicitly or implicitly involves character string comparison is a character comparison
operation. At least one of the operands of a character comparison operation shall have a declared type collation.

There may be an SQL-session collation for some or all of the character sets known to the SQL-implementation
(see Subclause 4.38, “SQL-sessions”).

The collation used for a particular character comparison is specified by Subclause 9.15, “Collation determination”.

The comparison of two character string expressions depends on the collation used for the comparison (see
Subclause 9.15, “Collation determination”). When values of unequal length are compared, if the collation for
the comparison has the NO PAD characteristic and the shorter value is equal to some prefix of the longer value,
then the shorter value is considered less than the longer value. If the collation for the comparison has the PAD
SPACE characteristic, for the purposes of the comparison, the shorter value is effectively extended to the length
of the longer by concatenation of <space>s on the right.

For every character set, there is at least one collation

> Support different string comparison based on SqlCollation
> ---------------------------------------------------------
>
>                 Key: CALCITE-3951
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3951
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>            Reporter: Ruben Q L
>            Assignee: Ruben Q L
>            Priority: Major
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently SqlCollation defines concepts like Coercibility, Charset, Locale, etc. However, we cannot specify on a certain collation that e.g. a string field should use case insensitive comparison. The goal of this ticket is to evolve SqlCollation to support that, and adapt the corresponding classes to use that (optional) "non-standard" comparison.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)