You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Vladimir Ozerov <pp...@gmail.com> on 2020/06/16 09:05:22 UTC

How to support case-insensitive identifier resolution?

Hi colleagues,

I am trying to implement case insensitive resolution of column/table/schema
names for our Apache Calcite integration in Hazelcast and got stuck. I hope
that the community might help me.

Consider that we have a table source that contains the following Java class:

class Employee {
    LocalDate birthDate
}

I would like to have an SQL engine that could resolve identifiers in
accordance with ANSI SQL standard, which basically says that unquoted
identifiers should be compared in a case-insensitive way, while quoted
identifiers should be compared in a case-sensitive way. Let's focus on
columns only for now:

SELECT birthDate FROM employee // OK
SELECT birthdate FROM employee // OK
SELECT BIRTHDATE FROM employee // OK
SELECT `birthDate` FROM employee // OK
SELECT `birthdate` FROM employee // Fail
SELECT `BIRTHDATE` FROM employee // Fail

That is, my source is a collection of Java objects, and the natural name of
the column is "birthDate". But I would like it to be accessible as
"birthDate", birthDate, BIRTHDate, etc.

My problem comes from the fact that casing configuration is applied during
parsing and by the time the table is asked for a column, the information
whether the user request was quoted or not is lost. Consider that I have a
table RelDataType["birthDate"]. Now consider what happens with different
combinations of casing configuration:
1) [unquoted=UNCHANGED, quoted=UNCHANGED]: "SELECT BIRTHDATE" doesn't work
obviously
2) [unquoted=UPPER, quoted=UNCHANGED]: "SELECT BIRTHDATE" doesn't work
again, because parser normalizes unqouted identifier to upper case, but
RelDataType has a column "birthDate"
3) Same as p.2, but with manual normalization of RelDataType
to RelDataType["BIRTHDATE"]: "SELECT BIRTHDATE" works now, but "SELECT
`birthDate`" don't!

Is there any built-in solution to the above problem?

Regards,
Vladimir.

Re: How to support case-insensitive identifier resolution?

Posted by Danny Chan <yu...@gmail.com>.
Thanks Viliam ~

Usually we use the quote character to avoid conflicts between the reserved keywords, i.e. DEFAULT is a reserved keyword in many vendors, if you want to use it as a column name or object name, then quoting is a choice.

The “create table …” example you gave seems problematic because the DDL and DML case-sensitivity should keep the same, thus, when you create the second table `MyTable`, the catalog show throw instead.

Best,
Danny Chan
在 2020年6月18日 +0800 PM10:02,Viliam Durina <vi...@hazelcast.com>,写道:
> > Does any SQL vendors behave like that ?
> I don't know of any vendor.
>
> > What do you mean by “sometimes"
> Quoted identifiers are sensitive and unquoted are not - that's what I meant
> that identifiers are "sometimes" case-sensitive.
>
> > SQL must have a deterministic semantic I think.
> You can define a deterministic rule to decide which object to match, but
> it's just weird: if you compare identifiers to object names in a
> case-insensitive way, then having `MyTable` and `MYTABLE` should not be
> allowed. This weird situation can happen:
> - you create `MYTABLE` object
> - you have a query `SELECT * FROM MyTable` in your application
> - later you create `MyTable` object. It succeeds because it has a distinct
> name
> - the query now reads from a different table, even though the original
> object wasn't changed, just a new, distinct one was created.
>
> My point is that the behavior originally requested probably isn't a good
> idea.
>
> Regards,
> Viliam
>
> On Thu, 18 Jun 2020 at 10:41, Danny Chan <yu...@gmail.com> wrote:
>
> > > when the identifier was
> > > quoted and case-insensitive, if it wasn’t
> >
> > Does any SQL vendors behave like that ? What do you mean by “sometimes”,
> > SQL must have a deterministic semantic I think.
> >
> > Best,
> > Danny Chan
> > 在 2020年6月18日 +0800 PM4:23,dev@calcite.apache.org,写道:
> > >
> > > when the identifier was
> > > quoted and case-insensitive, if it wasn't
> >
>
>
> --
> Viliam Durina
> Jet Developer
> hazelcast®
>
> <https://www.hazelcast.com> 2 W 5th Ave, Ste 300 | San Mateo, CA 94402 |
> USA
> +1 (650) 521-5453 | hazelcast.com <https://www.hazelcast.com>
>
> --
> This message contains confidential information and is intended only for the
> individuals named. If you are not the named addressee you should not
> disseminate, distribute or copy this e-mail. Please notify the sender
> immediately by e-mail if you have received this e-mail by mistake and
> delete this e-mail from your system. E-mail transmission cannot be
> guaranteed to be secure or error-free as information could be intercepted,
> corrupted, lost, destroyed, arrive late or incomplete, or contain viruses.
> The sender therefore does not accept liability for any errors or omissions
> in the contents of this message, which arise as a result of e-mail
> transmission. If verification is required, please request a hard-copy
> version. -Hazelcast

Re: How to support case-insensitive identifier resolution?

Posted by Viliam Durina <vi...@hazelcast.com>.
> Does any SQL vendors behave like that ?
I don't know of any vendor.

> What do you mean by “sometimes"
Quoted identifiers are sensitive and unquoted are not - that's what I meant
that identifiers are "sometimes" case-sensitive.

> SQL must have a deterministic semantic I think.
You can define a deterministic rule to decide which object to match, but
it's just weird: if you compare identifiers to object names in a
case-insensitive way, then having `MyTable` and `MYTABLE` should not be
allowed. This weird situation can happen:
- you create `MYTABLE` object
- you have a query `SELECT * FROM MyTable` in your application
- later you create `MyTable` object. It succeeds because it has a distinct
name
- the query now reads from a different table, even though the original
object wasn't changed, just a new, distinct one was created.

My point is that the behavior originally requested probably isn't a good
idea.

Regards,
Viliam

On Thu, 18 Jun 2020 at 10:41, Danny Chan <yu...@gmail.com> wrote:

> > when the identifier was
> > quoted and case-insensitive, if it wasn’t
>
> Does any SQL vendors behave like that ? What do you mean by “sometimes”,
> SQL must have a deterministic semantic I think.
>
> Best,
> Danny Chan
> 在 2020年6月18日 +0800 PM4:23,dev@calcite.apache.org,写道:
> >
> > when the identifier was
> > quoted and case-insensitive, if it wasn't
>


-- 
Viliam Durina
Jet Developer
      hazelcast®

  <https://www.hazelcast.com> 2 W 5th Ave, Ste 300 | San Mateo, CA 94402 |
USA
+1 (650) 521-5453 | hazelcast.com <https://www.hazelcast.com>

-- 
This message contains confidential information and is intended only for the 
individuals named. If you are not the named addressee you should not 
disseminate, distribute or copy this e-mail. Please notify the sender 
immediately by e-mail if you have received this e-mail by mistake and 
delete this e-mail from your system. E-mail transmission cannot be 
guaranteed to be secure or error-free as information could be intercepted, 
corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. 
The sender therefore does not accept liability for any errors or omissions 
in the contents of this message, which arise as a result of e-mail 
transmission. If verification is required, please request a hard-copy 
version. -Hazelcast

Re: How to support case-insensitive identifier resolution?

Posted by Danny Chan <yu...@gmail.com>.
> when the identifier was
> quoted and case-insensitive, if it wasn’t

Does any SQL vendors behave like that ? What do you mean by “sometimes”, SQL must have a deterministic semantic I think.

Best,
Danny Chan
在 2020年6月18日 +0800 PM4:23,dev@calcite.apache.org,写道:
>
> when the identifier was
> quoted and case-insensitive, if it wasn't

Re: How to support case-insensitive identifier resolution?

Posted by Viliam Durina <vi...@hazelcast.com>.
What we want is to have case-sensitive matching when the identifier was
quoted and case-insensitive, if it wasn't. But I guess this is not
possible. It also opens the question that if identifiers can sometimes be
case-sensitive, then two objects names of which differ only in case can
exist. And if a case-insensitive identifier is used, which one to choose?

Viliam

On Tue, 16 Jun 2020 at 11:18, Danny Chan <yu...@gmail.com> wrote:

> How did you use the Calcite parser and validator/converter?
>
> If you use the code directly, you should config the Lex[1] correctly for
> the SqlParser.Config in FrameworkConfig (i.e. make the caseSensitive false).
>
> The CalcteCatalogReader would then initialize from this FrameworkConfig
> and it would resolve the identifiers base on the config.
>
> If you use through the Avatica JDBC driver, config the parameters,
> something like this:
>
> jdbc:calcite:caseSensitive=false ...
>
> [1]
> https://github.com/apache/calcite/blob/69f25863f5f4197c17927a39a82cbf1cffd12b80/core/src/main/java/org/apache/calcite/config/Lex.java#L37
>
> Best,
> Danny Chan
> 在 2020年6月16日 +0800 PM5:05,Vladimir Ozerov <pp...@gmail.com>,写道:
> > Hi colleagues,
> >
> > I am trying to implement case insensitive resolution of
> column/table/schema
> > names for our Apache Calcite integration in Hazelcast and got stuck. I
> hope
> > that the community might help me.
> >
> > Consider that we have a table source that contains the following Java
> class:
> >
> > class Employee {
> > LocalDate birthDate
> > }
> >
> > I would like to have an SQL engine that could resolve identifiers in
> > accordance with ANSI SQL standard, which basically says that unquoted
> > identifiers should be compared in a case-insensitive way, while quoted
> > identifiers should be compared in a case-sensitive way. Let's focus on
> > columns only for now:
> >
> > SELECT birthDate FROM employee // OK
> > SELECT birthdate FROM employee // OK
> > SELECT BIRTHDATE FROM employee // OK
> > SELECT `birthDate` FROM employee // OK
> > SELECT `birthdate` FROM employee // Fail
> > SELECT `BIRTHDATE` FROM employee // Fail
> >
> > That is, my source is a collection of Java objects, and the natural name
> of
> > the column is "birthDate". But I would like it to be accessible as
> > "birthDate", birthDate, BIRTHDate, etc.
> >
> > My problem comes from the fact that casing configuration is applied
> during
> > parsing and by the time the table is asked for a column, the information
> > whether the user request was quoted or not is lost. Consider that I have
> a
> > table RelDataType["birthDate"]. Now consider what happens with different
> > combinations of casing configuration:
> > 1) [unquoted=UNCHANGED, quoted=UNCHANGED]: "SELECT BIRTHDATE" doesn't
> work
> > obviously
> > 2) [unquoted=UPPER, quoted=UNCHANGED]: "SELECT BIRTHDATE" doesn't work
> > again, because parser normalizes unqouted identifier to upper case, but
> > RelDataType has a column "birthDate"
> > 3) Same as p.2, but with manual normalization of RelDataType
> > to RelDataType["BIRTHDATE"]: "SELECT BIRTHDATE" works now, but "SELECT
> > `birthDate`" don't!
> >
> > Is there any built-in solution to the above problem?
> >
> > Regards,
> > Vladimir.
>


-- 
Viliam Durina
Jet Developer
      hazelcast®

  <https://www.hazelcast.com> 2 W 5th Ave, Ste 300 | San Mateo, CA 94402 |
USA
+1 (650) 521-5453 | hazelcast.com <https://www.hazelcast.com>

-- 
This message contains confidential information and is intended only for the 
individuals named. If you are not the named addressee you should not 
disseminate, distribute or copy this e-mail. Please notify the sender 
immediately by e-mail if you have received this e-mail by mistake and 
delete this e-mail from your system. E-mail transmission cannot be 
guaranteed to be secure or error-free as information could be intercepted, 
corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. 
The sender therefore does not accept liability for any errors or omissions 
in the contents of this message, which arise as a result of e-mail 
transmission. If verification is required, please request a hard-copy 
version. -Hazelcast

Re: How to support case-insensitive identifier resolution?

Posted by Danny Chan <yu...@gmail.com>.
How did you use the Calcite parser and validator/converter?

If you use the code directly, you should config the Lex[1] correctly for the SqlParser.Config in FrameworkConfig (i.e. make the caseSensitive false).

The CalcteCatalogReader would then initialize from this FrameworkConfig and it would resolve the identifiers base on the config.

If you use through the Avatica JDBC driver, config the parameters, something like this:

jdbc:calcite:caseSensitive=false ...

[1] https://github.com/apache/calcite/blob/69f25863f5f4197c17927a39a82cbf1cffd12b80/core/src/main/java/org/apache/calcite/config/Lex.java#L37

Best,
Danny Chan
在 2020年6月16日 +0800 PM5:05,Vladimir Ozerov <pp...@gmail.com>,写道:
> Hi colleagues,
>
> I am trying to implement case insensitive resolution of column/table/schema
> names for our Apache Calcite integration in Hazelcast and got stuck. I hope
> that the community might help me.
>
> Consider that we have a table source that contains the following Java class:
>
> class Employee {
> LocalDate birthDate
> }
>
> I would like to have an SQL engine that could resolve identifiers in
> accordance with ANSI SQL standard, which basically says that unquoted
> identifiers should be compared in a case-insensitive way, while quoted
> identifiers should be compared in a case-sensitive way. Let's focus on
> columns only for now:
>
> SELECT birthDate FROM employee // OK
> SELECT birthdate FROM employee // OK
> SELECT BIRTHDATE FROM employee // OK
> SELECT `birthDate` FROM employee // OK
> SELECT `birthdate` FROM employee // Fail
> SELECT `BIRTHDATE` FROM employee // Fail
>
> That is, my source is a collection of Java objects, and the natural name of
> the column is "birthDate". But I would like it to be accessible as
> "birthDate", birthDate, BIRTHDate, etc.
>
> My problem comes from the fact that casing configuration is applied during
> parsing and by the time the table is asked for a column, the information
> whether the user request was quoted or not is lost. Consider that I have a
> table RelDataType["birthDate"]. Now consider what happens with different
> combinations of casing configuration:
> 1) [unquoted=UNCHANGED, quoted=UNCHANGED]: "SELECT BIRTHDATE" doesn't work
> obviously
> 2) [unquoted=UPPER, quoted=UNCHANGED]: "SELECT BIRTHDATE" doesn't work
> again, because parser normalizes unqouted identifier to upper case, but
> RelDataType has a column "birthDate"
> 3) Same as p.2, but with manual normalization of RelDataType
> to RelDataType["BIRTHDATE"]: "SELECT BIRTHDATE" works now, but "SELECT
> `birthDate`" don't!
>
> Is there any built-in solution to the above problem?
>
> Regards,
> Vladimir.