You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Volodymyr Vysotskyi (JIRA)" <ji...@apache.org> on 2017/07/13 13:43:00 UTC

[jira] [Comment Edited] (DRILL-4264) Dots in identifier are not escaped correctly

    [ https://issues.apache.org/jira/browse/DRILL-4264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085700#comment-16085700 ] 

Volodymyr Vysotskyi edited comment on DRILL-4264 at 7/13/17 1:42 PM:
---------------------------------------------------------------------

Currently Drill has inconsistent behaviour when querying the file with quotes. Query
{code:sql}
select * from test_table t
{code}
fails, but query 
{code:sql}
select `rk.q` as `rk.q` from test_table t
{code}
returns correct result for the file
{noformat}
{"rk.q": "a", "m": {"a.b":"1", "a":{"b":"2"}, "c":"3"}}
{noformat}
The difference between these two cases is that for the second case filed reference was created using the method {{FieldReference.getWithQuotedRef(field.getName())}} which does not [check|https://github.com/apache/drill/blob/90f43bff7a01eaaee6c8861137759b05367dfcf3/logical/src/main/java/org/apache/drill/common/expression/FieldReference.java#L54] the field name. In the first case constructor with check was [used|https://github.com/apache/drill/blob/416ec70a616e8d12b5c7fca809763b977d2f7aad/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectRecordBatch.java#L360].

Nested field may be selected by few ways:
{{t.m.c}} or {{t.m\['c'\]}}.
Without checking the field name, query
{code:sql}
select t.m.`a.b`, t.m.a.b, t.m['a.b'] from test_table t
{code}
returns correct result.
Mysql, for example, also allows quoted field with dots.

Preferred solution is to remove the check for field with dots.
But user may forget to add quotes for the field with dots, so query may return result that does not expected by user.

Other solution is to add session option that allows to use fields with dots and depending on this option check the field or not. By default the value of this option will be disabled (the same behaviour as now). So user will be responsible for the queries with forgotten quotes.


was (Author: vvysotskyi):
Currently Drill has inconsistent behaviour when querying the file with quotes. Query
{code:sql}
select * from test_table t
{code}
fails, but query 
{code:sql}
select `rk.q` as `rk.q` from test_table t
{code}
returns correct result for the file
{noformat}
{"rk.q": "a", "m": {"a.b":"1", "a":{"b":"2"}, "c":"3"}}
{noformat}
The difference between these two cases is that for the second case filed reference was created using the method {{FieldReference.getWithQuotedRef(field.getName())}} which does not [check|https://github.com/apache/drill/blob/90f43bff7a01eaaee6c8861137759b05367dfcf3/logical/src/main/java/org/apache/drill/common/expression/FieldReference.java#L54] the field name. In the first case constructor with check was [used|https://github.com/apache/drill/blob/416ec70a616e8d12b5c7fca809763b977d2f7aad/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectRecordBatch.java#L360].

Nested field may be selected by few ways:
{{t.m.c}} or {{t.m\['c'\]}}.
Without checking the field name, query
{code:sql}
select t.m.`a.b`, t.m.a.b, t.m['a.b'] from test_table t
{code}
returns correct result.
Mysql, for example, also allows quoted field with dots.

Preferred solution is to remove the check for field with dots.
But user may forget to add quotes for the field with dots, so query may return result that does not expected by user.

Other solution is to add session option that allows to use fields with dots and depending on this option check the field or not. So user will be responsible for the queries with forgotten quotes.

> Dots in identifier are not escaped correctly
> --------------------------------------------
>
>                 Key: DRILL-4264
>                 URL: https://issues.apache.org/jira/browse/DRILL-4264
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Codegen
>            Reporter: Alex
>            Assignee: Volodymyr Vysotskyi
>
> If you have some json data like this...
> {code:javascript}
>     {
>       "0.0.1":{
>         "version":"0.0.1",
>         "date_created":"2014-03-15"
>       },
>       "0.1.2":{
>         "version":"0.1.2",
>         "date_created":"2014-05-21"
>       }
>     }
> {code}
> ... there is no way to select any of the rows since their identifiers contain dots and when trying to select them, Drill throws the following error:
> Error: SYSTEM ERROR: UnsupportedOperationException: Unhandled field reference "0.0.1"; a field reference identifier must not have the form of a qualified name
> This must be fixed since there are many json data files containing dots in some of the keys (e.g. when specifying version numbers etc)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)