You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Denys Ordynskiy (JIRA)" <ji...@apache.org> on 2019/08/08 13:33:00 UTC

[jira] [Commented] (DRILL-7322) Align cast boolean and schema boolean conversion

    [ https://issues.apache.org/jira/browse/DRILL-7322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902982#comment-16902982 ] 

Denys Ordynskiy commented on DRILL-7322:
----------------------------------------

I found 3 another data types that have differences between the Drill internal casting using schema and external casting by cast( as ) function.

*Float*
all_types.csvh
|float_col|varchar_float_col|
|340282356779733661637539395458142568447|340282356779733661637539395458142568447|
|-340282356779733661637539395458142568447|-340282356779733661637539395458142568447|
{noformat}
create schema (
  float_col float not null,
  varchar_float_col varchar not null
) for table dfs.tmp.`ischema/all_types`;
{noformat}
{noformat}
select
  float_col,
  sqlTypeOf(float_col) float_col_type,
  cast(varchar_float_col as float) casted_float,
  sqlTypeOf(cast(varchar_float_col as float)) casted_float_type,
  varchar_float_col
from dfs.tmp.`ischema/all_types`;
{noformat}
Result:
|float_col|float_col_type|casted_float|casted_float_type|varchar_float_col|
|{color:red}*Infinity*{color}|FLOAT|{color:red}*3.4028235E38*{color}|FLOAT|340282356779733661637539395458142568447|
|{color:red}*-Infinity*{color}|FLOAT|{color:red}*-3.4028235E38*{color}|FLOAT|-340282356779733661637539395458142568447|

------------------------------------------

*Date*
all_types.csvh
|date_col|varchar_date_col|
|1-1-1|1-1-1|
{noformat}
create schema (
  date_col date not null,
  varchar_date_col varchar not null
) for table dfs.tmp.`ischema/all_types`;
{noformat}
{noformat}
select
  date_col,
  sqlTypeOf(date_col) date_col_type,
  cast(varchar_date_col as date) casted_date_col,
  sqlTypeOf(cast(varchar_date_col as date)) casted_date_col_type,
  TO_DATE(varchar_date_col, 'yyyy-MM-dd') to_date_col,
  sqlTypeOf(TO_DATE(varchar_date_col, 'yyyy-MM-dd')) to_date_col_type,
  varchar_date_col
from dfs.tmp.`ischema/all_types`;
{noformat}
Result:
|date_col|date_col_type|casted_date_col|casted_date_col_type|to_date_col|to_date_col_type|varchar_date_col|
|{color:red}*0001-01-01*{color}|DATE|{color:red}*2001-01-01*{color}|DATE|{color:red}*0001-01-01*{color}|DATE|1-1-1|

------------------------------------------

*Time*
all_types.csvh
|time_col|varchar_time_col|
|01:01:001|01:01:001|
|01:001:001|01:001:001|
|001:001:001|001:001:001|
|1:1:1.|1:1:1.|
{noformat}
create schema (
  time_col time not null,
  varchar_time_col varchar not null
) for table dfs.tmp.`ischema/all_types`;
{noformat}
{noformat}
select
  varchar_time_col,
  time_col,
  sqlTypeOf(time_col) time_col_type,
  cast(varchar_time_col as time) casted_time_col,
  sqlTypeOf(cast(varchar_time_col as time)) casted_time_col_type
from dfs.tmp.`ischema/all_types`;
{noformat}
|varchar_time_col|time_col|time_col_type|casted_time_col|casted_time_col_type|
|01:01:001|{color:red}*ERROR*{color}|{color:red}*ERROR*{color}|01:01:01|TIME|
|01:001:001|{color:red}*ERROR*{color}|{color:red}*ERROR*{color}|01:01:01|TIME|
|001:001:001|{color:red}*ERROR*{color}|{color:red}*ERROR*{color}|01:01:01|TIME|
|1:1:1.|{color:red}*ERROR*{color}|{color:red}*ERROR*{color}|01:01:01|TIME|

'01:01:001'		-	Error: DATA_READ ERROR: Invalid format: "01:01:{color:red}*001*{color}" is malformed at "1"
'01:001:001'	-	Error: DATA_READ ERROR: Invalid format: "01:{color:red}*001:001*{color}" is malformed at "1:001"
'001:001:001'	-	Error: DATA_READ ERROR: Invalid format: "{color:red}*001:001:001*{color}" is malformed at "1:001:001"
'1:1:1.'		-	Error: DATA_READ ERROR: Invalid format: "{color:red}*1:1:1.*{color}" is malformed at "."

Schema file reading and to_time() function throws an error if there are > 2 digits in the time unit. But cast( as time) works fine with '001' time literals.

> Align cast boolean and schema boolean conversion
> ------------------------------------------------
>
>                 Key: DRILL-7322
>                 URL: https://issues.apache.org/jira/browse/DRILL-7322
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.16.0
>            Reporter: Denys Ordynskiy
>            Priority: Major
>
> Information schema file allows converting any string to the boolean data type.
> But "case(.. as boolean)" statement throws an error:
> {color:#d04437}UserRemoteException : 	SYSTEM ERROR: IllegalArgumentException: Invalid value for boolean: a
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: IllegalArgumentException: Invalid value for boolean: a{color}
> *Information Schema file should allow using the same range of boolean literals as cast statement.*
> *Steps to reproduce:*
> Upload text file all_types.csvh to the DFS /tmp/ischema/all_types:
> {noformat}
> boolean_col,boolean_col_for_cast
> true,true
> 1,1
> t,t
> y,y
> yes,yes
> on,on
> false,false
> 0,0
> f,f
> n,n
> no,no
> off,off
> a,a
> -,-
> !,!
> `,`
> 7,7
> @,@
> ^,^
> *,*
> {noformat}
> *Create schema:*
> {noformat}
> create schema (boolean_col boolean, boolean_col_for_cast varchar) for table dfs.tmp.`ischema/all_types`
> {noformat}
> *Run the query without cast:*
> select boolean_col, sqlTypeOf(boolean_col) boolean_col_type, boolean_col_for_cast, sqlTypeOf(boolean_col_for_cast) boolean_col_for_cast_type from dfs.tmp.`ischema/all_types`
> |boolean_col|boolean_col_type|boolean_col_for_cast|boolean_col_for_cast_type|
> |true|BOOLEAN|true|CHARACTER VARYING|
> |true|BOOLEAN|1|CHARACTER VARYING|
> |true|BOOLEAN|t|CHARACTER VARYING|
> |true|BOOLEAN|y|CHARACTER VARYING|
> |true|BOOLEAN|yes|CHARACTER VARYING|
> |true|BOOLEAN|on|CHARACTER VARYING|
> |false|BOOLEAN|false|CHARACTER VARYING|
> |false|BOOLEAN|0|CHARACTER VARYING|
> |false|BOOLEAN|f|CHARACTER VARYING|
> |false|BOOLEAN|n|CHARACTER VARYING|
> |false|BOOLEAN|no|CHARACTER VARYING|
> |false|BOOLEAN|off|CHARACTER VARYING|
> |false|BOOLEAN|a|CHARACTER VARYING|
> |false|BOOLEAN|-|CHARACTER VARYING|
> |false|BOOLEAN|!|CHARACTER VARYING|
> |false|BOOLEAN|`|CHARACTER VARYING|
> |false|BOOLEAN|7|CHARACTER VARYING|
> |false|BOOLEAN|@|CHARACTER VARYING|
> |false|BOOLEAN|^|CHARACTER VARYING|
> |false|BOOLEAN|*|CHARACTER VARYING|
> *Run the query with cast:*
> select boolean_col, sqlTypeOf(boolean_col) boolean_col_type, cast(boolean_col_for_cast as boolean) boolean_col_for_cast, sqlTypeOf(cast(boolean_col_for_cast as boolean)) boolean_col_for_cast_type from dfs.tmp.`ischema/all_types`
> {color:#d04437}UserRemoteException : 	SYSTEM ERROR: IllegalArgumentException: Invalid value for boolean: a
>     
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: IllegalArgumentException: Invalid value for boolean: *a*
> Fragment 0:0
> Please, refer to logs for more information.
> [Error Id: b9deab6f-7fd4-40c0-acdf-b2e31747e16f on cv1:31010]{color}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)