You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "benj (Jira)" <ji...@apache.org> on 2019/11/15 16:06:00 UTC

[jira] [Commented] (DRILL-7375) composite/nested type map/array convert_to/cast to varchar

    [ https://issues.apache.org/jira/browse/DRILL-7375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16975198#comment-16975198 ] 

benj commented on DRILL-7375:
-----------------------------

Waiting for a possible official version of such a feature, it is possible to use an own UDF like:
{code:java}
package org.apache.drill.contrib.function;

import io.netty.buffer.DrillBuf;
import org.apache.drill.exec.expr.DrillSimpleFunc;
import org.apache.drill.exec.expr.annotations.FunctionTemplate;
import org.apache.drill.exec.expr.annotations.FunctionTemplate.FunctionScope;
import org.apache.drill.exec.expr.annotations.FunctionTemplate.NullHandling;
import org.apache.drill.exec.expr.annotations.Output;
import org.apache.drill.exec.expr.annotations.Param;
import org.apache.drill.exec.vector.complex.reader.FieldReader;
import org.apache.drill.exec.expr.holders.*;
import javax.inject.Inject;

public class ToString {
 
@FunctionTemplate(
    name = "to_string",
    scope = FunctionScope.SIMPLE,
    nulls = NullHandling.NULL_IF_NULL)
public static class NullableVarChar_Field_ToString implements DrillSimpleFunc {
  @Param FieldReader input;
  @Output VarCharHolder out;
  @Inject DrillBuf buffer;

  @Override public void setup() {
  }
  
  @Override public void eval() {
      String rowString = input.readObject().toString();

      buffer = buffer.reallocIfNeeded(rowString.length());
      buffer.setBytes(0, rowString.getBytes(), 0, rowString.length());
      out.start  = 0;
      out.end    = rowString.length();
      out.buffer = buffer;
  }
}
}
{code}

Example of use:
{code:sql}
apache drill> SELECT j, typeof(j) AS tj, to_string(j) AS strj, typeof(to_string(j)) AS tstrj 
FROM (SELECT convert_fromJSON('{a:["1","2","3"]}' ) j);
+---------------------+-----+---------------------+---------+
|          j          | tj  |        strj         |  tstrj  |
+---------------------+-----+---------------------+---------+
| {"a":["1","2","3"]} | MAP | {"a":["1","2","3"]} | VARCHAR |
+---------------------+-----+---------------------+---------+
1 row selected (0.132 seconds)
{code}

With this function it's possible to "cast" anything in varchar and avoid storage problem in Parquet due to certain types. And it is eventually possible to cast the other way when requesting the Parquet file. 



> composite/nested type map/array convert_to/cast to varchar
> ----------------------------------------------------------
>
>                 Key: DRILL-7375
>                 URL: https://issues.apache.org/jira/browse/DRILL-7375
>             Project: Apache Drill
>          Issue Type: Wish
>          Components: Functions - Drill
>    Affects Versions: 1.16.0
>            Reporter: benj
>            Priority: Major
>
> As it possible to cast varchar to map (convert_from + JSON) with convert_from or transform a varchar to array (split)
> {code:sql}
> SELECT a, typeof(a), sqltypeof(a) from (SELECT CONVERT_FROM('{a : 100, b: 200}' ,'JSON') a);
> +-------------------+---------+--------+
> |         a         | EXPR$1  | EXPR$2 |
> +-------------------+---------+--------+
> | {"a":100,"b":200} | MAP     | STRUCT |
> +-------------------+---------+--------+
> SELECT a, typeof(a), sqltypeof(a)FROM (SELECT split(str,',') AS a FROM ( SELECT 'foo,bar' AS str));
> +-------------------+---------+--------+
> |        a          | EXPR$1  | EXPR$2 |
> +-------------------+---------+--------+
> | ["foo","bar"]     | VARCHAR | ARRAY  |
> +-------------------+---------+--------+
> {code}
> It will be very usefull :
>  # to have the capacity to "cast" the +_MAP_ into VARCHAR+ with a "cast syntax" or with a "convert_to" possibility
>  Expected:
> {code:sql}
> SELECT a, typeof(a) ta, va, typeof(va) tva FROM (
> SELECT a, CAST(a AS varchar) va FROM (SELECT CONVERT_FROM('{a : 100, b: 200}' ,'JSON') a));
> +-------------------+------+-------------------+---------+
> |         a         | ta   | va                | tva     |
> +-------------------+------+-------------------+---------+
> | {"a":100,"b":200} | MAP  | {"a":100,"b":200} | VARCHAR |
> +-------------------+------+-------------------+---------+
> {code}
>  # to have the capacity to "cast" the +_ARRAY_ into VARCHAR+ with a "cast syntax" or any other method
>  Expected
> {code:sql}
> SELECT a, sqltypeof(a) ta, va, sqltypeof(va) tva FROM (
> SELECT a, CAST(a AS varchar) va FROM (SELECT split(str,',') AS a FROM ( SELECT 'foo,bar' AS str));
> +-------------------+------+-------------------+---------+
> |         a         | ta   | va                | tva     |
> +-------------------+------+-------------------+---------+
> | ["foo","bar"]     | ARRAY| ["foo","bar"]     | VARCHAR |
> +-------------------+------+-------------------+---------+
> {code}
> Please note that these possibility of course exists in other database systems
>  Example with Postgres:
> {code:sql}
> SELECT '{"a":100,"b":200}'::json::text;
> => {"a":100,"b":200}
> SELECT array[1,2,3]::text;
> => {1,2,3}
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)