You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/04/29 06:25:00 UTC

[jira] [Commented] (DRILL-6361) Provide a sqlTypeOf() or modeOf() function

    [ https://issues.apache.org/jira/browse/DRILL-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16457916#comment-16457916 ] 

ASF GitHub Bot commented on DRILL-6361:
---------------------------------------

GitHub user paul-rogers opened a pull request:

    https://github.com/apache/drill/pull/1242

    DRILL-6361: Revised typeOf() function versions

    Drill provides the `typeof()` function to return the type of a column. However, this function has two key limitations:
    
    1. It returns NULL if any column value is NULL. But, Drill has no NULL type, so this masks the underlying type. This is especially annoying for columns which are all NULL, such as "missing" columns.
    2. It does not return the cardinality (AKA "mode") of the column.
    
    This PR introduces two new functions that solve these issues.
    
    ### New Functions
    
    `sqlTypeOf()` returns the data type (using the SQL names) whether the column is NULL or not. The SQL name is the one that can be used in a CAST statement. Thus,
    
    ```
    sqlTypeOf( CAST(x AS <type> ))
    ```
    
    returns &lt;type> as the type name.
    
    `modeOf()` returns the cardinality (mode) of the column as "NOT NULL", "NULLABLE" or "ARRAY". (Suggestions for better terms are welcome.) The Drill terms are not used because they are more Parquet-like than SQL-like.
    
    Finally, the `drillTypeOf()` function that works just like `sqlTypeOf()`, but returns the internal Drill names.
    
    ### Example
    
    Here is an example usage that highlights our old friend, "nullable int" for a missing column:
    
    ```
    SELECT sqlTypeOf(a) AS a_type, modeOf(a) AS a_mode FROM `json/all-null.json`;
    
    +----------+-----------+
    |  a_type  |  a_mode   |
    +----------+-----------+
    | INTEGER  | NULLABLE  |
    +----------+-----------+
    ```
    
    For arrays (repeated) types:
    
    ```
    SELECT sqlTypeOf(columns) as col_type, modeOf(columns) as col_mode
    FROM `csv/cust.csv`;
    
    +--------------------+-----------+
    |      col_type      | col_mode  |
    +--------------------+-----------+
    | CHARACTER VARYING  | ARRAY     |
    +--------------------+-----------+
    ```
    
    For non-null types:
    
    ```
    SELECT sqlTypeOf(`name`) AS name_type, 
    modeOf(`name`) AS name_mode FROM `csvh/cust.csvh`;
    
    +--------------------+------------+
    |     name_type      | name_mode  |
    +--------------------+------------+
    | CHARACTER VARYING  | NOT NULL   |
    +--------------------+------------+
    ```
    
    The result is that the internal Drill type is made very plain to the user of `sqlline`.
    
    ### UDF Utility Methods
    
    To save some typing, this PR also includes a few helper functions to make it easier to write UDFs. These functions were first described in the blog post [UDF Background Information](https://github.com/paul-rogers/drill/wiki/UDFs-Background-Information), on the [Troublshooting](https://github.com/paul-rogers/drill/wiki/UDF-Troubleshooting) page.
    
    In particular, to return a string, the old `typeof()` implementation uses:
    
    ```
          byte[] type = typeName.getBytes();
          buf = buf.reallocIfNeeded(type.length);
          buf.setBytes(0, type);
          out.buffer = buf;
          out.start = 0;
          out.end = type.length;
    ```
    
    While the new functions use:
    
    ```
          org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.varCharOutput(
            typeName, buf, out);
    ```


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/paul-rogers/drill DRILL-6361

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/1242.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1242
    
----
commit 7acf6cc77581c15981cf5cc7ac1a2b3780324f40
Author: Paul Rogers <pr...@...>
Date:   2018-04-29T06:04:26Z

    DRILL-6361: Revised typeOf() function versions

----


> Provide a sqlTypeOf() or modeOf() function
> ------------------------------------------
>
>                 Key: DRILL-6361
>                 URL: https://issues.apache.org/jira/browse/DRILL-6361
>             Project: Apache Drill
>          Issue Type: Improvement
>    Affects Versions: 1.13.0
>            Reporter: Paul Rogers
>            Priority: Minor
>
> Drill provides a {{typeof()}} function to return the type of a column. The returned string, however, has only the base data type. A Drill data type (a "major type") also includes a cardinality (a "mode"). For example, {{OptionalĀ Int}} or {{Required VarChar}}.
> This type information is useful for handling data conversions. For example, if I could tell that a column value was a {{Nullable Int}}, I could guess that it is one Drill invented, and I could merge it, by hand, with the type from another file that had actual values.
> The two options are equivalent. Either provide a {{modeOf()}} to just return cardinality, or a {{dataTypeOf()}} that returns both. (Maybe the {{modeOf()}} might be more useful.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)