You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "benj (Jira)" <ji...@apache.org> on 2020/06/04 13:22:00 UTC

[jira] [Updated] (DRILL-7747) Function to determine Unknow fields / on fly generated missing fields

     [ https://issues.apache.org/jira/browse/DRILL-7747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

benj updated DRILL-7747:
------------------------
    Description: 
it would be really useful to have a function allowing to know if a field comes from an existing column or not.

With this data:
{code:sql}
apache drill 1.17> SELECT * FROM dfs.test.`f1.parquet`;
+---+--------+-------+
| a |   b    |   c   |
+---+--------+-------+
| 1 | test-1 | other |
| 2 | test-2 | null  |
| 3 | test-3 | old   |
+---+--------+-------+

apache drill 1.17> SELECT * FROM dfs.test.`f2.parquet`;
+----+---------+
| a  |    b    |
+----+---------+
| 10 | test-10 |
| 20 | test-20 |
| 30 | test-30 |
+----+---------+

apache drill 1.17> SELECT *, drilltypeof(c), modeof(c) FROM dfs.test.`f*.parquet`;
+------------+----+---------+-------+---------+----------+
|    dir0    | a  |    b    |   c   | EXPR$1  |  EXPR$2  |
+------------+----+---------+-------+---------+----------+
| f1.parquet | 1  | test-1  | other | VARCHAR | NULLABLE |
| f1.parquet | 2  | test-2  | null  | VARCHAR | NULLABLE |
| f1.parquet | 3  | test-3  | old   | VARCHAR | NULLABLE |
| f2.parquet | 10 | test-10 | null  | VARCHAR | NULLABLE |
| f2.parquet | 20 | test-20 | null  | VARCHAR | NULLABLE |
| f2.parquet | 30 | test-30 | null  | VARCHAR | NULLABLE |
+------------+----+---------+-------+---------+----------+
{code}

It will be nice to know when 'c' data is present because the column exists in the Parquet  (or other type file) or if the value NULL was generated because the column was missing.

Example a function 'origin' that take a column name and return for each row if the value was 'generated' or 'original' (other/better keyword could be choose (exist(column)=>true/false))
Virtual Example with previous data:
{code:sql}
apache drill> SELECT *, drilltypeof(c), modeof(c), origin(c) AS origin FROM dfs.test.`f*.parquet`;
+------------+----+---------+-------+---------+----------+-----------+
|    dir0    | a  |    b    |   c   | EXPR$1  |  EXPR$2  |  origin   |
+------------+----+---------+-------+---------+----------+-----------+
| f1.parquet | 1  | test-1  | other | VARCHAR | NULLABLE | original |
| f1.parquet | 2  | test-2  | null  | VARCHAR | NULLABLE | original |
| f1.parquet | 3  | test-3  | old   | VARCHAR | NULLABLE | original |
| f2.parquet | 10 | test-10 | null  | VARCHAR | NULLABLE | generated |
| f2.parquet | 20 | test-20 | null  | VARCHAR | NULLABLE | generated |
| f2.parquet | 30 | test-30 | null  | VARCHAR | NULLABLE | generated |
+------------+----+---------+-------+---------+----------+-----------+
{code}




  was:
it would be really useful to have a function allowing to know if a field comes from an existing column or not.

With this data:
{code:sql}
apache drill 1.17> SELECT * FROM dfs.test.`f1.parquet`;
+---+--------+-------+
| a |   b    |   c   |
+---+--------+-------+
| 1 | test-1 | other |
| 2 | test-2 | null  |
| 3 | test-3 | old   |
+---+--------+-------+

apache drill 1.17> SELECT * FROM dfs.test.`f2.parquet`;
+----+---------+
| a  |    b    |
+----+---------+
| 10 | test-10 |
| 20 | test-20 |
| 30 | test-30 |
+----+---------+

apache drill 1.17> SELECT *, drilltypeof(c), modeof(c) FROM dfs.test.`f*.parquet`;
+------------+----+---------+-------+---------+----------+
|    dir0    | a  |    b    |   c   | EXPR$1  |  EXPR$2  |
+------------+----+---------+-------+---------+----------+
| f1.parquet | 1  | test-1  | other | VARCHAR | NULLABLE |
| f1.parquet | 2  | test-2  | null  | VARCHAR | NULLABLE |
| f1.parquet | 3  | test-3  | old   | VARCHAR | NULLABLE |
| f2.parquet | 10 | test-10 | null  | VARCHAR | NULLABLE |
| f2.parquet | 20 | test-20 | null  | VARCHAR | NULLABLE |
| f2.parquet | 30 | test-30 | null  | VARCHAR | NULLABLE |
+------------+----+---------+-------+---------+----------+
{code}

It will be nice to know when 'c' data is present because the column exists in the Parquet  (or other type file) or if the value NULL was generated because the column was missing.

Example a function 'origin' that take a column name and return for each row if the value was 'generated' or 'original' (other/better keyword could be choose)
Virtual Example with previous data:
{code:sql}
apache drill> SELECT *, drilltypeof(c), modeof(c), origin(c) AS origin FROM dfs.test.`f*.parquet`;
+------------+----+---------+-------+---------+----------+-----------+
|    dir0    | a  |    b    |   c   | EXPR$1  |  EXPR$2  |  origin   |
+------------+----+---------+-------+---------+----------+-----------+
| f1.parquet | 1  | test-1  | other | VARCHAR | NULLABLE | original |
| f1.parquet | 2  | test-2  | null  | VARCHAR | NULLABLE | original |
| f1.parquet | 3  | test-3  | old   | VARCHAR | NULLABLE | original |
| f2.parquet | 10 | test-10 | null  | VARCHAR | NULLABLE | generated |
| f2.parquet | 20 | test-20 | null  | VARCHAR | NULLABLE | generated |
| f2.parquet | 30 | test-30 | null  | VARCHAR | NULLABLE | generated |
+------------+----+---------+-------+---------+----------+-----------+
{code}





> Function to determine Unknow fields / on fly generated missing fields
> ---------------------------------------------------------------------
>
>                 Key: DRILL-7747
>                 URL: https://issues.apache.org/jira/browse/DRILL-7747
>             Project: Apache Drill
>          Issue Type: Wish
>          Components: Functions - Drill
>    Affects Versions: 1.17.0
>            Reporter: benj
>            Priority: Major
>
> it would be really useful to have a function allowing to know if a field comes from an existing column or not.
> With this data:
> {code:sql}
> apache drill 1.17> SELECT * FROM dfs.test.`f1.parquet`;
> +---+--------+-------+
> | a |   b    |   c   |
> +---+--------+-------+
> | 1 | test-1 | other |
> | 2 | test-2 | null  |
> | 3 | test-3 | old   |
> +---+--------+-------+
> apache drill 1.17> SELECT * FROM dfs.test.`f2.parquet`;
> +----+---------+
> | a  |    b    |
> +----+---------+
> | 10 | test-10 |
> | 20 | test-20 |
> | 30 | test-30 |
> +----+---------+
> apache drill 1.17> SELECT *, drilltypeof(c), modeof(c) FROM dfs.test.`f*.parquet`;
> +------------+----+---------+-------+---------+----------+
> |    dir0    | a  |    b    |   c   | EXPR$1  |  EXPR$2  |
> +------------+----+---------+-------+---------+----------+
> | f1.parquet | 1  | test-1  | other | VARCHAR | NULLABLE |
> | f1.parquet | 2  | test-2  | null  | VARCHAR | NULLABLE |
> | f1.parquet | 3  | test-3  | old   | VARCHAR | NULLABLE |
> | f2.parquet | 10 | test-10 | null  | VARCHAR | NULLABLE |
> | f2.parquet | 20 | test-20 | null  | VARCHAR | NULLABLE |
> | f2.parquet | 30 | test-30 | null  | VARCHAR | NULLABLE |
> +------------+----+---------+-------+---------+----------+
> {code}
> It will be nice to know when 'c' data is present because the column exists in the Parquet  (or other type file) or if the value NULL was generated because the column was missing.
> Example a function 'origin' that take a column name and return for each row if the value was 'generated' or 'original' (other/better keyword could be choose (exist(column)=>true/false))
> Virtual Example with previous data:
> {code:sql}
> apache drill> SELECT *, drilltypeof(c), modeof(c), origin(c) AS origin FROM dfs.test.`f*.parquet`;
> +------------+----+---------+-------+---------+----------+-----------+
> |    dir0    | a  |    b    |   c   | EXPR$1  |  EXPR$2  |  origin   |
> +------------+----+---------+-------+---------+----------+-----------+
> | f1.parquet | 1  | test-1  | other | VARCHAR | NULLABLE | original |
> | f1.parquet | 2  | test-2  | null  | VARCHAR | NULLABLE | original |
> | f1.parquet | 3  | test-3  | old   | VARCHAR | NULLABLE | original |
> | f2.parquet | 10 | test-10 | null  | VARCHAR | NULLABLE | generated |
> | f2.parquet | 20 | test-20 | null  | VARCHAR | NULLABLE | generated |
> | f2.parquet | 30 | test-30 | null  | VARCHAR | NULLABLE | generated |
> +------------+----+---------+-------+---------+----------+-----------+
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)