You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Paul Rogers <pa...@yahoo.com.INVALID> on 2018/04/26 04:22:46 UTC

Display column data type without code

Hi All,
Anyone know if there is a non-code way to display the data types of columns returned from a Drill query? Sqlline appears to only show the column names and values. The same is true of the Drill web console.
The EXPLAIN PLAN FOR ... command shows the query plan, but not type (which are only known at run time.) Is there a statement, system table or some other trick to display column types in, say, Sqlline?
In the past, I've gotten the types by using unit test style code. But, that is not to handy for use as an example for non-developers...
Thanks,
- Paul


Re: Display column data type without code

Posted by Paul Rogers <pa...@yahoo.com.INVALID>.
Turns out I really needed better type functions in order to explain the nuances of Drill types, so I went ahead and created them.

See DRILL-6361, PR #1242 [1]. Examples shown in the PR. Reviewers very much appreciated.

Thanks,
- Paul

[1] https://github.com/apache/drill/pull/1242

 

    On Saturday, April 28, 2018, 5:58:47 PM PDT, Charles Givre <cg...@gmail.com> wrote:  
 
 I’d like to weigh in here, but this would be EXTREMELY useful.  When I was trying to write connectors to enable various BI tools to connect to Drill, such as SQLPad and Metabase, the inability to get information about how drill interprets the data was really difficult to get around.  Just me .02. 

> On Apr 28, 2018, at 18:05, Paul Rogers <pa...@yahoo.com.INVALID> wrote:
> 
> Hi Rob,
> 
> Thanks for the suggestion. While this works for Hive (as you showed), it does not work for CSV files:
> 
> DESCRIBE `csvh/cust.csvh`;
> +--------------+------------+--------------+
> | COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
> +--------------+------------+--------------+
> +--------------+------------+--------------+
> 
> The typeof() function is handy, but does not report the "is nullable" (or repeated) "mode" of a column, and it loses the data type if a value is null. The following CSV file (with headers) uses non-nullable VARCHAR columns:
> 
> SELECT typeof(custId) FROM `csvh/cust.csvh`;
> +----------+
> |  EXPR$0  |
> +----------+
> | VARCHAR  |
> +----------+
> 
> Now, do something similar with JSON which uses a (nullable) VARCHAR:
> 
> SELECT typeof(a) FROM `json/str-null.json`;
> +----------+
> |  EXPR$0  |
> +----------+
> | VARCHAR  |
> | NULL    |
> +----------+
> 
> Finally, use a CSV file without headers, so that all columns are returned in the columns[] array:
> 
> SELECT typeof(columns) FROM `csv/cust.csv`;
> +----------+
> |  EXPR$0  |
> +----------+
> | VARCHAR  |
> +----------+
> 
> We know that the three "VARCHAR" are different because we know how Drill works internally. But, the output of sqlline does not express that knowledge.
> 
> Sqlline presents all data as strings, which often hides the data type and other details, making lit look like things work better than they actually do. You can see this by running a query against two JSON where a VarChar column is missing from one of the files. Drill guesses "nullable Int",  Sqlline shows the value as null, and typeof() shows the type as NULL, hiding the fact that there is actually a schema conflict (schema change) lurking in the data that manifests only if, say, you sort the data.
> 
> Bottom line: it seems that, at present, there isn't a good way at present (short of writing some Java code that uses the native Drill API) to get the actually, detailed type of a column with both data type and cardinality ("mode").
> 
> 
> So, would be great when explaining Drill concepts, if there was a clean non-code way to show people the actual structure of the data. (Yep, I know Drill is open source and welcomes contributions, so I'll try to offer a solution when I get time...)
> 
> Thanks,
> - Paul
> 
> 
> 
>    On Thursday, April 26, 2018, 10:08:04 AM PDT, Rob Wu <ro...@gmail.com> wrote:  
> 
> Hi Paul,
> 
> You could also use DESCRIBE (https://drill.apache.org/docs/describe/).
> 
> 0: jdbc:drill:drillbit=localhost:31010> describe
> `hive.default`.`integer_table`
> . . . . . . . . . . . . . . . . . . . > ;
> +--------------+--------------------+--------------+
> | COLUMN_NAME  |    DATA_TYPE      | IS_NULLABLE  |
> +--------------+--------------------+--------------+
> | keycolumn    | CHARACTER VARYING  | YES          |
> | column1      | INTEGER            | YES          |
> +--------------+--------------------+--------
> 
> Best regards,
> 
> Rob
> 
> On Wed, Apr 25, 2018 at 10:12 PM, Abhishek Girish <ag...@apache.org>
> wrote:
> 
>> Hey Paul,
>> 
>> You could use the typeof() function for this purpose. It takes a single
>> parameter - the column name.
>> 
>> For example:
>>> select typeof(c_current_cdemo_sk) from customer limit 1;
>> +---------+
>> | EXPR$0  |
>> +---------+
>> | BIGINT  |
>> +---------+
>> 1 row selected (0.472 seconds)
>> 
>> 
>> On Wed, Apr 25, 2018 at 9:23 PM Paul Rogers <pa...@yahoo.com.invalid>
>> wrote:
>> 
>>> Hi All,
>>> Anyone know if there is a non-code way to display the data types of
>>> columns returned from a Drill query? Sqlline appears to only show the
>>> column names and values. The same is true of the Drill web console.
>>> The EXPLAIN PLAN FOR ... command shows the query plan, but not type
>> (which
>>> are only known at run time.) Is there a statement, system table or some
>>> other trick to display column types in, say, Sqlline?
>>> In the past, I've gotten the types by using unit test style code. But,
>>> that is not to handy for use as an example for non-developers...
>>> Thanks,
>>> - Paul
>>> 
>>> 
>> 
  

Re: Display column data type without code

Posted by Charles Givre <cg...@gmail.com>.
I’d like to weigh in here, but this would be EXTREMELY useful.  When I was trying to write connectors to enable various BI tools to connect to Drill, such as SQLPad and Metabase, the inability to get information about how drill interprets the data was really difficult to get around.  Just me .02. 

> On Apr 28, 2018, at 18:05, Paul Rogers <pa...@yahoo.com.INVALID> wrote:
> 
> Hi Rob,
> 
> Thanks for the suggestion. While this works for Hive (as you showed), it does not work for CSV files:
> 
> DESCRIBE `csvh/cust.csvh`;
> +--------------+------------+--------------+
> | COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
> +--------------+------------+--------------+
> +--------------+------------+--------------+
> 
> The typeof() function is handy, but does not report the "is nullable" (or repeated) "mode" of a column, and it loses the data type if a value is null. The following CSV file (with headers) uses non-nullable VARCHAR columns:
> 
> SELECT typeof(custId) FROM `csvh/cust.csvh`;
> +----------+
> |  EXPR$0  |
> +----------+
> | VARCHAR  |
> +----------+
> 
> Now, do something similar with JSON which uses a (nullable) VARCHAR:
> 
> SELECT typeof(a) FROM `json/str-null.json`;
> +----------+
> |  EXPR$0  |
> +----------+
> | VARCHAR  |
> | NULL     |
> +----------+
> 
> Finally, use a CSV file without headers, so that all columns are returned in the columns[] array:
> 
> SELECT typeof(columns) FROM `csv/cust.csv`;
> +----------+
> |  EXPR$0  |
> +----------+
> | VARCHAR  |
> +----------+
> 
> We know that the three "VARCHAR" are different because we know how Drill works internally. But, the output of sqlline does not express that knowledge.
> 
> Sqlline presents all data as strings, which often hides the data type and other details, making lit look like things work better than they actually do. You can see this by running a query against two JSON where a VarChar column is missing from one of the files. Drill guesses "nullable Int",  Sqlline shows the value as null, and typeof() shows the type as NULL, hiding the fact that there is actually a schema conflict (schema change) lurking in the data that manifests only if, say, you sort the data.
> 
> Bottom line: it seems that, at present, there isn't a good way at present (short of writing some Java code that uses the native Drill API) to get the actually, detailed type of a column with both data type and cardinality ("mode").
> 
> 
> So, would be great when explaining Drill concepts, if there was a clean non-code way to show people the actual structure of the data. (Yep, I know Drill is open source and welcomes contributions, so I'll try to offer a solution when I get time...)
> 
> Thanks,
> - Paul
> 
> 
> 
>    On Thursday, April 26, 2018, 10:08:04 AM PDT, Rob Wu <ro...@gmail.com> wrote:  
> 
> Hi Paul,
> 
> You could also use DESCRIBE (https://drill.apache.org/docs/describe/).
> 
> 0: jdbc:drill:drillbit=localhost:31010> describe
> `hive.default`.`integer_table`
> . . . . . . . . . . . . . . . . . . . > ;
> +--------------+--------------------+--------------+
> | COLUMN_NAME  |    DATA_TYPE      | IS_NULLABLE  |
> +--------------+--------------------+--------------+
> | keycolumn    | CHARACTER VARYING  | YES          |
> | column1      | INTEGER            | YES          |
> +--------------+--------------------+--------
> 
> Best regards,
> 
> Rob
> 
> On Wed, Apr 25, 2018 at 10:12 PM, Abhishek Girish <ag...@apache.org>
> wrote:
> 
>> Hey Paul,
>> 
>> You could use the typeof() function for this purpose. It takes a single
>> parameter - the column name.
>> 
>> For example:
>>> select typeof(c_current_cdemo_sk) from customer limit 1;
>> +---------+
>> | EXPR$0  |
>> +---------+
>> | BIGINT  |
>> +---------+
>> 1 row selected (0.472 seconds)
>> 
>> 
>> On Wed, Apr 25, 2018 at 9:23 PM Paul Rogers <pa...@yahoo.com.invalid>
>> wrote:
>> 
>>> Hi All,
>>> Anyone know if there is a non-code way to display the data types of
>>> columns returned from a Drill query? Sqlline appears to only show the
>>> column names and values. The same is true of the Drill web console.
>>> The EXPLAIN PLAN FOR ... command shows the query plan, but not type
>> (which
>>> are only known at run time.) Is there a statement, system table or some
>>> other trick to display column types in, say, Sqlline?
>>> In the past, I've gotten the types by using unit test style code. But,
>>> that is not to handy for use as an example for non-developers...
>>> Thanks,
>>> - Paul
>>> 
>>> 
>> 


Re: Display column data type without code

Posted by Paul Rogers <pa...@yahoo.com.INVALID>.
Hi Rob,

Thanks for the suggestion. While this works for Hive (as you showed), it does not work for CSV files:

DESCRIBE `csvh/cust.csvh`;
+--------------+------------+--------------+
| COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
+--------------+------------+--------------+
+--------------+------------+--------------+

The typeof() function is handy, but does not report the "is nullable" (or repeated) "mode" of a column, and it loses the data type if a value is null. The following CSV file (with headers) uses non-nullable VARCHAR columns:

SELECT typeof(custId) FROM `csvh/cust.csvh`;
+----------+
|  EXPR$0  |
+----------+
| VARCHAR  |
+----------+

Now, do something similar with JSON which uses a (nullable) VARCHAR:

SELECT typeof(a) FROM `json/str-null.json`;
+----------+
|  EXPR$0  |
+----------+
| VARCHAR  |
| NULL     |
+----------+

Finally, use a CSV file without headers, so that all columns are returned in the columns[] array:

SELECT typeof(columns) FROM `csv/cust.csv`;
+----------+
|  EXPR$0  |
+----------+
| VARCHAR  |
+----------+

We know that the three "VARCHAR" are different because we know how Drill works internally. But, the output of sqlline does not express that knowledge.

Sqlline presents all data as strings, which often hides the data type and other details, making lit look like things work better than they actually do. You can see this by running a query against two JSON where a VarChar column is missing from one of the files. Drill guesses "nullable Int",  Sqlline shows the value as null, and typeof() shows the type as NULL, hiding the fact that there is actually a schema conflict (schema change) lurking in the data that manifests only if, say, you sort the data.

Bottom line: it seems that, at present, there isn't a good way at present (short of writing some Java code that uses the native Drill API) to get the actually, detailed type of a column with both data type and cardinality ("mode").


So, would be great when explaining Drill concepts, if there was a clean non-code way to show people the actual structure of the data. (Yep, I know Drill is open source and welcomes contributions, so I'll try to offer a solution when I get time...)

Thanks,
- Paul

 

    On Thursday, April 26, 2018, 10:08:04 AM PDT, Rob Wu <ro...@gmail.com> wrote:  
 
 Hi Paul,

You could also use DESCRIBE (https://drill.apache.org/docs/describe/).

0: jdbc:drill:drillbit=localhost:31010> describe
`hive.default`.`integer_table`
. . . . . . . . . . . . . . . . . . . > ;
+--------------+--------------------+--------------+
| COLUMN_NAME  |    DATA_TYPE      | IS_NULLABLE  |
+--------------+--------------------+--------------+
| keycolumn    | CHARACTER VARYING  | YES          |
| column1      | INTEGER            | YES          |
+--------------+--------------------+--------

Best regards,

Rob

On Wed, Apr 25, 2018 at 10:12 PM, Abhishek Girish <ag...@apache.org>
wrote:

> Hey Paul,
>
> You could use the typeof() function for this purpose. It takes a single
> parameter - the column name.
>
> For example:
> > select typeof(c_current_cdemo_sk) from customer limit 1;
> +---------+
> | EXPR$0  |
> +---------+
> | BIGINT  |
> +---------+
> 1 row selected (0.472 seconds)
>
>
> On Wed, Apr 25, 2018 at 9:23 PM Paul Rogers <pa...@yahoo.com.invalid>
> wrote:
>
> > Hi All,
> > Anyone know if there is a non-code way to display the data types of
> > columns returned from a Drill query? Sqlline appears to only show the
> > column names and values. The same is true of the Drill web console.
> > The EXPLAIN PLAN FOR ... command shows the query plan, but not type
> (which
> > are only known at run time.) Is there a statement, system table or some
> > other trick to display column types in, say, Sqlline?
> > In the past, I've gotten the types by using unit test style code. But,
> > that is not to handy for use as an example for non-developers...
> > Thanks,
> > - Paul
> >
> >
>
  

Re: Display column data type without code

Posted by Rob Wu <ro...@gmail.com>.
Hi Paul,

You could also use DESCRIBE (https://drill.apache.org/docs/describe/).

0: jdbc:drill:drillbit=localhost:31010> describe
`hive.default`.`integer_table`
. . . . . . . . . . . . . . . . . . . > ;
+--------------+--------------------+--------------+
| COLUMN_NAME  |     DATA_TYPE      | IS_NULLABLE  |
+--------------+--------------------+--------------+
| keycolumn    | CHARACTER VARYING  | YES          |
| column1      | INTEGER            | YES          |
+--------------+--------------------+--------

Best regards,

Rob

On Wed, Apr 25, 2018 at 10:12 PM, Abhishek Girish <ag...@apache.org>
wrote:

> Hey Paul,
>
> You could use the typeof() function for this purpose. It takes a single
> parameter - the column name.
>
> For example:
> > select typeof(c_current_cdemo_sk) from customer limit 1;
> +---------+
> | EXPR$0  |
> +---------+
> | BIGINT  |
> +---------+
> 1 row selected (0.472 seconds)
>
>
> On Wed, Apr 25, 2018 at 9:23 PM Paul Rogers <pa...@yahoo.com.invalid>
> wrote:
>
> > Hi All,
> > Anyone know if there is a non-code way to display the data types of
> > columns returned from a Drill query? Sqlline appears to only show the
> > column names and values. The same is true of the Drill web console.
> > The EXPLAIN PLAN FOR ... command shows the query plan, but not type
> (which
> > are only known at run time.) Is there a statement, system table or some
> > other trick to display column types in, say, Sqlline?
> > In the past, I've gotten the types by using unit test style code. But,
> > that is not to handy for use as an example for non-developers...
> > Thanks,
> > - Paul
> >
> >
>

Re: Display column data type without code

Posted by Abhishek Girish <ag...@apache.org>.
Hey Paul,

You could use the typeof() function for this purpose. It takes a single
parameter - the column name.

For example:
> select typeof(c_current_cdemo_sk) from customer limit 1;
+---------+
| EXPR$0  |
+---------+
| BIGINT  |
+---------+
1 row selected (0.472 seconds)


On Wed, Apr 25, 2018 at 9:23 PM Paul Rogers <pa...@yahoo.com.invalid>
wrote:

> Hi All,
> Anyone know if there is a non-code way to display the data types of
> columns returned from a Drill query? Sqlline appears to only show the
> column names and values. The same is true of the Drill web console.
> The EXPLAIN PLAN FOR ... command shows the query plan, but not type (which
> are only known at run time.) Is there a statement, system table or some
> other trick to display column types in, say, Sqlline?
> In the past, I've gotten the types by using unit test style code. But,
> that is not to handy for use as an example for non-developers...
> Thanks,
> - Paul
>
>

Re: Display column data type without code

Posted by Paul Rogers <pa...@yahoo.com.INVALID>.
Thanks!

A it turns out, typeof() is not documented on the Apache Drill website which is why I was confused. Maybe we should document it.

Just tried typeof(). Turns out it only returns the type, not the nullability or repeated mode. Is there a separate function to get the "mode" information?

- Paul

 

    On Wednesday, April 25, 2018, 10:09:34 PM PDT, Aman Sinha <am...@apache.org> wrote:  
 
 You can do it through SQL using typeof() function.  Since there is no
global schema, Drill evaluates this for each row.

0: jdbc:drill:drillbit=10.10.101.41> select n_name, typeof(n_name) as
name_type, n_nationkey, typeof(n_nationkey) as nationkey_type from
cp.`tpch/nation.parquet` limit 2;

*+------------+------------+--------------+-----------------+*

*| **  n_name  ** | **name_type ** | **n_nationkey ** | **nationkey_type **
|*

*+------------+------------+--------------+-----------------+*

*| *ALGERIA  * | *VARCHAR  * | *0          * | *INT            * |*

*| *ARGENTINA * | *VARCHAR  * | *1          * | *INT            * |*

*+------------+------------+--------------+-----------------+*


On Wed, Apr 25, 2018 at 9:22 PM, Paul Rogers <pa...@yahoo.com.invalid>
wrote:

> Hi All,
> Anyone know if there is a non-code way to display the data types of
> columns returned from a Drill query? Sqlline appears to only show the
> column names and values. The same is true of the Drill web console.
> The EXPLAIN PLAN FOR ... command shows the query plan, but not type (which
> are only known at run time.) Is there a statement, system table or some
> other trick to display column types in, say, Sqlline?
> In the past, I've gotten the types by using unit test style code. But,
> that is not to handy for use as an example for non-developers...
> Thanks,
> - Paul
>
>
  

Re: Display column data type without code

Posted by Aman Sinha <am...@apache.org>.
You can do it through SQL using typeof() function.  Since there is no
global schema, Drill evaluates this for each row.

0: jdbc:drill:drillbit=10.10.101.41> select n_name, typeof(n_name) as
name_type, n_nationkey, typeof(n_nationkey) as nationkey_type from
cp.`tpch/nation.parquet` limit 2;

*+------------+------------+--------------+-----------------+*

*| **  n_name  ** | **name_type ** | **n_nationkey ** | **nationkey_type **
|*

*+------------+------------+--------------+-----------------+*

*| *ALGERIA   * | *VARCHAR   * | *0           * | *INT            * |*

*| *ARGENTINA * | *VARCHAR   * | *1           * | *INT            * |*

*+------------+------------+--------------+-----------------+*


On Wed, Apr 25, 2018 at 9:22 PM, Paul Rogers <pa...@yahoo.com.invalid>
wrote:

> Hi All,
> Anyone know if there is a non-code way to display the data types of
> columns returned from a Drill query? Sqlline appears to only show the
> column names and values. The same is true of the Drill web console.
> The EXPLAIN PLAN FOR ... command shows the query plan, but not type (which
> are only known at run time.) Is there a statement, system table or some
> other trick to display column types in, say, Sqlline?
> In the past, I've gotten the types by using unit test style code. But,
> that is not to handy for use as an example for non-developers...
> Thanks,
> - Paul
>
>