You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Petri Lehtonen <Pe...@Affecto.com> on 2015/12/21 11:18:34 UTC

Column aliases lost when using Dates + GROUP BY in SQL

Hi all,

I'm using Tableau with MapR Drill ODBC drivers.  I have successfully set up the drivers and am able to query the data from our source which is basically parquet files. All seems well as long as I'm using all rows from the source but when I try to limit the size of the results by grouping and rolling up the data I came across something that feels like a bug:

When I use any functions that return dates and add a group by to my query, the column alias in the underlying sql won't show up in the resultset.

Plain select from the source works:

SELECT 
  valid_from as valid_from_raw,
  date_add(to_date(valid_from),1)as valid_from_to_date
FROM
  dw.`/directory/parquet-file`
WHERE 
  valid_from = '1356991200000'

---------------+---------------------
valid_from_raw | valid_from_to_date
---------------+---------------------
1356991200000  | 2013-01-01
---------------+---------------------

When you add GROUP BY to the SQL, Drill looses the original column alias and replaces it with it's internal variable or something similar ($f1..n):

SELECT 
  valid_from as valid_from_raw,
  date_add(to_date(valid_from),1) as valid_from_to_date
FROM
  dw.`/directory/parquet-file`
WHERE
  valid_from = '1356991200000'
GROUP BY
  valid_from as valid_from_raw,
  date_add(to_date(valid_from),1)

---------------+-------------
valid_from_raw | $f1
---------------+-------------
1356991200000  | 2013-01-01
---------------+-------------

And naturally this creates a problem with Tableau which is mapping the column aliases from the initial SQL and fails to find the corresponding aliases from the resultset. Only way to get this working is to use $f0..n column aliases in the original query and rename them in Tableau... Something that one definitely shouldn't do. I'd appreciate if anyone has a solution or a more solid workaround.

Thanks,
Petri


Re: Column aliases lost when using Dates + GROUP BY in SQL

Posted by Andries Engelbrecht <ae...@maprtech.com>.
I tested on MapR Drill 1.3 and it seems to work correctly

select prod_id as prod, date_add(to_date(purchdate), 1) as date_purch from hive.orders where prod_id > 100 group by prod_id, date_add(to_date(purchdate), 1) limit 2;
+-------+-------------+
| prod  | date_purch  |
+-------+-------------+
| 226   | 2014-03-01  |
| 167   | 2014-02-11  |
+-------+-------------+


Here is the version information.

Version     : 1.3.0.201511202117
Release     : 1
Size        : 288 M
Repo        : installed
From repo   : /mapr-drill-1.3.0.201511202117-1.noarch


Above was with Hive as data source, so did a quick test on parquet files (similar data). Result was the same.

select prod_id as prod, date_add(to_date(purchdate), 1) as date_purch from dfs.par.`/orders` where prod_id > 100 group by prod_id, date_add(to_date(purchdate), 1) limit 2;
+-------+-------------+
| prod  | date_purch  |
+-------+-------------+
| 226   | 2014-03-01  |
| 167   | 2014-02-11  |
+-------+-------------+

Then tried using the same column, similar to your query.

select purchdate as purch_date, date_add(to_date(purchdate), 1) as date_purch from dfs.par.`/orders` where purchdate > '2014-02-28 12:53:06.0' group by purchdate, date_add(to_date(purchdate), 1) limit 2;
+------------------------+-------------+
|       purch_date       | date_purch  |
+------------------------+-------------+
| 2014-02-28 18:51:28.0  | 2014-03-01  |
| 2014-02-28 22:11:13.0  | 2014-03-01  |
+------------------------+-------------+


--Andries


> On Dec 21, 2015, at 11:08 PM, Petri Lehtonen <Pe...@Affecto.com> wrote:
> 
> Tableau TDC has the lines present.
> 
> The Drill version we have is 1.3.0. MapR developer preview.
> 
> I ran the query using sqlline and it gives me the same results:
> 
> SELECT DISTINCT
>  valid_from as valid_from_raw,
>  date_add(to_date(valid_from),1) as valid_from_to_date
> FROM
>  dw.`/directory/parquet-file`
> WHERE
>  valid_from = '1356991200000'
> GROUP BY
>  valid_from,
>  date_add(to_date(valid_from),1)
> 
> +-----------------+-------------+
> | valid_from_raw  |     $f1     |
> +-----------------+-------------+
> | 1356991200000   | 2013-01-01  |
> +-----------------+-------------+
> 
> And I get the same results regardless of the tool I'm using (sqlline, Drill explorer, dbVisualizer, Tableau).
> 
>> -----Original Message-----
>> From: Andries Engelbrecht [mailto:aengelbrecht@maprtech.com]
>> Sent: 21. joulukuuta 2015 18:53
>> To: user@drill.apache.org
>> Subject: Re: Column aliases lost when using Dates + GROUP BY in SQL
>> 
>> I think the issue is with the group by and using the column name and alias
>> clause 'group by valid_from as valid_from_raw'.
>> 
>> Removing the alias clause in the group by statement works as Aman shows.
>> 
>> Can you check your Tableau TDC file to see if these lines are present.
>>  <customization name='CAP_QUERY_GROUP_BY_ALIAS' value='no' />
>>  <customization name='CAP_QUERY_GROUP_BY_DEGREE' value='no' />
>> 
>> --Andries
>> 
>>> On Dec 21, 2015, at 8:25 AM, Aman Sinha <as...@maprtech.com> wrote:
>>> 
>>> The aliases work for me in the following query similar to yours.
>>> However, I am using latest master branch and running directly through
>>> sqlline command, not through Tableau.  Can you confirm what Drill
>>> version you are using and check if you can repro the behavior through
>>> sqlline ?  If so, you should file a JIRA.
>>> 
>>> select o_custkey as x, date_add(to_date(o_orderdate), 1) as y from
>>> cp.`tpch/orders.parquet` where o_orderkey < 10 group by o_custkey,
>>> date_add(to_date(o_orderdate), 1) limit 2;
>>> +------+-------------+
>>> |  x   |      y      |
>>> +------+-------------+
>>> | 370  | 1996-01-03  |
>>> | 781  | 1996-12-02  |
>>> +------+-------------+
>>> 
>>> On Mon, Dec 21, 2015 at 2:18 AM, Petri Lehtonen
>>> <Pe...@affecto.com>
>>> wrote:
>>> 
>>>> Hi all,
>>>> 
>>>> I'm using Tableau with MapR Drill ODBC drivers.  I have successfully
>>>> set up the drivers and am able to query the data from our source
>>>> which is basically parquet files. All seems well as long as I'm using
>>>> all rows from the source but when I try to limit the size of the
>>>> results by grouping and rolling up the data I came across something that
>> feels like a bug:
>>>> 
>>>> When I use any functions that return dates and add a group by to my
>>>> query, the column alias in the underlying sql won't show up in the
>> resultset.
>>>> 
>>>> Plain select from the source works:
>>>> 
>>>> SELECT
>>>> valid_from as valid_from_raw,
>>>> date_add(to_date(valid_from),1)as valid_from_to_date FROM
>>>> dw.`/directory/parquet-file` WHERE  valid_from = '1356991200000'
>>>> 
>>>> ---------------+---------------------
>>>> valid_from_raw | valid_from_to_date
>>>> ---------------+---------------------
>>>> 1356991200000  | 2013-01-01
>>>> ---------------+---------------------
>>>> 
>>>> When you add GROUP BY to the SQL, Drill looses the original column
>>>> alias and replaces it with it's internal variable or something similar ($f1..n):
>>>> 
>>>> SELECT
>>>> valid_from as valid_from_raw,
>>>> date_add(to_date(valid_from),1) as valid_from_to_date FROM
>>>> dw.`/directory/parquet-file` WHERE  valid_from = '1356991200000'
>>>> GROUP BY
>>>> valid_from as valid_from_raw,
>>>> date_add(to_date(valid_from),1)
>>>> 
>>>> ---------------+-------------
>>>> valid_from_raw | $f1
>>>> ---------------+-------------
>>>> 1356991200000  | 2013-01-01
>>>> ---------------+-------------
>>>> 
>>>> And naturally this creates a problem with Tableau which is mapping
>>>> the column aliases from the initial SQL and fails to find the
>>>> corresponding aliases from the resultset. Only way to get this
>>>> working is to use $f0..n column aliases in the original query and rename
>> them in Tableau...
>>>> Something that one definitely shouldn't do. I'd appreciate if anyone
>>>> has a solution or a more solid workaround.
>>>> 
>>>> Thanks,
>>>> Petri
>>>> 
>>>> 
> 


RE: Column aliases lost when using Dates + GROUP BY in SQL

Posted by Petri Lehtonen <Pe...@Affecto.com>.
Tableau TDC has the lines present.

The Drill version we have is 1.3.0. MapR developer preview.

I ran the query using sqlline and it gives me the same results:

SELECT DISTINCT
  valid_from as valid_from_raw,
  date_add(to_date(valid_from),1) as valid_from_to_date
FROM
  dw.`/directory/parquet-file`
WHERE
  valid_from = '1356991200000'
GROUP BY
  valid_from,
  date_add(to_date(valid_from),1)

+-----------------+-------------+
| valid_from_raw  |     $f1     |
+-----------------+-------------+
| 1356991200000   | 2013-01-01  |
+-----------------+-------------+

And I get the same results regardless of the tool I'm using (sqlline, Drill explorer, dbVisualizer, Tableau).

> -----Original Message-----
> From: Andries Engelbrecht [mailto:aengelbrecht@maprtech.com]
> Sent: 21. joulukuuta 2015 18:53
> To: user@drill.apache.org
> Subject: Re: Column aliases lost when using Dates + GROUP BY in SQL
> 
> I think the issue is with the group by and using the column name and alias
> clause 'group by valid_from as valid_from_raw'.
> 
> Removing the alias clause in the group by statement works as Aman shows.
> 
> Can you check your Tableau TDC file to see if these lines are present.
>   <customization name='CAP_QUERY_GROUP_BY_ALIAS' value='no' />
>   <customization name='CAP_QUERY_GROUP_BY_DEGREE' value='no' />
> 
> --Andries
> 
> > On Dec 21, 2015, at 8:25 AM, Aman Sinha <as...@maprtech.com> wrote:
> >
> > The aliases work for me in the following query similar to yours.
> > However, I am using latest master branch and running directly through
> > sqlline command, not through Tableau.  Can you confirm what Drill
> > version you are using and check if you can repro the behavior through
> > sqlline ?  If so, you should file a JIRA.
> >
> > select o_custkey as x, date_add(to_date(o_orderdate), 1) as y from
> > cp.`tpch/orders.parquet` where o_orderkey < 10 group by o_custkey,
> > date_add(to_date(o_orderdate), 1) limit 2;
> > +------+-------------+
> > |  x   |      y      |
> > +------+-------------+
> > | 370  | 1996-01-03  |
> > | 781  | 1996-12-02  |
> > +------+-------------+
> >
> > On Mon, Dec 21, 2015 at 2:18 AM, Petri Lehtonen
> > <Pe...@affecto.com>
> > wrote:
> >
> >> Hi all,
> >>
> >> I'm using Tableau with MapR Drill ODBC drivers.  I have successfully
> >> set up the drivers and am able to query the data from our source
> >> which is basically parquet files. All seems well as long as I'm using
> >> all rows from the source but when I try to limit the size of the
> >> results by grouping and rolling up the data I came across something that
> feels like a bug:
> >>
> >> When I use any functions that return dates and add a group by to my
> >> query, the column alias in the underlying sql won't show up in the
> resultset.
> >>
> >> Plain select from the source works:
> >>
> >> SELECT
> >>  valid_from as valid_from_raw,
> >>  date_add(to_date(valid_from),1)as valid_from_to_date FROM
> >> dw.`/directory/parquet-file` WHERE  valid_from = '1356991200000'
> >>
> >> ---------------+---------------------
> >> valid_from_raw | valid_from_to_date
> >> ---------------+---------------------
> >> 1356991200000  | 2013-01-01
> >> ---------------+---------------------
> >>
> >> When you add GROUP BY to the SQL, Drill looses the original column
> >> alias and replaces it with it's internal variable or something similar ($f1..n):
> >>
> >> SELECT
> >>  valid_from as valid_from_raw,
> >>  date_add(to_date(valid_from),1) as valid_from_to_date FROM
> >> dw.`/directory/parquet-file` WHERE  valid_from = '1356991200000'
> >> GROUP BY
> >>  valid_from as valid_from_raw,
> >>  date_add(to_date(valid_from),1)
> >>
> >> ---------------+-------------
> >> valid_from_raw | $f1
> >> ---------------+-------------
> >> 1356991200000  | 2013-01-01
> >> ---------------+-------------
> >>
> >> And naturally this creates a problem with Tableau which is mapping
> >> the column aliases from the initial SQL and fails to find the
> >> corresponding aliases from the resultset. Only way to get this
> >> working is to use $f0..n column aliases in the original query and rename
> them in Tableau...
> >> Something that one definitely shouldn't do. I'd appreciate if anyone
> >> has a solution or a more solid workaround.
> >>
> >> Thanks,
> >> Petri
> >>
> >>


Re: Column aliases lost when using Dates + GROUP BY in SQL

Posted by Andries Engelbrecht <ae...@maprtech.com>.
I think the issue is with the group by and using the column name and alias clause 'group by valid_from as valid_from_raw'.

Removing the alias clause in the group by statement works as Aman shows.

Can you check your Tableau TDC file to see if these lines are present.
  <customization name='CAP_QUERY_GROUP_BY_ALIAS' value='no' />
  <customization name='CAP_QUERY_GROUP_BY_DEGREE' value='no' />

--Andries

> On Dec 21, 2015, at 8:25 AM, Aman Sinha <as...@maprtech.com> wrote:
> 
> The aliases work for me in the following query similar to yours.  However,
> I am using latest master branch and running directly through sqlline
> command, not through Tableau.  Can you confirm what Drill version you are
> using and check if you can repro the behavior through sqlline ?  If so, you
> should file a JIRA.
> 
> select o_custkey as x, date_add(to_date(o_orderdate), 1) as y from
> cp.`tpch/orders.parquet` where o_orderkey < 10 group by o_custkey,
> date_add(to_date(o_orderdate), 1) limit 2;
> +------+-------------+
> |  x   |      y      |
> +------+-------------+
> | 370  | 1996-01-03  |
> | 781  | 1996-12-02  |
> +------+-------------+
> 
> On Mon, Dec 21, 2015 at 2:18 AM, Petri Lehtonen <Pe...@affecto.com>
> wrote:
> 
>> Hi all,
>> 
>> I'm using Tableau with MapR Drill ODBC drivers.  I have successfully set
>> up the drivers and am able to query the data from our source which is
>> basically parquet files. All seems well as long as I'm using all rows from
>> the source but when I try to limit the size of the results by grouping and
>> rolling up the data I came across something that feels like a bug:
>> 
>> When I use any functions that return dates and add a group by to my query,
>> the column alias in the underlying sql won't show up in the resultset.
>> 
>> Plain select from the source works:
>> 
>> SELECT
>>  valid_from as valid_from_raw,
>>  date_add(to_date(valid_from),1)as valid_from_to_date
>> FROM
>>  dw.`/directory/parquet-file`
>> WHERE
>>  valid_from = '1356991200000'
>> 
>> ---------------+---------------------
>> valid_from_raw | valid_from_to_date
>> ---------------+---------------------
>> 1356991200000  | 2013-01-01
>> ---------------+---------------------
>> 
>> When you add GROUP BY to the SQL, Drill looses the original column alias
>> and replaces it with it's internal variable or something similar ($f1..n):
>> 
>> SELECT
>>  valid_from as valid_from_raw,
>>  date_add(to_date(valid_from),1) as valid_from_to_date
>> FROM
>>  dw.`/directory/parquet-file`
>> WHERE
>>  valid_from = '1356991200000'
>> GROUP BY
>>  valid_from as valid_from_raw,
>>  date_add(to_date(valid_from),1)
>> 
>> ---------------+-------------
>> valid_from_raw | $f1
>> ---------------+-------------
>> 1356991200000  | 2013-01-01
>> ---------------+-------------
>> 
>> And naturally this creates a problem with Tableau which is mapping the
>> column aliases from the initial SQL and fails to find the corresponding
>> aliases from the resultset. Only way to get this working is to use $f0..n
>> column aliases in the original query and rename them in Tableau...
>> Something that one definitely shouldn't do. I'd appreciate if anyone has a
>> solution or a more solid workaround.
>> 
>> Thanks,
>> Petri
>> 
>> 


Re: Column aliases lost when using Dates + GROUP BY in SQL

Posted by Aman Sinha <as...@maprtech.com>.
The aliases work for me in the following query similar to yours.  However,
I am using latest master branch and running directly through sqlline
command, not through Tableau.  Can you confirm what Drill version you are
using and check if you can repro the behavior through sqlline ?  If so, you
should file a JIRA.

select o_custkey as x, date_add(to_date(o_orderdate), 1) as y from
cp.`tpch/orders.parquet` where o_orderkey < 10 group by o_custkey,
date_add(to_date(o_orderdate), 1) limit 2;
+------+-------------+
|  x   |      y      |
+------+-------------+
| 370  | 1996-01-03  |
| 781  | 1996-12-02  |
+------+-------------+

On Mon, Dec 21, 2015 at 2:18 AM, Petri Lehtonen <Pe...@affecto.com>
wrote:

> Hi all,
>
> I'm using Tableau with MapR Drill ODBC drivers.  I have successfully set
> up the drivers and am able to query the data from our source which is
> basically parquet files. All seems well as long as I'm using all rows from
> the source but when I try to limit the size of the results by grouping and
> rolling up the data I came across something that feels like a bug:
>
> When I use any functions that return dates and add a group by to my query,
> the column alias in the underlying sql won't show up in the resultset.
>
> Plain select from the source works:
>
> SELECT
>   valid_from as valid_from_raw,
>   date_add(to_date(valid_from),1)as valid_from_to_date
> FROM
>   dw.`/directory/parquet-file`
> WHERE
>   valid_from = '1356991200000'
>
> ---------------+---------------------
> valid_from_raw | valid_from_to_date
> ---------------+---------------------
> 1356991200000  | 2013-01-01
> ---------------+---------------------
>
> When you add GROUP BY to the SQL, Drill looses the original column alias
> and replaces it with it's internal variable or something similar ($f1..n):
>
> SELECT
>   valid_from as valid_from_raw,
>   date_add(to_date(valid_from),1) as valid_from_to_date
> FROM
>   dw.`/directory/parquet-file`
> WHERE
>   valid_from = '1356991200000'
> GROUP BY
>   valid_from as valid_from_raw,
>   date_add(to_date(valid_from),1)
>
> ---------------+-------------
> valid_from_raw | $f1
> ---------------+-------------
> 1356991200000  | 2013-01-01
> ---------------+-------------
>
> And naturally this creates a problem with Tableau which is mapping the
> column aliases from the initial SQL and fails to find the corresponding
> aliases from the resultset. Only way to get this working is to use $f0..n
> column aliases in the original query and rename them in Tableau...
> Something that one definitely shouldn't do. I'd appreciate if anyone has a
> solution or a more solid workaround.
>
> Thanks,
> Petri
>
>