You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Khurram Faraaz <kf...@maprtech.com> on 2015/06/24 02:28:26 UTC
Wrong results - windowing query over view
Windowing query over a view returns wrong results when used with and
without a group by clause. Please let me know if this is a planning bug ?
Postgres does not support the query where we use a group by.
DDL used for view creation was,
create view vwOnParq (col_int, col_bigint, col_char_2, col_vchar_52,
col_tmstmp, col_dt, col_booln, col_dbl, col_tm) as select col_int,
col_bigint, col_char_2, col_vchar_52, col_tmstmp, col_dt, col_booln,
col_dbl, col_tm from `tblForView/0_0_0.parquet`;
The two queries are,
0: jdbc:drill:schema=dfs.tmp> SELECT MIN(col_int) OVER() FROM vwOnParq;
*+---------+*
*| **EXPR$0 ** |*
*+---------+*
*| *-19 * |*
*| *-19 * |*
*| *-19 * |*
*| *-19 * |*
*| *-19 * |*
*| *-19 * |*
*| *-19 * |*
*| *-19 * |*
*| *-19 * |*
*| *-19 * |*
*| *-19 * |*
*| *-19 * |*
*| *-19 * |*
*| *-19 * |*
*| *-19 * |*
*| *-19 * |*
*| *-19 * |*
*| *-19 * |*
*| *-19 * |*
*| *-19 * |*
*| *-19 * |*
*| *-19 * |*
*| *-19 * |*
*| *-19 * |*
*| *-19 * |*
*| *-19 * |*
*| *-19 * |*
*| *-19 * |*
*| *-19 * |*
*| *-19 * |*
*+---------+*
30 rows selected (0.26 seconds)
Explain plan for the above query
*| *00-00 Screen
00-01 Project(EXPR$0=[$0])
00-02 Project($0=[$9])
00-03 Window(window#0=[window(partition {} order by [] range
between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [MIN($0)])])
00-04 Project(col_int=[$4], col_bigint=[$7], col_char_2=[$2],
col_vchar_52=[$1], col_tmstmp=[$0], col_dt=[$3], col_booln=[$6],
col_dbl=[$8], col_tm=[$5])
00-05 Scan(groupscan=[ParquetGroupScan
[entries=[ReadEntryWithPath [path=maprfs:///tmp/tblForView/0_0_0.parquet]],
selectionRoot=/tmp/tblForView/0_0_0.parquet, numFiles=1,
columns=[`col_int`, `col_bigint`, `col_char_2`, `col_vchar_52`,
`col_tmstmp`, `col_dt`, `col_booln`, `col_dbl`, `col_tm`]]])
0: jdbc:drill:schema=dfs.tmp> SELECT MIN(col_int) OVER() FROM vwOnParq
group by col_char_2;
*+---------+*
*| **EXPR$0 ** |*
*+---------+*
*| *AZ * |*
*| *AZ * |*
*| *AZ * |*
*| *AZ * |*
*| *AZ * |*
*| *AZ * |*
*| *AZ * |*
*| *AZ * |*
*| *AZ * |*
*| *AZ * |*
*| *AZ * |*
*| *AZ * |*
*| *AZ * |*
*| *AZ * |*
*| *AZ * |*
*| *AZ * |*
*| *AZ * |*
*| *AZ * |*
*+---------+*
18 rows selected (0.27 seconds)
Explain plan for the above query that uses group by
*| *00-00 Screen
00-01 Project(EXPR$0=[$0])
00-02 Project($0=[$2])
00-03 Window(window#0=[window(partition {} order by [] range
between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [MIN($0)])])
00-04 HashAgg(group=[{0}], agg#0=[MIN($1)])
00-05 Scan(groupscan=[ParquetGroupScan
[entries=[ReadEntryWithPath [path=maprfs:///tmp/tblForView/0_0_0.parquet]],
selectionRoot=/tmp/tblForView/0_0_0.parquet, numFiles=1,
columns=[`col_char_2`, `col_int`]]])
Thanks,
Khurram
Re: Wrong results - windowing query over view
Posted by Khurram Faraaz <kf...@maprtech.com>.
The second query (below) that uses group by is not supported by Postgres, I
will file a JIRA to block that query.
SELECT MIN(col_int) OVER() FROM vwOnParq group by col_char_2;
Output from Postgres
postgres=# select min(col_int) over() from all_typs_tbl group by col_char_2;
ERROR: column "all_typs_tbl.col_int" must appear in the GROUP BY clause or
be used in an aggregate function
LINE 1: select min(col_int) over() from all_typs_tbl group by col_ch...
Querying over that original parquet file using which the view was created,
we see an assertion error
0: jdbc:drill:schema=dfs.tmp> SELECT MIN(col_int) OVER() FROM
`tblForView/0_0_0.parquet` group by col_char_2;
*Error: SYSTEM ERROR: java.lang.AssertionError: Internal error: while
converting MIN(`tblForView/0_0_0.parquet`.`col_int`)*
*[Error Id: e8ed279d-aa8c-4db1-9906-5dd7fdecaac2 on centos-02.qa.lab:31010]
(state=,code=0)*
On Tue, Jun 23, 2015 at 5:31 PM, Abdel Hakim Deneche <ad...@maprtech.com>
wrote:
> What happens if you run the queries on the original parquet files and not
> the views ?
>
> On Tue, Jun 23, 2015 at 5:28 PM, Khurram Faraaz <kf...@maprtech.com>
> wrote:
>
> > Windowing query over a view returns wrong results when used with and
> > without a group by clause. Please let me know if this is a planning bug ?
> > Postgres does not support the query where we use a group by.
> >
> > DDL used for view creation was,
> >
> > create view vwOnParq (col_int, col_bigint, col_char_2, col_vchar_52,
> > col_tmstmp, col_dt, col_booln, col_dbl, col_tm) as select col_int,
> > col_bigint, col_char_2, col_vchar_52, col_tmstmp, col_dt, col_booln,
> > col_dbl, col_tm from `tblForView/0_0_0.parquet`;
> >
> >
> > The two queries are,
> >
> >
> > 0: jdbc:drill:schema=dfs.tmp> SELECT MIN(col_int) OVER() FROM vwOnParq;
> >
> > *+---------+*
> >
> > *| **EXPR$0 ** |*
> >
> > *+---------+*
> >
> > *| *-19 * |*
> >
> > *| *-19 * |*
> >
> > *| *-19 * |*
> >
> > *| *-19 * |*
> >
> > *| *-19 * |*
> >
> > *| *-19 * |*
> >
> > *| *-19 * |*
> >
> > *| *-19 * |*
> >
> > *| *-19 * |*
> >
> > *| *-19 * |*
> >
> > *| *-19 * |*
> >
> > *| *-19 * |*
> >
> > *| *-19 * |*
> >
> > *| *-19 * |*
> >
> > *| *-19 * |*
> >
> > *| *-19 * |*
> >
> > *| *-19 * |*
> >
> > *| *-19 * |*
> >
> > *| *-19 * |*
> >
> > *| *-19 * |*
> >
> > *| *-19 * |*
> >
> > *| *-19 * |*
> >
> > *| *-19 * |*
> >
> > *| *-19 * |*
> >
> > *| *-19 * |*
> >
> > *| *-19 * |*
> >
> > *| *-19 * |*
> >
> > *| *-19 * |*
> >
> > *| *-19 * |*
> >
> > *| *-19 * |*
> >
> > *+---------+*
> >
> > 30 rows selected (0.26 seconds)
> >
> >
> > Explain plan for the above query
> >
> >
> > *| *00-00 Screen
> >
> > 00-01 Project(EXPR$0=[$0])
> >
> > 00-02 Project($0=[$9])
> >
> > 00-03 Window(window#0=[window(partition {} order by [] range
> > between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [MIN($0)])])
> >
> > 00-04 Project(col_int=[$4], col_bigint=[$7], col_char_2=[$2],
> > col_vchar_52=[$1], col_tmstmp=[$0], col_dt=[$3], col_booln=[$6],
> > col_dbl=[$8], col_tm=[$5])
> >
> > 00-05 Scan(groupscan=[ParquetGroupScan
> > [entries=[ReadEntryWithPath
> [path=maprfs:///tmp/tblForView/0_0_0.parquet]],
> > selectionRoot=/tmp/tblForView/0_0_0.parquet, numFiles=1,
> > columns=[`col_int`, `col_bigint`, `col_char_2`, `col_vchar_52`,
> > `col_tmstmp`, `col_dt`, `col_booln`, `col_dbl`, `col_tm`]]])
> >
> >
> > 0: jdbc:drill:schema=dfs.tmp> SELECT MIN(col_int) OVER() FROM vwOnParq
> > group by col_char_2;
> >
> > *+---------+*
> >
> > *| **EXPR$0 ** |*
> >
> > *+---------+*
> >
> > *| *AZ * |*
> >
> > *| *AZ * |*
> >
> > *| *AZ * |*
> >
> > *| *AZ * |*
> >
> > *| *AZ * |*
> >
> > *| *AZ * |*
> >
> > *| *AZ * |*
> >
> > *| *AZ * |*
> >
> > *| *AZ * |*
> >
> > *| *AZ * |*
> >
> > *| *AZ * |*
> >
> > *| *AZ * |*
> >
> > *| *AZ * |*
> >
> > *| *AZ * |*
> >
> > *| *AZ * |*
> >
> > *| *AZ * |*
> >
> > *| *AZ * |*
> >
> > *| *AZ * |*
> >
> > *+---------+*
> >
> > 18 rows selected (0.27 seconds)
> >
> >
> > Explain plan for the above query that uses group by
> >
> >
> > *| *00-00 Screen
> >
> > 00-01 Project(EXPR$0=[$0])
> >
> > 00-02 Project($0=[$2])
> >
> > 00-03 Window(window#0=[window(partition {} order by [] range
> > between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [MIN($0)])])
> >
> > 00-04 HashAgg(group=[{0}], agg#0=[MIN($1)])
> >
> > 00-05 Scan(groupscan=[ParquetGroupScan
> > [entries=[ReadEntryWithPath
> [path=maprfs:///tmp/tblForView/0_0_0.parquet]],
> > selectionRoot=/tmp/tblForView/0_0_0.parquet, numFiles=1,
> > columns=[`col_char_2`, `col_int`]]])
> >
> >
> > Thanks,
> >
> > Khurram
> >
>
>
>
> --
>
> Abdelhakim Deneche
>
> Software Engineer
>
> <http://www.mapr.com/>
>
>
> Now Available - Free Hadoop On-Demand Training
> <
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >
>
Re: Wrong results - windowing query over view
Posted by Abdel Hakim Deneche <ad...@maprtech.com>.
What happens if you run the queries on the original parquet files and not
the views ?
On Tue, Jun 23, 2015 at 5:28 PM, Khurram Faraaz <kf...@maprtech.com>
wrote:
> Windowing query over a view returns wrong results when used with and
> without a group by clause. Please let me know if this is a planning bug ?
> Postgres does not support the query where we use a group by.
>
> DDL used for view creation was,
>
> create view vwOnParq (col_int, col_bigint, col_char_2, col_vchar_52,
> col_tmstmp, col_dt, col_booln, col_dbl, col_tm) as select col_int,
> col_bigint, col_char_2, col_vchar_52, col_tmstmp, col_dt, col_booln,
> col_dbl, col_tm from `tblForView/0_0_0.parquet`;
>
>
> The two queries are,
>
>
> 0: jdbc:drill:schema=dfs.tmp> SELECT MIN(col_int) OVER() FROM vwOnParq;
>
> *+---------+*
>
> *| **EXPR$0 ** |*
>
> *+---------+*
>
> *| *-19 * |*
>
> *| *-19 * |*
>
> *| *-19 * |*
>
> *| *-19 * |*
>
> *| *-19 * |*
>
> *| *-19 * |*
>
> *| *-19 * |*
>
> *| *-19 * |*
>
> *| *-19 * |*
>
> *| *-19 * |*
>
> *| *-19 * |*
>
> *| *-19 * |*
>
> *| *-19 * |*
>
> *| *-19 * |*
>
> *| *-19 * |*
>
> *| *-19 * |*
>
> *| *-19 * |*
>
> *| *-19 * |*
>
> *| *-19 * |*
>
> *| *-19 * |*
>
> *| *-19 * |*
>
> *| *-19 * |*
>
> *| *-19 * |*
>
> *| *-19 * |*
>
> *| *-19 * |*
>
> *| *-19 * |*
>
> *| *-19 * |*
>
> *| *-19 * |*
>
> *| *-19 * |*
>
> *| *-19 * |*
>
> *+---------+*
>
> 30 rows selected (0.26 seconds)
>
>
> Explain plan for the above query
>
>
> *| *00-00 Screen
>
> 00-01 Project(EXPR$0=[$0])
>
> 00-02 Project($0=[$9])
>
> 00-03 Window(window#0=[window(partition {} order by [] range
> between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [MIN($0)])])
>
> 00-04 Project(col_int=[$4], col_bigint=[$7], col_char_2=[$2],
> col_vchar_52=[$1], col_tmstmp=[$0], col_dt=[$3], col_booln=[$6],
> col_dbl=[$8], col_tm=[$5])
>
> 00-05 Scan(groupscan=[ParquetGroupScan
> [entries=[ReadEntryWithPath [path=maprfs:///tmp/tblForView/0_0_0.parquet]],
> selectionRoot=/tmp/tblForView/0_0_0.parquet, numFiles=1,
> columns=[`col_int`, `col_bigint`, `col_char_2`, `col_vchar_52`,
> `col_tmstmp`, `col_dt`, `col_booln`, `col_dbl`, `col_tm`]]])
>
>
> 0: jdbc:drill:schema=dfs.tmp> SELECT MIN(col_int) OVER() FROM vwOnParq
> group by col_char_2;
>
> *+---------+*
>
> *| **EXPR$0 ** |*
>
> *+---------+*
>
> *| *AZ * |*
>
> *| *AZ * |*
>
> *| *AZ * |*
>
> *| *AZ * |*
>
> *| *AZ * |*
>
> *| *AZ * |*
>
> *| *AZ * |*
>
> *| *AZ * |*
>
> *| *AZ * |*
>
> *| *AZ * |*
>
> *| *AZ * |*
>
> *| *AZ * |*
>
> *| *AZ * |*
>
> *| *AZ * |*
>
> *| *AZ * |*
>
> *| *AZ * |*
>
> *| *AZ * |*
>
> *| *AZ * |*
>
> *+---------+*
>
> 18 rows selected (0.27 seconds)
>
>
> Explain plan for the above query that uses group by
>
>
> *| *00-00 Screen
>
> 00-01 Project(EXPR$0=[$0])
>
> 00-02 Project($0=[$2])
>
> 00-03 Window(window#0=[window(partition {} order by [] range
> between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [MIN($0)])])
>
> 00-04 HashAgg(group=[{0}], agg#0=[MIN($1)])
>
> 00-05 Scan(groupscan=[ParquetGroupScan
> [entries=[ReadEntryWithPath [path=maprfs:///tmp/tblForView/0_0_0.parquet]],
> selectionRoot=/tmp/tblForView/0_0_0.parquet, numFiles=1,
> columns=[`col_char_2`, `col_int`]]])
>
>
> Thanks,
>
> Khurram
>
--
Abdelhakim Deneche
Software Engineer
<http://www.mapr.com/>
Now Available - Free Hadoop On-Demand Training
<http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>