You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2017/11/01 17:27:01 UTC

[jira] [Commented] (DRILL-5920) Drill incorrectly projects column aliases to scan operator

    [ https://issues.apache.org/jira/browse/DRILL-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16234417#comment-16234417 ] 

Paul Rogers commented on DRILL-5920:
------------------------------------

Here are some questions.

I would like to know, is it expected that a column alias should appear in the project list given to a scan operator? If we expect this, how do we differentiate between the actual table column and the alias?

Or, do we just go ahead and create a null int column (or actual data column if “col1” did happen to be a real column), throw it away later, and replace it with the real data?

How would we handle the perfectly fine, albeit confusing, query of:

{code}
SELECT col1 AS col2, col2 AS col1 FROM ...
{code}

The problem I'm currently facing is that the text reader wants to allow only certain kinds of projection:

* SELECT *
* SELECT columns
* SELECT a, b, c
* SELECT

The last is an empty select list as is generated by a SELECT COUNT(*) query.

But, because aliases are pushed to the data source, the picture becomes very muddy:

* SELECT *
* SELECT columns, alias
* SELECT a, b, c, alias
* SELECT alias

That is, the meaning of a COUNT(*) query changes. Extra columns are created at other times. In the wildcard case, the alias column is not created, creating two distinct situations that downstream operators must handle (materialized aliases and non-materialized aliases.)

For all these reasons, the current behavior seems like a bug, not a feature. But, this can be open to debate.


> Drill incorrectly projects column aliases to scan operator
> ----------------------------------------------------------
>
>                 Key: DRILL-5920
>                 URL: https://issues.apache.org/jira/browse/DRILL-5920
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.10.0
>            Reporter: Paul Rogers
>            Priority: Major
>
> The {{TestNewTextReader.ensureColumnNameDisplayedinError}} unit test runs this query:
> {code}
> select max(columns[1]) as col1
> from cp.`textinput/input1.csv`
> where col1 is not null
> {code}
> The following appears in the {{SubScan}} for the {{TextFormatPlugin}}:
> {noformat}
> [`col1`, `columns`[1]]
> {noformat}
> This is clearly wrong. The actual table column is {{columns}} (and, specifically, element 1.) {{col1} is an alias that should never have been pushed down to the data source because the data source does not know about aliases.
> Further, the projection list makes no distinction between the "real" and "alias" columns, so, to the data source, both look like real table columns.
> The current workaround is to create a nullable int column for {{col1}} which is, presumably, replaced by a later projection operator.
> Because this behavior is wrong, we must think though all the possible failure cases and how to handle them in this incorrect design. What if the alias matches an (expensive) table column? What if the alias is the same as some base column in the same query?
> {code}
> SELECT a as b, b as c FROM ...
> {code}
> Incorrect name handling may work in many cases, but it does lead to problems because the behavior is not following the accepted SQL standards.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)