You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2019/03/06 20:16:00 UTC

[jira] [Created] (DRILL-7080) Inconsistent behavior with wildcard and partition columns

Paul Rogers created DRILL-7080:
----------------------------------

             Summary: Inconsistent behavior with wildcard and partition columns
                 Key: DRILL-7080
                 URL: https://issues.apache.org/jira/browse/DRILL-7080
             Project: Apache Drill
          Issue Type: Bug
    Affects Versions: 1.15.0
            Reporter: Paul Rogers


Drill supports queries of the form:

{code:sql}
SELECT *, dir0 FROM `myTable`
{code}

Where `myTable` is, say, a set of CSV files with columns "a", "b" and "c". As shown in the (soon to be submitted) {{TestCsvWithHeaders}} test, behavior of partition columns is wildly inconsistent and nearly unusable. This ticket focus on one specific issue: the query above results in a schema like (dir0, a, b, c, dir00). That is:

* The wildcard generates "dir0", "dir1" columns.
* The Project operator inserts a second column, "dir00" as type Nullable Int.

This behavior is surprising as the following query produces the expected result:

{code:sql}
SELECT *, filename from `myTable`
{code}

That is, the above produces a schema of the form (a, b, c, filename) with "filename" of the expected type: VARCHAR.

This appears to be a bug somewhere in the project operator and/or the planner, but I've not tracked down the root cause.

The workaround is to either:

1. Not include the "dir0" column explicitly with the wildcard, or
2. Don't use the wildcard: list columns explicitly, including the partition columns.

Given how late in the game that this bug is filed, I would guess that few people actually use this feature.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)