You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "benj (Jira)" <ji...@apache.org> on 2020/02/05 10:46:00 UTC

[jira] [Created] (DRILL-7569) dir0 problem reader - when path with wilcard and column named dir0

benj created DRILL-7569:
---------------------------

             Summary: dir0 problem reader - when path with wilcard and column named dir0
                 Key: DRILL-7569
                 URL: https://issues.apache.org/jira/browse/DRILL-7569
             Project: Apache Drill
          Issue Type: Bug
    Affects Versions: 1.17.0
            Reporter: benj


If file with named columns (like csvh, parquet, json) contains a column named *dir0* ( dir[0-9]+), it can cause problems when requesting with wilcard on path.

{code:sql}
apache drill> SELECT * FROM dfs.tmp.`REP/exa.csvh`;
+---------+------+
|  dir0   |  a   |
+---------+------+
| coldir0 | cola |
+---------+------+

apache drill> SELECT * FROM dfs.tmp.`R*/exa.csvh`;
Error: INTERNAL_ERROR ERROR: Failure while setting up text reader for file file:/tmp/REP/exa.csvh
{code}

The errors message are not the same depending on the input type file
{noformat}
CSVH => Error: INTERNAL_ERROR ERROR: Failure while setting up text reader for file file:...
PARQUET => Error: INTERNAL_ERROR ERROR: Error in parquet record reader.
Message: Failure in setting up reader
Parquet Metadata:...
JSON => Error: INTERNAL_ERROR ERROR: org.apache.drill.exec.exception.SchemaChangeException: It's not allowed to have regular field and implicit field share common name dir0. Either change regular field name in datasource, or change the default implicit field names.
{noformat}
Note that the JSON error message is more relevant and allows faster identification of the problem (even if (to my knowledge) dir* is not modifiable in default implicit field name).

I know you should avoid using dir0 for a column name. But when creating table it's "easy" to use a "SELECT *" which will include dir0 (and other dir*) (if path containing wildcard).

I have no good idea to solve this problem but it would be interesting to find a method to avoid falling into this trap.
Maybe *dir** should not appear automatically when _SELECT *_ but need implicit call like _SELECT dir0, dir1, *_ (maybe direceted by an option)
Maybe errors messages should be improved.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)