You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2019/07/07 01:09:00 UTC

[jira] [Commented] (DRILL-7308) Incorrect Metadata from text file queries

    [ https://issues.apache.org/jira/browse/DRILL-7308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16879779#comment-16879779 ] 

Paul Rogers commented on DRILL-7308:
------------------------------------

Turns out that the change mentioned above *does not* work. The incorrect REST code above checks if precision is set to determine if it is non-zero. Tried modifying the {{PrimitiveColumnMetadata}} class to not set the precision if it is zero. But, this caused some TPC-H tests to fail. It seems that other code relies on the precision being set, even if zero.

So, the only solution is to fix the REST code as described above; we can't work around the problem by mucking with other parts of Drill. 

> Incorrect Metadata from text file queries
> -----------------------------------------
>
>                 Key: DRILL-7308
>                 URL: https://issues.apache.org/jira/browse/DRILL-7308
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Metadata
>    Affects Versions: 1.17.0
>            Reporter: Charles Givre
>            Priority: Major
>         Attachments: Screen Shot 2019-06-24 at 3.16.40 PM.png, domains.csvh
>
>
> I'm noticing some strange behavior with the newest version of Drill.  If you query a CSV file, you get the following metadata:
> {code:sql}
> SELECT * FROM dfs.test.`domains.csvh` LIMIT 1
> {code}
> {code:json}
> {
>   "queryId": "22eee85f-c02c-5878-9735-091d18788061",
>   "columns": [
>     "domain"
>   ],
>   "rows": [}
>    {       "domain": "thedataist.com"     }  ],
>   "metadata": [
>     "VARCHAR(0, 0)",
>     "VARCHAR(0, 0)"
>   ],
>   "queryState": "COMPLETED",
>   "attemptedAutoLimit": 0
> }
> {code}
> There are two issues here:
> 1.  VARCHAR now has precision
> 2.  There are twice as many columns as there should be.
> Additionally, if you query a regular CSV, without the columns extracted, you get the following:
> {code:json}
>  "rows": [
>  { 
>       "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]"     }
>   ],
>    "metadata": [
>      "VARCHAR(0, 0)",
>      "VARCHAR(0, 0)"
>    ],
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)