You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Charles Givre (JIRA)" <ji...@apache.org> on 2019/06/24 19:19:00 UTC
[jira] [Updated] (DRILL-7308) Incorrect Metadata from text file
queries
[ https://issues.apache.org/jira/browse/DRILL-7308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Charles Givre updated DRILL-7308:
---------------------------------
Description:
{{I'm noticing some strange behavior with the newest version of Drill. If you query a CSV file, you get the following metadata:}}
{{ }}
{{ SELECT * FROM dfs.test.`domains.csvh` LIMIT 1}}
{{ }}
{{ {}}
{{ "queryId": "22eee85f-c02c-5878-9735-091d18788061",}}
{{ "columns": [}}
{{ "domain"}}
{{ ],}}
{{ "rows": [}}
{{ }}{{{ "domain": "thedataist.com" }}}{{ ],}}
{{ "metadata": [}}
{{ "VARCHAR(0, 0)",}}
{{ "VARCHAR(0, 0)"}}
{{ ],}}
{{ "queryState": "COMPLETED",}}
{{ "attemptedAutoLimit": 0}}
{{ }}}
{{ }}
{{ }}
{{ There are two issues here:}}
{{ 1. VARCHAR now has precision }}
{{ 2. There are twice as many columns as there should be.}}
{{ }}
{{ Additionally, if you query a regular CSV, without the columns extracted, you get the following:}}
{{ }}
{{ "rows": [}}
{{ }}
{ "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]" }
],
"metadata": [
"VARCHAR(0, 0)",
"VARCHAR(0, 0)"
],
was:
I'm noticing some strange behavior with the newest version of Drill. If you query a CSV file, you get the following metadata:
SELECT * FROM dfs.test.`domains.csvh` LIMIT 1
{
"queryId": "22eee85f-c02c-5878-9735-091d18788061",
"columns": [
"domain"
],
"rows": [
{
"domain": "thedataist.com"
}
],
"metadata": [
"VARCHAR(0, 0)",
"VARCHAR(0, 0)"
],
"queryState": "COMPLETED",
"attemptedAutoLimit": 0
}
There are two issues here:
1. VARCHAR now has precision
2. There are twice as many columns as there should be.
Additionally, if you query a regular CSV, without the columns extracted, you get the following:
"rows": [
{
"columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]"
}
],
"metadata": [
"VARCHAR(0, 0)",
"VARCHAR(0, 0)"
],
> Incorrect Metadata from text file queries
> -----------------------------------------
>
> Key: DRILL-7308
> URL: https://issues.apache.org/jira/browse/DRILL-7308
> Project: Apache Drill
> Issue Type: Bug
> Components: Metadata
> Affects Versions: 1.17.0
> Reporter: Charles Givre
> Priority: Major
> Attachments: Screen Shot 2019-06-24 at 3.16.40 PM.png, domains.csvh
>
>
> {{I'm noticing some strange behavior with the newest version of Drill. If you query a CSV file, you get the following metadata:}}
> {{ }}
> {{ SELECT * FROM dfs.test.`domains.csvh` LIMIT 1}}
> {{ }}
> {{ {}}
> {{ "queryId": "22eee85f-c02c-5878-9735-091d18788061",}}
> {{ "columns": [}}
> {{ "domain"}}
> {{ ],}}
> {{ "rows": [}}
> {{ }}{{{ "domain": "thedataist.com" }}}{{ ],}}
> {{ "metadata": [}}
> {{ "VARCHAR(0, 0)",}}
> {{ "VARCHAR(0, 0)"}}
> {{ ],}}
> {{ "queryState": "COMPLETED",}}
> {{ "attemptedAutoLimit": 0}}
> {{ }}}
> {{ }}
> {{ }}
> {{ There are two issues here:}}
> {{ 1. VARCHAR now has precision }}
> {{ 2. There are twice as many columns as there should be.}}
> {{ }}
> {{ Additionally, if you query a regular CSV, without the columns extracted, you get the following:}}
> {{ }}
> {{ "rows": [}}
> {{ }}
> { "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]" }
> ],
> "metadata": [
> "VARCHAR(0, 0)",
> "VARCHAR(0, 0)"
> ],
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)