You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Charles Givre (JIRA)" <ji...@apache.org> on 2019/06/24 19:19:00 UTC

[jira] [Updated] (DRILL-7308) Incorrect Metadata from text file queries

     [ https://issues.apache.org/jira/browse/DRILL-7308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Charles Givre updated DRILL-7308:
---------------------------------
    Description: 
{{I'm noticing some strange behavior with the newest version of Drill.  If you query a CSV file, you get the following metadata:}}
{{  }}
{{ SELECT * FROM dfs.test.`domains.csvh` LIMIT 1}}
{{  }}
{{ {}}
{{   "queryId": "22eee85f-c02c-5878-9735-091d18788061",}}
{{   "columns": [}}
{{     "domain"}}
{{   ],}}
{{   "rows": [}}
{{    }}{{{       "domain": "thedataist.com"     }}}{{  ],}}
{{   "metadata": [}}
{{     "VARCHAR(0, 0)",}}
{{     "VARCHAR(0, 0)"}}
{{   ],}}
{{   "queryState": "COMPLETED",}}
{{   "attemptedAutoLimit": 0}}
{{ }}}
{{  }}
{{  }}
{{ There are two issues here:}}
{{ 1.  VARCHAR now has precision }}
{{ 2.  There are twice as many columns as there should be.}}
{{  }}
{{ Additionally, if you query a regular CSV, without the columns extracted, you get the following:}}
{{  }}
{{ "rows": [}}
{{    }}

{       "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]"     }

  ],
   "metadata": [
     "VARCHAR(0, 0)",
     "VARCHAR(0, 0)"
   ],
  

  was:
I'm noticing some strange behavior with the newest version of Drill.  If you query a CSV file, you get the following metadata:
 
SELECT * FROM dfs.test.`domains.csvh` LIMIT 1
 
{
  "queryId": "22eee85f-c02c-5878-9735-091d18788061",
  "columns": [
    "domain"
  ],
  "rows": [
    {
      "domain": "thedataist.com"
    }
  ],
  "metadata": [
    "VARCHAR(0, 0)",
    "VARCHAR(0, 0)"
  ],
  "queryState": "COMPLETED",
  "attemptedAutoLimit": 0
}
 
 
There are two issues here:
1.  VARCHAR now has precision 
2.  There are twice as many columns as there should be.
 
Additionally, if you query a regular CSV, without the columns extracted, you get the following:
 
"rows": [
    {
      "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]"
    }
  ],
  "metadata": [
    "VARCHAR(0, 0)",
    "VARCHAR(0, 0)"
  ],
 


> Incorrect Metadata from text file queries
> -----------------------------------------
>
>                 Key: DRILL-7308
>                 URL: https://issues.apache.org/jira/browse/DRILL-7308
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Metadata
>    Affects Versions: 1.17.0
>            Reporter: Charles Givre
>            Priority: Major
>         Attachments: Screen Shot 2019-06-24 at 3.16.40 PM.png, domains.csvh
>
>
> {{I'm noticing some strange behavior with the newest version of Drill.  If you query a CSV file, you get the following metadata:}}
> {{  }}
> {{ SELECT * FROM dfs.test.`domains.csvh` LIMIT 1}}
> {{  }}
> {{ {}}
> {{   "queryId": "22eee85f-c02c-5878-9735-091d18788061",}}
> {{   "columns": [}}
> {{     "domain"}}
> {{   ],}}
> {{   "rows": [}}
> {{    }}{{{       "domain": "thedataist.com"     }}}{{  ],}}
> {{   "metadata": [}}
> {{     "VARCHAR(0, 0)",}}
> {{     "VARCHAR(0, 0)"}}
> {{   ],}}
> {{   "queryState": "COMPLETED",}}
> {{   "attemptedAutoLimit": 0}}
> {{ }}}
> {{  }}
> {{  }}
> {{ There are two issues here:}}
> {{ 1.  VARCHAR now has precision }}
> {{ 2.  There are twice as many columns as there should be.}}
> {{  }}
> {{ Additionally, if you query a regular CSV, without the columns extracted, you get the following:}}
> {{  }}
> {{ "rows": [}}
> {{    }}
> {       "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]"     }
>   ],
>    "metadata": [
>      "VARCHAR(0, 0)",
>      "VARCHAR(0, 0)"
>    ],
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)