You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by "benj (JIRA)" <ji...@apache.org> on 2019/03/14 14:59:00 UTC
[jira] [Created] (DRILL-7104) Change of data type when parquet with
multiple fragment
benj created DRILL-7104:
---------------------------
Summary: Change of data type when parquet with multiple fragment
Key: DRILL-7104
URL: https://issues.apache.org/jira/browse/DRILL-7104
Project: Apache Drill
Issue Type: Bug
Components: Storage - Parquet
Affects Versions: 1.15.0
Reporter: benj
When creating a Parquet with a column filled only with "CAST(NULL AS VARCHAR)", if the parquet has several fragment, the type is read like INT instead of VARCHAR.
First, create +Parquet with only one fragment+ - all is fine (the type of "demo" is correct).
{code:java}
CREATE TABLE ....`bug` AS
(SELECT CAST(NULL AS VARCHAR) AS demo
, md5(cast(rand() AS VARCHAR) AS jam
FROM ....`onebigfile` LIMIT 1000000));
+-----------+----------------------------+
| Fragment | Number of records written |
+-----------+----------------------------+
| 0_0 | 10000000 |
SELECT drilltypeof(demo) AS goodtype FROM ....`bug` LIMIT 1;
+--------------------+
| goodtype |
+--------------------+
| VARCHAR |
{code}
Second, create +Parquet with at least 2 fragments+ - the type of "demo" change to INT
{code:java}
CREATE TABLE ....`bug` AS
((SELECT CAST(NULL AS VARCHAR) AS demo
,md5(CAST(rand() AS VARCHAR)) AS jam
FROM ....`onebigfile` LIMIT 1000000)
UNION
(SELECT CAST(NULL AS VARCHAR) AS demo
,md5(CAST(rand() AS VARCHAR)) AS jam
FROM ....`onebigfile` LIMIT 1000000));
+-----------+----------------------------+
| Fragment | Number of records written |
+-----------+----------------------------+
| 1_1 | 1000276 |
| 1_0 | 999724 |
SELECT drilltypeof(demo) AS badtype FROM ....`bug` LIMIT 1;
+--------------------+
| badtype |
+--------------------+
| INT |{code}
The change of type is really terrible...
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)