You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "benj (Jira)" <ji...@apache.org> on 2020/02/20 12:32:00 UTC

[jira] [Updated] (DRILL-7595) Change of data type from bigint to int when parquet with multiple fragment

     [ https://issues.apache.org/jira/browse/DRILL-7595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

benj updated DRILL-7595:
------------------------
    Description: 
like on DRILL-7104, there is a bug that change the type from BIGINT to INT where a parquet have multiple fragment

With a file containing few row (all is fine (we store a BIGINT and really have a BIGINT in the Parquet)
{code:sql}
apache drill> CREATE TABLE dfs.tmp.`out_pqt` AS (SELECT CAST(0 as BIGINT) AS d FROM dfs.tmp.`fewrowfile`;
+----------+---------------------------+
| Fragment | Number of records written |
+----------+---------------------------+
| 1_0      | 1500                      |
+----------+---------------------------+
apache drill> SELECT typeof(d) FROM dfs.tmp.`out_pqt`;
+--------+
| EXPR$0 |
+--------+
| BIGINT |
+--------+
{code}
With a file containing "enough" row (there is a problem (we store a BIGINT but we unfortunatly have an INT in the Parquet)
{code:sql}
apache drill> CREATE TABLE dfs.tmp.`out_pqt` AS (SELECT CAST(0 as BIGINT) AS d FROM dfs.tmp.`manyrowfile`;
+----------+---------------------------+
| Fragment | Number of records written |
+----------+---------------------------+
| 1_1      | 934111                    |
| 1_0      | 1488743                   |
+----------+---------------------------+
apache drill> SELECT typeof(d) FROM dfs.tmp.`out_pqt`;
+--------+
| EXPR$0 |
+--------+
| INT    |
+--------+
{code}
 
It's not really satisfactory but please note that there is a Trick to avoid this problem: using a CAST('0' AS BIGINT) instead of a CAST(0 AS BIGINT) 
{code:sql}
apache drill> CREATE TABLE dfs.tmp.`out_pqt` AS (SELECT CAST('0' as BIGINT) AS d FROM dfs.tmp.`manyrowfile`;
+----------+---------------------------+
| Fragment | Number of records written |
+----------+---------------------------+
| 1_1      | 934111                    |
| 1_0      | 1488743                   |
+----------+---------------------------+
apache drill> SELECT typeof(d) FROM dfs.tmp.`out_pqt`;
+--------+
| EXPR$0 |
+--------+
| BIGINT |
+--------+
{code}


 

  was:
like on DRILL-7104, there is a bug that change the type from BIGINT to INT where a parquet have multiple fragment

With a file containing few row (all is fine (we store a BIGINT and really have a BIGINT in the Parquet)
{code:sql}
apache drill> CREATE TABLE dfs.tmp.`out_pqt` AS (SELECT CAST(0 as BIGINT) AS d FROM dfs.tmp.`fewrowfile`;
+----------+---------------------------+
| Fragment | Number of records written |
+----------+---------------------------+
| 1_0      | 1500                      |
+----------+---------------------------+
apache drill> SELECT typeof(d) FROM dfs.tmp.`out_pqt`;
+--------+
| EXPR$0 |
+--------+
| BIGINT |
+--------+
{code}
With a file containing "enough" row (there is a problem (we store a BIGINT but we unfortunatly have an INT in the Parquet)
{code:sql}
apache drill> CREATE TABLE dfs.tmp.`out_pqt` AS (SELECT CAST(0 as BIGINT) AS d FROM dfs.tmp.`fewrowfile`;
+----------+---------------------------+
| Fragment | Number of records written |
+----------+---------------------------+
| 1_1      | 934111                    |
| 1_0      | 1488743                   |
+----------+---------------------------+
apache drill> SELECT typeof(d) FROM dfs.tmp.`out_pqt`;
+--------+
| EXPR$0 |
+--------+
| INT    |
+--------+
{code}
 
It's not really satisfactory but please note that there is a Trick to avoid this problem: using a CAST('0' AS BIGINT) instead of a CAST(0 AS BIGINT) 
{code:sql}
apache drill> CREATE TABLE dfs.tmp.`out_pqt` AS (SELECT CAST('0' as BIGINT) AS d FROM dfs.tmp.`fewrowfile`;
+----------+---------------------------+
| Fragment | Number of records written |
+----------+---------------------------+
| 1_1      | 934111                    |
| 1_0      | 1488743                   |
+----------+---------------------------+
apache drill> SELECT typeof(d) FROM dfs.tmp.`out_pqt`;
+--------+
| EXPR$0 |
+--------+
| BIGINT |
+--------+
{code}


 


> Change of data type from bigint to int when parquet with multiple fragment
> --------------------------------------------------------------------------
>
>                 Key: DRILL-7595
>                 URL: https://issues.apache.org/jira/browse/DRILL-7595
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Parquet
>    Affects Versions: 1.17.0
>            Reporter: benj
>            Priority: Major
>
> like on DRILL-7104, there is a bug that change the type from BIGINT to INT where a parquet have multiple fragment
> With a file containing few row (all is fine (we store a BIGINT and really have a BIGINT in the Parquet)
> {code:sql}
> apache drill> CREATE TABLE dfs.tmp.`out_pqt` AS (SELECT CAST(0 as BIGINT) AS d FROM dfs.tmp.`fewrowfile`;
> +----------+---------------------------+
> | Fragment | Number of records written |
> +----------+---------------------------+
> | 1_0      | 1500                      |
> +----------+---------------------------+
> apache drill> SELECT typeof(d) FROM dfs.tmp.`out_pqt`;
> +--------+
> | EXPR$0 |
> +--------+
> | BIGINT |
> +--------+
> {code}
> With a file containing "enough" row (there is a problem (we store a BIGINT but we unfortunatly have an INT in the Parquet)
> {code:sql}
> apache drill> CREATE TABLE dfs.tmp.`out_pqt` AS (SELECT CAST(0 as BIGINT) AS d FROM dfs.tmp.`manyrowfile`;
> +----------+---------------------------+
> | Fragment | Number of records written |
> +----------+---------------------------+
> | 1_1      | 934111                    |
> | 1_0      | 1488743                   |
> +----------+---------------------------+
> apache drill> SELECT typeof(d) FROM dfs.tmp.`out_pqt`;
> +--------+
> | EXPR$0 |
> +--------+
> | INT    |
> +--------+
> {code}
>  
> It's not really satisfactory but please note that there is a Trick to avoid this problem: using a CAST('0' AS BIGINT) instead of a CAST(0 AS BIGINT) 
> {code:sql}
> apache drill> CREATE TABLE dfs.tmp.`out_pqt` AS (SELECT CAST('0' as BIGINT) AS d FROM dfs.tmp.`manyrowfile`;
> +----------+---------------------------+
> | Fragment | Number of records written |
> +----------+---------------------------+
> | 1_1      | 934111                    |
> | 1_0      | 1488743                   |
> +----------+---------------------------+
> apache drill> SELECT typeof(d) FROM dfs.tmp.`out_pqt`;
> +--------+
> | EXPR$0 |
> +--------+
> | BIGINT |
> +--------+
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)