You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Khurram Faraaz (JIRA)" <ji...@apache.org> on 2017/03/09 11:59:38 UTC

[jira] [Created] (DRILL-5336) Columns returned by select over CTAS created parquet are not in correct order.

Khurram Faraaz created DRILL-5336:
-------------------------------------

             Summary: Columns returned by select over CTAS created parquet are not in correct order.
                 Key: DRILL-5336
                 URL: https://issues.apache.org/jira/browse/DRILL-5336
             Project: Apache Drill
          Issue Type: Bug
          Components: Storage - Parquet
    Affects Versions: 1.10.0
            Reporter: Khurram Faraaz


The ordering of the columns in the result of the SELECT over CTAS created parquet file is not right.
Column col_int should be present before col_chr, however col_chr appears before col_int, in the result of select.
Note that there is a UNION ALL in the CTAS's SELECT statement.
And each of the select statements in the UNION ALL has an ORDER BY

The problem seems to be with the use of LIMIT clause in the SELECT on dfs.tmp.temp_tbl_unall.

Here is the parquet schema for the CTAS created parquet file. Note that col_int appears before col_chr in the parquet schema too.

{noformat}
[root@centos-01 parquet-tools]# hadoop fs -get /tmp/temp_tbl_unall/0_0_0.parquet .
[root@centos-01 parquet-tools]# ./parquet-schema 0_0_0.parquet
message root {
  optional int32 col_int;
  optional binary col_chr (UTF8);
  optional binary col_vrchr1 (UTF8);
  optional binary col_vrchr2 (UTF8);
}
{noformat}

Drill 1.10.0 git commit id : 3dfb4972

{noformat}
0: jdbc:drill:schema=dfs.tmp> CREATE TABLE dfs.tmp.temp_tbl_unall as ( SELECT col_int, col_chr, col_vrchr1, col_vrchr2 FROM typeall_l order by col_int ) UNION ALL ( SELECT col_int, col_chr, col_vrchr1, col_vrchr2 FROM typeall_r order by col_int );
+-----------+----------------------------+
| Fragment  | Number of records written  |
+-----------+----------------------------+
| 0_0       | 1107                       |
+-----------+----------------------------+
1 row selected (0.381 seconds)
0: jdbc:drill:schema=dfs.tmp> SELECT * FROM dfs.tmp.temp_tbl_unall ORDER BY col_int LIMIT 100;
+---------+---------+------------+------------+
| col_chr | col_int | col_vrchr1 | col_vrchr2 |
+---------+---------+------------+------------+
| MI | 0 | Felecia Gourd | NLBQMg9 |
| DE | 1 | Alvina Jenkins | f9MqJlnNettlCVGcShifgMgnzL5FrZmHysoMBe6kDtA |
| HI | 1 | Fredrick Vanderburg | eN3CNLW8FE5voAksuJCSYnMdJrVown7my6DiAlI8KhrG69kQoAxKFJmOHPVca1FjGyHWd5Ag53vvODvKB8YwqXcbDihjR0DDbed1cgs7L1tndiPRvU1OreN5ByB8pF0QisgwSBWRKRvS8RVOzA3CyxOpjyxVujRLLlctww0jWwn09m3iINTi6Delw |
| CA | 19 | John Doe | test string |
| CA | 19 | John Doe | test string |
...
 
 
| LA | 6854 | William Burk | 5krBT7wj8BkoiRUWV9HjkyIT1DRpPj6bNixK15g4gs9IEsKc5myCyzMKQk5k1 |
+----+------+---------------+-----------------+
| col_chr | col_int | col_vrchr1 | col_vrchr2 |
+----+------+---------------+-----------------+
| IN | 6870 | Caroline Bell | M2811poVmVJLuxqsHz0jzRSGrAJDXfl3UuE0Iz8ldqvRURURvq2dO4Q1358eiureI20NCGBl9lBpoKPc78TWS0gsWhIt280E8JZPQpj7lOJXnHUmvydDiBPgAzNoGn7SSP6xYlnMyBhvWRxB5NF3I9vszosjmpW1Yx7et56QvwLfWBb3unJPnrxVYXX5tAfeyednJ4A90aOE2dhMXy1wLwewMJ91SWBEUM8TU3aGikQ5Ax6dDhDBQLaP |
+----+------+---------------+-----------------+
100 rows selected (0.173 seconds)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)