You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by "Jinfeng Ni (JIRA)" <ji...@apache.org> on 2017/08/28 22:31:00 UTC
[jira] [Created] (DRILL-5747) Drill should put directory name field in same sequence w.r.t regular column for select * query

Jinfeng Ni created DRILL-5747:
---------------------------------

             Summary: Drill should put directory name field in same sequence w.r.t regular column for select * query
                 Key: DRILL-5747
                 URL: https://issues.apache.org/jira/browse/DRILL-5747
             Project: Apache Drill
          Issue Type: Bug
            Reporter: Jinfeng Ni
            Assignee: Jinfeng Ni


Today,  star column * in Drill would expand into a list of regular columns, and the directory name field such as dir0, dir1.  However, Drill does not put the directory name field with respect to regular field in a consistent way.

For instance, for parquet files, dir0 is put behind the list of regular columns.

{code}
select * from dfs.tmp.parquetTbl where dir0 = 1990;
+--------------+--------------+--------------+--------------+-------+
| N_NATIONKEY  |    N_NAME    | N_REGIONKEY  |  N_COMMENT   | dir0  |
+--------------+--------------+--------------+--------------+-------+
| 0            | [B@5527446   | 0            | [B@684fa264  | 1990  |
| 1            | [B@442e88bc  | 1            | [B@4b13119c  | 1990  |
| 2            | [B@50e93f45  | 1            | [B@138f483   | 1990  |
| 3            | [B@423cc515  | 1            | [B@23af07ac  | 1990  |
| 4            | [B@3820bf81  | 4            | [B@6dfccaf0  | 1990  |
| 5            | [B@6f6f8af9  | 0            | [B@40d1a97   | 1990  |
| 6            | [B@784cb194  | 3            | [B@731ea93f  | 1990  |
| 7            | [B@61f9a224  | 3            | [B@4c041bbc  | 1990  |
| 8            | [B@21b8faa1  | 2            | [B@774e7152  | 1990  |
| 9            | [B@3ef1fbaf  | 2            | [B@c2be72    | 1990  |
| 10           | [B@71652ec1  | 4            | [B@29e0bb10  | 1990  |
| 11           | [B@61192cea  | 4            | [B@3bd3e873  | 1990  |
| 12           | [B@5541f4b4  | 2            | [B@5d288126  | 1990  |
| 13           | [B@e371592   | 4            | [B@42692b88  | 1990  |
| 14           | [B@6a90fc8   | 0            | [B@454b16e2  | 1990  |
| 15           | [B@44cb72f8  | 0            | [B@8e91b11   | 1990  |
| 16           | [B@7feffda8  | 0            | [B@64f66236  | 1990  |
| 17           | [B@6ba9fb02  | 1            | [B@649e7786  | 1990  |
| 18           | [B@5fb93205  | 2            | [B@7783175b  | 1990  |
| 19           | [B@3f7294a9  | 3            | [B@7b7e03c9  | 1990  |
| 20           | [B@e2ac076   | 4            | [B@18c18a3e  | 1990  |
| 21           | [B@4a5af924  | 2            | [B@1a9ad09f  | 1990  |
| 22           | [B@29f6845e  | 3            | [B@776c4cd7  | 1990  |
| 23           | [B@6728f481  | 3            | [B@31cc7610  | 1990  |
| 24           | [B@665b2dfa  | 1            | [B@6c27ac95  | 1990  |
+--------------+--------------+--------------+--------------+-------+
{code}
Notice in the above output, dir0 = 1990 is the last column.

However, for JSON, dir0 is put in front of the list of regular columns.

{code}
select * from dfs.tmp.jsonTbl where dir0 = 1990;
+-------+------+
| dir0  |  a   |
+-------+------+
| 1990  | 100  |
| 1990  | 200  |
+-------+------+
{code}

It would be good to present the directory name field in the same sequence regardless of file format, storage plugin. IMHO, it makes sense to put the directory name field in front of the list of regular columns ( the behavior that JSON format present today).

This ticket is opened to modify Drill's ScanBatch code for the above explained purpose.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)