You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Boaz Ben-Zvi (JIRA)" <ji...@apache.org> on 2016/10/25 01:41:58 UTC

[jira] [Created] (DRILL-4960) Wrong columns after scanning Json files where some files have missing columns

Boaz Ben-Zvi created DRILL-4960:
-----------------------------------

             Summary: Wrong columns after scanning Json files where some files have missing columns
                 Key: DRILL-4960
                 URL: https://issues.apache.org/jira/browse/DRILL-4960
             Project: Apache Drill
          Issue Type: Bug
          Components:  Server
    Affects Versions: 1.8.0
         Environment: Mac
            Reporter: Boaz Ben-Zvi


(This problem may be more general than just Json)

To recreate: Scan two small Json files (e.g. copy twice contrib/storage-mongo/src/test/resources/emp.json ) where in one of the files a whole column was eliminated (e.g. "last_name"). 

A "normal" scan (the missing column shows up as nulls):

0: jdbc:drill:zk=local> select * from `drill/data/emp`;
+--------------+---------------------+-------------+--------------+--------------+---------+------------+--------+
| employee_id  |      full_name      | first_name  |  last_name   | position_id  | rating  |  position  | isFTE  |
+--------------+---------------------+-------------+--------------+--------------+---------+------------+--------+
| 1101         | Steve Eurich        | Steve       | Eurich       | 16           | 23.0    | Store T    | true   |
| 1102         | Mary Pierson        | Mary        | Pierson      | 16           | 45.6    | Store T    | true   |
| 1103         | Leo Jones           | Leo         | Jones        | 16           | 85.94   | Store Tem  | true   |
| 1104         | Nancy Beatty        | Nancy       | Beatty       | 16           | 97.16   | Store T    | false  |
| 1105         | Clara McNight       | Clara       | McNight      | 16           | 81.25   | Store      | true   |
| 1106         | null                | Marcella    | Isaacs       | 17           | 67.86   | Stor       | false  |
| 1107         | Charlotte Yonce     | Charlotte   | Yonce        | 17           | 52.17   | Stor       | true   |
| 1108         | Benjamin Foster     | Benjamin    | Foster       | 17           | 89.8    | Stor       | false  |
| 1109         | John Reed           | John        | Reed         | 17           | 12.9    | Store Per  | false  |
| 1110         | Lynn Kwiatkowski    | Lynn        | Kwiatkowski  | 17           | 25.76   | St         | true   |
| 1111         | Donald Vann         | Donald      | Vann         | 17           | 34.86   | Store Per  | false  |
| 1112         | null                | William     | Smith        | null         | 79.06   | St         | true   |
| 1113         | Amy Hensley         | Amy         | Hensley      | 17           | 82.96   | Store Pe   | false  |
| 1114         | Judy Owens          | Judy        | Owens        | 17           | 24.6    | Store Per  | true   |
| 1115         | Frederick Castillo  | Frederick   | Castillo     | 17           | 82.36   | S          | false  |
| 1116         | Phil Munoz          | Phil        | Munoz        | 17           | 97.63   | Store Per  | false  |
| 1117         | Lori Lightfoot      | Lori        | Lightfoot    | 17           | 39.16   | Store      | true   |
| 1            | Kumar               | Anil        | B            | 19           | 45.45   | Store      | true   |
| 2            | Kamesh              | Bh          | Venkata      | null         | 32.89   | Store      | true   |
| 1101         | Steve Eurich        | Steve       | null         | 16           | 23.0    | Store T    | true   |
| 1102         | Mary Pierson        | Mary        | null         | 16           | 45.6    | Store T    | true   |
| 1103         | Leo Jones           | Leo         | null         | 16           | 85.94   | Store Tem  | true   |
| 1104         | Nancy Beatty        | Nancy       | null         | 16           | 97.16   | Store T    | false  |
| 1105         | Clara McNight       | Clara       | null         | 16           | 81.25   | Store      | true   |
| 1106         | null                | Marcella    | null         | 17           | 67.86   | Stor       | false  |
| 1107         | Charlotte Yonce     | Charlotte   | null         | 17           | 52.17   | Stor       | true   |
| 1108         | Benjamin Foster     | Benjamin    | null         | 17           | 89.8    | Stor       | false  |
| 1109         | John Reed           | John        | null         | 17           | 12.9    | Store Per  | false  |
| 1110         | Lynn Kwiatkowski    | Lynn        | null         | 17           | 25.76   | St         | true   |
| 1111         | Donald Vann         | Donald      | null         | 17           | 34.86   | Store Per  | false  |
| 1112         | null                | William     | null         | null         | 79.06   | St         | true   |
| 1113         | Amy Hensley         | Amy         | null         | 17           | 82.96   | Store Pe   | false  |
| 1114         | Judy Owens          | Judy        | null         | 17           | 24.6    | Store Per  | true   |
| 1115         | Frederick Castillo  | Frederick   | null         | 17           | 82.36   | S          | false  |
| 1116         | Phil Munoz          | Phil        | null         | 17           | 97.63   | Store Per  | false  |
| 1117         | Lori Lightfoot      | Lori        | null         | 17           | 39.16   | Store      | true   |
| 1            | Kumar               | Anil        | null         | 19           | 45.45   | Store      | true   |
| 2            | Kamesh              | Bh          | null         | null         | 32.89   | Store      | true   |
+--------------+---------------------+-------------+--------------+--------------+---------+------------+--------+
38 rows selected (0.16 seconds)

But when the first alphabetically ordered file name is renamed to become second, that column ("last_name") does not show:

0: jdbc:drill:zk=local> select * from foo;
+--------------+---------------------+-------------+--------------+---------+------------+--------+
| employee_id  |      full_name      | first_name  | position_id  | rating  |  position  | isFTE  |
+--------------+---------------------+-------------+--------------+---------+------------+--------+
| 1101         | Steve Eurich        | Steve       | 16           | 23.0    | Store T    | true   |
| 1102         | Mary Pierson        | Mary        | 16           | 45.6    | Store T    | true   |
| 1103         | Leo Jones           | Leo         | 16           | 85.94   | Store Tem  | true   |
| 1104         | Nancy Beatty        | Nancy       | 16           | 97.16   | Store T    | false  |
| 1105         | Clara McNight       | Clara       | 16           | 81.25   | Store      | true   |
| 1106         | null                | Marcella    | 17           | 67.86   | Stor       | false  |
| 1107         | Charlotte Yonce     | Charlotte   | 17           | 52.17   | Stor       | true   |
| 1108         | Benjamin Foster     | Benjamin    | 17           | 89.8    | Stor       | false  |
| 1109         | John Reed           | John        | 17           | 12.9    | Store Per  | false  |
| 1110         | Lynn Kwiatkowski    | Lynn        | 17           | 25.76   | St         | true   |
| 1111         | Donald Vann         | Donald      | 17           | 34.86   | Store Per  | false  |
| 1112         | null                | William     | null         | 79.06   | St         | true   |
| 1113         | Amy Hensley         | Amy         | 17           | 82.96   | Store Pe   | false  |
| 1114         | Judy Owens          | Judy        | 17           | 24.6    | Store Per  | true   |
| 1115         | Frederick Castillo  | Frederick   | 17           | 82.36   | S          | false  |
| 1116         | Phil Munoz          | Phil        | 17           | 97.63   | Store Per  | false  |
| 1117         | Lori Lightfoot      | Lori        | 17           | 39.16   | Store      | true   |
| 1            | Kumar               | Anil        | 19           | 45.45   | Store      | true   |
| 2            | Kamesh              | Bh          | null         | 32.89   | Store      | true   |
| 1101         | Steve Eurich        | Steve       | 16           | 23.0    | Store T    | true   |
| 1102         | Mary Pierson        | Mary        | 16           | 45.6    | Store T    | true   |
| 1103         | Leo Jones           | Leo         | 16           | 85.94   | Store Tem  | true   |
| 1104         | Nancy Beatty        | Nancy       | 16           | 97.16   | Store T    | false  |
| 1105         | Clara McNight       | Clara       | 16           | 81.25   | Store      | true   |
| 1106         | null                | Marcella    | 17           | 67.86   | Stor       | false  |
| 1107         | Charlotte Yonce     | Charlotte   | 17           | 52.17   | Stor       | true   |
| 1108         | Benjamin Foster     | Benjamin    | 17           | 89.8    | Stor       | false  |
| 1109         | John Reed           | John        | 17           | 12.9    | Store Per  | false  |
| 1110         | Lynn Kwiatkowski    | Lynn        | 17           | 25.76   | St         | true   |
| 1111         | Donald Vann         | Donald      | 17           | 34.86   | Store Per  | false  |
| 1112         | null                | William     | null         | 79.06   | St         | true   |
| 1113         | Amy Hensley         | Amy         | 17           | 82.96   | Store Pe   | false  |
| 1114         | Judy Owens          | Judy        | 17           | 24.6    | Store Per  | true   |
| 1115         | Frederick Castillo  | Frederick   | 17           | 82.36   | S          | false  |
| 1116         | Phil Munoz          | Phil        | 17           | 97.63   | Store Per  | false  |
| 1117         | Lori Lightfoot      | Lori        | 17           | 39.16   | Store      | true   |
| 1            | Kumar               | Anil        | 19           | 45.45   | Store      | true   |
| 2            | Kamesh              | Bh          | null         | 32.89   | Store      | true   |
+--------------+---------------------+-------------+--------------+---------+------------+--------+
38 rows selected (0.261 seconds)

But if requested explicitly, the column does show:

0: jdbc:drill:zk=local> select last_name from `drill/data/emp`;
+--------------+
|  last_name   |
+--------------+
| null         |
| null         |
| null         |
| null         |
| null         |
| null         |
| null         |
| null         |
| null         |
| null         |
| null         |
| null         |
| null         |
| null         |
| null         |
| null         |
| null         |
| null         |
| null         |
| Eurich       |
| Pierson      |
| Jones        |
| Beatty       |
| McNight      |
| Isaacs       |
| Yonce        |
| Foster       |
| Reed         |
| Kwiatkowski  |
| Vann         |
| Smith        |
| Hensley      |
| Owens        |
| Castillo     |
| Munoz        |
| Lightfoot    |
| B            |
| Venkata      |
+--------------+
38 rows selected (0.159 seconds)

Things get even WORSE when a parallel plan is chosen -- some column data shows up under the wrong columns:

0: jdbc:drill:zk=local> alter session set planner.slice_target = 1;
+-------+--------------------------------+
|  ok   |            summary             |
+-------+--------------------------------+
| true  | planner.slice_target updated.  |
+-------+--------------------------------+
1 row selected (0.084 seconds)
0: jdbc:drill:zk=local> select * from `drill/data/emp`;
+--------------+---------------------+-------------+--------------+---------+------------+------------+
| employee_id  |      full_name      | first_name  | position_id  | rating  |  position  |   isFTE    |
+--------------+---------------------+-------------+--------------+---------+------------+------------+
| 1101         | Steve Eurich        | Steve       | 16           | 23.0    | Store T    | true       |
| 1102         | Mary Pierson        | Mary        | 16           | 45.6    | Store T    | true       |
| 1103         | Leo Jones           | Leo         | 16           | 85.94   | Store Tem  | true       |
| 1104         | Nancy Beatty        | Nancy       | 16           | 97.16   | Store T    | false      |
| 1105         | Clara McNight       | Clara       | 16           | 81.25   | Store      | true       |
| 1106         | null                | Marcella    | 17           | 67.86   | Stor       | false      |
| 1107         | Charlotte Yonce     | Charlotte   | 17           | 52.17   | Stor       | true       |
| 1108         | Benjamin Foster     | Benjamin    | 17           | 89.8    | Stor       | false      |
| 1109         | John Reed           | John        | 17           | 12.9    | Store Per  | false      |
| 1110         | Lynn Kwiatkowski    | Lynn        | 17           | 25.76   | St         | true       |
| 1111         | Donald Vann         | Donald      | 17           | 34.86   | Store Per  | false      |
| 1112         | null                | William     | null         | 79.06   | St         | true       |
| 1113         | Amy Hensley         | Amy         | 17           | 82.96   | Store Pe   | false      |
| 1114         | Judy Owens          | Judy        | 17           | 24.6    | Store Per  | true       |
| 1115         | Frederick Castillo  | Frederick   | 17           | 82.36   | S          | false      |
| 1116         | Phil Munoz          | Phil        | 17           | 97.63   | Store Per  | false      |
| 1117         | Lori Lightfoot      | Lori        | 17           | 39.16   | Store      | true       |
| 1            | Kumar               | Anil        | 19           | 45.45   | Store      | true       |
| 2            | Kamesh              | Bh          | null         | 32.89   | Store      | true       |
| 1101         | Steve Eurich        | Steve       | Eurich       | 16      | 23.0       | Store T    |
| 1102         | Mary Pierson        | Mary        | Pierson      | 16      | 45.6       | Store T    |
| 1103         | Leo Jones           | Leo         | Jones        | 16      | 85.94      | Store Tem  |
| 1104         | Nancy Beatty        | Nancy       | Beatty       | 16      | 97.16      | Store T    |
| 1105         | Clara McNight       | Clara       | McNight      | 16      | 81.25      | Store      |
| 1106         | null                | Marcella    | Isaacs       | 17      | 67.86      | Stor       |
| 1107         | Charlotte Yonce     | Charlotte   | Yonce        | 17      | 52.17      | Stor       |
| 1108         | Benjamin Foster     | Benjamin    | Foster       | 17      | 89.8       | Stor       |
| 1109         | John Reed           | John        | Reed         | 17      | 12.9       | Store Per  |
| 1110         | Lynn Kwiatkowski    | Lynn        | Kwiatkowski  | 17      | 25.76      | St         |
| 1111         | Donald Vann         | Donald      | Vann         | 17      | 34.86      | Store Per  |
| 1112         | null                | William     | Smith        | null    | 79.06      | St         |
| 1113         | Amy Hensley         | Amy         | Hensley      | 17      | 82.96      | Store Pe   |
| 1114         | Judy Owens          | Judy        | Owens        | 17      | 24.6       | Store Per  |
| 1115         | Frederick Castillo  | Frederick   | Castillo     | 17      | 82.36      | S          |
| 1116         | Phil Munoz          | Phil        | Munoz        | 17      | 97.63      | Store Per  |
| 1117         | Lori Lightfoot      | Lori        | Lightfoot    | 17      | 39.16      | Store      |
| 1            | Kumar               | Anil        | B            | 19      | 45.45      | Store      |
| 2            | Kamesh              | Bh          | Venkata      | null    | 32.89      | Store      |
+--------------+---------------------+-------------+--------------+---------+------------+------------+
38 rows selected (0.253 seconds)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)