You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Boaz Ben-Zvi (JIRA)" <ji...@apache.org> on 2016/10/25 01:41:58 UTC
[jira] [Created] (DRILL-4960) Wrong columns after scanning Json
files where some files have missing columns
Boaz Ben-Zvi created DRILL-4960:
-----------------------------------
Summary: Wrong columns after scanning Json files where some files have missing columns
Key: DRILL-4960
URL: https://issues.apache.org/jira/browse/DRILL-4960
Project: Apache Drill
Issue Type: Bug
Components: Server
Affects Versions: 1.8.0
Environment: Mac
Reporter: Boaz Ben-Zvi
(This problem may be more general than just Json)
To recreate: Scan two small Json files (e.g. copy twice contrib/storage-mongo/src/test/resources/emp.json ) where in one of the files a whole column was eliminated (e.g. "last_name").
A "normal" scan (the missing column shows up as nulls):
0: jdbc:drill:zk=local> select * from `drill/data/emp`;
+--------------+---------------------+-------------+--------------+--------------+---------+------------+--------+
| employee_id | full_name | first_name | last_name | position_id | rating | position | isFTE |
+--------------+---------------------+-------------+--------------+--------------+---------+------------+--------+
| 1101 | Steve Eurich | Steve | Eurich | 16 | 23.0 | Store T | true |
| 1102 | Mary Pierson | Mary | Pierson | 16 | 45.6 | Store T | true |
| 1103 | Leo Jones | Leo | Jones | 16 | 85.94 | Store Tem | true |
| 1104 | Nancy Beatty | Nancy | Beatty | 16 | 97.16 | Store T | false |
| 1105 | Clara McNight | Clara | McNight | 16 | 81.25 | Store | true |
| 1106 | null | Marcella | Isaacs | 17 | 67.86 | Stor | false |
| 1107 | Charlotte Yonce | Charlotte | Yonce | 17 | 52.17 | Stor | true |
| 1108 | Benjamin Foster | Benjamin | Foster | 17 | 89.8 | Stor | false |
| 1109 | John Reed | John | Reed | 17 | 12.9 | Store Per | false |
| 1110 | Lynn Kwiatkowski | Lynn | Kwiatkowski | 17 | 25.76 | St | true |
| 1111 | Donald Vann | Donald | Vann | 17 | 34.86 | Store Per | false |
| 1112 | null | William | Smith | null | 79.06 | St | true |
| 1113 | Amy Hensley | Amy | Hensley | 17 | 82.96 | Store Pe | false |
| 1114 | Judy Owens | Judy | Owens | 17 | 24.6 | Store Per | true |
| 1115 | Frederick Castillo | Frederick | Castillo | 17 | 82.36 | S | false |
| 1116 | Phil Munoz | Phil | Munoz | 17 | 97.63 | Store Per | false |
| 1117 | Lori Lightfoot | Lori | Lightfoot | 17 | 39.16 | Store | true |
| 1 | Kumar | Anil | B | 19 | 45.45 | Store | true |
| 2 | Kamesh | Bh | Venkata | null | 32.89 | Store | true |
| 1101 | Steve Eurich | Steve | null | 16 | 23.0 | Store T | true |
| 1102 | Mary Pierson | Mary | null | 16 | 45.6 | Store T | true |
| 1103 | Leo Jones | Leo | null | 16 | 85.94 | Store Tem | true |
| 1104 | Nancy Beatty | Nancy | null | 16 | 97.16 | Store T | false |
| 1105 | Clara McNight | Clara | null | 16 | 81.25 | Store | true |
| 1106 | null | Marcella | null | 17 | 67.86 | Stor | false |
| 1107 | Charlotte Yonce | Charlotte | null | 17 | 52.17 | Stor | true |
| 1108 | Benjamin Foster | Benjamin | null | 17 | 89.8 | Stor | false |
| 1109 | John Reed | John | null | 17 | 12.9 | Store Per | false |
| 1110 | Lynn Kwiatkowski | Lynn | null | 17 | 25.76 | St | true |
| 1111 | Donald Vann | Donald | null | 17 | 34.86 | Store Per | false |
| 1112 | null | William | null | null | 79.06 | St | true |
| 1113 | Amy Hensley | Amy | null | 17 | 82.96 | Store Pe | false |
| 1114 | Judy Owens | Judy | null | 17 | 24.6 | Store Per | true |
| 1115 | Frederick Castillo | Frederick | null | 17 | 82.36 | S | false |
| 1116 | Phil Munoz | Phil | null | 17 | 97.63 | Store Per | false |
| 1117 | Lori Lightfoot | Lori | null | 17 | 39.16 | Store | true |
| 1 | Kumar | Anil | null | 19 | 45.45 | Store | true |
| 2 | Kamesh | Bh | null | null | 32.89 | Store | true |
+--------------+---------------------+-------------+--------------+--------------+---------+------------+--------+
38 rows selected (0.16 seconds)
But when the first alphabetically ordered file name is renamed to become second, that column ("last_name") does not show:
0: jdbc:drill:zk=local> select * from foo;
+--------------+---------------------+-------------+--------------+---------+------------+--------+
| employee_id | full_name | first_name | position_id | rating | position | isFTE |
+--------------+---------------------+-------------+--------------+---------+------------+--------+
| 1101 | Steve Eurich | Steve | 16 | 23.0 | Store T | true |
| 1102 | Mary Pierson | Mary | 16 | 45.6 | Store T | true |
| 1103 | Leo Jones | Leo | 16 | 85.94 | Store Tem | true |
| 1104 | Nancy Beatty | Nancy | 16 | 97.16 | Store T | false |
| 1105 | Clara McNight | Clara | 16 | 81.25 | Store | true |
| 1106 | null | Marcella | 17 | 67.86 | Stor | false |
| 1107 | Charlotte Yonce | Charlotte | 17 | 52.17 | Stor | true |
| 1108 | Benjamin Foster | Benjamin | 17 | 89.8 | Stor | false |
| 1109 | John Reed | John | 17 | 12.9 | Store Per | false |
| 1110 | Lynn Kwiatkowski | Lynn | 17 | 25.76 | St | true |
| 1111 | Donald Vann | Donald | 17 | 34.86 | Store Per | false |
| 1112 | null | William | null | 79.06 | St | true |
| 1113 | Amy Hensley | Amy | 17 | 82.96 | Store Pe | false |
| 1114 | Judy Owens | Judy | 17 | 24.6 | Store Per | true |
| 1115 | Frederick Castillo | Frederick | 17 | 82.36 | S | false |
| 1116 | Phil Munoz | Phil | 17 | 97.63 | Store Per | false |
| 1117 | Lori Lightfoot | Lori | 17 | 39.16 | Store | true |
| 1 | Kumar | Anil | 19 | 45.45 | Store | true |
| 2 | Kamesh | Bh | null | 32.89 | Store | true |
| 1101 | Steve Eurich | Steve | 16 | 23.0 | Store T | true |
| 1102 | Mary Pierson | Mary | 16 | 45.6 | Store T | true |
| 1103 | Leo Jones | Leo | 16 | 85.94 | Store Tem | true |
| 1104 | Nancy Beatty | Nancy | 16 | 97.16 | Store T | false |
| 1105 | Clara McNight | Clara | 16 | 81.25 | Store | true |
| 1106 | null | Marcella | 17 | 67.86 | Stor | false |
| 1107 | Charlotte Yonce | Charlotte | 17 | 52.17 | Stor | true |
| 1108 | Benjamin Foster | Benjamin | 17 | 89.8 | Stor | false |
| 1109 | John Reed | John | 17 | 12.9 | Store Per | false |
| 1110 | Lynn Kwiatkowski | Lynn | 17 | 25.76 | St | true |
| 1111 | Donald Vann | Donald | 17 | 34.86 | Store Per | false |
| 1112 | null | William | null | 79.06 | St | true |
| 1113 | Amy Hensley | Amy | 17 | 82.96 | Store Pe | false |
| 1114 | Judy Owens | Judy | 17 | 24.6 | Store Per | true |
| 1115 | Frederick Castillo | Frederick | 17 | 82.36 | S | false |
| 1116 | Phil Munoz | Phil | 17 | 97.63 | Store Per | false |
| 1117 | Lori Lightfoot | Lori | 17 | 39.16 | Store | true |
| 1 | Kumar | Anil | 19 | 45.45 | Store | true |
| 2 | Kamesh | Bh | null | 32.89 | Store | true |
+--------------+---------------------+-------------+--------------+---------+------------+--------+
38 rows selected (0.261 seconds)
But if requested explicitly, the column does show:
0: jdbc:drill:zk=local> select last_name from `drill/data/emp`;
+--------------+
| last_name |
+--------------+
| null |
| null |
| null |
| null |
| null |
| null |
| null |
| null |
| null |
| null |
| null |
| null |
| null |
| null |
| null |
| null |
| null |
| null |
| null |
| Eurich |
| Pierson |
| Jones |
| Beatty |
| McNight |
| Isaacs |
| Yonce |
| Foster |
| Reed |
| Kwiatkowski |
| Vann |
| Smith |
| Hensley |
| Owens |
| Castillo |
| Munoz |
| Lightfoot |
| B |
| Venkata |
+--------------+
38 rows selected (0.159 seconds)
Things get even WORSE when a parallel plan is chosen -- some column data shows up under the wrong columns:
0: jdbc:drill:zk=local> alter session set planner.slice_target = 1;
+-------+--------------------------------+
| ok | summary |
+-------+--------------------------------+
| true | planner.slice_target updated. |
+-------+--------------------------------+
1 row selected (0.084 seconds)
0: jdbc:drill:zk=local> select * from `drill/data/emp`;
+--------------+---------------------+-------------+--------------+---------+------------+------------+
| employee_id | full_name | first_name | position_id | rating | position | isFTE |
+--------------+---------------------+-------------+--------------+---------+------------+------------+
| 1101 | Steve Eurich | Steve | 16 | 23.0 | Store T | true |
| 1102 | Mary Pierson | Mary | 16 | 45.6 | Store T | true |
| 1103 | Leo Jones | Leo | 16 | 85.94 | Store Tem | true |
| 1104 | Nancy Beatty | Nancy | 16 | 97.16 | Store T | false |
| 1105 | Clara McNight | Clara | 16 | 81.25 | Store | true |
| 1106 | null | Marcella | 17 | 67.86 | Stor | false |
| 1107 | Charlotte Yonce | Charlotte | 17 | 52.17 | Stor | true |
| 1108 | Benjamin Foster | Benjamin | 17 | 89.8 | Stor | false |
| 1109 | John Reed | John | 17 | 12.9 | Store Per | false |
| 1110 | Lynn Kwiatkowski | Lynn | 17 | 25.76 | St | true |
| 1111 | Donald Vann | Donald | 17 | 34.86 | Store Per | false |
| 1112 | null | William | null | 79.06 | St | true |
| 1113 | Amy Hensley | Amy | 17 | 82.96 | Store Pe | false |
| 1114 | Judy Owens | Judy | 17 | 24.6 | Store Per | true |
| 1115 | Frederick Castillo | Frederick | 17 | 82.36 | S | false |
| 1116 | Phil Munoz | Phil | 17 | 97.63 | Store Per | false |
| 1117 | Lori Lightfoot | Lori | 17 | 39.16 | Store | true |
| 1 | Kumar | Anil | 19 | 45.45 | Store | true |
| 2 | Kamesh | Bh | null | 32.89 | Store | true |
| 1101 | Steve Eurich | Steve | Eurich | 16 | 23.0 | Store T |
| 1102 | Mary Pierson | Mary | Pierson | 16 | 45.6 | Store T |
| 1103 | Leo Jones | Leo | Jones | 16 | 85.94 | Store Tem |
| 1104 | Nancy Beatty | Nancy | Beatty | 16 | 97.16 | Store T |
| 1105 | Clara McNight | Clara | McNight | 16 | 81.25 | Store |
| 1106 | null | Marcella | Isaacs | 17 | 67.86 | Stor |
| 1107 | Charlotte Yonce | Charlotte | Yonce | 17 | 52.17 | Stor |
| 1108 | Benjamin Foster | Benjamin | Foster | 17 | 89.8 | Stor |
| 1109 | John Reed | John | Reed | 17 | 12.9 | Store Per |
| 1110 | Lynn Kwiatkowski | Lynn | Kwiatkowski | 17 | 25.76 | St |
| 1111 | Donald Vann | Donald | Vann | 17 | 34.86 | Store Per |
| 1112 | null | William | Smith | null | 79.06 | St |
| 1113 | Amy Hensley | Amy | Hensley | 17 | 82.96 | Store Pe |
| 1114 | Judy Owens | Judy | Owens | 17 | 24.6 | Store Per |
| 1115 | Frederick Castillo | Frederick | Castillo | 17 | 82.36 | S |
| 1116 | Phil Munoz | Phil | Munoz | 17 | 97.63 | Store Per |
| 1117 | Lori Lightfoot | Lori | Lightfoot | 17 | 39.16 | Store |
| 1 | Kumar | Anil | B | 19 | 45.45 | Store |
| 2 | Kamesh | Bh | Venkata | null | 32.89 | Store |
+--------------+---------------------+-------------+--------------+---------+------------+------------+
38 rows selected (0.253 seconds)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)