You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2017/06/19 18:45:00 UTC
[jira] [Assigned] (DRILL-4960) Wrong columns after scanning Json
files where some files have missing columns
[ https://issues.apache.org/jira/browse/DRILL-4960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Paul Rogers reassigned DRILL-4960:
----------------------------------
Assignee: Paul Rogers
> Wrong columns after scanning Json files where some files have missing columns
> -----------------------------------------------------------------------------
>
> Key: DRILL-4960
> URL: https://issues.apache.org/jira/browse/DRILL-4960
> Project: Apache Drill
> Issue Type: Bug
> Components: Server
> Affects Versions: 1.8.0
> Environment: Mac
> Reporter: Boaz Ben-Zvi
> Assignee: Paul Rogers
>
> (This problem may be more general than just Json)
> To recreate: Scan two small Json files (e.g. copy twice contrib/storage-mongo/src/test/resources/emp.json ) where in one of the files a whole column was eliminated (e.g. "last_name").
> A "normal" scan (the missing column shows up as nulls):
> 0: jdbc:drill:zk=local> select * from `drill/data/emp`;
> +--------------+---------------------+-------------+--------------+--------------+---------+------------+--------+
> | employee_id | full_name | first_name | last_name | position_id | rating | position | isFTE |
> +--------------+---------------------+-------------+--------------+--------------+---------+------------+--------+
> | 1101 | Steve Eurich | Steve | Eurich | 16 | 23.0 | Store T | true |
> | 1102 | Mary Pierson | Mary | Pierson | 16 | 45.6 | Store T | true |
> | 1103 | Leo Jones | Leo | Jones | 16 | 85.94 | Store Tem | true |
> | 1104 | Nancy Beatty | Nancy | Beatty | 16 | 97.16 | Store T | false |
> | 1105 | Clara McNight | Clara | McNight | 16 | 81.25 | Store | true |
> | 1106 | null | Marcella | Isaacs | 17 | 67.86 | Stor | false |
> | 1107 | Charlotte Yonce | Charlotte | Yonce | 17 | 52.17 | Stor | true |
> | 1108 | Benjamin Foster | Benjamin | Foster | 17 | 89.8 | Stor | false |
> | 1109 | John Reed | John | Reed | 17 | 12.9 | Store Per | false |
> | 1110 | Lynn Kwiatkowski | Lynn | Kwiatkowski | 17 | 25.76 | St | true |
> | 1111 | Donald Vann | Donald | Vann | 17 | 34.86 | Store Per | false |
> | 1112 | null | William | Smith | null | 79.06 | St | true |
> | 1113 | Amy Hensley | Amy | Hensley | 17 | 82.96 | Store Pe | false |
> | 1114 | Judy Owens | Judy | Owens | 17 | 24.6 | Store Per | true |
> | 1115 | Frederick Castillo | Frederick | Castillo | 17 | 82.36 | S | false |
> | 1116 | Phil Munoz | Phil | Munoz | 17 | 97.63 | Store Per | false |
> | 1117 | Lori Lightfoot | Lori | Lightfoot | 17 | 39.16 | Store | true |
> | 1 | Kumar | Anil | B | 19 | 45.45 | Store | true |
> | 2 | Kamesh | Bh | Venkata | null | 32.89 | Store | true |
> | 1101 | Steve Eurich | Steve | null | 16 | 23.0 | Store T | true |
> | 1102 | Mary Pierson | Mary | null | 16 | 45.6 | Store T | true |
> | 1103 | Leo Jones | Leo | null | 16 | 85.94 | Store Tem | true |
> | 1104 | Nancy Beatty | Nancy | null | 16 | 97.16 | Store T | false |
> | 1105 | Clara McNight | Clara | null | 16 | 81.25 | Store | true |
> | 1106 | null | Marcella | null | 17 | 67.86 | Stor | false |
> | 1107 | Charlotte Yonce | Charlotte | null | 17 | 52.17 | Stor | true |
> | 1108 | Benjamin Foster | Benjamin | null | 17 | 89.8 | Stor | false |
> | 1109 | John Reed | John | null | 17 | 12.9 | Store Per | false |
> | 1110 | Lynn Kwiatkowski | Lynn | null | 17 | 25.76 | St | true |
> | 1111 | Donald Vann | Donald | null | 17 | 34.86 | Store Per | false |
> | 1112 | null | William | null | null | 79.06 | St | true |
> | 1113 | Amy Hensley | Amy | null | 17 | 82.96 | Store Pe | false |
> | 1114 | Judy Owens | Judy | null | 17 | 24.6 | Store Per | true |
> | 1115 | Frederick Castillo | Frederick | null | 17 | 82.36 | S | false |
> | 1116 | Phil Munoz | Phil | null | 17 | 97.63 | Store Per | false |
> | 1117 | Lori Lightfoot | Lori | null | 17 | 39.16 | Store | true |
> | 1 | Kumar | Anil | null | 19 | 45.45 | Store | true |
> | 2 | Kamesh | Bh | null | null | 32.89 | Store | true |
> +--------------+---------------------+-------------+--------------+--------------+---------+------------+--------+
> 38 rows selected (0.16 seconds)
> But when the first alphabetically ordered file name is renamed to become second, that column ("last_name") does not show:
> 0: jdbc:drill:zk=local> select * from foo;
> +--------------+---------------------+-------------+--------------+---------+------------+--------+
> | employee_id | full_name | first_name | position_id | rating | position | isFTE |
> +--------------+---------------------+-------------+--------------+---------+------------+--------+
> | 1101 | Steve Eurich | Steve | 16 | 23.0 | Store T | true |
> | 1102 | Mary Pierson | Mary | 16 | 45.6 | Store T | true |
> | 1103 | Leo Jones | Leo | 16 | 85.94 | Store Tem | true |
> | 1104 | Nancy Beatty | Nancy | 16 | 97.16 | Store T | false |
> | 1105 | Clara McNight | Clara | 16 | 81.25 | Store | true |
> | 1106 | null | Marcella | 17 | 67.86 | Stor | false |
> | 1107 | Charlotte Yonce | Charlotte | 17 | 52.17 | Stor | true |
> | 1108 | Benjamin Foster | Benjamin | 17 | 89.8 | Stor | false |
> | 1109 | John Reed | John | 17 | 12.9 | Store Per | false |
> | 1110 | Lynn Kwiatkowski | Lynn | 17 | 25.76 | St | true |
> | 1111 | Donald Vann | Donald | 17 | 34.86 | Store Per | false |
> | 1112 | null | William | null | 79.06 | St | true |
> | 1113 | Amy Hensley | Amy | 17 | 82.96 | Store Pe | false |
> | 1114 | Judy Owens | Judy | 17 | 24.6 | Store Per | true |
> | 1115 | Frederick Castillo | Frederick | 17 | 82.36 | S | false |
> | 1116 | Phil Munoz | Phil | 17 | 97.63 | Store Per | false |
> | 1117 | Lori Lightfoot | Lori | 17 | 39.16 | Store | true |
> | 1 | Kumar | Anil | 19 | 45.45 | Store | true |
> | 2 | Kamesh | Bh | null | 32.89 | Store | true |
> | 1101 | Steve Eurich | Steve | 16 | 23.0 | Store T | true |
> | 1102 | Mary Pierson | Mary | 16 | 45.6 | Store T | true |
> | 1103 | Leo Jones | Leo | 16 | 85.94 | Store Tem | true |
> | 1104 | Nancy Beatty | Nancy | 16 | 97.16 | Store T | false |
> | 1105 | Clara McNight | Clara | 16 | 81.25 | Store | true |
> | 1106 | null | Marcella | 17 | 67.86 | Stor | false |
> | 1107 | Charlotte Yonce | Charlotte | 17 | 52.17 | Stor | true |
> | 1108 | Benjamin Foster | Benjamin | 17 | 89.8 | Stor | false |
> | 1109 | John Reed | John | 17 | 12.9 | Store Per | false |
> | 1110 | Lynn Kwiatkowski | Lynn | 17 | 25.76 | St | true |
> | 1111 | Donald Vann | Donald | 17 | 34.86 | Store Per | false |
> | 1112 | null | William | null | 79.06 | St | true |
> | 1113 | Amy Hensley | Amy | 17 | 82.96 | Store Pe | false |
> | 1114 | Judy Owens | Judy | 17 | 24.6 | Store Per | true |
> | 1115 | Frederick Castillo | Frederick | 17 | 82.36 | S | false |
> | 1116 | Phil Munoz | Phil | 17 | 97.63 | Store Per | false |
> | 1117 | Lori Lightfoot | Lori | 17 | 39.16 | Store | true |
> | 1 | Kumar | Anil | 19 | 45.45 | Store | true |
> | 2 | Kamesh | Bh | null | 32.89 | Store | true |
> +--------------+---------------------+-------------+--------------+---------+------------+--------+
> 38 rows selected (0.261 seconds)
> But if requested explicitly, the column does show:
> 0: jdbc:drill:zk=local> select last_name from `drill/data/emp`;
> +--------------+
> | last_name |
> +--------------+
> | null |
> | null |
> | null |
> | null |
> | null |
> | null |
> | null |
> | null |
> | null |
> | null |
> | null |
> | null |
> | null |
> | null |
> | null |
> | null |
> | null |
> | null |
> | null |
> | Eurich |
> | Pierson |
> | Jones |
> | Beatty |
> | McNight |
> | Isaacs |
> | Yonce |
> | Foster |
> | Reed |
> | Kwiatkowski |
> | Vann |
> | Smith |
> | Hensley |
> | Owens |
> | Castillo |
> | Munoz |
> | Lightfoot |
> | B |
> | Venkata |
> +--------------+
> 38 rows selected (0.159 seconds)
> Things get even WORSE when a parallel plan is chosen -- some column data shows up under the wrong columns:
> 0: jdbc:drill:zk=local> alter session set planner.slice_target = 1;
> +-------+--------------------------------+
> | ok | summary |
> +-------+--------------------------------+
> | true | planner.slice_target updated. |
> +-------+--------------------------------+
> 1 row selected (0.084 seconds)
> 0: jdbc:drill:zk=local> select * from `drill/data/emp`;
> +--------------+---------------------+-------------+--------------+---------+------------+------------+
> | employee_id | full_name | first_name | position_id | rating | position | isFTE |
> +--------------+---------------------+-------------+--------------+---------+------------+------------+
> | 1101 | Steve Eurich | Steve | 16 | 23.0 | Store T | true |
> | 1102 | Mary Pierson | Mary | 16 | 45.6 | Store T | true |
> | 1103 | Leo Jones | Leo | 16 | 85.94 | Store Tem | true |
> | 1104 | Nancy Beatty | Nancy | 16 | 97.16 | Store T | false |
> | 1105 | Clara McNight | Clara | 16 | 81.25 | Store | true |
> | 1106 | null | Marcella | 17 | 67.86 | Stor | false |
> | 1107 | Charlotte Yonce | Charlotte | 17 | 52.17 | Stor | true |
> | 1108 | Benjamin Foster | Benjamin | 17 | 89.8 | Stor | false |
> | 1109 | John Reed | John | 17 | 12.9 | Store Per | false |
> | 1110 | Lynn Kwiatkowski | Lynn | 17 | 25.76 | St | true |
> | 1111 | Donald Vann | Donald | 17 | 34.86 | Store Per | false |
> | 1112 | null | William | null | 79.06 | St | true |
> | 1113 | Amy Hensley | Amy | 17 | 82.96 | Store Pe | false |
> | 1114 | Judy Owens | Judy | 17 | 24.6 | Store Per | true |
> | 1115 | Frederick Castillo | Frederick | 17 | 82.36 | S | false |
> | 1116 | Phil Munoz | Phil | 17 | 97.63 | Store Per | false |
> | 1117 | Lori Lightfoot | Lori | 17 | 39.16 | Store | true |
> | 1 | Kumar | Anil | 19 | 45.45 | Store | true |
> | 2 | Kamesh | Bh | null | 32.89 | Store | true |
> | 1101 | Steve Eurich | Steve | Eurich | 16 | 23.0 | Store T |
> | 1102 | Mary Pierson | Mary | Pierson | 16 | 45.6 | Store T |
> | 1103 | Leo Jones | Leo | Jones | 16 | 85.94 | Store Tem |
> | 1104 | Nancy Beatty | Nancy | Beatty | 16 | 97.16 | Store T |
> | 1105 | Clara McNight | Clara | McNight | 16 | 81.25 | Store |
> | 1106 | null | Marcella | Isaacs | 17 | 67.86 | Stor |
> | 1107 | Charlotte Yonce | Charlotte | Yonce | 17 | 52.17 | Stor |
> | 1108 | Benjamin Foster | Benjamin | Foster | 17 | 89.8 | Stor |
> | 1109 | John Reed | John | Reed | 17 | 12.9 | Store Per |
> | 1110 | Lynn Kwiatkowski | Lynn | Kwiatkowski | 17 | 25.76 | St |
> | 1111 | Donald Vann | Donald | Vann | 17 | 34.86 | Store Per |
> | 1112 | null | William | Smith | null | 79.06 | St |
> | 1113 | Amy Hensley | Amy | Hensley | 17 | 82.96 | Store Pe |
> | 1114 | Judy Owens | Judy | Owens | 17 | 24.6 | Store Per |
> | 1115 | Frederick Castillo | Frederick | Castillo | 17 | 82.36 | S |
> | 1116 | Phil Munoz | Phil | Munoz | 17 | 97.63 | Store Per |
> | 1117 | Lori Lightfoot | Lori | Lightfoot | 17 | 39.16 | Store |
> | 1 | Kumar | Anil | B | 19 | 45.45 | Store |
> | 2 | Kamesh | Bh | Venkata | null | 32.89 | Store |
> +--------------+---------------------+-------------+--------------+---------+------------+------------+
> 38 rows selected (0.253 seconds)
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)