You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Amit Katti (JIRA)" <ji...@apache.org> on 2014/07/25 07:58:38 UTC

[jira] [Commented] (DRILL-1151) JSON nested data returns empty for all rows except the first when using selections

    [ https://issues.apache.org/jira/browse/DRILL-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14074113#comment-14074113 ] 

Amit Katti commented on DRILL-1151:
-----------------------------------

Managed to narrow down the cause of the Issue:
=========================================
While querying a nested Json which has an array of Json objects, if the query has a LIMIT keyword, all rows after the 1st one return an empty array

{code}
{"rownum":1, "arrayval": [ {"val1":"a1"}, {"val2":"a2"}, {"val3":"a3"} ]}
{"rownum":2, "arrayval": [ {"val1":"b1"}, {"val2":"b2"}, {"val3":"b3"} ]}
{"rownum":3, "arrayval": [ {"val1":"c1"}, {"val2":"c2"}, {"val3":"c3"} ]}
{"rownum":4, "arrayval": [ {"val1":"d1"}, {"val2":"d2"}, {"val3":"d3"} ]}
{"rownum":5, "arrayval": [ {"val1":"e1"}, {"val2":"e2"}, {"val3":"e3"} ]}

select * from `failing.json`;
+------------+------------+
|   rownum   |  arrayval  |
+------------+------------+
| 1          | [{"val1":"a1"},{"val2":"a2"},{"val3":"a3"}] |
| 2          | [{"val1":"b1"},{"val2":"b2"},{"val3":"b3"}] |
| 3          | [{"val1":"c1"},{"val2":"c2"},{"val3":"c3"}] |
| 4          | [{"val1":"d1"},{"val2":"d2"},{"val3":"d3"}] |
| 5          | [{"val1":"e1"},{"val2":"e2"},{"val3":"e3"}] |
+------------+------------+
5 rows selected (0.205 seconds)

select * from `failing.json` limit 3;
+------------+------------+
|   rownum   |  arrayval  |
+------------+------------+
| 1          | [{"val1":"a1"},{"val2":"a2"},{"val3":"a3"}] |
| 2          | []         |
| 3          | []         |
+------------+------------+
3 rows selected (0.137 seconds)
{code}

However if the array's element are simple values (primitive) instead of Json objects, the query with LIMIT works perfectly fine.

{code}
{"rownum":1, "arrayval": [ "a1", "a2", "a3"]}
{"rownum":2, "arrayval": [ "b1", "b2", "b3"]}
{"rownum":3, "arrayval": [ "c1", "c2", "c3"]}
{"rownum":4, "arrayval": [ "d1", "d2", "d3"]}
{"rownum":5, "arrayval": [ "e1", "e2", "e3"]}

select * from `working.json`;
+------------+------------+
|   rownum   |  arrayval  |
+------------+------------+
| 1          | ["a1","a2","a3"] |
| 2          | ["b1","b2","b3"] |
| 3          | ["c1","c2","c3"] |
| 4          | ["d1","d2","d3"] |
| 5          | ["e1","e2","e3"] |
+------------+------------+
5 rows selected (0.145 seconds)

select * from `working.json` limit 3;
+------------+------------+
|   rownum   |  arrayval  |
+------------+------------+
| 1          | ["a1","a2","a3"] |
| 2          | ["b1","b2","b3"] |
| 3          | ["c1","c2","c3"] |
+------------+------------+
3 rows selected (0.141 seconds)
{code}

> JSON nested data returns empty for all rows except the first when using selections
> ----------------------------------------------------------------------------------
>
>                 Key: DRILL-1151
>                 URL: https://issues.apache.org/jira/browse/DRILL-1151
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Norris Lee
>            Assignee: Jinfeng Ni
>            Priority: Critical
>
> Without the limit clause, everything gets returned:
> {code}
> 0: jdbc:drill:zk=localhost:5181> SELECT * FROM `dfs`.`optdrill`.`./raw-files/donuts/moredonuts.json`;
> +------------+------------+------------+------------+------------+------------+------------+------------+
> |     id     |    type    |    name    |    ppu     |   sales    |  batters   |  topping   |  filling   |
> +------------+------------+------------+------------+------------+------------+------------+------------+
> | 0001       | donut      | Cake       | 0.55       | 35         | {"batter":[{"id":"1001","type":"Regular"},{"id":"1002","type":"Chocolate"},{"id":"100 |
> | 0002       | donut      | Raised     | 0.69       | 145        | {"batter":[{"id":"1001","type":"Regular"}]} | [{"id":"5001","type":"None"},{"id":"500 |
> | 0003       | donut      | Old Fashioned | 0.55       | 300        | {"batter":[{"id":"1001","type":"Regular"},{"id":"1002","type":"Chocolate"}]} | [{" |
> | 0004       | donut      | Filled     | 0.69       | 14         | {"batter":[{"id":"1001","type":"Regular"},{"id":"1002","type":"Chocolate"},{"id":"100 |
> | 0005       | donut      | Apple Fritter | 1.0        | 700        | {"batter":[{"id":"1001","type":"Regular"}]} | [{"id":"5002","type":"Glazed"}] | [] |
> +------------+------------+------------+------------+------------+------------+------------+------------+
> 5 rows selected (0.785 seconds)
> {code}
> Using the limit clause, the batters column returns [] for rows 2-5:
> {code}
> dbc:drill:zk=localhost:5181> SELECT * FROM `dfs`.`optdrill`.`./raw-files/donuts/moredonuts.json` LIMIT 100;
> +------------+------------+------------+------------+------------+------------+------------+------------+
> |     id     |    type    |    name    |    ppu     |   sales    |  batters   |  topping   |  filling   |
> +------------+------------+------------+------------+------------+------------+------------+------------+
> | 0001       | donut      | Cake       | 0.55       | 35         | {"batter":[{"id":"1001","type":"Regular"},{"id":"1002","type":"Chocolate"},{"id":"100 |
> | 0002       | donut      | Raised     | 0.69       | 145        | {"batter":[]} | []         | []         |
> | 0003       | donut      | Old Fashioned | 0.55       | 300        | {"batter":[]} | []         | []         |
> | 0004       | donut      | Filled     | 0.69       | 14         | {"batter":[]} | []         | []         |
> | 0005       | donut      | Apple Fritter | 1.0        | 700        | {"batter":[]} | []         | []         |
> +------------+------------+------------+------------+------------+------------+------------+------------+
> 5 rows selected (0.543 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)