You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Amit Katti (JIRA)" <ji...@apache.org> on 2014/07/25 06:39:38 UTC

[jira] [Created] (DRILL-1195) Querying nested array elements in Json returns only null values

Amit Katti created DRILL-1195:
---------------------------------

             Summary: Querying nested array elements in Json returns only null values
                 Key: DRILL-1195
                 URL: https://issues.apache.org/jira/browse/DRILL-1195
             Project: Apache Drill
          Issue Type: Bug
          Components: Storage - JSON
            Reporter: Amit Katti


If a Json obejct has an array which is present at the 1st(uppermost) level it can be queried in Drill successfully as follows

{code}
{"rownum":1, "arrayval": [ "a1", "a2", "a3" ]}
{"rownum":2, "arrayval": [ "b1", "b2", "b3" ]}
{"rownum":3, "arrayval": [ "c1", "c2", "c3" ]}
{"rownum":4, "arrayval": [ "d1", "d2", "d3" ]}
{"rownum":5, "arrayval": [ "e1", "e2", "e3" ]}

select tbl.arrayval[0] from `nested_working.json` tbl;
+------------+
|   EXPR$0   |
+------------+
| a1         |
| b1         |
| c1         |
| d1         |
| e1         |
+------------+
5 rows selected (0.157 seconds)
{code}

However if the array is present in the 2nd level or below (inner level), directly querying the array's elements throws only null values. 

{code}
{"rownum":1, "a":{"arrayval": [ "a1", "a2", "a3" ]}}
{"rownum":2, "a":{"arrayval": [ "b1", "b2", "b3" ]}}
{"rownum":3, "a":{"arrayval": [ "c1", "c2", "c3" ]}}
{"rownum":4, "a":{"arrayval": [ "d1", "d2", "d3" ]}}
{"rownum":5, "a":{"arrayval": [ "e1", "e2", "e3" ]}}

select tbl.a.arrayval[0] from `nested_failing.json` tbl;
+------------+
|   EXPR$0   |
+------------+
| null       |
| null       |
| null       |
| null       |
| null       |
+------------+
5 rows selected (0.144 seconds)
{code}

Select * as well as selecting the whole array works fine though.

{code}
select tbl.a.arrayval from `/user/root/nested_failing.json` tbl;
+------------+
|   EXPR$0   |
+------------+
| ["a1","a2","a3"] |
| ["b1","b2","b3"] |
| ["c1","c2","c3"] |
| ["d1","d2","d3"] |
| ["e1","e2","e3"] |
+------------+
5 rows selected (0.151 seconds)
{code}

On the other hand if the array's elements are json objects again, we are able to successfully query them. The issue seems to be centric only to single element objects.

{code}
{"rownum":1, "a":{"arrayval": [ {"val1":"a1"}, {"val2":"a2"}, {"val3":"a3"} ]}}
{"rownum":2, "a":{"arrayval": [ {"val1":"b1"}, {"val2":"b2"}, {"val3":"b3"} ]}}
{"rownum":3, "a":{"arrayval": [ {"val1":"c1"}, {"val2":"c2"}, {"val3":"c3"} ]}}
{"rownum":4, "a":{"arrayval": [ {"val1":"d1"}, {"val2":"d2"}, {"val3":"d3"} ]}}
{"rownum":5, "a":{"arrayval": [ {"val1":"e1"}, {"val2":"e2"}, {"val3":"e3"} ]}}

select tbl.a.arrayval[0] from `nested_working2.json` tbl;
+------------+
|   EXPR$0   |
+------------+
| {"val1":"a1"} |
| {"val1":"b1"} |
| {"val1":"c1"} |
| {"val1":"d1"} |
| {"val1":"e1"} |
+------------+
5 rows selected (0.186 seconds)

select tbl.a.arrayval[0].val1 from `nested_working2.json` tbl;
+------------+
|   EXPR$0   |
+------------+
| a1         |
| b1         |
| c1         |
| d1         |
| e1         |
+------------+
5 rows selected (0.157 seconds)
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)