You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Jiang Wu (JIRA)" <ji...@apache.org> on 2016/03/11 19:21:39 UTC
[jira] [Created] (DRILL-4498) Projecting a map key within an array
produces incorrect results
Jiang Wu created DRILL-4498:
-------------------------------
Summary: Projecting a map key within an array produces incorrect results
Key: DRILL-4498
URL: https://issues.apache.org/jira/browse/DRILL-4498
Project: Apache Drill
Issue Type: Bug
Components: Execution - Data Types
Affects Versions: 1.4.0
Reporter: Jiang Wu
To reproduce:
1) place the following 3 JSON objects in a file:
{noformat}
{"r":1,"c1":[{"c2":1,"c3":"a"},{"c2":2,"c3":"b"},{"c2":3,"c3":"c"}]}
{"r":2,"c1":[{"c2":4,"c3":"d"}]}
{"r":3,"c1":[{"c2":5,"c3":"e"},{"c2":6,"c3":"f"},{"c2":7,"c3":"g"}]}
{noformat}
2) Run query:
{noformat}
select t.r, t.c1.c2 from dfs.`c:\tmp\data.json` t;
+----+---------+
| r | EXPR$1 |
+----+---------+
| 1 | 1 |
| 2 | 2 | <-- not OK
| 3 | 3 | <-- not OK
+----+---------+
{noformat}
3) The above results are incorrect. The returned values for "c1.c2" are not correlated with the values from r after the first row. Expecting the result contains information for r = 1 has 3 values for c1.c2: 1, 2, and 3.
For example, the same conceptual query in MongoDB, returns the proper information:
{noformat}
> db.t.find({}, {"r":1, "c1.c2":1}):
{"r":1,"c1":[{"c2":1},{"c2":2},{"c2":3}]}
{"r":2,"c1":[{"c2":4}]}
{"r":3,"c1":[{"c2":5},{"c2":6},{"c2":7}]}
{noformat}
For Drill, the same information can be returned, even if it is differently formatted in a more relational style. For example:
{noformat}
select t.r, t.c1.c2 from dfs.`c:\tmp\data.json` t;
+----+-----------+
| r | EXPR$1 |
+----+-----------+
| 1 | [1, 2, 3] |
| 2 | [4] |
| 3 | [5, 6, 7] |
+----+-----------+
{noformat}
Or choose some other formatting for the output.
Returning an array of value can be an important use case to support operations such as forming a single string of comma separated value "1, 2, 3" without going through flatten and then re-aggregate, or predicates such as "where ... xyz in c1.c2 ..."
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)