You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by "benj (Jira)" <ji...@apache.org> on 2021/06/22 09:46:00 UTC
[jira] [Created] (DRILL-7954) XML ability to not concatenate fields
and attribute - change presentation of data
benj created DRILL-7954:
---------------------------
Summary: XML ability to not concatenate fields and attribute - change presentation of data
Key: DRILL-7954
URL: https://issues.apache.org/jira/browse/DRILL-7954
Project: Apache Drill
Issue Type: Improvement
Affects Versions: 1.19.0
Reporter: benj
With a XML containing these data :
{noformat}
<a>
<attr>
<set num="0" val="1">x</set>
<set num="1" val="2">y</set>
</attr>
<attr>
<set num="2" val="a">z</set>
<set num="3" val="b">a</set>
</attr>
</a>
{noformat}
{noformat}
apache drill> SELECT * FROM TABLE(dfs.test.`attributetest.xml`(type=>'xml', dataLevel=>1)) as x;
+-----------------------------------------------+----------------+
| attributes | attr |
+-----------------------------------------------+----------------+
| {"attr_set_num":"0123","attr_set_val":"12ab"} | {"set":"xyza"} |
+-----------------------------------------------+----------------+
SELECT * FROM TABLE(dfs.test.`attributetest.xml`(type=>'xml', dataLevel=>2)) as x;
+---------------------------------+-----+
| attributes | set |
+---------------------------------+-----+
| {"set_num":"01","set_val":"12"} | xy |
| {"set_num":"23","set_val":"ab"} | za |
+---------------------------------+-----+
apache drill> SELECT * FROM TABLE(dfs.test.`attributetest.xml`(type=>'xml', dataLevel=>3)) as x;
+------------+
| attributes |
+------------+
| {} |
| {} |
| {} |
| {} |
+------------+
{noformat}
Attributes and fields with the same name are concatenated and remains inexploitable _(maybe the posibility of adding separator should help but it's not the point here)_
In fact that we really need is the ability to obtain something like _(depending of the defining level)_ :
{noformat}
+----------------------------------------------------------------------------------+
| attr |
+----------------------------------------------------------------------------------+
| [{"set":"x","_attributes":{"num":"0","val":"1"}},{"set":"y","_attributes":{"num":"1","val":"2"}}] |
| [{"set":"z","_attributes":{"num":"2","val":"a"}},{"set":"a","_attributes":{"num":"3","val":"b"}}] |
+----------------------------------------------------------------------------------+
+------------------------------------------------+
| set |
+------------------------------------------------+
| {"set":"x","_attributes":{"num":"0","val":"1"}} |
| {"set":"y","_attributes":{"num":"1","val":"2"}} |
| {"set":"z","_attributes":{"num":"2","val":"a"}} |
| {"set":"a","_attributes":{"num":"3","val":"b"}} |
+------------------------------------------------+
{noformat}
_attributes fields could be generated on each level instead of generated with path from top level => that will allow to work with data from each level without losing information
--
This message was sent by Atlassian Jira
(v8.3.4#803005)