You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by "benj (Jira)" <ji...@apache.org> on 2021/06/22 09:46:00 UTC

[jira] [Created] (DRILL-7954) XML ability to not concatenate fields and attribute - change presentation of data

benj created DRILL-7954:
---------------------------

             Summary: XML ability to not concatenate fields and attribute - change presentation of data
                 Key: DRILL-7954
                 URL: https://issues.apache.org/jira/browse/DRILL-7954
             Project: Apache Drill
          Issue Type: Improvement
    Affects Versions: 1.19.0
            Reporter: benj


With a XML containing these data :
{noformat}
<a>
  <attr>
    <set num="0" val="1">x</set>
    <set num="1" val="2">y</set>
  </attr>
  <attr>
    <set num="2" val="a">z</set>
    <set num="3" val="b">a</set>
  </attr>
</a>
{noformat}

{noformat}
apache drill> SELECT * FROM TABLE(dfs.test.`attributetest.xml`(type=>'xml', dataLevel=>1)) as x;
+-----------------------------------------------+----------------+
|                  attributes                   |      attr      |
+-----------------------------------------------+----------------+
| {"attr_set_num":"0123","attr_set_val":"12ab"} | {"set":"xyza"} |
+-----------------------------------------------+----------------+

SELECT * FROM TABLE(dfs.test.`attributetest.xml`(type=>'xml', dataLevel=>2)) as x;
+---------------------------------+-----+
|           attributes            | set |
+---------------------------------+-----+
| {"set_num":"01","set_val":"12"} | xy  |
| {"set_num":"23","set_val":"ab"} | za  |
+---------------------------------+-----+

apache drill> SELECT * FROM TABLE(dfs.test.`attributetest.xml`(type=>'xml', dataLevel=>3)) as x;
+------------+
| attributes |
+------------+
| {}         |
| {}         |
| {}         |
| {}         |
+------------+
{noformat}

Attributes and fields with the same name are concatenated and remains inexploitable _(maybe the posibility of adding separator should help but it's not the point here)_

In fact that we really need is the ability to obtain something like _(depending of the defining level)_ :
{noformat}
+----------------------------------------------------------------------------------+
|                                       attr                                       |
+----------------------------------------------------------------------------------+
| [{"set":"x","_attributes":{"num":"0","val":"1"}},{"set":"y","_attributes":{"num":"1","val":"2"}}] |
| [{"set":"z","_attributes":{"num":"2","val":"a"}},{"set":"a","_attributes":{"num":"3","val":"b"}}] |
+----------------------------------------------------------------------------------+

+------------------------------------------------+
|                      set                       |
+------------------------------------------------+
| {"set":"x","_attributes":{"num":"0","val":"1"}} |
| {"set":"y","_attributes":{"num":"1","val":"2"}} |
| {"set":"z","_attributes":{"num":"2","val":"a"}} |
| {"set":"a","_attributes":{"num":"3","val":"b"}} |
+------------------------------------------------+
{noformat}
_attributes fields could be generated on each level instead of generated with path from top level => that will allow to work with data from each level without losing information





--
This message was sent by Atlassian Jira
(v8.3.4#803005)