You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Mike Beckerle <mb...@apache.org> on 2023/10/10 11:56:09 UTC

Question about Drill internal data representation for Daffodil tree infosets

I am trying to understand the options for populating Drill data from a
Daffodil data parse.

Suppose you have this JSON

{"parent": { "sub1": { "a1":1, "a2":2}, sub2:{"b1":3, "b2":4, "b3":5}}}

or this equivalent XML:

<parent>
  <sub1><a1>1</a1><a2>2</a2></sub1>
  <sub2><b1>3</b1><b2>4</b2><b3>5</b3></sub2>
</parent>

Unlike those texts, Daffodil is going to have a tree data structure where a
parent node contains two child nodes sub1 and sub2, and each of those has
children a1, a2, and b1, b2, b3 respectively.
It's analogous roughly to the DOM tree of the XML, or the tree of nested
JSON map nodes you'd get back from a JSON parse of that text.

In Drill to query the JSON like:

select parent.sub1 from myStructure

gives you back single column containing what seems to be a string like

|        sub1        |
----------------------
| { "a1":1, "a2":2}  |

So, my question is this. Is this actually a string in Drill, (what is the
type of sub1?) or is sub1 actually a Drill data row/map node value with two
node children, that just happens to print out looking like a JSON string?

Thanks for any insight here.

Mike Beckerle
Apache Daffodil PMC | daffodil.apache.org
OGF DFDL Workgroup Co-Chair | www.ogf.org/ogf/doku.php/standards/dfdl/dfdl
Owl Cyber Defense | www.owlcyberdefense.com

Re: Question about Drill internal data representation for Daffodil tree infosets

Posted by Paul Rogers <pa...@gmail.com>.
Mike,

Just to echo Charles, thanks for the work; sounds like you are making good
progress.

The question you asked is tricky. Charles is right, the type of the data
structure is a map. The output you showed appears to be from  the sqlline
tool. If so, then it helps to understand that sqlline "cheats" by
converting maps to strings for display, making it look like you have a
string column.

Also, remember that Drill uses the standard JSON structure internally, just
as you described. However, referencing any column projects it to the top
level. Clients don't understand complex JSON types (maps, arrays, etc.
Sqlline compensates by converting the data to strings for display.

- Paul

On Tue, Oct 10, 2023 at 12:55 PM Charles Givre <cg...@gmail.com> wrote:

> Hi Mike,
> Thanks for all the work you are doing on Drill.
>
> To answer your question, sub1 should be treated as a map in Drill.  You
> can verify this with the following query:
>
> SELECT drillTypeOf(sub1) FROM...
>
> In general, I'm pretty sure that Drill doesn't output strings that look
> like JSON objects unless they actually are complex objects.
>
> Take a look here for data type functions:
> https://drill.apache.org/docs/data-type-functions/
> Best,
> -- C
>
>
> > On Oct 10, 2023, at 7:56 AM, Mike Beckerle <mb...@apache.org> wrote:
> >
> > I am trying to understand the options for populating Drill data from a
> > Daffodil data parse.
> >
> > Suppose you have this JSON
> >
> > {"parent": { "sub1": { "a1":1, "a2":2}, sub2:{"b1":3, "b2":4, "b3":5}}}
> >
> > or this equivalent XML:
> >
> > <parent>
> >  <sub1><a1>1</a1><a2>2</a2></sub1>
> >  <sub2><b1>3</b1><b2>4</b2><b3>5</b3></sub2>
> > </parent>
> >
> > Unlike those texts, Daffodil is going to have a tree data structure
> where a
> > parent node contains two child nodes sub1 and sub2, and each of those has
> > children a1, a2, and b1, b2, b3 respectively.
> > It's analogous roughly to the DOM tree of the XML, or the tree of nested
> > JSON map nodes you'd get back from a JSON parse of that text.
> >
> > In Drill to query the JSON like:
> >
> > select parent.sub1 from myStructure
> >
> > gives you back single column containing what seems to be a string like
> >
> > |        sub1        |
> > ----------------------
> > | { "a1":1, "a2":2}  |
> >
> > So, my question is this. Is this actually a string in Drill, (what is the
> > type of sub1?) or is sub1 actually a Drill data row/map node value with
> two
> > node children, that just happens to print out looking like a JSON string?
> >
> > Thanks for any insight here.
> >
> > Mike Beckerle
> > Apache Daffodil PMC | daffodil.apache.org
> > OGF DFDL Workgroup Co-Chair |
> www.ogf.org/ogf/doku.php/standards/dfdl/dfdl
> > Owl Cyber Defense | www.owlcyberdefense.com
>
>

Re: Question about Drill internal data representation for Daffodil tree infosets

Posted by Charles Givre <cg...@gmail.com>.
Hi Mike, 
Thanks for all the work you are doing on Drill.

To answer your question, sub1 should be treated as a map in Drill.  You can verify this with the following query:

SELECT drillTypeOf(sub1) FROM...

In general, I'm pretty sure that Drill doesn't output strings that look like JSON objects unless they actually are complex objects.

Take a look here for data type functions:  https://drill.apache.org/docs/data-type-functions/
Best,
-- C


> On Oct 10, 2023, at 7:56 AM, Mike Beckerle <mb...@apache.org> wrote:
> 
> I am trying to understand the options for populating Drill data from a
> Daffodil data parse.
> 
> Suppose you have this JSON
> 
> {"parent": { "sub1": { "a1":1, "a2":2}, sub2:{"b1":3, "b2":4, "b3":5}}}
> 
> or this equivalent XML:
> 
> <parent>
>  <sub1><a1>1</a1><a2>2</a2></sub1>
>  <sub2><b1>3</b1><b2>4</b2><b3>5</b3></sub2>
> </parent>
> 
> Unlike those texts, Daffodil is going to have a tree data structure where a
> parent node contains two child nodes sub1 and sub2, and each of those has
> children a1, a2, and b1, b2, b3 respectively.
> It's analogous roughly to the DOM tree of the XML, or the tree of nested
> JSON map nodes you'd get back from a JSON parse of that text.
> 
> In Drill to query the JSON like:
> 
> select parent.sub1 from myStructure
> 
> gives you back single column containing what seems to be a string like
> 
> |        sub1        |
> ----------------------
> | { "a1":1, "a2":2}  |
> 
> So, my question is this. Is this actually a string in Drill, (what is the
> type of sub1?) or is sub1 actually a Drill data row/map node value with two
> node children, that just happens to print out looking like a JSON string?
> 
> Thanks for any insight here.
> 
> Mike Beckerle
> Apache Daffodil PMC | daffodil.apache.org
> OGF DFDL Workgroup Co-Chair | www.ogf.org/ogf/doku.php/standards/dfdl/dfdl
> Owl Cyber Defense | www.owlcyberdefense.com