You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2018/10/17 17:37:00 UTC
[jira] [Commented] (DRILL-4710) Document Drill's JSON processing rules

    [ https://issues.apache.org/jira/browse/DRILL-4710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16653930#comment-16653930 ] 

Paul Rogers commented on DRILL-4710:
------------------------------------

As it turns out, the learnings here went into the "Data Engineering" chapter in the upcoming O'Reilly book "Learning Apache Drill." So, even if not documented on the Drill website, it is documented in the book.

> Document Drill's JSON processing rules
> --------------------------------------
>
>                 Key: DRILL-4710
>                 URL: https://issues.apache.org/jira/browse/DRILL-4710
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Documentation
>            Reporter: Paul Rogers
>            Priority: Minor
>             Fix For: Future
>
>
> One of Drill's key benefits is the ability to query JSON-formatted data. Much great work has been done. But, unless someone happens to be a Drill developer, the details of exactly how Drill handles various JSON formats can be hard to find.
> We should document how Drill handles various JSON scenarios.
> * SELECT * (schema inferred)
> * SELECT a, b, c (schema implied by query)
> And various JSON structures:
> * Top-level structure (list of maps. Can we handle an array of maps? A list of scalars?)
> * Changes of the top-level map structure across rows.
> ** New field appears later in the file. (Was {a: 1, b: "s"}, now is {a: 1, b: "s", c: 10}
> ** Fields disappear later in the file
> ** Fields change type
> ** Start of file has many nulls for a field, later in file has non-null values.
> * How Drill handles array fields
> ** Array field is null: { a: [10, 20]}, { a: null }
> ** Array contains nulls: { a: [10, null, 20] }
> ** Array contains single scalar type (number or string)
> ** Array contains multiple scalar types (number and string)
> ** Aray contains structured types (array, map)
> * How Drill handles nested maps
> ** Explicit select: a, b.c, b.d: {a: 1, b: { c: "s", d: 10 }}
> ** Implicit select: *
> ** How data is delivered to Drill client
> ** How data is delivered to JDBC/ODBC clients
> * Size issues
> ** Very large records (what is max size?)
> ** Very large strings
> ** Vary large arrays
> Naming
> * Support for case-sensitive names: { a: 1, A: "foo" }
> The above is legal JSON, but causes problems with the case-insensitive naming rules of Drill
> Along with any other detailed information not covered by the above list.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)