You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Pritesh Maker (JIRA)" <ji...@apache.org> on 2018/10/17 13:47:00 UTC

[jira] [Updated] (DRILL-4710) Document Drill's JSON processing rules

     [ https://issues.apache.org/jira/browse/DRILL-4710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pritesh Maker updated DRILL-4710:
---------------------------------
    Fix Version/s: Future

> Document Drill's JSON processing rules
> --------------------------------------
>
>                 Key: DRILL-4710
>                 URL: https://issues.apache.org/jira/browse/DRILL-4710
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Documentation
>            Reporter: Paul Rogers
>            Priority: Minor
>             Fix For: Future
>
>
> One of Drill's key benefits is the ability to query JSON-formatted data. Much great work has been done. But, unless someone happens to be a Drill developer, the details of exactly how Drill handles various JSON formats can be hard to find.
> We should document how Drill handles various JSON scenarios.
> * SELECT * (schema inferred)
> * SELECT a, b, c (schema implied by query)
> And various JSON structures:
> * Top-level structure (list of maps. Can we handle an array of maps? A list of scalars?)
> * Changes of the top-level map structure across rows.
> ** New field appears later in the file. (Was {a: 1, b: "s"}, now is {a: 1, b: "s", c: 10}
> ** Fields disappear later in the file
> ** Fields change type
> ** Start of file has many nulls for a field, later in file has non-null values.
> * How Drill handles array fields
> ** Array field is null: { a: [10, 20]}, { a: null }
> ** Array contains nulls: { a: [10, null, 20] }
> ** Array contains single scalar type (number or string)
> ** Array contains multiple scalar types (number and string)
> ** Aray contains structured types (array, map)
> * How Drill handles nested maps
> ** Explicit select: a, b.c, b.d: {a: 1, b: { c: "s", d: 10 }}
> ** Implicit select: *
> ** How data is delivered to Drill client
> ** How data is delivered to JDBC/ODBC clients
> * Size issues
> ** Very large records (what is max size?)
> ** Very large strings
> ** Vary large arrays
> Naming
> * Support for case-sensitive names: { a: 1, A: "foo" }
> The above is legal JSON, but causes problems with the case-insensitive naming rules of Drill
> Along with any other detailed information not covered by the above list.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)