You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2016/06/06 18:02:21 UTC

[jira] [Created] (DRILL-4710) Document Drill's JSON processing rules

Paul Rogers created DRILL-4710:
----------------------------------

             Summary: Document Drill's JSON processing rules
                 Key: DRILL-4710
                 URL: https://issues.apache.org/jira/browse/DRILL-4710
             Project: Apache Drill
          Issue Type: Improvement
          Components: Documentation
            Reporter: Paul Rogers
            Priority: Minor


One of Drill's key benefits is the ability to query JSON-formatted data. Much great work has been done. But, unless someone happens to be a Drill developer, the details of exactly how Drill handles various JSON formats can be hard to find.

We should document how Drill handles various JSON scenarios.

* SELECT * (schema inferred)
* SELECT a, b, c (schema implied by query)

And various JSON structures:

* Top-level structure (list of maps. Can we handle an array of maps? A list of scalars?)
* Changes of the top-level map structure across rows.
** New field appears later in the file. (Was {a: 1, b: "s"}, now is {a: 1, b: "s", c: 10}
** Fields disappear later in the file
** Fields change type
** Start of file has many nulls for a field, later in file has non-null values.
* How Drill handles array fields
** Array field is null: { a: [10, 20]}, { a: null }
** Array contains nulls: { a: [10, null, 20] }
** Array contains single scalar type (number or string)
** Array contains multiple scalar types (number and string)
** Aray contains structured types (array, map)
* How Drill handles nested maps
** Explicit select: a, b.c, b.d: {a: 1, b: { c: "s", d: 10 }}
** Implicit select: *
** How data is delivered to Drill client
** How data is delivered to JDBC/ODBC clients
* Size issues
** Very large records (what is max size?)
** Very large strings
** Vary large arrays

Along with any other detailed information not covered by the above list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)