You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2016/06/06 18:59:21 UTC
[jira] [Updated] (DRILL-4710) Document Drill's JSON processing rules

     [ https://issues.apache.org/jira/browse/DRILL-4710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Paul Rogers updated DRILL-4710:
-------------------------------
    Description: 
One of Drill's key benefits is the ability to query JSON-formatted data. Much great work has been done. But, unless someone happens to be a Drill developer, the details of exactly how Drill handles various JSON formats can be hard to find.

We should document how Drill handles various JSON scenarios.

* SELECT * (schema inferred)
* SELECT a, b, c (schema implied by query)

And various JSON structures:

* Top-level structure (list of maps. Can we handle an array of maps? A list of scalars?)
* Changes of the top-level map structure across rows.
** New field appears later in the file. (Was {a: 1, b: "s"}, now is {a: 1, b: "s", c: 10}
** Fields disappear later in the file
** Fields change type
** Start of file has many nulls for a field, later in file has non-null values.
* How Drill handles array fields
** Array field is null: { a: [10, 20]}, { a: null }
** Array contains nulls: { a: [10, null, 20] }
** Array contains single scalar type (number or string)
** Array contains multiple scalar types (number and string)
** Aray contains structured types (array, map)
* How Drill handles nested maps
** Explicit select: a, b.c, b.d: {a: 1, b: { c: "s", d: 10 }}
** Implicit select: *
** How data is delivered to Drill client
** How data is delivered to JDBC/ODBC clients
* Size issues
** Very large records (what is max size?)
** Very large strings
** Vary large arrays

Naming
* Support for case-sensitive names: { a: 1, A: "foo" }

The above is legal JSON, but causes problems with the case-insensitive naming rules of Drill

Along with any other detailed information not covered by the above list.

  was:
One of Drill's key benefits is the ability to query JSON-formatted data. Much great work has been done. But, unless someone happens to be a Drill developer, the details of exactly how Drill handles various JSON formats can be hard to find.

We should document how Drill handles various JSON scenarios.

* SELECT * (schema inferred)
* SELECT a, b, c (schema implied by query)

And various JSON structures:

* Top-level structure (list of maps. Can we handle an array of maps? A list of scalars?)
* Changes of the top-level map structure across rows.
** New field appears later in the file. (Was {a: 1, b: "s"}, now is {a: 1, b: "s", c: 10}
** Fields disappear later in the file
** Fields change type
** Start of file has many nulls for a field, later in file has non-null values.
* How Drill handles array fields
** Array field is null: { a: [10, 20]}, { a: null }
** Array contains nulls: { a: [10, null, 20] }
** Array contains single scalar type (number or string)
** Array contains multiple scalar types (number and string)
** Aray contains structured types (array, map)
* How Drill handles nested maps
** Explicit select: a, b.c, b.d: {a: 1, b: { c: "s", d: 10 }}
** Implicit select: *
** How data is delivered to Drill client
** How data is delivered to JDBC/ODBC clients
* Size issues
** Very large records (what is max size?)
** Very large strings
** Vary large arrays

Along with any other detailed information not covered by the above list.


> Document Drill's JSON processing rules
> --------------------------------------
>
>                 Key: DRILL-4710
>                 URL: https://issues.apache.org/jira/browse/DRILL-4710
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Documentation
>            Reporter: Paul Rogers
>            Priority: Minor
>
> One of Drill's key benefits is the ability to query JSON-formatted data. Much great work has been done. But, unless someone happens to be a Drill developer, the details of exactly how Drill handles various JSON formats can be hard to find.
> We should document how Drill handles various JSON scenarios.
> * SELECT * (schema inferred)
> * SELECT a, b, c (schema implied by query)
> And various JSON structures:
> * Top-level structure (list of maps. Can we handle an array of maps? A list of scalars?)
> * Changes of the top-level map structure across rows.
> ** New field appears later in the file. (Was {a: 1, b: "s"}, now is {a: 1, b: "s", c: 10}
> ** Fields disappear later in the file
> ** Fields change type
> ** Start of file has many nulls for a field, later in file has non-null values.
> * How Drill handles array fields
> ** Array field is null: { a: [10, 20]}, { a: null }
> ** Array contains nulls: { a: [10, null, 20] }
> ** Array contains single scalar type (number or string)
> ** Array contains multiple scalar types (number and string)
> ** Aray contains structured types (array, map)
> * How Drill handles nested maps
> ** Explicit select: a, b.c, b.d: {a: 1, b: { c: "s", d: 10 }}
> ** Implicit select: *
> ** How data is delivered to Drill client
> ** How data is delivered to JDBC/ODBC clients
> * Size issues
> ** Very large records (what is max size?)
> ** Very large strings
> ** Vary large arrays
> Naming
> * Support for case-sensitive names: { a: 1, A: "foo" }
> The above is legal JSON, but causes problems with the case-insensitive naming rules of Drill
> Along with any other detailed information not covered by the above list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)