You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Paul Rogers (Jira)" <ji...@apache.org> on 2020/02/24 00:44:00 UTC

[jira] [Created] (DRILL-7597) Read selected JSON colums as JSON text

Paul Rogers created DRILL-7597:
----------------------------------

             Summary: Read selected JSON colums as JSON text
                 Key: DRILL-7597
                 URL: https://issues.apache.org/jira/browse/DRILL-7597
             Project: Apache Drill
          Issue Type: Improvement
    Affects Versions: 1.17.0
            Reporter: Paul Rogers
            Assignee: Paul Rogers
             Fix For: 1.18.0


See . The use case wishes to read selected JSON columns as JSON text rather than parsing the JSON into a relational structure as is done today in the JSON reader.

The JSON reader supports "all text mode", but, despite the name, this mode only works for scalars (primitives) such as numbers. It does not work for structured types such as objects or arrays: such types are always parsed into Drill structures (which causes the conflict describe in __.)

Instead, we need a feature to read an entire JSON value, including structure, as a JSON string.

This feature would work best when the user can parse some parts of a JSON input file into relational structure, others as JSON. (This is the use case which the user list user faced.) So, we need a way to do that.

Drill has a "provided schema" feature, which, at present, is used only for text files (and recently with limited support in Avro.) We are working on a project to add such support for JSON.

Perhaps we can leverage this feature to allow the JSON reader to read chunks of JSON as text which can be manipulated by those future JSON functions. In the example, column "c" would be read as JSON text; Drill would not attempt to parse it into a relational structure.

As it turns out, the "new" JSON reader we're working on originally had a feature to do just that, but we took it out because we were not sure it was needed. Sounds like we should restore it as part of our "provided schema" support. It could work this way: if you CREATE SCHEMA with column "c" as VARCHAR (maybe with a hint to read as JSON), the JSON parser would read the entire nested structure as JSON without trying to parse it into a relational structure.

This ticket asks to build the concept:

* Allow a `CREATE SCHEMA` option (to be designed) to designate a JSON field to be read as JSON.
* Implement the "read column as JSON" feature in the new EVF-based JSON reader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)