You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2017/11/26 18:18:00 UTC

[jira] [Commented] (DRILL-4824) Null maps / lists and non-provided state support for JSON fields. Numeric types promotion.

    [ https://issues.apache.org/jira/browse/DRILL-4824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16266122#comment-16266122 ] 

Paul Rogers commented on DRILL-4824:
------------------------------------

Recently had reason to review Drill's existing List and Union support. Seems that the original Drill designers intended those to satisfy the use case outlined in this JIRA.

A Union can contain any combination of:

* Null value
* One or more Drill types

Thus, a union of a map provides a nullable map.

(The union does go on to allow, say, either a map or a Varchar, or or a map or a list, or any other combination. So, a union by itself may be overkill to achieve just a nullable map.)

A list provides:

* An array of any type (including unions)
* A null flag per row.

Thus, with a list, one can have lists with empty or null values. For example, with repeated types, Drill can represent only {{\[1, 2, 3]}} but lists can also support {{\[1, null, 3]}}.

The drawback seems to be that list and union support are limited in Drill. Unions have their own issues. (See DRILL-5955.)

But, if there is a JSON use case for nulls in maps in arrays, lists are the existing solution. Since they exist in the product, none of the compatibility issues discussed above apply: clients already understand lists (though we'd have to check if the ODBC and JDBC clients correctly handle them.)

Our effort would then turn to the many operators that do not currently support lists.

> Null maps / lists and non-provided state support for JSON fields. Numeric types promotion.
> ------------------------------------------------------------------------------------------
>
>                 Key: DRILL-4824
>                 URL: https://issues.apache.org/jira/browse/DRILL-4824
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Storage - JSON
>    Affects Versions: 1.0.0
>            Reporter: Roman Kulyk
>            Assignee: Volodymyr Vysotskyi
>
> There is incorrect output in case of JSON file with complex nested data.
> _JSON:_
> {code:none|title=example.json|borderStyle=solid}
> {
>         "Field1" : {
>         }
> }
> {
>         "Field1" : {
>                 "InnerField1": {"key1":"value1"},
>                 "InnerField2": {"key2":"value2"}
>         }
> }
> {
>         "Field1" : {
>                 "InnerField3" : ["value3", "value4"],
>                 "InnerField4" : ["value5", "value6"]
>         }
> }
> {code}
> _Query:_
> {code:sql}
> select Field1 from dfs.`/tmp/example.json`
> {code}
> _Incorrect result:_
> {code:none}
> +---------------------------+
> |          Field1           |
> +---------------------------+
> {"InnerField1":{},"InnerField2":{},"InnerField3":[],"InnerField4":[]}
> {"InnerField1":{"key1":"value1"},"InnerField2" {"key2":"value2"},"InnerField3":[],"InnerField4":[]}
> {"InnerField1":{},"InnerField2":{},"InnerField3":["value3","value4"],"InnerField4":["value5","value6"]}
> +--------------------------+
> {code}
> Theres is no need to output missing fields. In case of deeply nested structure we will get unreadable result for user.
> _Correct result:_
> {code:none}
> +--------------------------+
> |         Field1           |
> +--------------------------+
> |{}                                                                     
> {"InnerField1":{"key1":"value1"},"InnerField2":{"key2":"value2"}}
> {"InnerField3":["value3","value4"],"InnerField4":["value5","value6"]}
> +--------------------------+
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)