You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Steven Phillips (JIRA)" <ji...@apache.org> on 2015/10/01 12:25:26 UTC

[jira] [Commented] (DRILL-3229) Create a new EmbeddedVector

    [ https://issues.apache.org/jira/browse/DRILL-3229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14939640#comment-14939640 ] 

Steven Phillips commented on DRILL-3229:
----------------------------------------

i) In this first iteration, Union types will be enabled with an option, and they will be created in Json Reader and Mongo reader automatically if the option is enabled. Everything will be a Union type in this case. A future patch will work on promoting from non-union once it is necessary to promote.
ii) Your understanding is correct. One change from the earlier comment, there is no "bits" vector. The underlying primitive type vectors will have their own "bits" for tracking nulls. The type vector with a value of zero will also indicate null.

Without going into much detail at this point, I can answer the next paragraph of question by saying that this patch will allow reading of any valid json. It also has a more literal representation of the json, e.g. null values will be treated as null, instead of empty maps/lists. The patch also includes functions for inspecting the type of a field, which can be used with case statements to handle the data based on which type it is. Though it may be somewhat cumbersome, with these tools you should be able to run almost any query against dynamic json data. This will generally involve using introspection and case statements to remove the Union types early in the query. Future work will eliminate the need for this in many cases. One notable exception is that flatten is not supported in this initial patch.

> Create a new EmbeddedVector
> ---------------------------
>
>                 Key: DRILL-3229
>                 URL: https://issues.apache.org/jira/browse/DRILL-3229
>             Project: Apache Drill
>          Issue Type: Sub-task
>          Components: Execution - Codegen, Execution - Data Types, Execution - Relational Operators, Functions - Drill
>            Reporter: Jacques Nadeau
>            Assignee: Steven Phillips
>             Fix For: Future
>
>
> Embedded Vector will leverage a binary encoding for holding information about type for each individual field.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)