You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/08/09 23:56:00 UTC

[jira] [Commented] (DRILL-6676) Add Union, List and Repeated List types to Result Set Loader

    [ https://issues.apache.org/jira/browse/DRILL-6676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16575586#comment-16575586 ] 

ASF GitHub Bot commented on DRILL-6676:
---------------------------------------

paul-rogers opened a new pull request #1429: DRILL-6676: Add Union, List and Repeated List types to Result Set Loader
URL: https://github.com/apache/drill/pull/1429
 
 
   Previous commits provided the core "result set loader" (RSL) structure and support for the "mainstream" vector types, including structured types such as maps and lists.
   
   This PR adds the "obscure" (and partly implemented) types used for JSON: (non-repeated) list, repeated list and union.
   
   The union type is complex: it is a bundle of vectors keyed by type, and can accept new types as a run proceeds. A (non-repeated) list is highly complex: it it can act like a repeated list, but with the ability to specify a null state for each entry. The non-repeated List can also act like a union type. This dual/morphing nature of a list required some rather complex magic behind the scenes to support the simple JSON-like interface used by the row set and result set loader mechanisms.
   
   This PR introduces the idea of a "variant" to model unions and non-repeated-lists-as-list-of-unions. The name is taken from Microsoft Basic and simply means a tagged union. (Where "union" is taken from "C".)
   
   Changes include fixing a number of issues with the list vectors, adding support in the column accessors and metadata layers, and adding support for creating vectors from metadata and metadata from vectors.
   
   Unit tests demonstrate how to use the resulting behavior as well as verifying that the behavior is correct.
   
   The focus of this PR is to enable union, list and repeated list support in the RSL and associated mechanisms. It is known that support of these vector types is incomplete: some operators fail when presented with such vectors. It is not the goal here to fix those issues: this is not a PR to fully support these types. Rather, the the scope of this PR is just to the RSL and associated classes.
   
   For more information, see [this wiki entry](https://github.com/paul-rogers/drill/wiki/Batch-Handling-Upgrades).
   
   This PR completes the result set loader work. The next PR in this series will introduce revisions to the scan operator that allow readers to use the RSL. After that, there are revised implementations for the delimited text (e.g. CSV) and JSON readers.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Add Union, List and Repeated List types to Result Set Loader
> ------------------------------------------------------------
>
>                 Key: DRILL-6676
>                 URL: https://issues.apache.org/jira/browse/DRILL-6676
>             Project: Apache Drill
>          Issue Type: Improvement
>    Affects Versions: 1.15.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Major
>             Fix For: 1.15.0
>
>
> Add support for the "obscure" vector types to the {{ResultSetLoader}}:
> * Union
> * List
> * Repeated List



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)