You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@drill.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2017/05/15 22:13:04 UTC

[jira] [Created] (DRILL-5514) Enhance VectorContainer to merge two row sets

Paul Rogers created DRILL-5514:
----------------------------------

             Summary: Enhance VectorContainer to merge two row sets
                 Key: DRILL-5514
                 URL: https://issues.apache.org/jira/browse/DRILL-5514
             Project: Apache Drill
          Issue Type: Improvement
    Affects Versions: 1.10.0
            Reporter: Paul Rogers
            Assignee: Paul Rogers
            Priority: Minor
             Fix For: 1.11.0


Consider the concept of a "record batch" in Drill. On the one hand, one can envision a record batch as a stack of records:

{code}
| a1 | b1 | c1 |
----------------
| a2 | b2 | c2 |
{code}

But, Drill is columnar. So a record batch is really a "bundle" of vectors:

{code}
| a1 |    | b1 |    | c1 |
| a2 |    | b2 |    | c2 |
{code}

There are times when it is handy to build up a record batch as a merge of two different vector bundles:

{code}
-- bundle 1 --    -- bundle 2 --
| a1 |    | b1 |        | c1 |
| a2 |    | b2 |        | c2 |
{code}

For example, consider a reader. The reader implementation might read columns (a, b) from a file, say. Then, the "{{ScanBatch}}" might add (c) as an implicit vector (the file name, say.) The merged set of vectors comprises the final schema: (a, b, c).

This ticket asks for the code to do the merge:

* Merge two schemas A = (a, b), B = (c) to create schema C = (a, b, c).
* Merge two vector containers C1 and C2 to create a new container, C3, that holds the merger of the vectors from the first two.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)