You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2017/05/15 22:13:04 UTC
[jira] [Created] (DRILL-5514) Enhance VectorContainer to merge two
row sets
Paul Rogers created DRILL-5514:
----------------------------------
Summary: Enhance VectorContainer to merge two row sets
Key: DRILL-5514
URL: https://issues.apache.org/jira/browse/DRILL-5514
Project: Apache Drill
Issue Type: Improvement
Affects Versions: 1.10.0
Reporter: Paul Rogers
Assignee: Paul Rogers
Priority: Minor
Fix For: 1.11.0
Consider the concept of a "record batch" in Drill. On the one hand, one can envision a record batch as a stack of records:
{code}
| a1 | b1 | c1 |
----------------
| a2 | b2 | c2 |
{code}
But, Drill is columnar. So a record batch is really a "bundle" of vectors:
{code}
| a1 | | b1 | | c1 |
| a2 | | b2 | | c2 |
{code}
There are times when it is handy to build up a record batch as a merge of two different vector bundles:
{code}
-- bundle 1 -- -- bundle 2 --
| a1 | | b1 | | c1 |
| a2 | | b2 | | c2 |
{code}
For example, consider a reader. The reader implementation might read columns (a, b) from a file, say. Then, the "{{ScanBatch}}" might add (c) as an implicit vector (the file name, say.) The merged set of vectors comprises the final schema: (a, b, c).
This ticket asks for the code to do the merge:
* Merge two schemas A = (a, b), B = (c) to create schema C = (a, b, c).
* Merge two vector containers C1 and C2 to create a new container, C3, that holds the merger of the vectors from the first two.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)