You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2018/08/14 00:31:00 UTC

[jira] [Commented] (DRILL-6683) move getSelectionVector2 and getSelectionVector4 from VectorAccessible interface to RecordBatch interface

    [ https://issues.apache.org/jira/browse/DRILL-6683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16579069#comment-16579069 ] 

Paul Rogers commented on DRILL-6683:
------------------------------------

While this seems a good idea; it does get to the core of the design of the {{VectorContainer}} vs. {{RecordBatch}} abstractions.

Despite its name, {{RecordBatch}} is an *operator*, not a batch of data. A {{RecordBatch}} (operator) has an associated output batch of data (a record batch but not a {{RecordBatch}}) represented by a {{VectorContainer}}. Metadata for that container is described by {{BatchSchema}}, which is stored in the {{VectorContainer}}. Since a full record batch is defined by a set of vectors *and* it associated selection vector, it seems odd to disassociate them.

Rather than remove the methods from {{VectorContainer}}, a better longer-term change would be to move the selection vector into the {{VectorContainer}}. Today, it is an odd add-on maintained by the operator, (the so-called {{RecordBatch}}), not the record batch (the so-called {{VectorContainer}}.)

As you've seen in the {{RowSet}} classes, a {{RowSet}} is the logical equivalent of (actually a wrapper for) both a {{VectorContainer}} and a selection vector.

Also, the newer stuff to come that builds on the result set loader splits the operator interface into three responsibilities:

* Operator
* Outgoing batch
* Iterator protocol driver

In this world, a {{RowSet}} (or the result set loader equivalent for reading) would represent the outgoing batch, the operator handle the work of transforming batches.

So, long comment, because the design in this area needs work (which this bug suggests), but the fixes are subtle.

> move getSelectionVector2 and getSelectionVector4 from VectorAccessible interface to RecordBatch interface
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-6683
>                 URL: https://issues.apache.org/jira/browse/DRILL-6683
>             Project: Apache Drill
>          Issue Type: Improvement
>            Reporter: Timothy Farkas
>            Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)