You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Hanifi Gunes (JIRA)" <ji...@apache.org> on 2015/10/28 20:48:27 UTC

[jira] [Comment Edited] (DRILL-3987) Create a POC VV extraction

    [ https://issues.apache.org/jira/browse/DRILL-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14978974#comment-14978974 ] 

Hanifi Gunes edited comment on DRILL-3987 at 10/28/15 7:47 PM:
---------------------------------------------------------------

Vectors should store specific types of values supporting append only writes & random reads as well as exporting convenience functions for zero-copy buffer transfer, accessing vector metadata like buffer size, schema etc.

So for the points above,

we need to export
i) a purified ByteBuf sub-interface. DrillBuf seems over convoluted with operator, fragment ctx and suchlike operations.
ii) a subset of Drill's BufferAllocator removing drill specific logic like getFragmentLimit
iii) builders to instantiate vectors, writers to support append only writes, readers to make random reads
iv) Involving RPC related stuff in the base library sounds out of scope. I would model transfers happening amongst vectors.

v) you can export a vector into a metadata & composite buffer it would be really nice if you could build it back again. Exporting convenience classes/methods like VectorContainers, RecordBatchLoader (will need a better name here :) would be really complementary.
vi) I would also propose touching to the design for abstracting out a ListVector and removing Repeated* types.
vii) [~jnadeau] we had a lot of difficulty in the past due to serialized/materialized mix in the past esp with computing hash code, materialized field mismatching complex VV instances. At this point, I would think that having an immutable vector descriptor along with an immutable schema descriptor built lazily on demand (see BaseVV#getMetadataBuilder) would make sense. To me a barebones vector descriptor is as simple as a path/name + type (all immutable). We should be able to create a vector just using these two. We can still keep MField for carrying out metadata info.

Will look at this more as PoC gets a shape.




was (Author: hgunes):
For the points above,

i) 

> Create a POC VV extraction
> --------------------------
>
>                 Key: DRILL-3987
>                 URL: https://issues.apache.org/jira/browse/DRILL-3987
>             Project: Apache Drill
>          Issue Type: Sub-task
>            Reporter: Jacques Nadeau
>            Assignee: Jacques Nadeau
>
> I'd like to start by looking at an extraction that pulls out the base concepts of:
> buffer allocation, value vectors and complexwriter/fieldreader.
> I need to figure out how to resolve some of the cross-dependency issues (such as the jdbc accessor connections).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)