You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2018/09/19 17:12:00 UTC

[jira] [Commented] (ARROW-3267) [Python] Create empty table from schema

    [ https://issues.apache.org/jira/browse/ARROW-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16620873#comment-16620873 ] 

Paul Rogers commented on ARROW-3267:
------------------------------------

FWIW, ARROW-3164 describes a port of a "row set mechanism" from Apache Drill that does exactly this. There are three relevant components:

1. A fluent schema builder to define the schema.
2. The schema definition itself which includes both scalar and "complex" types.
3. A "row set" (vector batch) builder to build vectors from schema.

Drill found that it was helpful to have additional metadata in the schema, such as expected width for VARCHAR columns, expected cardinality for arrays, and expected types for unions.

The row set builder could then optionally allocate vector buffers at the approximate desired size, which avoided the need to double vectors repeatedly as they are written.

The rest of the mechanism provides a means to write to, or read from vectors, which is beyond the scope of this particular ticket.

This ticket talks about Python, so the Java row set code is not directly applicable. Still feel free to borrow ideas. Also, perhaps we can coordinate to establish a common approach across languages.

> [Python] Create empty table from schema
> ---------------------------------------
>
>                 Key: ARROW-3267
>                 URL: https://issues.apache.org/jira/browse/ARROW-3267
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>            Reporter: Uwe L. Korn
>            Assignee: Uwe L. Korn
>            Priority: Major
>             Fix For: 0.11.0
>
>
> When one knows the expected schema for its input data but has no input data for a data pipeline, it is necessary to construct an empty table as a sentinel value to pass through.
> This is a small but often useful convenience function.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)