You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Wes McKinney (JIRA)" <ji...@apache.org> on 2016/08/19 23:04:20 UTC

[jira] [Commented] (ARROW-264) Create an Arrow File format

    [ https://issues.apache.org/jira/browse/ARROW-264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15429008#comment-15429008 ] 

Wes McKinney commented on ARROW-264:
------------------------------------

Looks like a good start to me. We should add some minor internal details, like padding all byte buffers to start and end on 8-byte boundaries (according to the Arrow spec memory will already be aligned and padded, but the serialized metadata may require padding bytes). This is a similar, but much more general version of a file layout compared with what we did in Feather (which has a schema and record batch headers in a single metadata chunk, but only a single record batch and no dictionaries -- https://github.com/wesm/feather/blob/master/doc/FORMAT.md).

> Create an Arrow File format
> ---------------------------
>
>                 Key: ARROW-264
>                 URL: https://issues.apache.org/jira/browse/ARROW-264
>             Project: Apache Arrow
>          Issue Type: Improvement
>            Reporter: Julien Le Dem
>            Assignee: Julien Le Dem
>
> File layout:
> (DictionaryBatch, RecordBatch, Schema as defined in Message.fbs)
> {noformat}
> MAGIC:   ARROW1
> (
> DictionaryBatch:  DictionaryBatch Header (FlatBuffer)
> DictionaryBatch: DictionaryBatch Body (buffers concatenated)
> )*
> (
> RecordBacth: RecordBatch Header (FlatBuffer)
> RecordBacth: RecordBatch Body (buffers concatenated)
> )+
> Footer: Flatbuffer
> Footer length: int (4 bytes unsigned LE)
> MAGIC: ARROW1
> {noformat}
> Footer definition:
> {noformat}
> table Footer {
>   schema: org.apache.arrow.flatbuf.Schema;
>   dictionaries: [ Block ];
>   recordBatches: [ Block ];
> }
> struct Block {
>   offset: long;
>   metaDataLength: int;
>   bodyLength: long;
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)