You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Rares Vernica <rv...@gmail.com> on 2017/09/16 03:39:07 UTC

Chunked arrays in Feather files

Hi,

I have a question about chunks in Feather files.

A TableReader can be used to read a Column. For a column, the data is in a
ChunkedArray. For a Feather file, what is the chunk size? Can the chunk
size be modified?

Thanks!
Rares

Re: Chunked arrays in Feather files

Posted by Wes McKinney <we...@gmail.com>.
hi Rares,

I assume you're talking about this function:

https://github.com/apache/arrow/blob/master/cpp/src/arrow/ipc/feather.h#L73

Currently in Feather files the columns only ever have one chunk, so
this function could return a contiguous arrow::Array instead. In the
future once there are R bindings for Arrow, I would like to replace
the Feather format's ad hoc metadata and on disk layout with the main
Arrow stream / file layout, which would allow chunking, appends,
nested data, compression, and more. See also See
https://github.com/apache/arrow/blob/master/python/doc/source/ipc.rst#feather-format

The Feather API could be augmented to support chunked iteration (e.g.
like a generic stream reader) or arbitrary reads from the middle,
since internally you would be computing zero-copy slices on the
columns.

- Wes

On Fri, Sep 15, 2017 at 11:39 PM, Rares Vernica <rv...@gmail.com> wrote:
> Hi,
>
> I have a question about chunks in Feather files.
>
> A TableReader can be used to read a Column. For a column, the data is in a
> ChunkedArray. For a Feather file, what is the chunk size? Can the chunk
> size be modified?
>
> Thanks!
> Rares