You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Andy Grove <an...@gmail.com> on 2018/04/15 17:35:04 UTC

Arrow file formats

I've started down the path of building a very simple file format for
transferring Arrow data between nodes in my project.

Somebody on Reddit quite reasonably pointed out that I should look at
Feather (which I didn't actually know about until now) and also mentioned
that has been deprecated now in favor of some new format in the Arrow
project itself?

I'm also aware that the IPC mechanism might potentially be suitable but I
haven't had time to read the specs yet. I'm waiting on Google Flatbuffers
for Rust though before starting to contribute IPC support and I need
something usable in the meantime (and I'm happy to donate whatever I build
if it is useful).

I'd appreciate hearing opinions on where Arrow is going in terms of
defining file formats.

Thanks,

Andy.

Re: Arrow file formats

Posted by Andy Grove <an...@gmail.com>.
Hi Wes,

The IPC format looks ideal. Once again my enthusiasm is getting the better
of me. I will start looking into options for implementing this in Rust.

Thanks,

Andy.

On Sun, Apr 15, 2018 at 12:36 PM, Wes McKinney <we...@gmail.com> wrote:

> hi Andy,
>
> Is there a reason to not use the file format defined in
> https://github.com/apache/arrow/blob/master/format/IPC.md#file-format?
> We already have 3 implementations of this format in Java, C++, and
> JavaScript. Is there a way you could wrap the C or C++ Flatbuffers
> headers for use in Rust until the Rust generator is ready for
> primetime? Otherwise there's a lot of wheels to reinvent.
>
> > Somebody on Reddit quite reasonably pointed out that I should look at
> > Feather (which I didn't actually know about until now) and also mentioned
> > that has been deprecated now in favor of some new format in the Arrow
> > project itself?
>
> FYI, I've found there's quite a bit of disinformation (or
> half-information) surrounding this project on the internet. People
> routinely say things to me at conferences and elsewhere that have
> resulted from misconceptions that have been propagated via word of
> mouth or Twitter. For example, I wrote
> http://wesmckinney.com/blog/feather-arrow-future/ in an effort to
> clear up confusion about where the Feather format is going.
>
> - Wes
>
> On Sun, Apr 15, 2018 at 1:35 PM, Andy Grove <an...@gmail.com> wrote:
> > I've started down the path of building a very simple file format for
> > transferring Arrow data between nodes in my project.
> >
> >
> > I'm also aware that the IPC mechanism might potentially be suitable but I
> > haven't had time to read the specs yet. I'm waiting on Google Flatbuffers
> > for Rust though before starting to contribute IPC support and I need
> > something usable in the meantime (and I'm happy to donate whatever I
> build
> > if it is useful).
> >
> > I'd appreciate hearing opinions on where Arrow is going in terms of
> > defining file formats.
> >
> > Thanks,
> >
> > Andy.
>

Re: Arrow file formats

Posted by Wes McKinney <we...@gmail.com>.
hi Andy,

Is there a reason to not use the file format defined in
https://github.com/apache/arrow/blob/master/format/IPC.md#file-format?
We already have 3 implementations of this format in Java, C++, and
JavaScript. Is there a way you could wrap the C or C++ Flatbuffers
headers for use in Rust until the Rust generator is ready for
primetime? Otherwise there's a lot of wheels to reinvent.

> Somebody on Reddit quite reasonably pointed out that I should look at
> Feather (which I didn't actually know about until now) and also mentioned
> that has been deprecated now in favor of some new format in the Arrow
> project itself?

FYI, I've found there's quite a bit of disinformation (or
half-information) surrounding this project on the internet. People
routinely say things to me at conferences and elsewhere that have
resulted from misconceptions that have been propagated via word of
mouth or Twitter. For example, I wrote
http://wesmckinney.com/blog/feather-arrow-future/ in an effort to
clear up confusion about where the Feather format is going.

- Wes

On Sun, Apr 15, 2018 at 1:35 PM, Andy Grove <an...@gmail.com> wrote:
> I've started down the path of building a very simple file format for
> transferring Arrow data between nodes in my project.
>
>
> I'm also aware that the IPC mechanism might potentially be suitable but I
> haven't had time to read the specs yet. I'm waiting on Google Flatbuffers
> for Rust though before starting to contribute IPC support and I need
> something usable in the meantime (and I'm happy to donate whatever I build
> if it is useful).
>
> I'd appreciate hearing opinions on where Arrow is going in terms of
> defining file formats.
>
> Thanks,
>
> Andy.