You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@arrow.apache.org by MattK <hw...@gmail.com> on 2018/12/05 16:28:08 UTC

Writing Parquet in Go via Arrow?

Pure Go Parquet libraries are currently a bit slower than we need, and
using Arrow might be an option.

Are there any examples of using Arrow in Go to create and write Parquet in
code where the data has already been read and typed?

Re: Writing Parquet in Go via Arrow?

Posted by Wes McKinney <we...@gmail.com>.
AFAIK no one has yet implemented an adapter between the Arrow columnar
data structures and a Parquet file writer. This would be a useful
thing to do, though, and Arrow is an ideal tool to represent
deserialized Parquet files in memory. We're doing this in C++, for
example
On Wed, Dec 5, 2018 at 10:28 AM MattK <hw...@gmail.com> wrote:
>
> Pure Go Parquet libraries are currently a bit slower than we need, and using Arrow might be an option.
>
> Are there any examples of using Arrow in Go to create and write Parquet in code where the data has already been read and typed?

Re: Writing Parquet in Go via Arrow?

Posted by MattK <hw...@gmail.com>.
It was parquet-go indeed, which works well but as pure Go has some
performance limits.

In our volume and patterns, Python using Arrow is actually faster
apparently due to the C++ code for writing Parquet via Arrow.

Thus using Arrow in Go gives us another performance path.

On Wed, Dec 5, 2018 at 3:45 PM Sebastien Binet <bi...@cern.ch> wrote:

> hi,
>
> On Wed, Dec 5, 2018 at 5:29 PM MattK <hw...@gmail.com> wrote:
>
>> Pure Go Parquet libraries are currently a bit slower than we need, and
>> using Arrow might be an option.
>>
>> Are there any examples of using Arrow in Go to create and write Parquet
>> in code where the data has already been read and typed?
>>
>
> like Wes, I am not aware of any body of work around this.
>
> out of curiosity, what were the Parquet packages you tried?
>
> at some point I tried to merge go-parquet with Apache Arrow:
> - https://github.com/xitongsys/parquet-go/issues/91
>
> was it this package?
>
> -s
>

Re: Writing Parquet in Go via Arrow?

Posted by Sebastien Binet <bi...@cern.ch>.
hi,

On Wed, Dec 5, 2018 at 5:29 PM MattK <hw...@gmail.com> wrote:

> Pure Go Parquet libraries are currently a bit slower than we need, and
> using Arrow might be an option.
>
> Are there any examples of using Arrow in Go to create and write Parquet in
> code where the data has already been read and typed?
>

like Wes, I am not aware of any body of work around this.

out of curiosity, what were the Parquet packages you tried?

at some point I tried to merge go-parquet with Apache Arrow:
- https://github.com/xitongsys/parquet-go/issues/91

was it this package?

-s