You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@arrow.apache.org by Jason Sachs <jm...@gmail.com> on 2020/11/12 21:44:21 UTC

How to write Arrow Table to Parquet file in Java

The Python examples in https://arrow.apache.org/docs/python/parquet.html are wonderful and really easy to get started; in particular this one:

    writer = pq.ParquetWriter('example2.parquet', table.schema)
    for i in range(3):
        writer.write_table(table)
    writer.close()

How would I do something similar in Java? Arrow and Parquet libraries don't seem to know about one another.

I have looked a little bit at the Javadocs at https://www.javadoc.io/doc/org.apache.parquet/parquet-column/1.10.0/index.html but my head is spinning. (although for the record most of my work is in Python and a coworker is handling the Java side... he is only slightly less confused, though)

Re: How to write Arrow Table to Parquet file in Java

Posted by Micah Kornfield <em...@gmail.com>.
I don't think there is an official Apache library in Java that supports
writing/reading Arrow data to parquet.

If you are looking to interchange Arrow data between Java/Python, your best
bet is to use the native Arrow file format (Java doesn't support
compression options yet).

-Micah

On Thu, Nov 12, 2020 at 3:23 PM Chris Nuernberger <ch...@techascent.com>
wrote:

> We use Clojure and have a dataframe library that does this:
>
> https://github.com/techascent/tech.ml.dataset/
>
> On Thu, Nov 12, 2020 at 2:44 PM Jason Sachs <jm...@gmail.com> wrote:
>
>> The Python examples in https://arrow.apache.org/docs/python/parquet.html
>> are wonderful and really easy to get started; in particular this one:
>>
>>     writer = pq.ParquetWriter('example2.parquet', table.schema)
>>     for i in range(3):
>>         writer.write_table(table)
>>     writer.close()
>>
>> How would I do something similar in Java? Arrow and Parquet libraries
>> don't seem to know about one another.
>>
>> I have looked a little bit at the Javadocs at
>> https://www.javadoc.io/doc/org.apache.parquet/parquet-column/1.10.0/index.html
>> but my head is spinning. (although for the record most of my work is in
>> Python and a coworker is handling the Java side... he is only slightly less
>> confused, though)
>>
>

Re: How to write Arrow Table to Parquet file in Java

Posted by Chris Nuernberger <ch...@techascent.com>.
We use Clojure and have a dataframe library that does this:

https://github.com/techascent/tech.ml.dataset/

On Thu, Nov 12, 2020 at 2:44 PM Jason Sachs <jm...@gmail.com> wrote:

> The Python examples in https://arrow.apache.org/docs/python/parquet.html
> are wonderful and really easy to get started; in particular this one:
>
>     writer = pq.ParquetWriter('example2.parquet', table.schema)
>     for i in range(3):
>         writer.write_table(table)
>     writer.close()
>
> How would I do something similar in Java? Arrow and Parquet libraries
> don't seem to know about one another.
>
> I have looked a little bit at the Javadocs at
> https://www.javadoc.io/doc/org.apache.parquet/parquet-column/1.10.0/index.html
> but my head is spinning. (although for the record most of my work is in
> Python and a coworker is handling the Java side... he is only slightly less
> confused, though)
>