You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@arrow.apache.org by Manoj Karthick <ma...@ymail.com> on 2021/02/05 20:28:32 UTC

[Rust] [Parquet] Combining Parquet files

Hi,

I've been playing around with the Rust Parquet library and was trying to
understand how to combine Parquet files. I'm new to Rust and the Arrow
ecosystem, so I'd appreciate some help in figuring this out.

I'm looking for a way to naively merge Parquet files. For example, if we
have input files: A, B, C - I would like to create an output Parquet file
that has the row groups from A, B and C placed one after the other (I
understand this might be inefficient, but this mostly for development
purposes).

What would be the best way to achieve this? Also how should the
FileMetaData be updated to reflect the new row groups and number of rows?

Thank you!

Re: [Rust] [Parquet] Combining Parquet files

Posted by Manoj Karthick <ma...@ymail.com>.
Thank you. The guide was really helpful!

On Sat, Feb 6, 2021 at 4:23 AM Fernando Herrera <
fernando.j.herrera@gmail.com> wrote:

> Hi,
>
> Have a look at this
>
> https://elferherrera.github.io/arrow_guide/reading_parquet.html
>
> It may give you and idea of the things you want to do
>
> On Fri, Feb 5, 2021 at 8:29 PM Manoj Karthick <ma...@ymail.com>
> wrote:
>
>> Hi,
>>
>> I've been playing around with the Rust Parquet library and was trying to
>> understand how to combine Parquet files. I'm new to Rust and the Arrow
>> ecosystem, so I'd appreciate some help in figuring this out.
>>
>> I'm looking for a way to naively merge Parquet files. For example, if we
>> have input files: A, B, C - I would like to create an output Parquet file
>> that has the row groups from A, B and C placed one after the other (I
>> understand this might be inefficient, but this mostly for development
>> purposes).
>>
>> What would be the best way to achieve this? Also how should the
>> FileMetaData be updated to reflect the new row groups and number of rows?
>>
>> Thank you!
>>
>

Re: [Rust] [Parquet] Combining Parquet files

Posted by Fernando Herrera <fe...@gmail.com>.
Hi,

Have a look at this

https://elferherrera.github.io/arrow_guide/reading_parquet.html

It may give you and idea of the things you want to do

On Fri, Feb 5, 2021 at 8:29 PM Manoj Karthick <ma...@ymail.com>
wrote:

> Hi,
>
> I've been playing around with the Rust Parquet library and was trying to
> understand how to combine Parquet files. I'm new to Rust and the Arrow
> ecosystem, so I'd appreciate some help in figuring this out.
>
> I'm looking for a way to naively merge Parquet files. For example, if we
> have input files: A, B, C - I would like to create an output Parquet file
> that has the row groups from A, B and C placed one after the other (I
> understand this might be inefficient, but this mostly for development
> purposes).
>
> What would be the best way to achieve this? Also how should the
> FileMetaData be updated to reflect the new row groups and number of rows?
>
> Thank you!
>