You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Julien Le Dem <ju...@dremio.com> on 2016/04/21 00:51:36 UTC

Parquet sync up

It is happening at 4pm PT on google hangout
https://plus.google.com/hangouts/_/event/parquet_sync_up


-- 
Julien

RE: Parquet sync up

Posted by "Zheng, Kai" <ka...@intel.com>.
Sounds a lot of great work!

>> Performance benchmarks for Write
I know writing performance is critical for operations like storing a table using PARQUET format in some frameworks (like Impala), in addition to this, is there any cases to speed up running into the parquet write in query executing?

Sorry if the question is stupid. Thanks in advance.

Regards,
Kai

-----Original Message-----
From: Uwe Korn [mailto:uwelk@xhochy.com] 
Sent: Thursday, April 21, 2016 8:32 PM
To: dev@parquet.apache.org
Subject: Re: Parquet sync up

Hello,

due to me being in Europe, this is a very inconvenient time. Thus I rather write a longer mail instead of joining. As a bit of input, here is what I'm up to at the moment:

  * Write support in a basic form for parquet-cpp (no compression, fixed encodings, excessive memory usage, ..) is nearly done. I hope to open the final PR for discussion next week.
  * Remaining Tasks until I make the PR:
    * a bit of code cleanup
    * Going through the API again to make it consistent
    * Metadata for RowGroups and ColumnChunks

Afterwards I would look into one of the following tasks w.r.t. parquet-cpp:
  * WriterProperties to specify compression, encoding, .. on a global and per-column basis.
  * Performance benchmarks for Write
  * Integration of Parquet support in Apache Arrow to use it with Python
  * Reduce the memory usage of the initial Writer implementation (therefore we probably need to extend the encoders a bit)

If anyone else also looks into this, I'm happy to collaborate ;)

Cheers
Uwe

On 21.04.16 00:51, Julien Le Dem wrote:
> It is happening at 4pm PT on google hangout 
> https://plus.google.com/hangouts/_/event/parquet_sync_up
>
>


Re: Parquet sync up

Posted by Wes McKinney <we...@cloudera.com>.
I'm sorry that I'm not able to join either due to international travel
(also due to European time zone), but my interests are much in line
with Uwe's and I look forward to continuing to work together with him
and Deepak and Aliaksei on parquet-cpp. We should engage in a
conversation on the ML about the C++ roadmap next week, which should
be good timing with Uwe's progress on the write path.

Thanks
Wes

On Thu, Apr 21, 2016 at 8:32 AM, Uwe Korn <uw...@xhochy.com> wrote:
> Hello,
>
> due to me being in Europe, this is a very inconvenient time. Thus I rather
> write a longer mail instead of joining. As a bit of input, here is what I'm
> up to at the moment:
>
>  * Write support in a basic form for parquet-cpp (no compression, fixed
> encodings, excessive memory usage, ..) is nearly done. I hope to open the
> final PR for discussion next week.
>  * Remaining Tasks until I make the PR:
>    * a bit of code cleanup
>    * Going through the API again to make it consistent
>    * Metadata for RowGroups and ColumnChunks
>
> Afterwards I would look into one of the following tasks w.r.t. parquet-cpp:
>  * WriterProperties to specify compression, encoding, .. on a global and
> per-column basis.
>  * Performance benchmarks for Write
>  * Integration of Parquet support in Apache Arrow to use it with Python
>  * Reduce the memory usage of the initial Writer implementation (therefore
> we probably need to extend the encoders a bit)
>
> If anyone else also looks into this, I'm happy to collaborate ;)
>
> Cheers
> Uwe
>
>
> On 21.04.16 00:51, Julien Le Dem wrote:
>>
>> It is happening at 4pm PT on google hangout
>> https://plus.google.com/hangouts/_/event/parquet_sync_up
>>
>>
>

Re: Parquet sync up

Posted by Uwe Korn <uw...@xhochy.com>.
Hello,

due to me being in Europe, this is a very inconvenient time. Thus I 
rather write a longer mail instead of joining. As a bit of input, here 
is what I'm up to at the moment:

  * Write support in a basic form for parquet-cpp (no compression, fixed 
encodings, excessive memory usage, ..) is nearly done. I hope to open 
the final PR for discussion next week.
  * Remaining Tasks until I make the PR:
    * a bit of code cleanup
    * Going through the API again to make it consistent
    * Metadata for RowGroups and ColumnChunks

Afterwards I would look into one of the following tasks w.r.t. parquet-cpp:
  * WriterProperties to specify compression, encoding, .. on a global 
and per-column basis.
  * Performance benchmarks for Write
  * Integration of Parquet support in Apache Arrow to use it with Python
  * Reduce the memory usage of the initial Writer implementation 
(therefore we probably need to extend the encoders a bit)

If anyone else also looks into this, I'm happy to collaborate ;)

Cheers
Uwe

On 21.04.16 00:51, Julien Le Dem wrote:
> It is happening at 4pm PT on google hangout
> https://plus.google.com/hangouts/_/event/parquet_sync_up
>
>