You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Julien Le Dem <ju...@dremio.com> on 2016/11/10 16:51:56 UTC
parquet sync up today at 10PT (in 1 hour)
Reminder that the Parquet Sync up will be in 1h at 10am PT on hangout:
https://hangouts.google.com/hangouts/_/dremio.com/parquet-sync-up
--
Julien
Re: parquet sync up today at 10PT (in 1 hour)
Posted by Julien Le Dem <ju...@dremio.com>.
Thank you for correcting!
On Thursday, November 10, 2016, Ryan Blue <rb...@netflix.com.invalid> wrote:
> I have a slight correct for the Brotli encoding numbers. The 20% size
> decrease incurred a 2.5% increase in compression time (using brotli-5),
> while the 15% size decrease had a 12% encoding time *decrease* (using
> brotli-4). We've decided to use brotli-5 for tables that are read a lot,
> and brotli-4 for most other tables.
>
> On Thu, Nov 10, 2016 at 11:26 AM, Julien Le Dem <julien@dremio.com
> <javascript:;>> wrote:
>
> > Attendees/agenda:
> > Zoltan (Cloudera):
> > - Parquet tools questions
> > Piyush (Twitter):
> > - planning on encoding optimization
> > Uwe:
> > - release parquet-cpp
> > - license/notice questions
> > Wes (twosigma):
> > - working on arrow
> > - helping with the parquet-cpp release
> > Deepak (HP/Vertica):
> > - read/write parquet-cpp
> > - discuss. statistics PARQUET-686. timestamps/...
> > Ryan (Netflix):
> > - 1.9.0 release out.
> > - statistics
> > Julien (Dremio):
> > - Parquet-Arrow integration
> >
> > Notes:
> > Parquet-tools:
> > - when missing hadoop jars on the class path => bad error message
> > - 1.6 used to bundle hadoop
> > - 1.9 requires adding hadoop classpath
> > - Ryan has new new CLI tool
> >
> > Parquet cpp release:
> > - need to put mentions in NOTICE files
> > - merge script came from the Spark project (Apache 2 License)
> > - some code came from Impala (Apache 2 License)
> > - Need to track the files imported from impala
> > - Wes to document.
> > - Zoltan to look into moving copyright to NOTICE
> >
> > Statistics:
> > - Revisit signed/unsigned stats approach
> > - instead add information on how the min/man got obtained. (Collation)
> > - collation should follow a standard. We’re going to implement only a
> > subset.
> > - JIRA PARQUET-686
> >
> > int96:
> > - deprecate write of int96 (Ryan to look into it)
> >
> > New Encodings/compression:
> > - brotli compression. => 20% decrease in size. 25% increase in encoding
> > time. other settings: 15%/12% (compared to gzip). Ryan to update the PR.
> > - need cpp integration as well. Uwe
> > - PARQUET-682: specify encoding per column. Piyush to update PR
> >
> >
> >
> > On Thu, Nov 10, 2016 at 10:00 AM, Julien Le Dem <julien@dremio.com
> <javascript:;>> wrote:
> >
> > > starting now
> > > https://hangouts.google.com/hangouts/_/dremio.com/parquet-sync-up
> > >
> > > On Thu, Nov 10, 2016 at 8:51 AM, Julien Le Dem <julien@dremio.com
> <javascript:;>>
> > wrote:
> > >
> > >> Reminder that the Parquet Sync up will be in 1h at 10am PT on hangout:
> > >> https://hangouts.google.com/hangouts/_/dremio.com/parquet-sync-up
> > >>
> > >> --
> > >> Julien
> > >>
> > >
> > >
> > >
> > > --
> > > Julien
> > >
> >
> >
> >
> > --
> > Julien
> >
>
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>
--
Julien
Re: parquet sync up today at 10PT (in 1 hour)
Posted by Ryan Blue <rb...@netflix.com.INVALID>.
I have a slight correct for the Brotli encoding numbers. The 20% size
decrease incurred a 2.5% increase in compression time (using brotli-5),
while the 15% size decrease had a 12% encoding time *decrease* (using
brotli-4). We've decided to use brotli-5 for tables that are read a lot,
and brotli-4 for most other tables.
On Thu, Nov 10, 2016 at 11:26 AM, Julien Le Dem <ju...@dremio.com> wrote:
> Attendees/agenda:
> Zoltan (Cloudera):
> - Parquet tools questions
> Piyush (Twitter):
> - planning on encoding optimization
> Uwe:
> - release parquet-cpp
> - license/notice questions
> Wes (twosigma):
> - working on arrow
> - helping with the parquet-cpp release
> Deepak (HP/Vertica):
> - read/write parquet-cpp
> - discuss. statistics PARQUET-686. timestamps/...
> Ryan (Netflix):
> - 1.9.0 release out.
> - statistics
> Julien (Dremio):
> - Parquet-Arrow integration
>
> Notes:
> Parquet-tools:
> - when missing hadoop jars on the class path => bad error message
> - 1.6 used to bundle hadoop
> - 1.9 requires adding hadoop classpath
> - Ryan has new new CLI tool
>
> Parquet cpp release:
> - need to put mentions in NOTICE files
> - merge script came from the Spark project (Apache 2 License)
> - some code came from Impala (Apache 2 License)
> - Need to track the files imported from impala
> - Wes to document.
> - Zoltan to look into moving copyright to NOTICE
>
> Statistics:
> - Revisit signed/unsigned stats approach
> - instead add information on how the min/man got obtained. (Collation)
> - collation should follow a standard. We’re going to implement only a
> subset.
> - JIRA PARQUET-686
>
> int96:
> - deprecate write of int96 (Ryan to look into it)
>
> New Encodings/compression:
> - brotli compression. => 20% decrease in size. 25% increase in encoding
> time. other settings: 15%/12% (compared to gzip). Ryan to update the PR.
> - need cpp integration as well. Uwe
> - PARQUET-682: specify encoding per column. Piyush to update PR
>
>
>
> On Thu, Nov 10, 2016 at 10:00 AM, Julien Le Dem <ju...@dremio.com> wrote:
>
> > starting now
> > https://hangouts.google.com/hangouts/_/dremio.com/parquet-sync-up
> >
> > On Thu, Nov 10, 2016 at 8:51 AM, Julien Le Dem <ju...@dremio.com>
> wrote:
> >
> >> Reminder that the Parquet Sync up will be in 1h at 10am PT on hangout:
> >> https://hangouts.google.com/hangouts/_/dremio.com/parquet-sync-up
> >>
> >> --
> >> Julien
> >>
> >
> >
> >
> > --
> > Julien
> >
>
>
>
> --
> Julien
>
--
Ryan Blue
Software Engineer
Netflix
Re: parquet sync up today at 10PT (in 1 hour)
Posted by Julien Le Dem <ju...@dremio.com>.
Attendees/agenda:
Zoltan (Cloudera):
- Parquet tools questions
Piyush (Twitter):
- planning on encoding optimization
Uwe:
- release parquet-cpp
- license/notice questions
Wes (twosigma):
- working on arrow
- helping with the parquet-cpp release
Deepak (HP/Vertica):
- read/write parquet-cpp
- discuss. statistics PARQUET-686. timestamps/...
Ryan (Netflix):
- 1.9.0 release out.
- statistics
Julien (Dremio):
- Parquet-Arrow integration
Notes:
Parquet-tools:
- when missing hadoop jars on the class path => bad error message
- 1.6 used to bundle hadoop
- 1.9 requires adding hadoop classpath
- Ryan has new new CLI tool
Parquet cpp release:
- need to put mentions in NOTICE files
- merge script came from the Spark project (Apache 2 License)
- some code came from Impala (Apache 2 License)
- Need to track the files imported from impala
- Wes to document.
- Zoltan to look into moving copyright to NOTICE
Statistics:
- Revisit signed/unsigned stats approach
- instead add information on how the min/man got obtained. (Collation)
- collation should follow a standard. We’re going to implement only a
subset.
- JIRA PARQUET-686
int96:
- deprecate write of int96 (Ryan to look into it)
New Encodings/compression:
- brotli compression. => 20% decrease in size. 25% increase in encoding
time. other settings: 15%/12% (compared to gzip). Ryan to update the PR.
- need cpp integration as well. Uwe
- PARQUET-682: specify encoding per column. Piyush to update PR
On Thu, Nov 10, 2016 at 10:00 AM, Julien Le Dem <ju...@dremio.com> wrote:
> starting now
> https://hangouts.google.com/hangouts/_/dremio.com/parquet-sync-up
>
> On Thu, Nov 10, 2016 at 8:51 AM, Julien Le Dem <ju...@dremio.com> wrote:
>
>> Reminder that the Parquet Sync up will be in 1h at 10am PT on hangout:
>> https://hangouts.google.com/hangouts/_/dremio.com/parquet-sync-up
>>
>> --
>> Julien
>>
>
>
>
> --
> Julien
>
--
Julien
Re: parquet sync up today at 10PT (in 1 hour)
Posted by Julien Le Dem <ju...@dremio.com>.
starting now
https://hangouts.google.com/hangouts/_/dremio.com/parquet-sync-up
On Thu, Nov 10, 2016 at 8:51 AM, Julien Le Dem <ju...@dremio.com> wrote:
> Reminder that the Parquet Sync up will be in 1h at 10am PT on hangout:
> https://hangouts.google.com/hangouts/_/dremio.com/parquet-sync-up
>
> --
> Julien
>
--
Julien