You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Julien Le Dem <ju...@dremio.com> on 2017/06/07 16:53:31 UTC
Parquet sync starting in 10 min
10am PT on google hangout:
https://hangouts.google.com/hangouts/_/dremio.com/parquet-sync-up
Reminder that this is open to all.
Here is how it goes:
- we do a "round table" of people present where they quickly introduce
themselves and state the topics they wish discussed (if any. Being a "fly
on the wall" is totally fine too)
- based on that first round we summarize the agenda and go over the topics
one by one. (can be just bringing attention of people to a PR that needs a
review or asking if it makes sense to implement some new feature)
- In the end we send notes back to the list and follow ups happen on JIRA,
github PRs and the dev list.
- if the time is inconvenient to you say so on the list and we can figure
out something.
--
Julien
Re: Parquet sync starting in 10 min
Posted by Julien Le Dem <ju...@dremio.com>.
Notes:
Attendees/agenda building:
Zoltan (Cloudera):
- timestamp, min/max
Anna (cloudera)
Deepak (Vertica):
- timestamp
- c++/java: bloom filter.
Lars (Cloudera Impala)
- page skipping indexes
- open PRs
Pooja (Cloudera Impala):
- page skipping indexes
Julien (Dremio):
- page skipping indexes
- timestamp
Agenda:
- open PRs
TODO (all): review:
- https://github.com/apache/parquet-format/pull/54
- https://github.com/apache/parquet-mr/pull/414
- https://github.com/apache/parquet-mr/pull/411
- https://github.com/apache/parquet-mr/pull/413
- https://github.com/apache/parquet-mr/pull/410
TODO:
follow up (Julien, Lars, Ryan): https://github.com/
apache/parquet-format/pull/53
Ryan follow up https://github.com/apache/parquet-format/pull/51
Julien more tests: https://github.com/apache/parquet-format/pull/50
Ryan follow up: https://github.com/apache/parquet-format/pull/49
- PR triage:
- TODO: Lars to do a pass on parquet-format
- TODO: Julien to do a pass on parquet-mr
- timestamps:
- When reading from parquet to arrow if the timestamp isAdjusted to UTC
in arrow we use UTC timezone. otherwise no timezone (timestamp without
timezone)
- follow up on jira about timestamp with timezone: PARQUET-906
- min/max: PARQUET-686
- final conclusion: https://github.com/apache/parquet-format/pull/46
- PARQUET-839 => duplicate of PARQUET-686
- TODO close obsolete PRs:
- <https://github.com/apache/parquet-format/pull/42>
https://github.com/apache/parquet-format/pull/42
- https://github.com/apache/parquet-mr/pull/362
- We need an implementation in parquet-mr for the metadata in
https://github.com/apache/parquet-format/pull/46
- TODO: Zoltan to open a jira
- impala has an implementation, we should test they are compatible
- bloom filter
- PARQUET-319: see linked PR and doc.
- https://github.com/apache/parquet-format/pull/28
- https://docs.google.com/document/d/1mIZ0W24Cr79QHJWN1sQ3dIUc4lAK5
AVqozwSwtpFhW8/edit#heading=h.hmt1hrab3fpc
- TODO: review and give feedback
- page skipping indexes
- plan is prototype a writer in impala then a reader.
- We’ll review the results to finalize the metadata in 5-6 weeks.
- dealing with statistics coming from parquet-cpp
- new min/max_value fields will be the reference
On Wed, Jun 7, 2017 at 10:54 AM, Wes McKinney <we...@gmail.com> wrote:
> Sorry, I was unable to join the sync today. I'm interested to discuss
> more my comments on
>
> https://github.com/apache/parquet-format/pull/51#discussion_r119911623
>
> I'll wait for the notes from the call and maybe we can continue the
> discussion on GitHub
>
> On Wed, Jun 7, 2017 at 12:53 PM, Julien Le Dem <ju...@dremio.com> wrote:
> > 10am PT on google hangout:
> > https://hangouts.google.com/hangouts/_/dremio.com/parquet-sync-up
> >
> > Reminder that this is open to all.
> > Here is how it goes:
> > - we do a "round table" of people present where they quickly introduce
> > themselves and state the topics they wish discussed (if any. Being a "fly
> > on the wall" is totally fine too)
> > - based on that first round we summarize the agenda and go over the
> topics
> > one by one. (can be just bringing attention of people to a PR that needs
> a
> > review or asking if it makes sense to implement some new feature)
> > - In the end we send notes back to the list and follow ups happen on
> JIRA,
> > github PRs and the dev list.
> > - if the time is inconvenient to you say so on the list and we can
> figure
> > out something.
> >
> > --
> > Julien
>
--
Julien
Re: Parquet sync starting in 10 min
Posted by Wes McKinney <we...@gmail.com>.
Sorry, I was unable to join the sync today. I'm interested to discuss
more my comments on
https://github.com/apache/parquet-format/pull/51#discussion_r119911623
I'll wait for the notes from the call and maybe we can continue the
discussion on GitHub
On Wed, Jun 7, 2017 at 12:53 PM, Julien Le Dem <ju...@dremio.com> wrote:
> 10am PT on google hangout:
> https://hangouts.google.com/hangouts/_/dremio.com/parquet-sync-up
>
> Reminder that this is open to all.
> Here is how it goes:
> - we do a "round table" of people present where they quickly introduce
> themselves and state the topics they wish discussed (if any. Being a "fly
> on the wall" is totally fine too)
> - based on that first round we summarize the agenda and go over the topics
> one by one. (can be just bringing attention of people to a PR that needs a
> review or asking if it makes sense to implement some new feature)
> - In the end we send notes back to the list and follow ups happen on JIRA,
> github PRs and the dev list.
> - if the time is inconvenient to you say so on the list and we can figure
> out something.
>
> --
> Julien