You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Julien Le Dem <ju...@dremio.com> on 2017/06/07 16:53:31 UTC

Parquet sync starting in 10 min

10am PT on google hangout:
https://hangouts.google.com/hangouts/_/dremio.com/parquet-sync-up

Reminder that this is open to all.
Here is how it goes:
- we do a "round table" of people present where they quickly introduce
themselves and state the topics they wish discussed (if any. Being a "fly
on the wall" is totally fine too)
- based on that first round we summarize the agenda and go over the topics
one by one. (can be just bringing attention of people to a PR that needs a
review or asking if it makes sense to implement some new feature)
 - In the end we send notes back to the list and follow ups happen on JIRA,
github PRs and the dev list.
 - if the time is inconvenient to you say so on the list and we can figure
out something.

-- 
Julien

Re: Parquet sync starting in 10 min

Posted by Julien Le Dem <ju...@dremio.com>.
Notes:
Attendees/agenda building:
Zoltan (Cloudera):
 - timestamp, min/max
Anna (cloudera)
Deepak (Vertica):
 - timestamp
 - c++/java: bloom filter.
Lars (Cloudera Impala)
 - page skipping indexes
 - open PRs
Pooja (Cloudera Impala):
 - page skipping indexes
Julien (Dremio):
 - page skipping indexes
 - timestamp


Agenda:
 - open PRs
  TODO (all): review:
   - https://github.com/apache/parquet-format/pull/54
   - https://github.com/apache/parquet-mr/pull/414
   - https://github.com/apache/parquet-mr/pull/411
   - https://github.com/apache/parquet-mr/pull/413
   - https://github.com/apache/parquet-mr/pull/410
  TODO:
    follow up (Julien, Lars, Ryan): https://github.com/
apache/parquet-format/pull/53
    Ryan follow up https://github.com/apache/parquet-format/pull/51
    Julien more tests: https://github.com/apache/parquet-format/pull/50
    Ryan follow up: https://github.com/apache/parquet-format/pull/49
 - PR triage:
   - TODO: Lars to do a pass on parquet-format
   - TODO: Julien to do a pass on parquet-mr
 - timestamps:
   - When reading from parquet to arrow if the timestamp isAdjusted to UTC
in arrow we use UTC timezone. otherwise no timezone (timestamp without
timezone)
   - follow up on jira about timestamp with timezone: PARQUET-906
 - min/max: PARQUET-686
   - final conclusion: https://github.com/apache/parquet-format/pull/46
   - PARQUET-839 => duplicate of PARQUET-686
   - TODO close obsolete PRs:
      -  <https://github.com/apache/parquet-format/pull/42>
https://github.com/apache/parquet-format/pull/42
      - https://github.com/apache/parquet-mr/pull/362
   - We need an implementation in parquet-mr for the metadata in
https://github.com/apache/parquet-format/pull/46
      - TODO: Zoltan to open a jira
      - impala has an implementation, we should test they are compatible
 - bloom filter
   - PARQUET-319: see linked PR and doc.
      - https://github.com/apache/parquet-format/pull/28
      - https://docs.google.com/document/d/1mIZ0W24Cr79QHJWN1sQ3dIUc4lAK5
AVqozwSwtpFhW8/edit#heading=h.hmt1hrab3fpc
      - TODO: review and give feedback
 - page skipping indexes
    - plan is prototype a writer in impala then a reader.
    - We’ll review the results to finalize the metadata in 5-6 weeks.
- dealing with statistics coming from parquet-cpp
  - new min/max_value fields will be the reference


On Wed, Jun 7, 2017 at 10:54 AM, Wes McKinney <we...@gmail.com> wrote:

> Sorry, I was unable to join the sync today. I'm interested to discuss
> more my comments on
>
> https://github.com/apache/parquet-format/pull/51#discussion_r119911623
>
> I'll wait for the notes from the call and maybe we can continue the
> discussion on GitHub
>
> On Wed, Jun 7, 2017 at 12:53 PM, Julien Le Dem <ju...@dremio.com> wrote:
> > 10am PT on google hangout:
> > https://hangouts.google.com/hangouts/_/dremio.com/parquet-sync-up
> >
> > Reminder that this is open to all.
> > Here is how it goes:
> > - we do a "round table" of people present where they quickly introduce
> > themselves and state the topics they wish discussed (if any. Being a "fly
> > on the wall" is totally fine too)
> > - based on that first round we summarize the agenda and go over the
> topics
> > one by one. (can be just bringing attention of people to a PR that needs
> a
> > review or asking if it makes sense to implement some new feature)
> >  - In the end we send notes back to the list and follow ups happen on
> JIRA,
> > github PRs and the dev list.
> >  - if the time is inconvenient to you say so on the list and we can
> figure
> > out something.
> >
> > --
> > Julien
>



-- 
Julien

Re: Parquet sync starting in 10 min

Posted by Wes McKinney <we...@gmail.com>.
Sorry, I was unable to join the sync today. I'm interested to discuss
more my comments on

https://github.com/apache/parquet-format/pull/51#discussion_r119911623

I'll wait for the notes from the call and maybe we can continue the
discussion on GitHub

On Wed, Jun 7, 2017 at 12:53 PM, Julien Le Dem <ju...@dremio.com> wrote:
> 10am PT on google hangout:
> https://hangouts.google.com/hangouts/_/dremio.com/parquet-sync-up
>
> Reminder that this is open to all.
> Here is how it goes:
> - we do a "round table" of people present where they quickly introduce
> themselves and state the topics they wish discussed (if any. Being a "fly
> on the wall" is totally fine too)
> - based on that first round we summarize the agenda and go over the topics
> one by one. (can be just bringing attention of people to a PR that needs a
> review or asking if it makes sense to implement some new feature)
>  - In the end we send notes back to the list and follow ups happen on JIRA,
> github PRs and the dev list.
>  - if the time is inconvenient to you say so on the list and we can figure
> out something.
>
> --
> Julien