You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@incubator.apache.org by Chris Aniszczyk <ca...@gmail.com> on 2014/05/23 01:10:55 UTC
[RESULT][VOTE] Accept Parquet into the Incubator

Forgot to send this email earlier (thanks Henry).

With 19 +1 votes (and 10 as binding votes), I'll consider this vote a
success.

+1 votes (binding)
Todd Lipcon
Henry Saptura
Lewis John McGibbney
Chris Mattmann
Jake Farrel
Arvind Prabhakar
Mark Struberg
Andrei Savu
Andrew Purtell
Roman Shaposhnik

+1 votes (non-binding)
Jarek Jarcec Cecho
Timothy Chen
Olivier Lamy
Hitesh Shah
Bertrand Delacretaz
Tom White
Brock Noland
Julien Le Dem
Hyunsik Choi

Onwards to: http://incubator.apache.org/projects/parquet.html

On Thu, May 22, 2014 at 3:49 PM, Henry Saputra <he...@gmail.com>wrote:

> Hi Chris, could you re-send the tally up VOTE result with subject
> prefixed with [RESULT] ?
>
>
> - Henry
>
> On Wed, May 21, 2014 at 3:56 PM, Chris Aniszczyk <ca...@gmail.com>
> wrote:
> > With 18 +1 votes (and 10+ as binding votes), I'll consider this vote a
> > success.
> >
> > I'll proceed with the next steps.
> >
> > Thank you!
> >
> >
> >
> > On Sun, May 18, 2014 at 3:57 PM, Todd Lipcon <to...@cloudera.com> wrote:
> >
> >> +1 from me (the proposed Champion)
> >>
> >> -Todd
> >>
> >>
> >> On Sun, May 18, 2014 at 2:15 PM, Chris Aniszczyk <caniszczyk@gmail.com
> >> >wrote:
> >>
> >> > Based on the results of the discussion thread:
> >> >
> >> >
> >>
> http://mail-archives.apache.org/mod_mbox/incubator-general/201405.mbox/%3CCAJg1wMRGhLu4P7LeVQB%2B5K0C-fr-pw2448uj%3D6-3zHag4F1EbA%40mail.gmail.com%3E
> >> >
> >> > I would like to call a vote on accepting Parquet into the incubator.
> >> > https://wiki.apache.org/incubator/ParquetProposal
> >> >
> >> > [ ] +1 Accept Parquet into the Incubator
> >> > [ ] +0 Indifferent to the acceptance of Parquet
> >> > [ ] -1 Do not accept Parquet because ...
> >> >
> >> > The vote will be open until Thursday May 22nd 18:00 UTC.
> >> >
> >> > = Parquet Proposal =
> >> >
> >> > == Abstract ==
> >> > Parquet is a columnar storage format for Hadoop.
> >> >
> >> > == Proposal ==
> >> >
> >> > We created Parquet to make the advantages of compressed, efficient
> >> columnar
> >> > data representation available to any project in the Hadoop ecosystem,
> >> > regardless of the choice of data processing framework, data model, or
> >> > programming language.
> >> >
> >> > == Background ==
> >> >
> >> > Parquet is built from the ground up with complex nested data
> structures
> >> in
> >> > mind, and uses the repetition/definition level approach to encoding
> such
> >> > data structures, as popularized by Google Dremel (
> >> > https://blog.twitter.com/2013/dremel-made-simple-with-parquet). We
> >> believe
> >> > this approach is superior to simple flattening of nested name spaces.
> >> >
> >> > Parquet is built to support very efficient compression and encoding
> >> > schemes. Parquet allows compression schemes to be specified on a
> >> per-column
> >> > level, and is future-proofed to allow adding more encodings as they
> are
> >> > invented and implemented. We separate the concepts of encoding and
> >> > compression, allowing parquet consumers to implement operators that
> work
> >> > directly on encoded data without paying decompression and decoding
> >> penalty
> >> > when possible.
> >> >
> >> > == Rationale ==
> >> >
> >> > Parquet is built to be used by anyone. We believe that an efficient,
> >> > well-implemented columnar storage substrate should be useful to all
> >> > frameworks without the cost of extensive and difficult to set up
> >> > dependencies.
> >> >
> >> > Furthermore, the rapid growth of Parquet community is empowered by
> open
> >> > source. We believe the Apache foundation is a great fit as the
> long-term
> >> > home for Parquet, as it provides an established process for
> >> > community-driven development and decision making by consensus. This is
> >> > exactly the model we want for future Parquet development.
> >> >
> >> > == Initial Goals ==
> >> >
> >> >  * Move the existing codebase to Apache
> >> >  * Integrate with the Apache development process
> >> >  * Ensure all dependencies are compliant with Apache License version
> 2.0
> >> >  * Incremental development and releases per Apache guidelines
> >> >
> >> > == Current Status ==
> >> >
> >> > Parquet has undergone 2 major releases:
> >> > https://github.com/Parquet/parquet-format/releases of the core format
> >> and
> >> > 22 releases: https://github.com/Parquet/parquet-mr/releases of the
> >> > supporting set of Java libraries.
> >> >
> >> > The Parquet source is currently hosted at GitHub, which will seed the
> >> > Apache git repository.
> >> >
> >> > === Meritocracy ===
> >> >
> >> > We plan to invest in supporting a meritocracy. We will discuss the
> >> > requirements in an open forum. Several companies have already
> expressed
> >> > interest in this project, and we intend to invite additional
> developers
> >> to
> >> > participate. We will encourage and monitor community participation so
> >> that
> >> > privileges can be extended to those that contribute.
> >> >
> >> > === Community ===
> >> >
> >> > There is a large need for an advanced columnar storage format for
> Hadoop.
> >> > Parquet is being used in production by many organizations (see
> >> > https://github.com/Parquet/parquet-mr/blob/master/PoweredBy.md)
> >> >
> >> >  * Cloudera: https://twitter.com/HenryR/statuses/324222874011451392
> >> >  * Criteo: https://twitter.com/julsimon/statuses/312114074911666177
> >> >  * Salesforce:
> >> https://twitter.com/TwitterOSS/statuses/392734610116726784
> >> >  * Stripe: https://twitter.com/avibryant/statuses/391339949250715648
> >> >  * Twitter: https://twitter.com/J_/statuses/315844725611581441
> >> >
> >> > By bringing Parquet into Apache, we believe that the community will
> grow
> >> > even bigger.
> >> >
> >> > === Core Developers ===
> >> >
> >> > Parquet was initially developed as a collaboration between Twitter,
> >> > Cloudera and Criteo.
> >> >
> >> > See
> >> >
> >> >
> >>
> https://blog.twitter.com/2013/announcing-parquet-10-columnar-storage-for-hadoop
> >> >
> >> > === Alignment ===
> >> >
> >> > We believe that having Parquet at Apache will help further the growth
> of
> >> > the big-data community, as it will encourage cooperation within the
> >> greater
> >> > ecosystem of projects spawned by Apache Hadoop. The alignment is also
> >> > beneficial to other Apache communities (such as Hadoop, Hive, Avro).
> >> >
> >> > == Known Risks ==
> >> >
> >> > === Orphaned Products ===
> >> >
> >> > The risk of the Parquet project being abandoned is minimal. There are
> >> many
> >> > organizations using Parquet in production, including Twitter,
> Cloudera,
> >> > Stripe, and Salesforce (
> >> > http://blog.cloudera.com/blog/2013/10/parquet-at-salesforce-com/).
> >> >
> >> > === Inexperience with Open Source ===
> >> >
> >> > Parquet has existed as a healthy open source for one year. During that
> >> > time, we have curated an open-source community successfully,
> attracting
> >> > over 40 contributors (see
> >> > https://github.com/Parquet/parquet-mr/graphs/contributors) from a
> >> diverse
> >> > group of companies.
> >> > Several of the core contributors to the project are deeply familiar
> with
> >> > OSS and Apache specifically: Julien Le Dem was until recently the PMC
> >> Chair
> >> > for Apache Pig, and Dmitriy Ryaboy, Aniket Mokashi, and Jonathan
> Coveney
> >> > are also Apache Pig committers with contributions to several other
> Apache
> >> > projects. Todd Lipcon and Tom White are committers to Apache Hadoop
> and
> >> > multiple other related projects. Brock Noland is a Hive committer.
> >> >
> >> > === Homogenous Developers ===
> >> >
> >> > The initial committers come from a number of companies and countries.
> >> > Parquet has an active community of developers, and we are committed to
> >> > recruiting additional committers based on their contributions to the
> >> > project. The java library component alone has contributions from 31
> >> > individual github accounts, 14 of which contributed over 1000 lines of
> >> > code.
> >> >
> >> > === Reliance on Salaried Developers ===
> >> >
> >> > It is expected that Parquet development will occur on both salaried
> time
> >> > and on volunteer time, after hours. The majority of initial committers
> >> are
> >> > paid by their employers to contribute to this project. However, they
> are
> >> > all passionate about the project, and we are confident that the
> project
> >> > will continue even if no salaried developers contribute to the
> project.
> >> As
> >> > evidence of this statement, we present the GitHub punchcard (see
> >> > https://github.com/Parquet/parquet-mr/graphs/punch-card) showing
> that a
> >> > lot
> >> > of activity happens on weekends. We are committed to recruiting
> >> additional
> >> > committers including non-salaried developers.
> >> >
> >> > === Relationships with Other Apache Products ===
> >> >
> >> > As mentioned in the Alignment section, Parquet is closely related to
> >> > Hadoop. It provides an API that allowed it to be easily integrated
> with
> >> > many other apache projects: Pig, Hive, Avro, Thrift, Spark, Drill,
> >> Crunch,
> >> > Tajo. Some of the features it provides are similar to the ORC file
> format
> >> > which is part of the Hive project. However Parquet focused on being
> >> > framework agnostic and language independent and has been really
> >> successful
> >> > to that end. On top of the Apache projects mentioned above, Parquet is
> >> also
> >> > integrated with other open source projects, including Protocol
> Buffers,
> >> > Cloudera Impala or Scrooge. We look forward to continue collaborating
> >> with
> >> > those communities, as well as other Apache communities.
> >> >
> >> > === An Excessive Fascination with the Apache Brand ===
> >> >
> >> > Parquet is an already healthy and well known open source project. This
> >> > proposal is not for the purpose of generating publicity. Rather, the
> >> > primary benefits to joining Apache are those outlined in the Rationale
> >> > section.
> >> >
> >> > == Documentation ==
> >> >
> >> > Documentation is currently located as README markdown files:
> >> >
> >> >  * https://github.com/Parquet/parquet-format
> >> >  * https://github.com/Parquet/parquet-mr
> >> >
> >> > == Source and Intellectual Property Submission Plan ==
> >> >
> >> > The Parquet codebase is currently hosted on Github:
> >> > https://github.com/Parquet.
> >> >
> >> > These are the codebases that we would migrate to the Apache
> foundation.
> >> >
> >> > == External Dependencies ==
> >> >
> >> >
> >> >  * Junit: EPL
> >> >  * Apache Commons: ALv2
> >> >  * Apache Thrift: ALv2
> >> >  * Apache Maven: ALv2
> >> >  * Apache Avro: ALv2
> >> >  * Apache Hadoop: ALv2
> >> >  * Google Guava: ALv2
> >> >  * Google Protobuf: New BSD License
> >> >
> >> > == Cryptography ==
> >> >
> >> > We do not expect Parquet to be a controlled export item due to the
> use of
> >> > encryption.
> >> >
> >> > == Required Resources ==
> >> >
> >> > === Mailing lists ===
> >> >
> >> >  * private@parquet.incubator.apache.org
> >> >  * commits@parquet.incubator.apache.org
> >> >  * dev@parquet.incubator.apache.org
> >> >
> >> > == Subversion Directory ==
> >> >
> >> > Git is the preferred source control system:
> >> >
> >> >  * git://git.apache.org/parquet-format
> >> >  * git://git.apache.org/parquet-mr
> >> >
> >> > == Issue Tracking ==
> >> >
> >> > We'd like to keep using the Git review and issue tracking tools.
> >> > Controlling Pull requests closing through git commit messages in
> >> > git.apache.org
> >> >
> >> > == Initial Committers ==
> >> >
> >> >  * Aniket Mokashi <an...@gmail.com>
> >> >  * Brock Noland <br...@apache.org>
> >> >  * Chris Aniszczyk <ca...@gmail.com>
> >> >  * Dmitriy Ryaboy <dv...@apache.org>
> >> >  * Jake Farrell <jf...@apache.org>
> >> >  * Jonathan Coveney <jc...@gmail.com>
> >> >  * Julien Le Dem <ju...@apache.org>
> >> >  * Lukas Nalezenec <lu...@gmail.com>
> >> >  * Marcel Kornacker <ma...@cloudera.com>
> >> >  * Mickael Lacour
> >> >  * Nong Li <no...@cloudera.com>
> >> >  * Remy Pecqueur
> >> >  * Ryan Blue <bl...@cloudera.com>
> >> >  * Tianshuo Deng <de...@gmail.com>
> >> >  * Tom White <to...@apache.org>
> >> >  * Wesley Peck
> >> >
> >> > == Affiliations ==
> >> >
> >> >  * Aniket Mokashi - Twitter
> >> >  * Brock Noland - Cloudera
> >> >  * Chris Aniszczyk - Twitter
> >> >  * Dmitriy Ryaboy - Twitter
> >> >  * Jake Farrell
> >> >  * Jonathan Coveney - Twitter
> >> >  * Julien Le Dem - Twitter
> >> >  * Lukas Nalezenec
> >> >  * Marcel Kornacker - Cloudera
> >> >  * Mickael Lacour - Criteo
> >> >  * Nong Li - Cloudera
> >> >  * Remy Pecqueur - Criteo
> >> >  * Ryan Blue - Cloudera
> >> >  * Tianshuo Deng - Twitter
> >> >  * Tom White - Cloudera
> >> >  * Wesley Peck - ARRIS, Inc.
> >> >
> >> > == Sponsors ==
> >> >
> >> > === Champion ===
> >> >
> >> >  * Todd Lipcon
> >> >
> >> > === Nominated Mentors ===
> >> >
> >> >  * Tom White
> >> >  * Chris Mattmann
> >> >  * Jake Farrell
> >> >  * Roman Shaposhnik
> >> >
> >> > === Sponsoring Entity ===
> >> >
> >> > The Apache Incubator
> >> >
> >> > --
> >> > Cheers,
> >> >
> >> > Chris Aniszczyk
> >> > http://aniszczyk.org
> >> > +1 512 961 6719
> >> >
> >>
> >>
> >>
> >> --
> >> Todd Lipcon
> >> Software Engineer, Cloudera
> >>
> >
> >
> >
> > --
> > Cheers,
> >
> > Chris Aniszczyk
> > http://aniszczyk.org
> > +1 512 961 6719
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>
>


-- 
Cheers,

Chris Aniszczyk
http://aniszczyk.org
+1 512 961 6719