You are viewing a plain text version of this content. The canonical link for it is here.

Posted to general@incubator.apache.org by Todd Lipcon <to...@apache.org> on 2015/11/24 20:32:51 UTC

[VOTE] Accept Kudu into the Apache Incubator

Hi all,

Discussion on the [DISCUSS] thread seems to have wound down, so I'd like to
call a VOTE on acceptance of Kudu into the ASF Incubator. The proposal is
pasted below and also available on the wiki at:
https://wiki.apache.org/incubator/KuduProposal

The proposal is unchanged since the original version, except for the
addition of Carl Steinbach as a Mentor.

Please cast your votes:

[] +1, accept Kudu into the Incubator
[] +/-0, positive/negative non-counted expression of feelings
[] -1, do not accept Kudu into the incubator (please state reasoning)

Given the US holiday this week, I imagine many folks are traveling or
otherwise offline. So, let's run the vote for a full week rather than the
traditional 72 hours. Unless the IPMC objects to the extended voting
period, the vote will close on Tues, Dec 1st at noon PST.

Thanks
-Todd
-----

= Kudu Proposal =

== Abstract ==

Kudu is a distributed columnar storage engine built for the Apache Hadoop
ecosystem.

== Proposal ==

Kudu is an open source storage engine for structured data which supports
low-latency random access together with efficient analytical access
patterns. Kudu distributes data using horizontal partitioning and
replicates each partition using Raft consensus, providing low
mean-time-to-recovery and low tail latencies. Kudu is designed within the
context of the Apache Hadoop ecosystem and supports many integrations with
other data analytics projects both inside and outside of the Apache
Software Foundation.



We propose to incubate Kudu as a project of the Apache Software Foundation.

== Background ==

In recent years, explosive growth in the amount of data being generated and
captured by enterprises has resulted in the rapid adoption of open source
technology which is able to store massive data sets at scale and at low
cost. In particular, the Apache Hadoop ecosystem has become a focal point
for such “big data” workloads, because many traditional open source
database systems have lagged in offering a scalable alternative.



Structured storage in the Hadoop ecosystem has typically been achieved in
two ways: for static data sets, data is typically stored on Apache HDFS
using binary data formats such as Apache Avro or Apache Parquet. However,
neither HDFS nor these formats has any provision for updating individual
records, or for efficient random access. Mutable data sets are typically
stored in semi-structured stores such as Apache HBase or Apache Cassandra.
These systems allow for low-latency record-level reads and writes, but lag
far behind the static file formats in terms of sequential read throughput
for applications such as SQL-based analytics or machine learning.



Kudu is a new storage system designed and implemented from the ground up to
fill this gap between high-throughput sequential-access storage systems
such as HDFS and low-latency random-access systems such as HBase or
Cassandra. While these existing systems continue to hold advantages in some
situations, Kudu offers a “happy medium” alternative that can dramatically
simplify the architecture of many common workloads. In particular, Kudu
offers a simple API for row-level inserts, updates, and deletes, while
providing table scans at throughputs similar to Parquet, a commonly-used
columnar format for static data.



More information on Kudu can be found at the existing open source project
website: http://getkudu.io and in particular in the Kudu white-paper PDF:
http://getkudu.io/kudu.pdf from which the above was excerpted.

== Rationale ==

As described above, Kudu fills an important gap in the open source storage
ecosystem. After our initial open source project release in September 2015,
we have seen a great amount of interest across a diverse set of users and
companies. We believe that, as a storage system, it is critical to build an
equally diverse set of contributors in the development community. Our
experiences as committers and PMC members on other Apache projects have
taught us the value of diverse communities in ensuring both longevity and
high quality for such foundational systems.

== Initial Goals ==

 * Move the existing codebase, website, documentation, and mailing lists to
Apache-hosted infrastructure
 * Work with the infrastructure team to implement and approve our code
review, build, and testing workflows in the context of the ASF
 * Incremental development and releases per Apache guidelines

== Current Status ==

==== Releases ====

Kudu has undergone one public release, tagged here
https://github.com/cloudera/kudu/tree/kudu0.5.0-release

This initial release was not performed in the typical ASF fashion -- no
source tarball was released, but rather only convenience binaries made
available in Cloudera’s repositories. We will adopt the ASF source release
process upon joining the incubator.


==== Source ====

Kudu’s source is currently hosted on GitHub at
https://github.com/cloudera/kudu

This repository will be transitioned to Apache’s git hosting during
incubation.



==== Code review ====

Kudu’s code reviews are currently public and hosted on Gerrit at
http://gerrit.cloudera.org:8080/#/q/status:open+project:kudu

The Kudu developer community is very happy with gerrit and hopes to work
with the Apache Infrastructure team to figure out how we can continue to
use Gerrit within ASF policies.



==== Issue tracking ====

Kudu’s bug and feature tracking is hosted on JIRA at:
https://issues.cloudera.org/projects/KUDU/summary

This JIRA instance contains bugs and development discussion dating back 2
years prior to Kudu’s open source release and will provide an initial seed
for the ASF JIRA.



==== Community discussion ====

Kudu has several public discussion forums, linked here:
http://getkudu.io/community.html



==== Build Infrastructure ====

The Kudu Gerrit instance is configured to only allow patches to be
committed after running them through an extensive set of pre-commit tests
and code lints. The project currently makes use of elastic public cloud
resources to perform these tests. Until this point, these resources have
been internal to Cloudera, though we are currently investing in moving to a
publicly accessible infrastructure.



==== Development practices ====

Given that Kudu is a persistent storage engine, the community has a high
quality bar for contributions to its core. We have a firm belief that high
quality is achieved through automation, not manual inspection, and hence
put a focus on thorough testing and build infrastructure to ensure that
bar. The development community also practices review-then-commit for all
changes to ensure that changes are accompanied by appropriate tests, are
well commented, etc.

Rather than seeing these practices as barriers to contribution, we believe
that a fully automated and standardized review and testing practice makes
it easier for new contributors to have patches accepted. Any new developer
may post a patch to Gerrit using the same workflow as a seasoned
contributor, and the same suite of tests will be automatically run. If the
tests pass, a committer can quickly review and commit the contribution from
their web browser.

=== Meritocracy ===

We believe strongly in meritocracy in electing committers and PMC members.
We believe that contributions can come in forms other than just code: for
example, one of our initial proposed committers has contributed solely in
the area of project documentation. We will encourage contributions and
participation of all types, and ensure that contributors are appropriately
recognized.

=== Community ===

Though Kudu is relatively new as an open source project, it has already
seen promising growth in its community across several organizations:

 * '''Cloudera''' is the original development sponsor for Kudu.
 * '''Xiaomi''' has been helping to develop and optimize Kudu for a new
production use case, contributing code, benchmarks, feedback, and
conference talks.
 * '''Intel''' has contributed optimizations related to their hardware
technologies.
 * '''Dropbox''' has been experimenting with Kudu for a machine monitoring
use case, and has been contributing bug reports and product feedback.
 * '''Dremio''' is working on integration with Apache Drill and exploring
using Kudu in a production use case.
 * Several community-built Docker images, tutorials, and blog posts have
sprouted up since Kudu’s release.



By bringing Kudu to Apache, we hope to encourage further contribution from
the above organizations as well as to engage new users and contributors in
the community.

=== Core Developers ===

Kudu was initially developed as a project at Cloudera. Most of the
contributions to date have been by developers employed by Cloudera.



Many of the developers are committers or PMC members on other Apache
projects.

=== Alignment ===

As a project in the big data ecosystem, Kudu is aligned with several other
ASF projects. Kudu includes input/output format integration with Apache
Hadoop, and this integration can also provide a bridge to Apache Spark. We
are planning to integrate with Apache Hive in the near future. We also
integrate closely with Cloudera Impala, which is also currently being
proposed for incubation. We have also scheduled a hackathon with the Apache
Drill team to work on integration with that query engine.

== Known Risks ==

=== Orphaned Products ===

The risk of Kudu being abandoned is low. Cloudera has invested a great deal
in the initial development of the project, and intends to grow its
investment over time as Kudu becomes a product adopted by its customer
base. Several other organizations are also experimenting with Kudu for
production use cases which would live for many years.

=== Inexperience with Open Source ===

Kudu has been released in the open for less than two months. However, from
our very first public announcement we have been committed to open-source
style development:

 * our code reviews are fully public and documented on a mailing list
 * our daily development chatter is in a public chat room
 * we send out weekly “community status” reports highlighting news and
contributions
 * we published our entire JIRA history and discuss bugs in the open
 * we published our entire Git commit history, going back three years (no
squashing)



Several of the initial committers are experienced open source developers,
several being committers and/or PMC members on other ASF projects (Hadoop,
HBase, Thrift, Flume, et al). Those who are not ASF committers have
experience on non-ASF open source projects (Kiji, open-vm-tools, et al).

=== Homogenous Developers ===

The initial committers are employees or former employees of Cloudera.
However, the committers are spread across multiple offices (Palo Alto, San
Francisco, Melbourne), so the team is familiar with working in a
distributed environment across varied time zones.



The project has received some contributions from developers outside of
Cloudera, and is starting to attract a ''user'' community as well. We hope
to continue to encourage contributions from these developers and community
members and grow them into committers after they have had time to continue
their contributions.

=== Reliance on Salaried Developers ===

As mentioned above, the majority of development up to this point has been
sponsored by Cloudera. We have seen several community users participate in
discussions who are hobbyists interested in distributed systems and
databases, and hope that they will continue their participation in the
project going forward.

=== Relationships with Other Apache Products ===

Kudu is currently related to the following other Apache projects:

 * Hadoop: Kudu provides MapReduce input/output formats for integration
 * Spark: Kudu integrates with Spark via the above-mentioned input formats,
and work is progressing on support for Spark Data Frames and Spark SQL.



The Kudu team has reached out to several other Apache projects to start
discussing integrations, including Flume, Kafka, Hive, and Drill.



Kudu integrates with Impala, which is also being proposed for incubation.



Kudu is already collaborating on ValueVector, a proposed TLP spinning out
from the Apache Drill community.



We look forward to continuing to integrate and collaborate with these
communities.

=== An Excessive Fascination with the Apache Brand ===

Many of the initial committers are already experienced Apache committers,
and understand the true value provided by the Apache Way and the principles
of the ASF. We believe that this development and contribution model is
especially appropriate for storage products, where Apache’s
community-over-code philosophy ensures long term viability and
consensus-based participation.

== Documentation ==

 * Documentation is written in AsciiDoc and committed in the Kudu source
repository:

 * https://github.com/cloudera/kudu/tree/master/docs



 * The Kudu web site is version-controlled on the ‘gh-pages’ branch of the
above repository.

 * A LaTeX whitepaper is also published, and the source is available within
the same repository.
 * APIs are documented within the source code as JavaDoc or C++-style
documentation comments.
 * Many design documents are stored within the source code repository as
text files next to the code being documented.

== Source and Intellectual Property Submission Plan ==

The Kudu codebase and web site is currently hosted on GitHub and will be
transitioned to the ASF repositories during incubation. Kudu is already
licensed under the Apache 2.0 license.



Some portions of the code are imported from other open source projects
under the Apache 2.0, BSD, or MIT licenses, with copyrights held by authors
other than the initial committers. These copyright notices are maintained
in those files as well as a top-level NOTICE.txt file. We believe this to
be permissible under the license terms and ASF policies, and confirmed via
a recent thread on general@incubator.apache.org .



The “Kudu” name is not a registered trademark, though before the initial
release of the project, we performed a trademark search and Cloudera’s
legal counsel deemed it acceptable in the context of a data storage engine.
There exists an unrelated open source project by the same name related to
deployments on Microsoft’s Azure cloud service. We have been in contact
with legal counsel from Microsoft and have obtained their approval for the
use of the Kudu name.



Cloudera currently owns several domain names related to Kudu (getkudu.io,
kududb.io, et al) which will be transferred to the ASF and redirected to
the official page during incubation.



Portions of Kudu are protected by pending or published patents owned by
Cloudera. Given the protections already granted by the Apache License, we
do not anticipate any explicit licensing or transfer of this intellectual
property.

== External Dependencies ==

The full set of dependencies and licenses are listed in
https://github.com/cloudera/kudu/blob/master/LICENSE.txt

and summarized here:

 * '''Twitter Bootstrap''': Apache 2.0
 * '''d3''': BSD 3-clause
 * '''epoch JS library''': MIT
 * '''lz4''': BSD 2-clause
 * '''gflags''': BSD 3-clause
 * '''glog''': BSD 3-clause
 * '''gperftools''': BSD 3-clause
 * '''libev''': BSD 2-clause
 * '''squeasel''':MIT license
 * '''protobuf''': BSD 3-clause
 * '''rapidjson''': MIT
 * '''snappy''': BSD 3-clause
 * '''trace-viewer''': BSD 3-clause
 * '''zlib''': zlib license
 * '''llvm''': University of Illinois/NCSA Open Source (BSD-alike)
 * '''bitshuffle''': MIT
 * '''boost''': Boost license
 * '''curl''': MIT
 * '''libunwind''': MIT
 * '''nvml''': BSD 3-clause
 * '''cyrus-sasl''': Cyrus SASL license (BSD-alike)
 * '''openssl''': OpenSSL License (BSD-alike)

 * '''Guava''': Apache 2.0
 * '''StumbleUpon Async''': BSD
 * '''Apache Hadoop''': Apache 2.0
 * '''Apache log4j''': Apache 2.0
 * '''Netty''': Apache 2.0
 * '''slf4j''': MIT
 * '''Apache Commons''': Apache 2.0
 * '''murmur''': Apache 2.0


'''Build/test-only dependencies''':

 * '''CMake''': BSD 3-clause
 * '''gcovr''': BSD 3-clause
 * '''gmock''': BSD 3-clause
 * '''Apache Maven''': Apache 2.0
 * '''JUnit''': EPL
 * '''Mockito''': MIT

== Cryptography ==

Kudu does not currently include any cryptography-related code.

== Required Resources ==

=== Mailing lists ===

 * private@kudu.incubator.apache.org (PMC)
 * commits@kudu.incubator.apache.org (git push emails)
 * issues@kudu.incubator.apache.org (JIRA issue feed)
 * dev@kudu.incubator.apache.org (Gerrit code reviews plus dev discussion)
 * user@kudu.incubator.apache.org (User questions)


=== Repository ===

 * git://git.apache.org/kudu

=== Gerrit ===

We hope to continue using Gerrit for our code review and commit workflow.
The Kudu team has already been in contact with Jake Farrell to start
discussions on how Gerrit can fit into the ASF. We know that several other
ASF projects and podlings are also interested in Gerrit.



If the Infrastructure team does not have the bandwidth to support Gerrit,
we will continue to support our own instance of Gerrit for Kudu, and make
the necessary integrations such that commits are properly authenticated and
maintain sufficient provenance to uphold the ASF standards (e.g. via the
solution adopted by the AsterixDB podling).

== Issue Tracking ==

We would like to import our current JIRA project into the ASF JIRA, such
that our historical commit messages and code comments continue to reference
the appropriate bug numbers.

== Initial Committers ==

 * Adar Dembo adar@cloudera.com
 * Alex Feinberg alex@strlen.net
 * Andrew Wang wang@apache.org
 * Dan Burkert dan@cloudera.com
 * David Alves dralves@apache.org
 * Jean-Daniel Cryans jdcryans@apache.org
 * Mike Percy mpercy@apache.org
 * Misty Stanley-Jones misty@apache.org
 * Todd Lipcon todd@apache.org

The initial list of committers was seeded by listing those contributors who
have contributed 20 or more patches in the last 12 months, indicating that
they are active and have achieved merit through participation on the
project. We chose not to include other contributors who either have not yet
contributed a significant number of patches, or whose contributions are far
in the past and we don’t expect to be active within the ASF.

== Affiliations ==

 * Adar Dembo - Cloudera
 * Alex Feinberg - Forward Networks
 * Andrew Wang - Cloudera
 * Dan Burkert - Cloudera
 * David Alves - Cloudera
 * Jean-Daniel Cryans - Cloudera
 * Mike Percy - Cloudera
 * Misty Stanley-Jones - Cloudera
 * Todd Lipcon - Cloudera

== Sponsors ==

=== Champion ===

 * Todd Lipcon

=== Nominated Mentors ===

 * Jake Farrell - ASF Member and Infra team member, Acquia
 * Brock Noland - ASF Member, StreamSets
 * Michael Stack - ASF Member, Cloudera
 * Jarek Jarcec Cecho - ASF Member, Cloudera
 * Chris Mattmann - ASF Member, NASA JPL and USC
 * Julien Le Dem - Incubator PMC, Dremio
 * Carl Steinbach - ASF Member, LinkedIn

=== Sponsoring Entity ===

The Apache Incubator

Re: [VOTE] Accept Kudu into the Apache Incubator

Posted by Doug Cutting <cu...@apache.org>.

+1 (binding)

Doug

On Wed, Nov 25, 2015 at 8:45 AM, Chris Douglas <cd...@apache.org> wrote:

> +1 (binding) -C
>
> On Tue, Nov 24, 2015 at 11:32 AM, Todd Lipcon <to...@apache.org> wrote:
> > Hi all,
> >
> > Discussion on the [DISCUSS] thread seems to have wound down, so I'd like
> to
> > call a VOTE on acceptance of Kudu into the ASF Incubator. The proposal is
> > pasted below and also available on the wiki at:
> > https://wiki.apache.org/incubator/KuduProposal
> >
> > The proposal is unchanged since the original version, except for the
> > addition of Carl Steinbach as a Mentor.
> >
> > Please cast your votes:
> >
> > [] +1, accept Kudu into the Incubator
> > [] +/-0, positive/negative non-counted expression of feelings
> > [] -1, do not accept Kudu into the incubator (please state reasoning)
> >
> > Given the US holiday this week, I imagine many folks are traveling or
> > otherwise offline. So, let's run the vote for a full week rather than the
> > traditional 72 hours. Unless the IPMC objects to the extended voting
> > period, the vote will close on Tues, Dec 1st at noon PST.
> >
> > Thanks
> > -Todd
> > -----
> >
> > = Kudu Proposal =
> >
> > == Abstract ==
> >
> > Kudu is a distributed columnar storage engine built for the Apache Hadoop
> > ecosystem.
> >
> > == Proposal ==
> >
> > Kudu is an open source storage engine for structured data which supports
> > low-latency random access together with efficient analytical access
> > patterns. Kudu distributes data using horizontal partitioning and
> > replicates each partition using Raft consensus, providing low
> > mean-time-to-recovery and low tail latencies. Kudu is designed within the
> > context of the Apache Hadoop ecosystem and supports many integrations
> with
> > other data analytics projects both inside and outside of the Apache
> > Software Foundation.
> >
> >
> >
> > We propose to incubate Kudu as a project of the Apache Software
> Foundation.
> >
> > == Background ==
> >
> > In recent years, explosive growth in the amount of data being generated
> and
> > captured by enterprises has resulted in the rapid adoption of open source
> > technology which is able to store massive data sets at scale and at low
> > cost. In particular, the Apache Hadoop ecosystem has become a focal point
> > for such “big data” workloads, because many traditional open source
> > database systems have lagged in offering a scalable alternative.
> >
> >
> >
> > Structured storage in the Hadoop ecosystem has typically been achieved in
> > two ways: for static data sets, data is typically stored on Apache HDFS
> > using binary data formats such as Apache Avro or Apache Parquet. However,
> > neither HDFS nor these formats has any provision for updating individual
> > records, or for efficient random access. Mutable data sets are typically
> > stored in semi-structured stores such as Apache HBase or Apache
> Cassandra.
> > These systems allow for low-latency record-level reads and writes, but
> lag
> > far behind the static file formats in terms of sequential read throughput
> > for applications such as SQL-based analytics or machine learning.
> >
> >
> >
> > Kudu is a new storage system designed and implemented from the ground up
> to
> > fill this gap between high-throughput sequential-access storage systems
> > such as HDFS and low-latency random-access systems such as HBase or
> > Cassandra. While these existing systems continue to hold advantages in
> some
> > situations, Kudu offers a “happy medium” alternative that can
> dramatically
> > simplify the architecture of many common workloads. In particular, Kudu
> > offers a simple API for row-level inserts, updates, and deletes, while
> > providing table scans at throughputs similar to Parquet, a commonly-used
> > columnar format for static data.
> >
> >
> >
> > More information on Kudu can be found at the existing open source project
> > website: http://getkudu.io and in particular in the Kudu white-paper
> PDF:
> > http://getkudu.io/kudu.pdf from which the above was excerpted.
> >
> > == Rationale ==
> >
> > As described above, Kudu fills an important gap in the open source
> storage
> > ecosystem. After our initial open source project release in September
> 2015,
> > we have seen a great amount of interest across a diverse set of users and
> > companies. We believe that, as a storage system, it is critical to build
> an
> > equally diverse set of contributors in the development community. Our
> > experiences as committers and PMC members on other Apache projects have
> > taught us the value of diverse communities in ensuring both longevity and
> > high quality for such foundational systems.
> >
> > == Initial Goals ==
> >
> >  * Move the existing codebase, website, documentation, and mailing lists
> to
> > Apache-hosted infrastructure
> >  * Work with the infrastructure team to implement and approve our code
> > review, build, and testing workflows in the context of the ASF
> >  * Incremental development and releases per Apache guidelines
> >
> > == Current Status ==
> >
> > ==== Releases ====
> >
> > Kudu has undergone one public release, tagged here
> > https://github.com/cloudera/kudu/tree/kudu0.5.0-release
> >
> > This initial release was not performed in the typical ASF fashion -- no
> > source tarball was released, but rather only convenience binaries made
> > available in Cloudera’s repositories. We will adopt the ASF source
> release
> > process upon joining the incubator.
> >
> >
> > ==== Source ====
> >
> > Kudu’s source is currently hosted on GitHub at
> > https://github.com/cloudera/kudu
> >
> > This repository will be transitioned to Apache’s git hosting during
> > incubation.
> >
> >
> >
> > ==== Code review ====
> >
> > Kudu’s code reviews are currently public and hosted on Gerrit at
> > http://gerrit.cloudera.org:8080/#/q/status:open+project:kudu
> >
> > The Kudu developer community is very happy with gerrit and hopes to work
> > with the Apache Infrastructure team to figure out how we can continue to
> > use Gerrit within ASF policies.
> >
> >
> >
> > ==== Issue tracking ====
> >
> > Kudu’s bug and feature tracking is hosted on JIRA at:
> > https://issues.cloudera.org/projects/KUDU/summary
> >
> > This JIRA instance contains bugs and development discussion dating back 2
> > years prior to Kudu’s open source release and will provide an initial
> seed
> > for the ASF JIRA.
> >
> >
> >
> > ==== Community discussion ====
> >
> > Kudu has several public discussion forums, linked here:
> > http://getkudu.io/community.html
> >
> >
> >
> > ==== Build Infrastructure ====
> >
> > The Kudu Gerrit instance is configured to only allow patches to be
> > committed after running them through an extensive set of pre-commit tests
> > and code lints. The project currently makes use of elastic public cloud
> > resources to perform these tests. Until this point, these resources have
> > been internal to Cloudera, though we are currently investing in moving
> to a
> > publicly accessible infrastructure.
> >
> >
> >
> > ==== Development practices ====
> >
> > Given that Kudu is a persistent storage engine, the community has a high
> > quality bar for contributions to its core. We have a firm belief that
> high
> > quality is achieved through automation, not manual inspection, and hence
> > put a focus on thorough testing and build infrastructure to ensure that
> > bar. The development community also practices review-then-commit for all
> > changes to ensure that changes are accompanied by appropriate tests, are
> > well commented, etc.
> >
> > Rather than seeing these practices as barriers to contribution, we
> believe
> > that a fully automated and standardized review and testing practice makes
> > it easier for new contributors to have patches accepted. Any new
> developer
> > may post a patch to Gerrit using the same workflow as a seasoned
> > contributor, and the same suite of tests will be automatically run. If
> the
> > tests pass, a committer can quickly review and commit the contribution
> from
> > their web browser.
> >
> > === Meritocracy ===
> >
> > We believe strongly in meritocracy in electing committers and PMC
> members.
> > We believe that contributions can come in forms other than just code: for
> > example, one of our initial proposed committers has contributed solely in
> > the area of project documentation. We will encourage contributions and
> > participation of all types, and ensure that contributors are
> appropriately
> > recognized.
> >
> > === Community ===
> >
> > Though Kudu is relatively new as an open source project, it has already
> > seen promising growth in its community across several organizations:
> >
> >  * '''Cloudera''' is the original development sponsor for Kudu.
> >  * '''Xiaomi''' has been helping to develop and optimize Kudu for a new
> > production use case, contributing code, benchmarks, feedback, and
> > conference talks.
> >  * '''Intel''' has contributed optimizations related to their hardware
> > technologies.
> >  * '''Dropbox''' has been experimenting with Kudu for a machine
> monitoring
> > use case, and has been contributing bug reports and product feedback.
> >  * '''Dremio''' is working on integration with Apache Drill and exploring
> > using Kudu in a production use case.
> >  * Several community-built Docker images, tutorials, and blog posts have
> > sprouted up since Kudu’s release.
> >
> >
> >
> > By bringing Kudu to Apache, we hope to encourage further contribution
> from
> > the above organizations as well as to engage new users and contributors
> in
> > the community.
> >
> > === Core Developers ===
> >
> > Kudu was initially developed as a project at Cloudera. Most of the
> > contributions to date have been by developers employed by Cloudera.
> >
> >
> >
> > Many of the developers are committers or PMC members on other Apache
> > projects.
> >
> > === Alignment ===
> >
> > As a project in the big data ecosystem, Kudu is aligned with several
> other
> > ASF projects. Kudu includes input/output format integration with Apache
> > Hadoop, and this integration can also provide a bridge to Apache Spark.
> We
> > are planning to integrate with Apache Hive in the near future. We also
> > integrate closely with Cloudera Impala, which is also currently being
> > proposed for incubation. We have also scheduled a hackathon with the
> Apache
> > Drill team to work on integration with that query engine.
> >
> > == Known Risks ==
> >
> > === Orphaned Products ===
> >
> > The risk of Kudu being abandoned is low. Cloudera has invested a great
> deal
> > in the initial development of the project, and intends to grow its
> > investment over time as Kudu becomes a product adopted by its customer
> > base. Several other organizations are also experimenting with Kudu for
> > production use cases which would live for many years.
> >
> > === Inexperience with Open Source ===
> >
> > Kudu has been released in the open for less than two months. However,
> from
> > our very first public announcement we have been committed to open-source
> > style development:
> >
> >  * our code reviews are fully public and documented on a mailing list
> >  * our daily development chatter is in a public chat room
> >  * we send out weekly “community status” reports highlighting news and
> > contributions
> >  * we published our entire JIRA history and discuss bugs in the open
> >  * we published our entire Git commit history, going back three years (no
> > squashing)
> >
> >
> >
> > Several of the initial committers are experienced open source developers,
> > several being committers and/or PMC members on other ASF projects
> (Hadoop,
> > HBase, Thrift, Flume, et al). Those who are not ASF committers have
> > experience on non-ASF open source projects (Kiji, open-vm-tools, et al).
> >
> > === Homogenous Developers ===
> >
> > The initial committers are employees or former employees of Cloudera.
> > However, the committers are spread across multiple offices (Palo Alto,
> San
> > Francisco, Melbourne), so the team is familiar with working in a
> > distributed environment across varied time zones.
> >
> >
> >
> > The project has received some contributions from developers outside of
> > Cloudera, and is starting to attract a ''user'' community as well. We
> hope
> > to continue to encourage contributions from these developers and
> community
> > members and grow them into committers after they have had time to
> continue
> > their contributions.
> >
> > === Reliance on Salaried Developers ===
> >
> > As mentioned above, the majority of development up to this point has been
> > sponsored by Cloudera. We have seen several community users participate
> in
> > discussions who are hobbyists interested in distributed systems and
> > databases, and hope that they will continue their participation in the
> > project going forward.
> >
> > === Relationships with Other Apache Products ===
> >
> > Kudu is currently related to the following other Apache projects:
> >
> >  * Hadoop: Kudu provides MapReduce input/output formats for integration
> >  * Spark: Kudu integrates with Spark via the above-mentioned input
> formats,
> > and work is progressing on support for Spark Data Frames and Spark SQL.
> >
> >
> >
> > The Kudu team has reached out to several other Apache projects to start
> > discussing integrations, including Flume, Kafka, Hive, and Drill.
> >
> >
> >
> > Kudu integrates with Impala, which is also being proposed for incubation.
> >
> >
> >
> > Kudu is already collaborating on ValueVector, a proposed TLP spinning out
> > from the Apache Drill community.
> >
> >
> >
> > We look forward to continuing to integrate and collaborate with these
> > communities.
> >
> > === An Excessive Fascination with the Apache Brand ===
> >
> > Many of the initial committers are already experienced Apache committers,
> > and understand the true value provided by the Apache Way and the
> principles
> > of the ASF. We believe that this development and contribution model is
> > especially appropriate for storage products, where Apache’s
> > community-over-code philosophy ensures long term viability and
> > consensus-based participation.
> >
> > == Documentation ==
> >
> >  * Documentation is written in AsciiDoc and committed in the Kudu source
> > repository:
> >
> >  * https://github.com/cloudera/kudu/tree/master/docs
> >
> >
> >
> >  * The Kudu web site is version-controlled on the ‘gh-pages’ branch of
> the
> > above repository.
> >
> >  * A LaTeX whitepaper is also published, and the source is available
> within
> > the same repository.
> >  * APIs are documented within the source code as JavaDoc or C++-style
> > documentation comments.
> >  * Many design documents are stored within the source code repository as
> > text files next to the code being documented.
> >
> > == Source and Intellectual Property Submission Plan ==
> >
> > The Kudu codebase and web site is currently hosted on GitHub and will be
> > transitioned to the ASF repositories during incubation. Kudu is already
> > licensed under the Apache 2.0 license.
> >
> >
> >
> > Some portions of the code are imported from other open source projects
> > under the Apache 2.0, BSD, or MIT licenses, with copyrights held by
> authors
> > other than the initial committers. These copyright notices are maintained
> > in those files as well as a top-level NOTICE.txt file. We believe this to
> > be permissible under the license terms and ASF policies, and confirmed
> via
> > a recent thread on general@incubator.apache.org .
> >
> >
> >
> > The “Kudu” name is not a registered trademark, though before the initial
> > release of the project, we performed a trademark search and Cloudera’s
> > legal counsel deemed it acceptable in the context of a data storage
> engine.
> > There exists an unrelated open source project by the same name related to
> > deployments on Microsoft’s Azure cloud service. We have been in contact
> > with legal counsel from Microsoft and have obtained their approval for
> the
> > use of the Kudu name.
> >
> >
> >
> > Cloudera currently owns several domain names related to Kudu (getkudu.io
> ,
> > kududb.io, et al) which will be transferred to the ASF and redirected to
> > the official page during incubation.
> >
> >
> >
> > Portions of Kudu are protected by pending or published patents owned by
> > Cloudera. Given the protections already granted by the Apache License, we
> > do not anticipate any explicit licensing or transfer of this intellectual
> > property.
> >
> > == External Dependencies ==
> >
> > The full set of dependencies and licenses are listed in
> > https://github.com/cloudera/kudu/blob/master/LICENSE.txt
> >
> > and summarized here:
> >
> >  * '''Twitter Bootstrap''': Apache 2.0
> >  * '''d3''': BSD 3-clause
> >  * '''epoch JS library''': MIT
> >  * '''lz4''': BSD 2-clause
> >  * '''gflags''': BSD 3-clause
> >  * '''glog''': BSD 3-clause
> >  * '''gperftools''': BSD 3-clause
> >  * '''libev''': BSD 2-clause
> >  * '''squeasel''':MIT license
> >  * '''protobuf''': BSD 3-clause
> >  * '''rapidjson''': MIT
> >  * '''snappy''': BSD 3-clause
> >  * '''trace-viewer''': BSD 3-clause
> >  * '''zlib''': zlib license
> >  * '''llvm''': University of Illinois/NCSA Open Source (BSD-alike)
> >  * '''bitshuffle''': MIT
> >  * '''boost''': Boost license
> >  * '''curl''': MIT
> >  * '''libunwind''': MIT
> >  * '''nvml''': BSD 3-clause
> >  * '''cyrus-sasl''': Cyrus SASL license (BSD-alike)
> >  * '''openssl''': OpenSSL License (BSD-alike)
> >
> >  * '''Guava''': Apache 2.0
> >  * '''StumbleUpon Async''': BSD
> >  * '''Apache Hadoop''': Apache 2.0
> >  * '''Apache log4j''': Apache 2.0
> >  * '''Netty''': Apache 2.0
> >  * '''slf4j''': MIT
> >  * '''Apache Commons''': Apache 2.0
> >  * '''murmur''': Apache 2.0
> >
> >
> > '''Build/test-only dependencies''':
> >
> >  * '''CMake''': BSD 3-clause
> >  * '''gcovr''': BSD 3-clause
> >  * '''gmock''': BSD 3-clause
> >  * '''Apache Maven''': Apache 2.0
> >  * '''JUnit''': EPL
> >  * '''Mockito''': MIT
> >
> > == Cryptography ==
> >
> > Kudu does not currently include any cryptography-related code.
> >
> > == Required Resources ==
> >
> > === Mailing lists ===
> >
> >  * private@kudu.incubator.apache.org (PMC)
> >  * commits@kudu.incubator.apache.org (git push emails)
> >  * issues@kudu.incubator.apache.org (JIRA issue feed)
> >  * dev@kudu.incubator.apache.org (Gerrit code reviews plus dev
> discussion)
> >  * user@kudu.incubator.apache.org (User questions)
> >
> >
> > === Repository ===
> >
> >  * git://git.apache.org/kudu
> >
> > === Gerrit ===
> >
> > We hope to continue using Gerrit for our code review and commit workflow.
> > The Kudu team has already been in contact with Jake Farrell to start
> > discussions on how Gerrit can fit into the ASF. We know that several
> other
> > ASF projects and podlings are also interested in Gerrit.
> >
> >
> >
> > If the Infrastructure team does not have the bandwidth to support Gerrit,
> > we will continue to support our own instance of Gerrit for Kudu, and make
> > the necessary integrations such that commits are properly authenticated
> and
> > maintain sufficient provenance to uphold the ASF standards (e.g. via the
> > solution adopted by the AsterixDB podling).
> >
> > == Issue Tracking ==
> >
> > We would like to import our current JIRA project into the ASF JIRA, such
> > that our historical commit messages and code comments continue to
> reference
> > the appropriate bug numbers.
> >
> > == Initial Committers ==
> >
> >  * Adar Dembo adar@cloudera.com
> >  * Alex Feinberg alex@strlen.net
> >  * Andrew Wang wang@apache.org
> >  * Dan Burkert dan@cloudera.com
> >  * David Alves dralves@apache.org
> >  * Jean-Daniel Cryans jdcryans@apache.org
> >  * Mike Percy mpercy@apache.org
> >  * Misty Stanley-Jones misty@apache.org
> >  * Todd Lipcon todd@apache.org
> >
> > The initial list of committers was seeded by listing those contributors
> who
> > have contributed 20 or more patches in the last 12 months, indicating
> that
> > they are active and have achieved merit through participation on the
> > project. We chose not to include other contributors who either have not
> yet
> > contributed a significant number of patches, or whose contributions are
> far
> > in the past and we don’t expect to be active within the ASF.
> >
> > == Affiliations ==
> >
> >  * Adar Dembo - Cloudera
> >  * Alex Feinberg - Forward Networks
> >  * Andrew Wang - Cloudera
> >  * Dan Burkert - Cloudera
> >  * David Alves - Cloudera
> >  * Jean-Daniel Cryans - Cloudera
> >  * Mike Percy - Cloudera
> >  * Misty Stanley-Jones - Cloudera
> >  * Todd Lipcon - Cloudera
> >
> > == Sponsors ==
> >
> > === Champion ===
> >
> >  * Todd Lipcon
> >
> > === Nominated Mentors ===
> >
> >  * Jake Farrell - ASF Member and Infra team member, Acquia
> >  * Brock Noland - ASF Member, StreamSets
> >  * Michael Stack - ASF Member, Cloudera
> >  * Jarek Jarcec Cecho - ASF Member, Cloudera
> >  * Chris Mattmann - ASF Member, NASA JPL and USC
> >  * Julien Le Dem - Incubator PMC, Dremio
> >  * Carl Steinbach - ASF Member, LinkedIn
> >
> > === Sponsoring Entity ===
> >
> > The Apache Incubator
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>
>

Re: [VOTE] Accept Kudu into the Apache Incubator

Posted by Chris Douglas <cd...@apache.org>.

+1 (binding) -C

On Tue, Nov 24, 2015 at 11:32 AM, Todd Lipcon <to...@apache.org> wrote:
> Hi all,
>
> Discussion on the [DISCUSS] thread seems to have wound down, so I'd like to
> call a VOTE on acceptance of Kudu into the ASF Incubator. The proposal is
> pasted below and also available on the wiki at:
> https://wiki.apache.org/incubator/KuduProposal
>
> The proposal is unchanged since the original version, except for the
> addition of Carl Steinbach as a Mentor.
>
> Please cast your votes:
>
> [] +1, accept Kudu into the Incubator
> [] +/-0, positive/negative non-counted expression of feelings
> [] -1, do not accept Kudu into the incubator (please state reasoning)
>
> Given the US holiday this week, I imagine many folks are traveling or
> otherwise offline. So, let's run the vote for a full week rather than the
> traditional 72 hours. Unless the IPMC objects to the extended voting
> period, the vote will close on Tues, Dec 1st at noon PST.
>
> Thanks
> -Todd
> -----
>
> = Kudu Proposal =
>
> == Abstract ==
>
> Kudu is a distributed columnar storage engine built for the Apache Hadoop
> ecosystem.
>
> == Proposal ==
>
> Kudu is an open source storage engine for structured data which supports
> low-latency random access together with efficient analytical access
> patterns. Kudu distributes data using horizontal partitioning and
> replicates each partition using Raft consensus, providing low
> mean-time-to-recovery and low tail latencies. Kudu is designed within the
> context of the Apache Hadoop ecosystem and supports many integrations with
> other data analytics projects both inside and outside of the Apache
> Software Foundation.
>
>
>
> We propose to incubate Kudu as a project of the Apache Software Foundation.
>
> == Background ==
>
> In recent years, explosive growth in the amount of data being generated and
> captured by enterprises has resulted in the rapid adoption of open source
> technology which is able to store massive data sets at scale and at low
> cost. In particular, the Apache Hadoop ecosystem has become a focal point
> for such “big data” workloads, because many traditional open source
> database systems have lagged in offering a scalable alternative.
>
>
>
> Structured storage in the Hadoop ecosystem has typically been achieved in
> two ways: for static data sets, data is typically stored on Apache HDFS
> using binary data formats such as Apache Avro or Apache Parquet. However,
> neither HDFS nor these formats has any provision for updating individual
> records, or for efficient random access. Mutable data sets are typically
> stored in semi-structured stores such as Apache HBase or Apache Cassandra.
> These systems allow for low-latency record-level reads and writes, but lag
> far behind the static file formats in terms of sequential read throughput
> for applications such as SQL-based analytics or machine learning.
>
>
>
> Kudu is a new storage system designed and implemented from the ground up to
> fill this gap between high-throughput sequential-access storage systems
> such as HDFS and low-latency random-access systems such as HBase or
> Cassandra. While these existing systems continue to hold advantages in some
> situations, Kudu offers a “happy medium” alternative that can dramatically
> simplify the architecture of many common workloads. In particular, Kudu
> offers a simple API for row-level inserts, updates, and deletes, while
> providing table scans at throughputs similar to Parquet, a commonly-used
> columnar format for static data.
>
>
>
> More information on Kudu can be found at the existing open source project
> website: http://getkudu.io and in particular in the Kudu white-paper PDF:
> http://getkudu.io/kudu.pdf from which the above was excerpted.
>
> == Rationale ==
>
> As described above, Kudu fills an important gap in the open source storage
> ecosystem. After our initial open source project release in September 2015,
> we have seen a great amount of interest across a diverse set of users and
> companies. We believe that, as a storage system, it is critical to build an
> equally diverse set of contributors in the development community. Our
> experiences as committers and PMC members on other Apache projects have
> taught us the value of diverse communities in ensuring both longevity and
> high quality for such foundational systems.
>
> == Initial Goals ==
>
>  * Move the existing codebase, website, documentation, and mailing lists to
> Apache-hosted infrastructure
>  * Work with the infrastructure team to implement and approve our code
> review, build, and testing workflows in the context of the ASF
>  * Incremental development and releases per Apache guidelines
>
> == Current Status ==
>
> ==== Releases ====
>
> Kudu has undergone one public release, tagged here
> https://github.com/cloudera/kudu/tree/kudu0.5.0-release
>
> This initial release was not performed in the typical ASF fashion -- no
> source tarball was released, but rather only convenience binaries made
> available in Cloudera’s repositories. We will adopt the ASF source release
> process upon joining the incubator.
>
>
> ==== Source ====
>
> Kudu’s source is currently hosted on GitHub at
> https://github.com/cloudera/kudu
>
> This repository will be transitioned to Apache’s git hosting during
> incubation.
>
>
>
> ==== Code review ====
>
> Kudu’s code reviews are currently public and hosted on Gerrit at
> http://gerrit.cloudera.org:8080/#/q/status:open+project:kudu
>
> The Kudu developer community is very happy with gerrit and hopes to work
> with the Apache Infrastructure team to figure out how we can continue to
> use Gerrit within ASF policies.
>
>
>
> ==== Issue tracking ====
>
> Kudu’s bug and feature tracking is hosted on JIRA at:
> https://issues.cloudera.org/projects/KUDU/summary
>
> This JIRA instance contains bugs and development discussion dating back 2
> years prior to Kudu’s open source release and will provide an initial seed
> for the ASF JIRA.
>
>
>
> ==== Community discussion ====
>
> Kudu has several public discussion forums, linked here:
> http://getkudu.io/community.html
>
>
>
> ==== Build Infrastructure ====
>
> The Kudu Gerrit instance is configured to only allow patches to be
> committed after running them through an extensive set of pre-commit tests
> and code lints. The project currently makes use of elastic public cloud
> resources to perform these tests. Until this point, these resources have
> been internal to Cloudera, though we are currently investing in moving to a
> publicly accessible infrastructure.
>
>
>
> ==== Development practices ====
>
> Given that Kudu is a persistent storage engine, the community has a high
> quality bar for contributions to its core. We have a firm belief that high
> quality is achieved through automation, not manual inspection, and hence
> put a focus on thorough testing and build infrastructure to ensure that
> bar. The development community also practices review-then-commit for all
> changes to ensure that changes are accompanied by appropriate tests, are
> well commented, etc.
>
> Rather than seeing these practices as barriers to contribution, we believe
> that a fully automated and standardized review and testing practice makes
> it easier for new contributors to have patches accepted. Any new developer
> may post a patch to Gerrit using the same workflow as a seasoned
> contributor, and the same suite of tests will be automatically run. If the
> tests pass, a committer can quickly review and commit the contribution from
> their web browser.
>
> === Meritocracy ===
>
> We believe strongly in meritocracy in electing committers and PMC members.
> We believe that contributions can come in forms other than just code: for
> example, one of our initial proposed committers has contributed solely in
> the area of project documentation. We will encourage contributions and
> participation of all types, and ensure that contributors are appropriately
> recognized.
>
> === Community ===
>
> Though Kudu is relatively new as an open source project, it has already
> seen promising growth in its community across several organizations:
>
>  * '''Cloudera''' is the original development sponsor for Kudu.
>  * '''Xiaomi''' has been helping to develop and optimize Kudu for a new
> production use case, contributing code, benchmarks, feedback, and
> conference talks.
>  * '''Intel''' has contributed optimizations related to their hardware
> technologies.
>  * '''Dropbox''' has been experimenting with Kudu for a machine monitoring
> use case, and has been contributing bug reports and product feedback.
>  * '''Dremio''' is working on integration with Apache Drill and exploring
> using Kudu in a production use case.
>  * Several community-built Docker images, tutorials, and blog posts have
> sprouted up since Kudu’s release.
>
>
>
> By bringing Kudu to Apache, we hope to encourage further contribution from
> the above organizations as well as to engage new users and contributors in
> the community.
>
> === Core Developers ===
>
> Kudu was initially developed as a project at Cloudera. Most of the
> contributions to date have been by developers employed by Cloudera.
>
>
>
> Many of the developers are committers or PMC members on other Apache
> projects.
>
> === Alignment ===
>
> As a project in the big data ecosystem, Kudu is aligned with several other
> ASF projects. Kudu includes input/output format integration with Apache
> Hadoop, and this integration can also provide a bridge to Apache Spark. We
> are planning to integrate with Apache Hive in the near future. We also
> integrate closely with Cloudera Impala, which is also currently being
> proposed for incubation. We have also scheduled a hackathon with the Apache
> Drill team to work on integration with that query engine.
>
> == Known Risks ==
>
> === Orphaned Products ===
>
> The risk of Kudu being abandoned is low. Cloudera has invested a great deal
> in the initial development of the project, and intends to grow its
> investment over time as Kudu becomes a product adopted by its customer
> base. Several other organizations are also experimenting with Kudu for
> production use cases which would live for many years.
>
> === Inexperience with Open Source ===
>
> Kudu has been released in the open for less than two months. However, from
> our very first public announcement we have been committed to open-source
> style development:
>
>  * our code reviews are fully public and documented on a mailing list
>  * our daily development chatter is in a public chat room
>  * we send out weekly “community status” reports highlighting news and
> contributions
>  * we published our entire JIRA history and discuss bugs in the open
>  * we published our entire Git commit history, going back three years (no
> squashing)
>
>
>
> Several of the initial committers are experienced open source developers,
> several being committers and/or PMC members on other ASF projects (Hadoop,
> HBase, Thrift, Flume, et al). Those who are not ASF committers have
> experience on non-ASF open source projects (Kiji, open-vm-tools, et al).
>
> === Homogenous Developers ===
>
> The initial committers are employees or former employees of Cloudera.
> However, the committers are spread across multiple offices (Palo Alto, San
> Francisco, Melbourne), so the team is familiar with working in a
> distributed environment across varied time zones.
>
>
>
> The project has received some contributions from developers outside of
> Cloudera, and is starting to attract a ''user'' community as well. We hope
> to continue to encourage contributions from these developers and community
> members and grow them into committers after they have had time to continue
> their contributions.
>
> === Reliance on Salaried Developers ===
>
> As mentioned above, the majority of development up to this point has been
> sponsored by Cloudera. We have seen several community users participate in
> discussions who are hobbyists interested in distributed systems and
> databases, and hope that they will continue their participation in the
> project going forward.
>
> === Relationships with Other Apache Products ===
>
> Kudu is currently related to the following other Apache projects:
>
>  * Hadoop: Kudu provides MapReduce input/output formats for integration
>  * Spark: Kudu integrates with Spark via the above-mentioned input formats,
> and work is progressing on support for Spark Data Frames and Spark SQL.
>
>
>
> The Kudu team has reached out to several other Apache projects to start
> discussing integrations, including Flume, Kafka, Hive, and Drill.
>
>
>
> Kudu integrates with Impala, which is also being proposed for incubation.
>
>
>
> Kudu is already collaborating on ValueVector, a proposed TLP spinning out
> from the Apache Drill community.
>
>
>
> We look forward to continuing to integrate and collaborate with these
> communities.
>
> === An Excessive Fascination with the Apache Brand ===
>
> Many of the initial committers are already experienced Apache committers,
> and understand the true value provided by the Apache Way and the principles
> of the ASF. We believe that this development and contribution model is
> especially appropriate for storage products, where Apache’s
> community-over-code philosophy ensures long term viability and
> consensus-based participation.
>
> == Documentation ==
>
>  * Documentation is written in AsciiDoc and committed in the Kudu source
> repository:
>
>  * https://github.com/cloudera/kudu/tree/master/docs
>
>
>
>  * The Kudu web site is version-controlled on the ‘gh-pages’ branch of the
> above repository.
>
>  * A LaTeX whitepaper is also published, and the source is available within
> the same repository.
>  * APIs are documented within the source code as JavaDoc or C++-style
> documentation comments.
>  * Many design documents are stored within the source code repository as
> text files next to the code being documented.
>
> == Source and Intellectual Property Submission Plan ==
>
> The Kudu codebase and web site is currently hosted on GitHub and will be
> transitioned to the ASF repositories during incubation. Kudu is already
> licensed under the Apache 2.0 license.
>
>
>
> Some portions of the code are imported from other open source projects
> under the Apache 2.0, BSD, or MIT licenses, with copyrights held by authors
> other than the initial committers. These copyright notices are maintained
> in those files as well as a top-level NOTICE.txt file. We believe this to
> be permissible under the license terms and ASF policies, and confirmed via
> a recent thread on general@incubator.apache.org .
>
>
>
> The “Kudu” name is not a registered trademark, though before the initial
> release of the project, we performed a trademark search and Cloudera’s
> legal counsel deemed it acceptable in the context of a data storage engine.
> There exists an unrelated open source project by the same name related to
> deployments on Microsoft’s Azure cloud service. We have been in contact
> with legal counsel from Microsoft and have obtained their approval for the
> use of the Kudu name.
>
>
>
> Cloudera currently owns several domain names related to Kudu (getkudu.io,
> kududb.io, et al) which will be transferred to the ASF and redirected to
> the official page during incubation.
>
>
>
> Portions of Kudu are protected by pending or published patents owned by
> Cloudera. Given the protections already granted by the Apache License, we
> do not anticipate any explicit licensing or transfer of this intellectual
> property.
>
> == External Dependencies ==
>
> The full set of dependencies and licenses are listed in
> https://github.com/cloudera/kudu/blob/master/LICENSE.txt
>
> and summarized here:
>
>  * '''Twitter Bootstrap''': Apache 2.0
>  * '''d3''': BSD 3-clause
>  * '''epoch JS library''': MIT
>  * '''lz4''': BSD 2-clause
>  * '''gflags''': BSD 3-clause
>  * '''glog''': BSD 3-clause
>  * '''gperftools''': BSD 3-clause
>  * '''libev''': BSD 2-clause
>  * '''squeasel''':MIT license
>  * '''protobuf''': BSD 3-clause
>  * '''rapidjson''': MIT
>  * '''snappy''': BSD 3-clause
>  * '''trace-viewer''': BSD 3-clause
>  * '''zlib''': zlib license
>  * '''llvm''': University of Illinois/NCSA Open Source (BSD-alike)
>  * '''bitshuffle''': MIT
>  * '''boost''': Boost license
>  * '''curl''': MIT
>  * '''libunwind''': MIT
>  * '''nvml''': BSD 3-clause
>  * '''cyrus-sasl''': Cyrus SASL license (BSD-alike)
>  * '''openssl''': OpenSSL License (BSD-alike)
>
>  * '''Guava''': Apache 2.0
>  * '''StumbleUpon Async''': BSD
>  * '''Apache Hadoop''': Apache 2.0
>  * '''Apache log4j''': Apache 2.0
>  * '''Netty''': Apache 2.0
>  * '''slf4j''': MIT
>  * '''Apache Commons''': Apache 2.0
>  * '''murmur''': Apache 2.0
>
>
> '''Build/test-only dependencies''':
>
>  * '''CMake''': BSD 3-clause
>  * '''gcovr''': BSD 3-clause
>  * '''gmock''': BSD 3-clause
>  * '''Apache Maven''': Apache 2.0
>  * '''JUnit''': EPL
>  * '''Mockito''': MIT
>
> == Cryptography ==
>
> Kudu does not currently include any cryptography-related code.
>
> == Required Resources ==
>
> === Mailing lists ===
>
>  * private@kudu.incubator.apache.org (PMC)
>  * commits@kudu.incubator.apache.org (git push emails)
>  * issues@kudu.incubator.apache.org (JIRA issue feed)
>  * dev@kudu.incubator.apache.org (Gerrit code reviews plus dev discussion)
>  * user@kudu.incubator.apache.org (User questions)
>
>
> === Repository ===
>
>  * git://git.apache.org/kudu
>
> === Gerrit ===
>
> We hope to continue using Gerrit for our code review and commit workflow.
> The Kudu team has already been in contact with Jake Farrell to start
> discussions on how Gerrit can fit into the ASF. We know that several other
> ASF projects and podlings are also interested in Gerrit.
>
>
>
> If the Infrastructure team does not have the bandwidth to support Gerrit,
> we will continue to support our own instance of Gerrit for Kudu, and make
> the necessary integrations such that commits are properly authenticated and
> maintain sufficient provenance to uphold the ASF standards (e.g. via the
> solution adopted by the AsterixDB podling).
>
> == Issue Tracking ==
>
> We would like to import our current JIRA project into the ASF JIRA, such
> that our historical commit messages and code comments continue to reference
> the appropriate bug numbers.
>
> == Initial Committers ==
>
>  * Adar Dembo adar@cloudera.com
>  * Alex Feinberg alex@strlen.net
>  * Andrew Wang wang@apache.org
>  * Dan Burkert dan@cloudera.com
>  * David Alves dralves@apache.org
>  * Jean-Daniel Cryans jdcryans@apache.org
>  * Mike Percy mpercy@apache.org
>  * Misty Stanley-Jones misty@apache.org
>  * Todd Lipcon todd@apache.org
>
> The initial list of committers was seeded by listing those contributors who
> have contributed 20 or more patches in the last 12 months, indicating that
> they are active and have achieved merit through participation on the
> project. We chose not to include other contributors who either have not yet
> contributed a significant number of patches, or whose contributions are far
> in the past and we don’t expect to be active within the ASF.
>
> == Affiliations ==
>
>  * Adar Dembo - Cloudera
>  * Alex Feinberg - Forward Networks
>  * Andrew Wang - Cloudera
>  * Dan Burkert - Cloudera
>  * David Alves - Cloudera
>  * Jean-Daniel Cryans - Cloudera
>  * Mike Percy - Cloudera
>  * Misty Stanley-Jones - Cloudera
>  * Todd Lipcon - Cloudera
>
> == Sponsors ==
>
> === Champion ===
>
>  * Todd Lipcon
>
> === Nominated Mentors ===
>
>  * Jake Farrell - ASF Member and Infra team member, Acquia
>  * Brock Noland - ASF Member, StreamSets
>  * Michael Stack - ASF Member, Cloudera
>  * Jarek Jarcec Cecho - ASF Member, Cloudera
>  * Chris Mattmann - ASF Member, NASA JPL and USC
>  * Julien Le Dem - Incubator PMC, Dremio
>  * Carl Steinbach - ASF Member, LinkedIn
>
> === Sponsoring Entity ===
>
> The Apache Incubator

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: [VOTE] Accept Kudu into the Apache Incubator

Posted by Sree V <sr...@yahoo.com.INVALID>.

+1 (non-binding) Thanking you.With RegardsSree
 


    On Monday, November 30, 2015 9:33 AM, stack <sa...@gmail.com> wrote:
 

 +1 (binding)
St.Ack
On Nov 24, 2015 11:33 AM, "Todd Lipcon" <to...@apache.org> wrote:

> Hi all,
>
> Discussion on the [DISCUSS] thread seems to have wound down, so I'd like to
> call a VOTE on acceptance of Kudu into the ASF Incubator. The proposal is
> pasted below and also available on the wiki at:
> https://wiki.apache.org/incubator/KuduProposal
>
> The proposal is unchanged since the original version, except for the
> addition of Carl Steinbach as a Mentor.
>
> Please cast your votes:
>
> [] +1, accept Kudu into the Incubator
> [] +/-0, positive/negative non-counted expression of feelings
> [] -1, do not accept Kudu into the incubator (please state reasoning)
>
> Given the US holiday this week, I imagine many folks are traveling or
> otherwise offline. So, let's run the vote for a full week rather than the
> traditional 72 hours. Unless the IPMC objects to the extended voting
> period, the vote will close on Tues, Dec 1st at noon PST.
>
> Thanks
> -Todd
> -----
>
> = Kudu Proposal =
>
> == Abstract ==
>
> Kudu is a distributed columnar storage engine built for the Apache Hadoop
> ecosystem.
>
> == Proposal ==
>
> Kudu is an open source storage engine for structured data which supports
> low-latency random access together with efficient analytical access
> patterns. Kudu distributes data using horizontal partitioning and
> replicates each partition using Raft consensus, providing low
> mean-time-to-recovery and low tail latencies. Kudu is designed within the
> context of the Apache Hadoop ecosystem and supports many integrations with
> other data analytics projects both inside and outside of the Apache
> Software Foundation.
>
>
>
> We propose to incubate Kudu as a project of the Apache Software Foundation.
>
> == Background ==
>
> In recent years, explosive growth in the amount of data being generated and
> captured by enterprises has resulted in the rapid adoption of open source
> technology which is able to store massive data sets at scale and at low
> cost. In particular, the Apache Hadoop ecosystem has become a focal point
> for such “big data” workloads, because many traditional open source
> database systems have lagged in offering a scalable alternative.
>
>
>
> Structured storage in the Hadoop ecosystem has typically been achieved in
> two ways: for static data sets, data is typically stored on Apache HDFS
> using binary data formats such as Apache Avro or Apache Parquet. However,
> neither HDFS nor these formats has any provision for updating individual
> records, or for efficient random access. Mutable data sets are typically
> stored in semi-structured stores such as Apache HBase or Apache Cassandra.
> These systems allow for low-latency record-level reads and writes, but lag
> far behind the static file formats in terms of sequential read throughput
> for applications such as SQL-based analytics or machine learning.
>
>
>
> Kudu is a new storage system designed and implemented from the ground up to
> fill this gap between high-throughput sequential-access storage systems
> such as HDFS and low-latency random-access systems such as HBase or
> Cassandra. While these existing systems continue to hold advantages in some
> situations, Kudu offers a “happy medium” alternative that can dramatically
> simplify the architecture of many common workloads. In particular, Kudu
> offers a simple API for row-level inserts, updates, and deletes, while
> providing table scans at throughputs similar to Parquet, a commonly-used
> columnar format for static data.
>
>
>
> More information on Kudu can be found at the existing open source project
> website: http://getkudu.io and in particular in the Kudu white-paper PDF:
> http://getkudu.io/kudu.pdf from which the above was excerpted.
>
> == Rationale ==
>
> As described above, Kudu fills an important gap in the open source storage
> ecosystem. After our initial open source project release in September 2015,
> we have seen a great amount of interest across a diverse set of users and
> companies. We believe that, as a storage system, it is critical to build an
> equally diverse set of contributors in the development community. Our
> experiences as committers and PMC members on other Apache projects have
> taught us the value of diverse communities in ensuring both longevity and
> high quality for such foundational systems.
>
> == Initial Goals ==
>
>  * Move the existing codebase, website, documentation, and mailing lists to
> Apache-hosted infrastructure
>  * Work with the infrastructure team to implement and approve our code
> review, build, and testing workflows in the context of the ASF
>  * Incremental development and releases per Apache guidelines
>
> == Current Status ==
>
> ==== Releases ====
>
> Kudu has undergone one public release, tagged here
> https://github.com/cloudera/kudu/tree/kudu0.5.0-release
>
> This initial release was not performed in the typical ASF fashion -- no
> source tarball was released, but rather only convenience binaries made
> available in Cloudera’s repositories. We will adopt the ASF source release
> process upon joining the incubator.
>
>
> ==== Source ====
>
> Kudu’s source is currently hosted on GitHub at
> https://github.com/cloudera/kudu
>
> This repository will be transitioned to Apache’s git hosting during
> incubation.
>
>
>
> ==== Code review ====
>
> Kudu’s code reviews are currently public and hosted on Gerrit at
> http://gerrit.cloudera.org:8080/#/q/status:open+project:kudu
>
> The Kudu developer community is very happy with gerrit and hopes to work
> with the Apache Infrastructure team to figure out how we can continue to
> use Gerrit within ASF policies.
>
>
>
> ==== Issue tracking ====
>
> Kudu’s bug and feature tracking is hosted on JIRA at:
> https://issues.cloudera.org/projects/KUDU/summary
>
> This JIRA instance contains bugs and development discussion dating back 2
> years prior to Kudu’s open source release and will provide an initial seed
> for the ASF JIRA.
>
>
>
> ==== Community discussion ====
>
> Kudu has several public discussion forums, linked here:
> http://getkudu.io/community.html
>
>
>
> ==== Build Infrastructure ====
>
> The Kudu Gerrit instance is configured to only allow patches to be
> committed after running them through an extensive set of pre-commit tests
> and code lints. The project currently makes use of elastic public cloud
> resources to perform these tests. Until this point, these resources have
> been internal to Cloudera, though we are currently investing in moving to a
> publicly accessible infrastructure.
>
>
>
> ==== Development practices ====
>
> Given that Kudu is a persistent storage engine, the community has a high
> quality bar for contributions to its core. We have a firm belief that high
> quality is achieved through automation, not manual inspection, and hence
> put a focus on thorough testing and build infrastructure to ensure that
> bar. The development community also practices review-then-commit for all
> changes to ensure that changes are accompanied by appropriate tests, are
> well commented, etc.
>
> Rather than seeing these practices as barriers to contribution, we believe
> that a fully automated and standardized review and testing practice makes
> it easier for new contributors to have patches accepted. Any new developer
> may post a patch to Gerrit using the same workflow as a seasoned
> contributor, and the same suite of tests will be automatically run. If the
> tests pass, a committer can quickly review and commit the contribution from
> their web browser.
>
> === Meritocracy ===
>
> We believe strongly in meritocracy in electing committers and PMC members.
> We believe that contributions can come in forms other than just code: for
> example, one of our initial proposed committers has contributed solely in
> the area of project documentation. We will encourage contributions and
> participation of all types, and ensure that contributors are appropriately
> recognized.
>
> === Community ===
>
> Though Kudu is relatively new as an open source project, it has already
> seen promising growth in its community across several organizations:
>
>  * '''Cloudera''' is the original development sponsor for Kudu.
>  * '''Xiaomi''' has been helping to develop and optimize Kudu for a new
> production use case, contributing code, benchmarks, feedback, and
> conference talks.
>  * '''Intel''' has contributed optimizations related to their hardware
> technologies.
>  * '''Dropbox''' has been experimenting with Kudu for a machine monitoring
> use case, and has been contributing bug reports and product feedback.
>  * '''Dremio''' is working on integration with Apache Drill and exploring
> using Kudu in a production use case.
>  * Several community-built Docker images, tutorials, and blog posts have
> sprouted up since Kudu’s release.
>
>
>
> By bringing Kudu to Apache, we hope to encourage further contribution from
> the above organizations as well as to engage new users and contributors in
> the community.
>
> === Core Developers ===
>
> Kudu was initially developed as a project at Cloudera. Most of the
> contributions to date have been by developers employed by Cloudera.
>
>
>
> Many of the developers are committers or PMC members on other Apache
> projects.
>
> === Alignment ===
>
> As a project in the big data ecosystem, Kudu is aligned with several other
> ASF projects. Kudu includes input/output format integration with Apache
> Hadoop, and this integration can also provide a bridge to Apache Spark. We
> are planning to integrate with Apache Hive in the near future. We also
> integrate closely with Cloudera Impala, which is also currently being
> proposed for incubation. We have also scheduled a hackathon with the Apache
> Drill team to work on integration with that query engine.
>
> == Known Risks ==
>
> === Orphaned Products ===
>
> The risk of Kudu being abandoned is low. Cloudera has invested a great deal
> in the initial development of the project, and intends to grow its
> investment over time as Kudu becomes a product adopted by its customer
> base. Several other organizations are also experimenting with Kudu for
> production use cases which would live for many years.
>
> === Inexperience with Open Source ===
>
> Kudu has been released in the open for less than two months. However, from
> our very first public announcement we have been committed to open-source
> style development:
>
>  * our code reviews are fully public and documented on a mailing list
>  * our daily development chatter is in a public chat room
>  * we send out weekly “community status” reports highlighting news and
> contributions
>  * we published our entire JIRA history and discuss bugs in the open
>  * we published our entire Git commit history, going back three years (no
> squashing)
>
>
>
> Several of the initial committers are experienced open source developers,
> several being committers and/or PMC members on other ASF projects (Hadoop,
> HBase, Thrift, Flume, et al). Those who are not ASF committers have
> experience on non-ASF open source projects (Kiji, open-vm-tools, et al).
>
> === Homogenous Developers ===
>
> The initial committers are employees or former employees of Cloudera.
> However, the committers are spread across multiple offices (Palo Alto, San
> Francisco, Melbourne), so the team is familiar with working in a
> distributed environment across varied time zones.
>
>
>
> The project has received some contributions from developers outside of
> Cloudera, and is starting to attract a ''user'' community as well. We hope
> to continue to encourage contributions from these developers and community
> members and grow them into committers after they have had time to continue
> their contributions.
>
> === Reliance on Salaried Developers ===
>
> As mentioned above, the majority of development up to this point has been
> sponsored by Cloudera. We have seen several community users participate in
> discussions who are hobbyists interested in distributed systems and
> databases, and hope that they will continue their participation in the
> project going forward.
>
> === Relationships with Other Apache Products ===
>
> Kudu is currently related to the following other Apache projects:
>
>  * Hadoop: Kudu provides MapReduce input/output formats for integration
>  * Spark: Kudu integrates with Spark via the above-mentioned input formats,
> and work is progressing on support for Spark Data Frames and Spark SQL.
>
>
>
> The Kudu team has reached out to several other Apache projects to start
> discussing integrations, including Flume, Kafka, Hive, and Drill.
>
>
>
> Kudu integrates with Impala, which is also being proposed for incubation.
>
>
>
> Kudu is already collaborating on ValueVector, a proposed TLP spinning out
> from the Apache Drill community.
>
>
>
> We look forward to continuing to integrate and collaborate with these
> communities.
>
> === An Excessive Fascination with the Apache Brand ===
>
> Many of the initial committers are already experienced Apache committers,
> and understand the true value provided by the Apache Way and the principles
> of the ASF. We believe that this development and contribution model is
> especially appropriate for storage products, where Apache’s
> community-over-code philosophy ensures long term viability and
> consensus-based participation.
>
> == Documentation ==
>
>  * Documentation is written in AsciiDoc and committed in the Kudu source
> repository:
>
>  * https://github.com/cloudera/kudu/tree/master/docs
>
>
>
>  * The Kudu web site is version-controlled on the ‘gh-pages’ branch of the
> above repository.
>
>  * A LaTeX whitepaper is also published, and the source is available within
> the same repository.
>  * APIs are documented within the source code as JavaDoc or C++-style
> documentation comments.
>  * Many design documents are stored within the source code repository as
> text files next to the code being documented.
>
> == Source and Intellectual Property Submission Plan ==
>
> The Kudu codebase and web site is currently hosted on GitHub and will be
> transitioned to the ASF repositories during incubation. Kudu is already
> licensed under the Apache 2.0 license.
>
>
>
> Some portions of the code are imported from other open source projects
> under the Apache 2.0, BSD, or MIT licenses, with copyrights held by authors
> other than the initial committers. These copyright notices are maintained
> in those files as well as a top-level NOTICE.txt file. We believe this to
> be permissible under the license terms and ASF policies, and confirmed via
> a recent thread on general@incubator.apache.org .
>
>
>
> The “Kudu” name is not a registered trademark, though before the initial
> release of the project, we performed a trademark search and Cloudera’s
> legal counsel deemed it acceptable in the context of a data storage engine.
> There exists an unrelated open source project by the same name related to
> deployments on Microsoft’s Azure cloud service. We have been in contact
> with legal counsel from Microsoft and have obtained their approval for the
> use of the Kudu name.
>
>
>
> Cloudera currently owns several domain names related to Kudu (getkudu.io,
> kududb.io, et al) which will be transferred to the ASF and redirected to
> the official page during incubation.
>
>
>
> Portions of Kudu are protected by pending or published patents owned by
> Cloudera. Given the protections already granted by the Apache License, we
> do not anticipate any explicit licensing or transfer of this intellectual
> property.
>
> == External Dependencies ==
>
> The full set of dependencies and licenses are listed in
> https://github.com/cloudera/kudu/blob/master/LICENSE.txt
>
> and summarized here:
>
>  * '''Twitter Bootstrap''': Apache 2.0
>  * '''d3''': BSD 3-clause
>  * '''epoch JS library''': MIT
>  * '''lz4''': BSD 2-clause
>  * '''gflags''': BSD 3-clause
>  * '''glog''': BSD 3-clause
>  * '''gperftools''': BSD 3-clause
>  * '''libev''': BSD 2-clause
>  * '''squeasel''':MIT license
>  * '''protobuf''': BSD 3-clause
>  * '''rapidjson''': MIT
>  * '''snappy''': BSD 3-clause
>  * '''trace-viewer''': BSD 3-clause
>  * '''zlib''': zlib license
>  * '''llvm''': University of Illinois/NCSA Open Source (BSD-alike)
>  * '''bitshuffle''': MIT
>  * '''boost''': Boost license
>  * '''curl''': MIT
>  * '''libunwind''': MIT
>  * '''nvml''': BSD 3-clause
>  * '''cyrus-sasl''': Cyrus SASL license (BSD-alike)
>  * '''openssl''': OpenSSL License (BSD-alike)
>
>  * '''Guava''': Apache 2.0
>  * '''StumbleUpon Async''': BSD
>  * '''Apache Hadoop''': Apache 2.0
>  * '''Apache log4j''': Apache 2.0
>  * '''Netty''': Apache 2.0
>  * '''slf4j''': MIT
>  * '''Apache Commons''': Apache 2.0
>  * '''murmur''': Apache 2.0
>
>
> '''Build/test-only dependencies''':
>
>  * '''CMake''': BSD 3-clause
>  * '''gcovr''': BSD 3-clause
>  * '''gmock''': BSD 3-clause
>  * '''Apache Maven''': Apache 2.0
>  * '''JUnit''': EPL
>  * '''Mockito''': MIT
>
> == Cryptography ==
>
> Kudu does not currently include any cryptography-related code.
>
> == Required Resources ==
>
> === Mailing lists ===
>
>  * private@kudu.incubator.apache.org (PMC)
>  * commits@kudu.incubator.apache.org (git push emails)
>  * issues@kudu.incubator.apache.org (JIRA issue feed)
>  * dev@kudu.incubator.apache.org (Gerrit code reviews plus dev discussion)
>  * user@kudu.incubator.apache.org (User questions)
>
>
> === Repository ===
>
>  * git://git.apache.org/kudu
>
> === Gerrit ===
>
> We hope to continue using Gerrit for our code review and commit workflow.
> The Kudu team has already been in contact with Jake Farrell to start
> discussions on how Gerrit can fit into the ASF. We know that several other
> ASF projects and podlings are also interested in Gerrit.
>
>
>
> If the Infrastructure team does not have the bandwidth to support Gerrit,
> we will continue to support our own instance of Gerrit for Kudu, and make
> the necessary integrations such that commits are properly authenticated and
> maintain sufficient provenance to uphold the ASF standards (e.g. via the
> solution adopted by the AsterixDB podling).
>
> == Issue Tracking ==
>
> We would like to import our current JIRA project into the ASF JIRA, such
> that our historical commit messages and code comments continue to reference
> the appropriate bug numbers.
>
> == Initial Committers ==
>
>  * Adar Dembo adar@cloudera.com
>  * Alex Feinberg alex@strlen.net
>  * Andrew Wang wang@apache.org
>  * Dan Burkert dan@cloudera.com
>  * David Alves dralves@apache.org
>  * Jean-Daniel Cryans jdcryans@apache.org
>  * Mike Percy mpercy@apache.org
>  * Misty Stanley-Jones misty@apache.org
>  * Todd Lipcon todd@apache.org
>
> The initial list of committers was seeded by listing those contributors who
> have contributed 20 or more patches in the last 12 months, indicating that
> they are active and have achieved merit through participation on the
> project. We chose not to include other contributors who either have not yet
> contributed a significant number of patches, or whose contributions are far
> in the past and we don’t expect to be active within the ASF.
>
> == Affiliations ==
>
>  * Adar Dembo - Cloudera
>  * Alex Feinberg - Forward Networks
>  * Andrew Wang - Cloudera
>  * Dan Burkert - Cloudera
>  * David Alves - Cloudera
>  * Jean-Daniel Cryans - Cloudera
>  * Mike Percy - Cloudera
>  * Misty Stanley-Jones - Cloudera
>  * Todd Lipcon - Cloudera
>
> == Sponsors ==
>
> === Champion ===
>
>  * Todd Lipcon
>
> === Nominated Mentors ===
>
>  * Jake Farrell - ASF Member and Infra team member, Acquia
>  * Brock Noland - ASF Member, StreamSets
>  * Michael Stack - ASF Member, Cloudera
>  * Jarek Jarcec Cecho - ASF Member, Cloudera
>  * Chris Mattmann - ASF Member, NASA JPL and USC
>  * Julien Le Dem - Incubator PMC, Dremio
>  * Carl Steinbach - ASF Member, LinkedIn
>
> === Sponsoring Entity ===
>
> The Apache Incubator
>

Re: [VOTE] Accept Kudu into the Apache Incubator

Posted by stack <sa...@gmail.com>.

+1 (binding)
St.Ack
On Nov 24, 2015 11:33 AM, "Todd Lipcon" <to...@apache.org> wrote:

> Hi all,
>
> Discussion on the [DISCUSS] thread seems to have wound down, so I'd like to
> call a VOTE on acceptance of Kudu into the ASF Incubator. The proposal is
> pasted below and also available on the wiki at:
> https://wiki.apache.org/incubator/KuduProposal
>
> The proposal is unchanged since the original version, except for the
> addition of Carl Steinbach as a Mentor.
>
> Please cast your votes:
>
> [] +1, accept Kudu into the Incubator
> [] +/-0, positive/negative non-counted expression of feelings
> [] -1, do not accept Kudu into the incubator (please state reasoning)
>
> Given the US holiday this week, I imagine many folks are traveling or
> otherwise offline. So, let's run the vote for a full week rather than the
> traditional 72 hours. Unless the IPMC objects to the extended voting
> period, the vote will close on Tues, Dec 1st at noon PST.
>
> Thanks
> -Todd
> -----
>
> = Kudu Proposal =
>
> == Abstract ==
>
> Kudu is a distributed columnar storage engine built for the Apache Hadoop
> ecosystem.
>
> == Proposal ==
>
> Kudu is an open source storage engine for structured data which supports
> low-latency random access together with efficient analytical access
> patterns. Kudu distributes data using horizontal partitioning and
> replicates each partition using Raft consensus, providing low
> mean-time-to-recovery and low tail latencies. Kudu is designed within the
> context of the Apache Hadoop ecosystem and supports many integrations with
> other data analytics projects both inside and outside of the Apache
> Software Foundation.
>
>
>
> We propose to incubate Kudu as a project of the Apache Software Foundation.
>
> == Background ==
>
> In recent years, explosive growth in the amount of data being generated and
> captured by enterprises has resulted in the rapid adoption of open source
> technology which is able to store massive data sets at scale and at low
> cost. In particular, the Apache Hadoop ecosystem has become a focal point
> for such “big data” workloads, because many traditional open source
> database systems have lagged in offering a scalable alternative.
>
>
>
> Structured storage in the Hadoop ecosystem has typically been achieved in
> two ways: for static data sets, data is typically stored on Apache HDFS
> using binary data formats such as Apache Avro or Apache Parquet. However,
> neither HDFS nor these formats has any provision for updating individual
> records, or for efficient random access. Mutable data sets are typically
> stored in semi-structured stores such as Apache HBase or Apache Cassandra.
> These systems allow for low-latency record-level reads and writes, but lag
> far behind the static file formats in terms of sequential read throughput
> for applications such as SQL-based analytics or machine learning.
>
>
>
> Kudu is a new storage system designed and implemented from the ground up to
> fill this gap between high-throughput sequential-access storage systems
> such as HDFS and low-latency random-access systems such as HBase or
> Cassandra. While these existing systems continue to hold advantages in some
> situations, Kudu offers a “happy medium” alternative that can dramatically
> simplify the architecture of many common workloads. In particular, Kudu
> offers a simple API for row-level inserts, updates, and deletes, while
> providing table scans at throughputs similar to Parquet, a commonly-used
> columnar format for static data.
>
>
>
> More information on Kudu can be found at the existing open source project
> website: http://getkudu.io and in particular in the Kudu white-paper PDF:
> http://getkudu.io/kudu.pdf from which the above was excerpted.
>
> == Rationale ==
>
> As described above, Kudu fills an important gap in the open source storage
> ecosystem. After our initial open source project release in September 2015,
> we have seen a great amount of interest across a diverse set of users and
> companies. We believe that, as a storage system, it is critical to build an
> equally diverse set of contributors in the development community. Our
> experiences as committers and PMC members on other Apache projects have
> taught us the value of diverse communities in ensuring both longevity and
> high quality for such foundational systems.
>
> == Initial Goals ==
>
>  * Move the existing codebase, website, documentation, and mailing lists to
> Apache-hosted infrastructure
>  * Work with the infrastructure team to implement and approve our code
> review, build, and testing workflows in the context of the ASF
>  * Incremental development and releases per Apache guidelines
>
> == Current Status ==
>
> ==== Releases ====
>
> Kudu has undergone one public release, tagged here
> https://github.com/cloudera/kudu/tree/kudu0.5.0-release
>
> This initial release was not performed in the typical ASF fashion -- no
> source tarball was released, but rather only convenience binaries made
> available in Cloudera’s repositories. We will adopt the ASF source release
> process upon joining the incubator.
>
>
> ==== Source ====
>
> Kudu’s source is currently hosted on GitHub at
> https://github.com/cloudera/kudu
>
> This repository will be transitioned to Apache’s git hosting during
> incubation.
>
>
>
> ==== Code review ====
>
> Kudu’s code reviews are currently public and hosted on Gerrit at
> http://gerrit.cloudera.org:8080/#/q/status:open+project:kudu
>
> The Kudu developer community is very happy with gerrit and hopes to work
> with the Apache Infrastructure team to figure out how we can continue to
> use Gerrit within ASF policies.
>
>
>
> ==== Issue tracking ====
>
> Kudu’s bug and feature tracking is hosted on JIRA at:
> https://issues.cloudera.org/projects/KUDU/summary
>
> This JIRA instance contains bugs and development discussion dating back 2
> years prior to Kudu’s open source release and will provide an initial seed
> for the ASF JIRA.
>
>
>
> ==== Community discussion ====
>
> Kudu has several public discussion forums, linked here:
> http://getkudu.io/community.html
>
>
>
> ==== Build Infrastructure ====
>
> The Kudu Gerrit instance is configured to only allow patches to be
> committed after running them through an extensive set of pre-commit tests
> and code lints. The project currently makes use of elastic public cloud
> resources to perform these tests. Until this point, these resources have
> been internal to Cloudera, though we are currently investing in moving to a
> publicly accessible infrastructure.
>
>
>
> ==== Development practices ====
>
> Given that Kudu is a persistent storage engine, the community has a high
> quality bar for contributions to its core. We have a firm belief that high
> quality is achieved through automation, not manual inspection, and hence
> put a focus on thorough testing and build infrastructure to ensure that
> bar. The development community also practices review-then-commit for all
> changes to ensure that changes are accompanied by appropriate tests, are
> well commented, etc.
>
> Rather than seeing these practices as barriers to contribution, we believe
> that a fully automated and standardized review and testing practice makes
> it easier for new contributors to have patches accepted. Any new developer
> may post a patch to Gerrit using the same workflow as a seasoned
> contributor, and the same suite of tests will be automatically run. If the
> tests pass, a committer can quickly review and commit the contribution from
> their web browser.
>
> === Meritocracy ===
>
> We believe strongly in meritocracy in electing committers and PMC members.
> We believe that contributions can come in forms other than just code: for
> example, one of our initial proposed committers has contributed solely in
> the area of project documentation. We will encourage contributions and
> participation of all types, and ensure that contributors are appropriately
> recognized.
>
> === Community ===
>
> Though Kudu is relatively new as an open source project, it has already
> seen promising growth in its community across several organizations:
>
>  * '''Cloudera''' is the original development sponsor for Kudu.
>  * '''Xiaomi''' has been helping to develop and optimize Kudu for a new
> production use case, contributing code, benchmarks, feedback, and
> conference talks.
>  * '''Intel''' has contributed optimizations related to their hardware
> technologies.
>  * '''Dropbox''' has been experimenting with Kudu for a machine monitoring
> use case, and has been contributing bug reports and product feedback.
>  * '''Dremio''' is working on integration with Apache Drill and exploring
> using Kudu in a production use case.
>  * Several community-built Docker images, tutorials, and blog posts have
> sprouted up since Kudu’s release.
>
>
>
> By bringing Kudu to Apache, we hope to encourage further contribution from
> the above organizations as well as to engage new users and contributors in
> the community.
>
> === Core Developers ===
>
> Kudu was initially developed as a project at Cloudera. Most of the
> contributions to date have been by developers employed by Cloudera.
>
>
>
> Many of the developers are committers or PMC members on other Apache
> projects.
>
> === Alignment ===
>
> As a project in the big data ecosystem, Kudu is aligned with several other
> ASF projects. Kudu includes input/output format integration with Apache
> Hadoop, and this integration can also provide a bridge to Apache Spark. We
> are planning to integrate with Apache Hive in the near future. We also
> integrate closely with Cloudera Impala, which is also currently being
> proposed for incubation. We have also scheduled a hackathon with the Apache
> Drill team to work on integration with that query engine.
>
> == Known Risks ==
>
> === Orphaned Products ===
>
> The risk of Kudu being abandoned is low. Cloudera has invested a great deal
> in the initial development of the project, and intends to grow its
> investment over time as Kudu becomes a product adopted by its customer
> base. Several other organizations are also experimenting with Kudu for
> production use cases which would live for many years.
>
> === Inexperience with Open Source ===
>
> Kudu has been released in the open for less than two months. However, from
> our very first public announcement we have been committed to open-source
> style development:
>
>  * our code reviews are fully public and documented on a mailing list
>  * our daily development chatter is in a public chat room
>  * we send out weekly “community status” reports highlighting news and
> contributions
>  * we published our entire JIRA history and discuss bugs in the open
>  * we published our entire Git commit history, going back three years (no
> squashing)
>
>
>
> Several of the initial committers are experienced open source developers,
> several being committers and/or PMC members on other ASF projects (Hadoop,
> HBase, Thrift, Flume, et al). Those who are not ASF committers have
> experience on non-ASF open source projects (Kiji, open-vm-tools, et al).
>
> === Homogenous Developers ===
>
> The initial committers are employees or former employees of Cloudera.
> However, the committers are spread across multiple offices (Palo Alto, San
> Francisco, Melbourne), so the team is familiar with working in a
> distributed environment across varied time zones.
>
>
>
> The project has received some contributions from developers outside of
> Cloudera, and is starting to attract a ''user'' community as well. We hope
> to continue to encourage contributions from these developers and community
> members and grow them into committers after they have had time to continue
> their contributions.
>
> === Reliance on Salaried Developers ===
>
> As mentioned above, the majority of development up to this point has been
> sponsored by Cloudera. We have seen several community users participate in
> discussions who are hobbyists interested in distributed systems and
> databases, and hope that they will continue their participation in the
> project going forward.
>
> === Relationships with Other Apache Products ===
>
> Kudu is currently related to the following other Apache projects:
>
>  * Hadoop: Kudu provides MapReduce input/output formats for integration
>  * Spark: Kudu integrates with Spark via the above-mentioned input formats,
> and work is progressing on support for Spark Data Frames and Spark SQL.
>
>
>
> The Kudu team has reached out to several other Apache projects to start
> discussing integrations, including Flume, Kafka, Hive, and Drill.
>
>
>
> Kudu integrates with Impala, which is also being proposed for incubation.
>
>
>
> Kudu is already collaborating on ValueVector, a proposed TLP spinning out
> from the Apache Drill community.
>
>
>
> We look forward to continuing to integrate and collaborate with these
> communities.
>
> === An Excessive Fascination with the Apache Brand ===
>
> Many of the initial committers are already experienced Apache committers,
> and understand the true value provided by the Apache Way and the principles
> of the ASF. We believe that this development and contribution model is
> especially appropriate for storage products, where Apache’s
> community-over-code philosophy ensures long term viability and
> consensus-based participation.
>
> == Documentation ==
>
>  * Documentation is written in AsciiDoc and committed in the Kudu source
> repository:
>
>  * https://github.com/cloudera/kudu/tree/master/docs
>
>
>
>  * The Kudu web site is version-controlled on the ‘gh-pages’ branch of the
> above repository.
>
>  * A LaTeX whitepaper is also published, and the source is available within
> the same repository.
>  * APIs are documented within the source code as JavaDoc or C++-style
> documentation comments.
>  * Many design documents are stored within the source code repository as
> text files next to the code being documented.
>
> == Source and Intellectual Property Submission Plan ==
>
> The Kudu codebase and web site is currently hosted on GitHub and will be
> transitioned to the ASF repositories during incubation. Kudu is already
> licensed under the Apache 2.0 license.
>
>
>
> Some portions of the code are imported from other open source projects
> under the Apache 2.0, BSD, or MIT licenses, with copyrights held by authors
> other than the initial committers. These copyright notices are maintained
> in those files as well as a top-level NOTICE.txt file. We believe this to
> be permissible under the license terms and ASF policies, and confirmed via
> a recent thread on general@incubator.apache.org .
>
>
>
> The “Kudu” name is not a registered trademark, though before the initial
> release of the project, we performed a trademark search and Cloudera’s
> legal counsel deemed it acceptable in the context of a data storage engine.
> There exists an unrelated open source project by the same name related to
> deployments on Microsoft’s Azure cloud service. We have been in contact
> with legal counsel from Microsoft and have obtained their approval for the
> use of the Kudu name.
>
>
>
> Cloudera currently owns several domain names related to Kudu (getkudu.io,
> kududb.io, et al) which will be transferred to the ASF and redirected to
> the official page during incubation.
>
>
>
> Portions of Kudu are protected by pending or published patents owned by
> Cloudera. Given the protections already granted by the Apache License, we
> do not anticipate any explicit licensing or transfer of this intellectual
> property.
>
> == External Dependencies ==
>
> The full set of dependencies and licenses are listed in
> https://github.com/cloudera/kudu/blob/master/LICENSE.txt
>
> and summarized here:
>
>  * '''Twitter Bootstrap''': Apache 2.0
>  * '''d3''': BSD 3-clause
>  * '''epoch JS library''': MIT
>  * '''lz4''': BSD 2-clause
>  * '''gflags''': BSD 3-clause
>  * '''glog''': BSD 3-clause
>  * '''gperftools''': BSD 3-clause
>  * '''libev''': BSD 2-clause
>  * '''squeasel''':MIT license
>  * '''protobuf''': BSD 3-clause
>  * '''rapidjson''': MIT
>  * '''snappy''': BSD 3-clause
>  * '''trace-viewer''': BSD 3-clause
>  * '''zlib''': zlib license
>  * '''llvm''': University of Illinois/NCSA Open Source (BSD-alike)
>  * '''bitshuffle''': MIT
>  * '''boost''': Boost license
>  * '''curl''': MIT
>  * '''libunwind''': MIT
>  * '''nvml''': BSD 3-clause
>  * '''cyrus-sasl''': Cyrus SASL license (BSD-alike)
>  * '''openssl''': OpenSSL License (BSD-alike)
>
>  * '''Guava''': Apache 2.0
>  * '''StumbleUpon Async''': BSD
>  * '''Apache Hadoop''': Apache 2.0
>  * '''Apache log4j''': Apache 2.0
>  * '''Netty''': Apache 2.0
>  * '''slf4j''': MIT
>  * '''Apache Commons''': Apache 2.0
>  * '''murmur''': Apache 2.0
>
>
> '''Build/test-only dependencies''':
>
>  * '''CMake''': BSD 3-clause
>  * '''gcovr''': BSD 3-clause
>  * '''gmock''': BSD 3-clause
>  * '''Apache Maven''': Apache 2.0
>  * '''JUnit''': EPL
>  * '''Mockito''': MIT
>
> == Cryptography ==
>
> Kudu does not currently include any cryptography-related code.
>
> == Required Resources ==
>
> === Mailing lists ===
>
>  * private@kudu.incubator.apache.org (PMC)
>  * commits@kudu.incubator.apache.org (git push emails)
>  * issues@kudu.incubator.apache.org (JIRA issue feed)
>  * dev@kudu.incubator.apache.org (Gerrit code reviews plus dev discussion)
>  * user@kudu.incubator.apache.org (User questions)
>
>
> === Repository ===
>
>  * git://git.apache.org/kudu
>
> === Gerrit ===
>
> We hope to continue using Gerrit for our code review and commit workflow.
> The Kudu team has already been in contact with Jake Farrell to start
> discussions on how Gerrit can fit into the ASF. We know that several other
> ASF projects and podlings are also interested in Gerrit.
>
>
>
> If the Infrastructure team does not have the bandwidth to support Gerrit,
> we will continue to support our own instance of Gerrit for Kudu, and make
> the necessary integrations such that commits are properly authenticated and
> maintain sufficient provenance to uphold the ASF standards (e.g. via the
> solution adopted by the AsterixDB podling).
>
> == Issue Tracking ==
>
> We would like to import our current JIRA project into the ASF JIRA, such
> that our historical commit messages and code comments continue to reference
> the appropriate bug numbers.
>
> == Initial Committers ==
>
>  * Adar Dembo adar@cloudera.com
>  * Alex Feinberg alex@strlen.net
>  * Andrew Wang wang@apache.org
>  * Dan Burkert dan@cloudera.com
>  * David Alves dralves@apache.org
>  * Jean-Daniel Cryans jdcryans@apache.org
>  * Mike Percy mpercy@apache.org
>  * Misty Stanley-Jones misty@apache.org
>  * Todd Lipcon todd@apache.org
>
> The initial list of committers was seeded by listing those contributors who
> have contributed 20 or more patches in the last 12 months, indicating that
> they are active and have achieved merit through participation on the
> project. We chose not to include other contributors who either have not yet
> contributed a significant number of patches, or whose contributions are far
> in the past and we don’t expect to be active within the ASF.
>
> == Affiliations ==
>
>  * Adar Dembo - Cloudera
>  * Alex Feinberg - Forward Networks
>  * Andrew Wang - Cloudera
>  * Dan Burkert - Cloudera
>  * David Alves - Cloudera
>  * Jean-Daniel Cryans - Cloudera
>  * Mike Percy - Cloudera
>  * Misty Stanley-Jones - Cloudera
>  * Todd Lipcon - Cloudera
>
> == Sponsors ==
>
> === Champion ===
>
>  * Todd Lipcon
>
> === Nominated Mentors ===
>
>  * Jake Farrell - ASF Member and Infra team member, Acquia
>  * Brock Noland - ASF Member, StreamSets
>  * Michael Stack - ASF Member, Cloudera
>  * Jarek Jarcec Cecho - ASF Member, Cloudera
>  * Chris Mattmann - ASF Member, NASA JPL and USC
>  * Julien Le Dem - Incubator PMC, Dremio
>  * Carl Steinbach - ASF Member, LinkedIn
>
> === Sponsoring Entity ===
>
> The Apache Incubator
>

Re: [VOTE] Accept Kudu into the Apache Incubator

Posted by Amol Kekre <am...@datatorrent.com>.

+1 (non-binding)

Amol


On Wed, Nov 25, 2015 at 3:19 AM, Roman Shaposhnik <rv...@apache.org> wrote:

> On Tue, Nov 24, 2015 at 11:32 AM, Todd Lipcon <to...@apache.org> wrote:
> > Hi all,
> >
> > Discussion on the [DISCUSS] thread seems to have wound down, so I'd like
> to
> > call a VOTE on acceptance of Kudu into the ASF Incubator. The proposal is
> > pasted below and also available on the wiki at:
> > https://wiki.apache.org/incubator/KuduProposal
> >
> > The proposal is unchanged since the original version, except for the
> > addition of Carl Steinbach as a Mentor.
> >
> > Please cast your votes:
> >
> > [] +1, accept Kudu into the Incubator
> > [] +/-0, positive/negative non-counted expression of feelings
> > [] -1, do not accept Kudu into the incubator (please state reasoning)
>
> +1 (binding)
>
> Bets of luck guys!
>
> Thanks,
> Roman.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>
>

Re: [VOTE] Accept Kudu into the Apache Incubator

Posted by Roman Shaposhnik <rv...@apache.org>.

On Tue, Nov 24, 2015 at 11:32 AM, Todd Lipcon <to...@apache.org> wrote:
> Hi all,
>
> Discussion on the [DISCUSS] thread seems to have wound down, so I'd like to
> call a VOTE on acceptance of Kudu into the ASF Incubator. The proposal is
> pasted below and also available on the wiki at:
> https://wiki.apache.org/incubator/KuduProposal
>
> The proposal is unchanged since the original version, except for the
> addition of Carl Steinbach as a Mentor.
>
> Please cast your votes:
>
> [] +1, accept Kudu into the Incubator
> [] +/-0, positive/negative non-counted expression of feelings
> [] -1, do not accept Kudu into the incubator (please state reasoning)

+1 (binding)

Bets of luck guys!

Thanks,
Roman.

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: [VOTE] Accept Kudu into the Apache Incubator

Posted by Tony Kurc <tr...@gmail.com>.

+1 (non-binding)
On Nov 26, 2015 3:04 PM, "Joe Witt" <jo...@gmail.com> wrote:

> +1 (non-binding)
>
> On Wed, Nov 25, 2015 at 5:26 PM, Hitesh Shah <hi...@apache.org> wrote:
> > +1 (binding)
> >
> > — Hitesh
> >
> > On Nov 24, 2015, at 11:32 AM, Todd Lipcon <to...@apache.org> wrote:
> >
> >> Hi all,
> >>
> >> Discussion on the [DISCUSS] thread seems to have wound down, so I'd
> like to
> >> call a VOTE on acceptance of Kudu into the ASF Incubator. The proposal
> is
> >> pasted below and also available on the wiki at:
> >> https://wiki.apache.org/incubator/KuduProposal
> >>
> >> The proposal is unchanged since the original version, except for the
> >> addition of Carl Steinbach as a Mentor.
> >>
> >> Please cast your votes:
> >>
> >> [] +1, accept Kudu into the Incubator
> >> [] +/-0, positive/negative non-counted expression of feelings
> >> [] -1, do not accept Kudu into the incubator (please state reasoning)
> >>
> >> Given the US holiday this week, I imagine many folks are traveling or
> >> otherwise offline. So, let's run the vote for a full week rather than
> the
> >> traditional 72 hours. Unless the IPMC objects to the extended voting
> >> period, the vote will close on Tues, Dec 1st at noon PST.
> >>
> >> Thanks
> >> -Todd
> >> -----
> >>
> >> = Kudu Proposal =
> >>
> >> == Abstract ==
> >>
> >> Kudu is a distributed columnar storage engine built for the Apache
> Hadoop
> >> ecosystem.
> >>
> >> == Proposal ==
> >>
> >> Kudu is an open source storage engine for structured data which supports
> >> low-latency random access together with efficient analytical access
> >> patterns. Kudu distributes data using horizontal partitioning and
> >> replicates each partition using Raft consensus, providing low
> >> mean-time-to-recovery and low tail latencies. Kudu is designed within
> the
> >> context of the Apache Hadoop ecosystem and supports many integrations
> with
> >> other data analytics projects both inside and outside of the Apache
> >> Software Foundation.
> >>
> >>
> >>
> >> We propose to incubate Kudu as a project of the Apache Software
> Foundation.
> >>
> >> == Background ==
> >>
> >> In recent years, explosive growth in the amount of data being generated
> and
> >> captured by enterprises has resulted in the rapid adoption of open
> source
> >> technology which is able to store massive data sets at scale and at low
> >> cost. In particular, the Apache Hadoop ecosystem has become a focal
> point
> >> for such “big data” workloads, because many traditional open source
> >> database systems have lagged in offering a scalable alternative.
> >>
> >>
> >>
> >> Structured storage in the Hadoop ecosystem has typically been achieved
> in
> >> two ways: for static data sets, data is typically stored on Apache HDFS
> >> using binary data formats such as Apache Avro or Apache Parquet.
> However,
> >> neither HDFS nor these formats has any provision for updating individual
> >> records, or for efficient random access. Mutable data sets are typically
> >> stored in semi-structured stores such as Apache HBase or Apache
> Cassandra.
> >> These systems allow for low-latency record-level reads and writes, but
> lag
> >> far behind the static file formats in terms of sequential read
> throughput
> >> for applications such as SQL-based analytics or machine learning.
> >>
> >>
> >>
> >> Kudu is a new storage system designed and implemented from the ground
> up to
> >> fill this gap between high-throughput sequential-access storage systems
> >> such as HDFS and low-latency random-access systems such as HBase or
> >> Cassandra. While these existing systems continue to hold advantages in
> some
> >> situations, Kudu offers a “happy medium” alternative that can
> dramatically
> >> simplify the architecture of many common workloads. In particular, Kudu
> >> offers a simple API for row-level inserts, updates, and deletes, while
> >> providing table scans at throughputs similar to Parquet, a commonly-used
> >> columnar format for static data.
> >>
> >>
> >>
> >> More information on Kudu can be found at the existing open source
> project
> >> website: http://getkudu.io and in particular in the Kudu white-paper
> PDF:
> >> http://getkudu.io/kudu.pdf from which the above was excerpted.
> >>
> >> == Rationale ==
> >>
> >> As described above, Kudu fills an important gap in the open source
> storage
> >> ecosystem. After our initial open source project release in September
> 2015,
> >> we have seen a great amount of interest across a diverse set of users
> and
> >> companies. We believe that, as a storage system, it is critical to
> build an
> >> equally diverse set of contributors in the development community. Our
> >> experiences as committers and PMC members on other Apache projects have
> >> taught us the value of diverse communities in ensuring both longevity
> and
> >> high quality for such foundational systems.
> >>
> >> == Initial Goals ==
> >>
> >> * Move the existing codebase, website, documentation, and mailing lists
> to
> >> Apache-hosted infrastructure
> >> * Work with the infrastructure team to implement and approve our code
> >> review, build, and testing workflows in the context of the ASF
> >> * Incremental development and releases per Apache guidelines
> >>
> >> == Current Status ==
> >>
> >> ==== Releases ====
> >>
> >> Kudu has undergone one public release, tagged here
> >> https://github.com/cloudera/kudu/tree/kudu0.5.0-release
> >>
> >> This initial release was not performed in the typical ASF fashion -- no
> >> source tarball was released, but rather only convenience binaries made
> >> available in Cloudera’s repositories. We will adopt the ASF source
> release
> >> process upon joining the incubator.
> >>
> >>
> >> ==== Source ====
> >>
> >> Kudu’s source is currently hosted on GitHub at
> >> https://github.com/cloudera/kudu
> >>
> >> This repository will be transitioned to Apache’s git hosting during
> >> incubation.
> >>
> >>
> >>
> >> ==== Code review ====
> >>
> >> Kudu’s code reviews are currently public and hosted on Gerrit at
> >> http://gerrit.cloudera.org:8080/#/q/status:open+project:kudu
> >>
> >> The Kudu developer community is very happy with gerrit and hopes to work
> >> with the Apache Infrastructure team to figure out how we can continue to
> >> use Gerrit within ASF policies.
> >>
> >>
> >>
> >> ==== Issue tracking ====
> >>
> >> Kudu’s bug and feature tracking is hosted on JIRA at:
> >> https://issues.cloudera.org/projects/KUDU/summary
> >>
> >> This JIRA instance contains bugs and development discussion dating back
> 2
> >> years prior to Kudu’s open source release and will provide an initial
> seed
> >> for the ASF JIRA.
> >>
> >>
> >>
> >> ==== Community discussion ====
> >>
> >> Kudu has several public discussion forums, linked here:
> >> http://getkudu.io/community.html
> >>
> >>
> >>
> >> ==== Build Infrastructure ====
> >>
> >> The Kudu Gerrit instance is configured to only allow patches to be
> >> committed after running them through an extensive set of pre-commit
> tests
> >> and code lints. The project currently makes use of elastic public cloud
> >> resources to perform these tests. Until this point, these resources have
> >> been internal to Cloudera, though we are currently investing in moving
> to a
> >> publicly accessible infrastructure.
> >>
> >>
> >>
> >> ==== Development practices ====
> >>
> >> Given that Kudu is a persistent storage engine, the community has a high
> >> quality bar for contributions to its core. We have a firm belief that
> high
> >> quality is achieved through automation, not manual inspection, and hence
> >> put a focus on thorough testing and build infrastructure to ensure that
> >> bar. The development community also practices review-then-commit for all
> >> changes to ensure that changes are accompanied by appropriate tests, are
> >> well commented, etc.
> >>
> >> Rather than seeing these practices as barriers to contribution, we
> believe
> >> that a fully automated and standardized review and testing practice
> makes
> >> it easier for new contributors to have patches accepted. Any new
> developer
> >> may post a patch to Gerrit using the same workflow as a seasoned
> >> contributor, and the same suite of tests will be automatically run. If
> the
> >> tests pass, a committer can quickly review and commit the contribution
> from
> >> their web browser.
> >>
> >> === Meritocracy ===
> >>
> >> We believe strongly in meritocracy in electing committers and PMC
> members.
> >> We believe that contributions can come in forms other than just code:
> for
> >> example, one of our initial proposed committers has contributed solely
> in
> >> the area of project documentation. We will encourage contributions and
> >> participation of all types, and ensure that contributors are
> appropriately
> >> recognized.
> >>
> >> === Community ===
> >>
> >> Though Kudu is relatively new as an open source project, it has already
> >> seen promising growth in its community across several organizations:
> >>
> >> * '''Cloudera''' is the original development sponsor for Kudu.
> >> * '''Xiaomi''' has been helping to develop and optimize Kudu for a new
> >> production use case, contributing code, benchmarks, feedback, and
> >> conference talks.
> >> * '''Intel''' has contributed optimizations related to their hardware
> >> technologies.
> >> * '''Dropbox''' has been experimenting with Kudu for a machine
> monitoring
> >> use case, and has been contributing bug reports and product feedback.
> >> * '''Dremio''' is working on integration with Apache Drill and exploring
> >> using Kudu in a production use case.
> >> * Several community-built Docker images, tutorials, and blog posts have
> >> sprouted up since Kudu’s release.
> >>
> >>
> >>
> >> By bringing Kudu to Apache, we hope to encourage further contribution
> from
> >> the above organizations as well as to engage new users and contributors
> in
> >> the community.
> >>
> >> === Core Developers ===
> >>
> >> Kudu was initially developed as a project at Cloudera. Most of the
> >> contributions to date have been by developers employed by Cloudera.
> >>
> >>
> >>
> >> Many of the developers are committers or PMC members on other Apache
> >> projects.
> >>
> >> === Alignment ===
> >>
> >> As a project in the big data ecosystem, Kudu is aligned with several
> other
> >> ASF projects. Kudu includes input/output format integration with Apache
> >> Hadoop, and this integration can also provide a bridge to Apache Spark.
> We
> >> are planning to integrate with Apache Hive in the near future. We also
> >> integrate closely with Cloudera Impala, which is also currently being
> >> proposed for incubation. We have also scheduled a hackathon with the
> Apache
> >> Drill team to work on integration with that query engine.
> >>
> >> == Known Risks ==
> >>
> >> === Orphaned Products ===
> >>
> >> The risk of Kudu being abandoned is low. Cloudera has invested a great
> deal
> >> in the initial development of the project, and intends to grow its
> >> investment over time as Kudu becomes a product adopted by its customer
> >> base. Several other organizations are also experimenting with Kudu for
> >> production use cases which would live for many years.
> >>
> >> === Inexperience with Open Source ===
> >>
> >> Kudu has been released in the open for less than two months. However,
> from
> >> our very first public announcement we have been committed to open-source
> >> style development:
> >>
> >> * our code reviews are fully public and documented on a mailing list
> >> * our daily development chatter is in a public chat room
> >> * we send out weekly “community status” reports highlighting news and
> >> contributions
> >> * we published our entire JIRA history and discuss bugs in the open
> >> * we published our entire Git commit history, going back three years (no
> >> squashing)
> >>
> >>
> >>
> >> Several of the initial committers are experienced open source
> developers,
> >> several being committers and/or PMC members on other ASF projects
> (Hadoop,
> >> HBase, Thrift, Flume, et al). Those who are not ASF committers have
> >> experience on non-ASF open source projects (Kiji, open-vm-tools, et al).
> >>
> >> === Homogenous Developers ===
> >>
> >> The initial committers are employees or former employees of Cloudera.
> >> However, the committers are spread across multiple offices (Palo Alto,
> San
> >> Francisco, Melbourne), so the team is familiar with working in a
> >> distributed environment across varied time zones.
> >>
> >>
> >>
> >> The project has received some contributions from developers outside of
> >> Cloudera, and is starting to attract a ''user'' community as well. We
> hope
> >> to continue to encourage contributions from these developers and
> community
> >> members and grow them into committers after they have had time to
> continue
> >> their contributions.
> >>
> >> === Reliance on Salaried Developers ===
> >>
> >> As mentioned above, the majority of development up to this point has
> been
> >> sponsored by Cloudera. We have seen several community users participate
> in
> >> discussions who are hobbyists interested in distributed systems and
> >> databases, and hope that they will continue their participation in the
> >> project going forward.
> >>
> >> === Relationships with Other Apache Products ===
> >>
> >> Kudu is currently related to the following other Apache projects:
> >>
> >> * Hadoop: Kudu provides MapReduce input/output formats for integration
> >> * Spark: Kudu integrates with Spark via the above-mentioned input
> formats,
> >> and work is progressing on support for Spark Data Frames and Spark SQL.
> >>
> >>
> >>
> >> The Kudu team has reached out to several other Apache projects to start
> >> discussing integrations, including Flume, Kafka, Hive, and Drill.
> >>
> >>
> >>
> >> Kudu integrates with Impala, which is also being proposed for
> incubation.
> >>
> >>
> >>
> >> Kudu is already collaborating on ValueVector, a proposed TLP spinning
> out
> >> from the Apache Drill community.
> >>
> >>
> >>
> >> We look forward to continuing to integrate and collaborate with these
> >> communities.
> >>
> >> === An Excessive Fascination with the Apache Brand ===
> >>
> >> Many of the initial committers are already experienced Apache
> committers,
> >> and understand the true value provided by the Apache Way and the
> principles
> >> of the ASF. We believe that this development and contribution model is
> >> especially appropriate for storage products, where Apache’s
> >> community-over-code philosophy ensures long term viability and
> >> consensus-based participation.
> >>
> >> == Documentation ==
> >>
> >> * Documentation is written in AsciiDoc and committed in the Kudu source
> >> repository:
> >>
> >> * https://github.com/cloudera/kudu/tree/master/docs
> >>
> >>
> >>
> >> * The Kudu web site is version-controlled on the ‘gh-pages’ branch of
> the
> >> above repository.
> >>
> >> * A LaTeX whitepaper is also published, and the source is available
> within
> >> the same repository.
> >> * APIs are documented within the source code as JavaDoc or C++-style
> >> documentation comments.
> >> * Many design documents are stored within the source code repository as
> >> text files next to the code being documented.
> >>
> >> == Source and Intellectual Property Submission Plan ==
> >>
> >> The Kudu codebase and web site is currently hosted on GitHub and will be
> >> transitioned to the ASF repositories during incubation. Kudu is already
> >> licensed under the Apache 2.0 license.
> >>
> >>
> >>
> >> Some portions of the code are imported from other open source projects
> >> under the Apache 2.0, BSD, or MIT licenses, with copyrights held by
> authors
> >> other than the initial committers. These copyright notices are
> maintained
> >> in those files as well as a top-level NOTICE.txt file. We believe this
> to
> >> be permissible under the license terms and ASF policies, and confirmed
> via
> >> a recent thread on general@incubator.apache.org .
> >>
> >>
> >>
> >> The “Kudu” name is not a registered trademark, though before the initial
> >> release of the project, we performed a trademark search and Cloudera’s
> >> legal counsel deemed it acceptable in the context of a data storage
> engine.
> >> There exists an unrelated open source project by the same name related
> to
> >> deployments on Microsoft’s Azure cloud service. We have been in contact
> >> with legal counsel from Microsoft and have obtained their approval for
> the
> >> use of the Kudu name.
> >>
> >>
> >>
> >> Cloudera currently owns several domain names related to Kudu (
> getkudu.io,
> >> kududb.io, et al) which will be transferred to the ASF and redirected
> to
> >> the official page during incubation.
> >>
> >>
> >>
> >> Portions of Kudu are protected by pending or published patents owned by
> >> Cloudera. Given the protections already granted by the Apache License,
> we
> >> do not anticipate any explicit licensing or transfer of this
> intellectual
> >> property.
> >>
> >> == External Dependencies ==
> >>
> >> The full set of dependencies and licenses are listed in
> >> https://github.com/cloudera/kudu/blob/master/LICENSE.txt
> >>
> >> and summarized here:
> >>
> >> * '''Twitter Bootstrap''': Apache 2.0
> >> * '''d3''': BSD 3-clause
> >> * '''epoch JS library''': MIT
> >> * '''lz4''': BSD 2-clause
> >> * '''gflags''': BSD 3-clause
> >> * '''glog''': BSD 3-clause
> >> * '''gperftools''': BSD 3-clause
> >> * '''libev''': BSD 2-clause
> >> * '''squeasel''':MIT license
> >> * '''protobuf''': BSD 3-clause
> >> * '''rapidjson''': MIT
> >> * '''snappy''': BSD 3-clause
> >> * '''trace-viewer''': BSD 3-clause
> >> * '''zlib''': zlib license
> >> * '''llvm''': University of Illinois/NCSA Open Source (BSD-alike)
> >> * '''bitshuffle''': MIT
> >> * '''boost''': Boost license
> >> * '''curl''': MIT
> >> * '''libunwind''': MIT
> >> * '''nvml''': BSD 3-clause
> >> * '''cyrus-sasl''': Cyrus SASL license (BSD-alike)
> >> * '''openssl''': OpenSSL License (BSD-alike)
> >>
> >> * '''Guava''': Apache 2.0
> >> * '''StumbleUpon Async''': BSD
> >> * '''Apache Hadoop''': Apache 2.0
> >> * '''Apache log4j''': Apache 2.0
> >> * '''Netty''': Apache 2.0
> >> * '''slf4j''': MIT
> >> * '''Apache Commons''': Apache 2.0
> >> * '''murmur''': Apache 2.0
> >>
> >>
> >> '''Build/test-only dependencies''':
> >>
> >> * '''CMake''': BSD 3-clause
> >> * '''gcovr''': BSD 3-clause
> >> * '''gmock''': BSD 3-clause
> >> * '''Apache Maven''': Apache 2.0
> >> * '''JUnit''': EPL
> >> * '''Mockito''': MIT
> >>
> >> == Cryptography ==
> >>
> >> Kudu does not currently include any cryptography-related code.
> >>
> >> == Required Resources ==
> >>
> >> === Mailing lists ===
> >>
> >> * private@kudu.incubator.apache.org (PMC)
> >> * commits@kudu.incubator.apache.org (git push emails)
> >> * issues@kudu.incubator.apache.org (JIRA issue feed)
> >> * dev@kudu.incubator.apache.org (Gerrit code reviews plus dev
> discussion)
> >> * user@kudu.incubator.apache.org (User questions)
> >>
> >>
> >> === Repository ===
> >>
> >> * git://git.apache.org/kudu
> >>
> >> === Gerrit ===
> >>
> >> We hope to continue using Gerrit for our code review and commit
> workflow.
> >> The Kudu team has already been in contact with Jake Farrell to start
> >> discussions on how Gerrit can fit into the ASF. We know that several
> other
> >> ASF projects and podlings are also interested in Gerrit.
> >>
> >>
> >>
> >> If the Infrastructure team does not have the bandwidth to support
> Gerrit,
> >> we will continue to support our own instance of Gerrit for Kudu, and
> make
> >> the necessary integrations such that commits are properly authenticated
> and
> >> maintain sufficient provenance to uphold the ASF standards (e.g. via the
> >> solution adopted by the AsterixDB podling).
> >>
> >> == Issue Tracking ==
> >>
> >> We would like to import our current JIRA project into the ASF JIRA, such
> >> that our historical commit messages and code comments continue to
> reference
> >> the appropriate bug numbers.
> >>
> >> == Initial Committers ==
> >>
> >> * Adar Dembo adar@cloudera.com
> >> * Alex Feinberg alex@strlen.net
> >> * Andrew Wang wang@apache.org
> >> * Dan Burkert dan@cloudera.com
> >> * David Alves dralves@apache.org
> >> * Jean-Daniel Cryans jdcryans@apache.org
> >> * Mike Percy mpercy@apache.org
> >> * Misty Stanley-Jones misty@apache.org
> >> * Todd Lipcon todd@apache.org
> >>
> >> The initial list of committers was seeded by listing those contributors
> who
> >> have contributed 20 or more patches in the last 12 months, indicating
> that
> >> they are active and have achieved merit through participation on the
> >> project. We chose not to include other contributors who either have not
> yet
> >> contributed a significant number of patches, or whose contributions are
> far
> >> in the past and we don’t expect to be active within the ASF.
> >>
> >> == Affiliations ==
> >>
> >> * Adar Dembo - Cloudera
> >> * Alex Feinberg - Forward Networks
> >> * Andrew Wang - Cloudera
> >> * Dan Burkert - Cloudera
> >> * David Alves - Cloudera
> >> * Jean-Daniel Cryans - Cloudera
> >> * Mike Percy - Cloudera
> >> * Misty Stanley-Jones - Cloudera
> >> * Todd Lipcon - Cloudera
> >>
> >> == Sponsors ==
> >>
> >> === Champion ===
> >>
> >> * Todd Lipcon
> >>
> >> === Nominated Mentors ===
> >>
> >> * Jake Farrell - ASF Member and Infra team member, Acquia
> >> * Brock Noland - ASF Member, StreamSets
> >> * Michael Stack - ASF Member, Cloudera
> >> * Jarek Jarcec Cecho - ASF Member, Cloudera
> >> * Chris Mattmann - ASF Member, NASA JPL and USC
> >> * Julien Le Dem - Incubator PMC, Dremio
> >> * Carl Steinbach - ASF Member, LinkedIn
> >>
> >> === Sponsoring Entity ===
> >>
> >> The Apache Incubator
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> > For additional commands, e-mail: general-help@incubator.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>
>

Re: [VOTE] Accept Kudu into the Apache Incubator

Posted by Joe Witt <jo...@gmail.com>.

+1 (non-binding)

On Wed, Nov 25, 2015 at 5:26 PM, Hitesh Shah <hi...@apache.org> wrote:
> +1 (binding)
>
> — Hitesh
>
> On Nov 24, 2015, at 11:32 AM, Todd Lipcon <to...@apache.org> wrote:
>
>> Hi all,
>>
>> Discussion on the [DISCUSS] thread seems to have wound down, so I'd like to
>> call a VOTE on acceptance of Kudu into the ASF Incubator. The proposal is
>> pasted below and also available on the wiki at:
>> https://wiki.apache.org/incubator/KuduProposal
>>
>> The proposal is unchanged since the original version, except for the
>> addition of Carl Steinbach as a Mentor.
>>
>> Please cast your votes:
>>
>> [] +1, accept Kudu into the Incubator
>> [] +/-0, positive/negative non-counted expression of feelings
>> [] -1, do not accept Kudu into the incubator (please state reasoning)
>>
>> Given the US holiday this week, I imagine many folks are traveling or
>> otherwise offline. So, let's run the vote for a full week rather than the
>> traditional 72 hours. Unless the IPMC objects to the extended voting
>> period, the vote will close on Tues, Dec 1st at noon PST.
>>
>> Thanks
>> -Todd
>> -----
>>
>> = Kudu Proposal =
>>
>> == Abstract ==
>>
>> Kudu is a distributed columnar storage engine built for the Apache Hadoop
>> ecosystem.
>>
>> == Proposal ==
>>
>> Kudu is an open source storage engine for structured data which supports
>> low-latency random access together with efficient analytical access
>> patterns. Kudu distributes data using horizontal partitioning and
>> replicates each partition using Raft consensus, providing low
>> mean-time-to-recovery and low tail latencies. Kudu is designed within the
>> context of the Apache Hadoop ecosystem and supports many integrations with
>> other data analytics projects both inside and outside of the Apache
>> Software Foundation.
>>
>>
>>
>> We propose to incubate Kudu as a project of the Apache Software Foundation.
>>
>> == Background ==
>>
>> In recent years, explosive growth in the amount of data being generated and
>> captured by enterprises has resulted in the rapid adoption of open source
>> technology which is able to store massive data sets at scale and at low
>> cost. In particular, the Apache Hadoop ecosystem has become a focal point
>> for such “big data” workloads, because many traditional open source
>> database systems have lagged in offering a scalable alternative.
>>
>>
>>
>> Structured storage in the Hadoop ecosystem has typically been achieved in
>> two ways: for static data sets, data is typically stored on Apache HDFS
>> using binary data formats such as Apache Avro or Apache Parquet. However,
>> neither HDFS nor these formats has any provision for updating individual
>> records, or for efficient random access. Mutable data sets are typically
>> stored in semi-structured stores such as Apache HBase or Apache Cassandra.
>> These systems allow for low-latency record-level reads and writes, but lag
>> far behind the static file formats in terms of sequential read throughput
>> for applications such as SQL-based analytics or machine learning.
>>
>>
>>
>> Kudu is a new storage system designed and implemented from the ground up to
>> fill this gap between high-throughput sequential-access storage systems
>> such as HDFS and low-latency random-access systems such as HBase or
>> Cassandra. While these existing systems continue to hold advantages in some
>> situations, Kudu offers a “happy medium” alternative that can dramatically
>> simplify the architecture of many common workloads. In particular, Kudu
>> offers a simple API for row-level inserts, updates, and deletes, while
>> providing table scans at throughputs similar to Parquet, a commonly-used
>> columnar format for static data.
>>
>>
>>
>> More information on Kudu can be found at the existing open source project
>> website: http://getkudu.io and in particular in the Kudu white-paper PDF:
>> http://getkudu.io/kudu.pdf from which the above was excerpted.
>>
>> == Rationale ==
>>
>> As described above, Kudu fills an important gap in the open source storage
>> ecosystem. After our initial open source project release in September 2015,
>> we have seen a great amount of interest across a diverse set of users and
>> companies. We believe that, as a storage system, it is critical to build an
>> equally diverse set of contributors in the development community. Our
>> experiences as committers and PMC members on other Apache projects have
>> taught us the value of diverse communities in ensuring both longevity and
>> high quality for such foundational systems.
>>
>> == Initial Goals ==
>>
>> * Move the existing codebase, website, documentation, and mailing lists to
>> Apache-hosted infrastructure
>> * Work with the infrastructure team to implement and approve our code
>> review, build, and testing workflows in the context of the ASF
>> * Incremental development and releases per Apache guidelines
>>
>> == Current Status ==
>>
>> ==== Releases ====
>>
>> Kudu has undergone one public release, tagged here
>> https://github.com/cloudera/kudu/tree/kudu0.5.0-release
>>
>> This initial release was not performed in the typical ASF fashion -- no
>> source tarball was released, but rather only convenience binaries made
>> available in Cloudera’s repositories. We will adopt the ASF source release
>> process upon joining the incubator.
>>
>>
>> ==== Source ====
>>
>> Kudu’s source is currently hosted on GitHub at
>> https://github.com/cloudera/kudu
>>
>> This repository will be transitioned to Apache’s git hosting during
>> incubation.
>>
>>
>>
>> ==== Code review ====
>>
>> Kudu’s code reviews are currently public and hosted on Gerrit at
>> http://gerrit.cloudera.org:8080/#/q/status:open+project:kudu
>>
>> The Kudu developer community is very happy with gerrit and hopes to work
>> with the Apache Infrastructure team to figure out how we can continue to
>> use Gerrit within ASF policies.
>>
>>
>>
>> ==== Issue tracking ====
>>
>> Kudu’s bug and feature tracking is hosted on JIRA at:
>> https://issues.cloudera.org/projects/KUDU/summary
>>
>> This JIRA instance contains bugs and development discussion dating back 2
>> years prior to Kudu’s open source release and will provide an initial seed
>> for the ASF JIRA.
>>
>>
>>
>> ==== Community discussion ====
>>
>> Kudu has several public discussion forums, linked here:
>> http://getkudu.io/community.html
>>
>>
>>
>> ==== Build Infrastructure ====
>>
>> The Kudu Gerrit instance is configured to only allow patches to be
>> committed after running them through an extensive set of pre-commit tests
>> and code lints. The project currently makes use of elastic public cloud
>> resources to perform these tests. Until this point, these resources have
>> been internal to Cloudera, though we are currently investing in moving to a
>> publicly accessible infrastructure.
>>
>>
>>
>> ==== Development practices ====
>>
>> Given that Kudu is a persistent storage engine, the community has a high
>> quality bar for contributions to its core. We have a firm belief that high
>> quality is achieved through automation, not manual inspection, and hence
>> put a focus on thorough testing and build infrastructure to ensure that
>> bar. The development community also practices review-then-commit for all
>> changes to ensure that changes are accompanied by appropriate tests, are
>> well commented, etc.
>>
>> Rather than seeing these practices as barriers to contribution, we believe
>> that a fully automated and standardized review and testing practice makes
>> it easier for new contributors to have patches accepted. Any new developer
>> may post a patch to Gerrit using the same workflow as a seasoned
>> contributor, and the same suite of tests will be automatically run. If the
>> tests pass, a committer can quickly review and commit the contribution from
>> their web browser.
>>
>> === Meritocracy ===
>>
>> We believe strongly in meritocracy in electing committers and PMC members.
>> We believe that contributions can come in forms other than just code: for
>> example, one of our initial proposed committers has contributed solely in
>> the area of project documentation. We will encourage contributions and
>> participation of all types, and ensure that contributors are appropriately
>> recognized.
>>
>> === Community ===
>>
>> Though Kudu is relatively new as an open source project, it has already
>> seen promising growth in its community across several organizations:
>>
>> * '''Cloudera''' is the original development sponsor for Kudu.
>> * '''Xiaomi''' has been helping to develop and optimize Kudu for a new
>> production use case, contributing code, benchmarks, feedback, and
>> conference talks.
>> * '''Intel''' has contributed optimizations related to their hardware
>> technologies.
>> * '''Dropbox''' has been experimenting with Kudu for a machine monitoring
>> use case, and has been contributing bug reports and product feedback.
>> * '''Dremio''' is working on integration with Apache Drill and exploring
>> using Kudu in a production use case.
>> * Several community-built Docker images, tutorials, and blog posts have
>> sprouted up since Kudu’s release.
>>
>>
>>
>> By bringing Kudu to Apache, we hope to encourage further contribution from
>> the above organizations as well as to engage new users and contributors in
>> the community.
>>
>> === Core Developers ===
>>
>> Kudu was initially developed as a project at Cloudera. Most of the
>> contributions to date have been by developers employed by Cloudera.
>>
>>
>>
>> Many of the developers are committers or PMC members on other Apache
>> projects.
>>
>> === Alignment ===
>>
>> As a project in the big data ecosystem, Kudu is aligned with several other
>> ASF projects. Kudu includes input/output format integration with Apache
>> Hadoop, and this integration can also provide a bridge to Apache Spark. We
>> are planning to integrate with Apache Hive in the near future. We also
>> integrate closely with Cloudera Impala, which is also currently being
>> proposed for incubation. We have also scheduled a hackathon with the Apache
>> Drill team to work on integration with that query engine.
>>
>> == Known Risks ==
>>
>> === Orphaned Products ===
>>
>> The risk of Kudu being abandoned is low. Cloudera has invested a great deal
>> in the initial development of the project, and intends to grow its
>> investment over time as Kudu becomes a product adopted by its customer
>> base. Several other organizations are also experimenting with Kudu for
>> production use cases which would live for many years.
>>
>> === Inexperience with Open Source ===
>>
>> Kudu has been released in the open for less than two months. However, from
>> our very first public announcement we have been committed to open-source
>> style development:
>>
>> * our code reviews are fully public and documented on a mailing list
>> * our daily development chatter is in a public chat room
>> * we send out weekly “community status” reports highlighting news and
>> contributions
>> * we published our entire JIRA history and discuss bugs in the open
>> * we published our entire Git commit history, going back three years (no
>> squashing)
>>
>>
>>
>> Several of the initial committers are experienced open source developers,
>> several being committers and/or PMC members on other ASF projects (Hadoop,
>> HBase, Thrift, Flume, et al). Those who are not ASF committers have
>> experience on non-ASF open source projects (Kiji, open-vm-tools, et al).
>>
>> === Homogenous Developers ===
>>
>> The initial committers are employees or former employees of Cloudera.
>> However, the committers are spread across multiple offices (Palo Alto, San
>> Francisco, Melbourne), so the team is familiar with working in a
>> distributed environment across varied time zones.
>>
>>
>>
>> The project has received some contributions from developers outside of
>> Cloudera, and is starting to attract a ''user'' community as well. We hope
>> to continue to encourage contributions from these developers and community
>> members and grow them into committers after they have had time to continue
>> their contributions.
>>
>> === Reliance on Salaried Developers ===
>>
>> As mentioned above, the majority of development up to this point has been
>> sponsored by Cloudera. We have seen several community users participate in
>> discussions who are hobbyists interested in distributed systems and
>> databases, and hope that they will continue their participation in the
>> project going forward.
>>
>> === Relationships with Other Apache Products ===
>>
>> Kudu is currently related to the following other Apache projects:
>>
>> * Hadoop: Kudu provides MapReduce input/output formats for integration
>> * Spark: Kudu integrates with Spark via the above-mentioned input formats,
>> and work is progressing on support for Spark Data Frames and Spark SQL.
>>
>>
>>
>> The Kudu team has reached out to several other Apache projects to start
>> discussing integrations, including Flume, Kafka, Hive, and Drill.
>>
>>
>>
>> Kudu integrates with Impala, which is also being proposed for incubation.
>>
>>
>>
>> Kudu is already collaborating on ValueVector, a proposed TLP spinning out
>> from the Apache Drill community.
>>
>>
>>
>> We look forward to continuing to integrate and collaborate with these
>> communities.
>>
>> === An Excessive Fascination with the Apache Brand ===
>>
>> Many of the initial committers are already experienced Apache committers,
>> and understand the true value provided by the Apache Way and the principles
>> of the ASF. We believe that this development and contribution model is
>> especially appropriate for storage products, where Apache’s
>> community-over-code philosophy ensures long term viability and
>> consensus-based participation.
>>
>> == Documentation ==
>>
>> * Documentation is written in AsciiDoc and committed in the Kudu source
>> repository:
>>
>> * https://github.com/cloudera/kudu/tree/master/docs
>>
>>
>>
>> * The Kudu web site is version-controlled on the ‘gh-pages’ branch of the
>> above repository.
>>
>> * A LaTeX whitepaper is also published, and the source is available within
>> the same repository.
>> * APIs are documented within the source code as JavaDoc or C++-style
>> documentation comments.
>> * Many design documents are stored within the source code repository as
>> text files next to the code being documented.
>>
>> == Source and Intellectual Property Submission Plan ==
>>
>> The Kudu codebase and web site is currently hosted on GitHub and will be
>> transitioned to the ASF repositories during incubation. Kudu is already
>> licensed under the Apache 2.0 license.
>>
>>
>>
>> Some portions of the code are imported from other open source projects
>> under the Apache 2.0, BSD, or MIT licenses, with copyrights held by authors
>> other than the initial committers. These copyright notices are maintained
>> in those files as well as a top-level NOTICE.txt file. We believe this to
>> be permissible under the license terms and ASF policies, and confirmed via
>> a recent thread on general@incubator.apache.org .
>>
>>
>>
>> The “Kudu” name is not a registered trademark, though before the initial
>> release of the project, we performed a trademark search and Cloudera’s
>> legal counsel deemed it acceptable in the context of a data storage engine.
>> There exists an unrelated open source project by the same name related to
>> deployments on Microsoft’s Azure cloud service. We have been in contact
>> with legal counsel from Microsoft and have obtained their approval for the
>> use of the Kudu name.
>>
>>
>>
>> Cloudera currently owns several domain names related to Kudu (getkudu.io,
>> kududb.io, et al) which will be transferred to the ASF and redirected to
>> the official page during incubation.
>>
>>
>>
>> Portions of Kudu are protected by pending or published patents owned by
>> Cloudera. Given the protections already granted by the Apache License, we
>> do not anticipate any explicit licensing or transfer of this intellectual
>> property.
>>
>> == External Dependencies ==
>>
>> The full set of dependencies and licenses are listed in
>> https://github.com/cloudera/kudu/blob/master/LICENSE.txt
>>
>> and summarized here:
>>
>> * '''Twitter Bootstrap''': Apache 2.0
>> * '''d3''': BSD 3-clause
>> * '''epoch JS library''': MIT
>> * '''lz4''': BSD 2-clause
>> * '''gflags''': BSD 3-clause
>> * '''glog''': BSD 3-clause
>> * '''gperftools''': BSD 3-clause
>> * '''libev''': BSD 2-clause
>> * '''squeasel''':MIT license
>> * '''protobuf''': BSD 3-clause
>> * '''rapidjson''': MIT
>> * '''snappy''': BSD 3-clause
>> * '''trace-viewer''': BSD 3-clause
>> * '''zlib''': zlib license
>> * '''llvm''': University of Illinois/NCSA Open Source (BSD-alike)
>> * '''bitshuffle''': MIT
>> * '''boost''': Boost license
>> * '''curl''': MIT
>> * '''libunwind''': MIT
>> * '''nvml''': BSD 3-clause
>> * '''cyrus-sasl''': Cyrus SASL license (BSD-alike)
>> * '''openssl''': OpenSSL License (BSD-alike)
>>
>> * '''Guava''': Apache 2.0
>> * '''StumbleUpon Async''': BSD
>> * '''Apache Hadoop''': Apache 2.0
>> * '''Apache log4j''': Apache 2.0
>> * '''Netty''': Apache 2.0
>> * '''slf4j''': MIT
>> * '''Apache Commons''': Apache 2.0
>> * '''murmur''': Apache 2.0
>>
>>
>> '''Build/test-only dependencies''':
>>
>> * '''CMake''': BSD 3-clause
>> * '''gcovr''': BSD 3-clause
>> * '''gmock''': BSD 3-clause
>> * '''Apache Maven''': Apache 2.0
>> * '''JUnit''': EPL
>> * '''Mockito''': MIT
>>
>> == Cryptography ==
>>
>> Kudu does not currently include any cryptography-related code.
>>
>> == Required Resources ==
>>
>> === Mailing lists ===
>>
>> * private@kudu.incubator.apache.org (PMC)
>> * commits@kudu.incubator.apache.org (git push emails)
>> * issues@kudu.incubator.apache.org (JIRA issue feed)
>> * dev@kudu.incubator.apache.org (Gerrit code reviews plus dev discussion)
>> * user@kudu.incubator.apache.org (User questions)
>>
>>
>> === Repository ===
>>
>> * git://git.apache.org/kudu
>>
>> === Gerrit ===
>>
>> We hope to continue using Gerrit for our code review and commit workflow.
>> The Kudu team has already been in contact with Jake Farrell to start
>> discussions on how Gerrit can fit into the ASF. We know that several other
>> ASF projects and podlings are also interested in Gerrit.
>>
>>
>>
>> If the Infrastructure team does not have the bandwidth to support Gerrit,
>> we will continue to support our own instance of Gerrit for Kudu, and make
>> the necessary integrations such that commits are properly authenticated and
>> maintain sufficient provenance to uphold the ASF standards (e.g. via the
>> solution adopted by the AsterixDB podling).
>>
>> == Issue Tracking ==
>>
>> We would like to import our current JIRA project into the ASF JIRA, such
>> that our historical commit messages and code comments continue to reference
>> the appropriate bug numbers.
>>
>> == Initial Committers ==
>>
>> * Adar Dembo adar@cloudera.com
>> * Alex Feinberg alex@strlen.net
>> * Andrew Wang wang@apache.org
>> * Dan Burkert dan@cloudera.com
>> * David Alves dralves@apache.org
>> * Jean-Daniel Cryans jdcryans@apache.org
>> * Mike Percy mpercy@apache.org
>> * Misty Stanley-Jones misty@apache.org
>> * Todd Lipcon todd@apache.org
>>
>> The initial list of committers was seeded by listing those contributors who
>> have contributed 20 or more patches in the last 12 months, indicating that
>> they are active and have achieved merit through participation on the
>> project. We chose not to include other contributors who either have not yet
>> contributed a significant number of patches, or whose contributions are far
>> in the past and we don’t expect to be active within the ASF.
>>
>> == Affiliations ==
>>
>> * Adar Dembo - Cloudera
>> * Alex Feinberg - Forward Networks
>> * Andrew Wang - Cloudera
>> * Dan Burkert - Cloudera
>> * David Alves - Cloudera
>> * Jean-Daniel Cryans - Cloudera
>> * Mike Percy - Cloudera
>> * Misty Stanley-Jones - Cloudera
>> * Todd Lipcon - Cloudera
>>
>> == Sponsors ==
>>
>> === Champion ===
>>
>> * Todd Lipcon
>>
>> === Nominated Mentors ===
>>
>> * Jake Farrell - ASF Member and Infra team member, Acquia
>> * Brock Noland - ASF Member, StreamSets
>> * Michael Stack - ASF Member, Cloudera
>> * Jarek Jarcec Cecho - ASF Member, Cloudera
>> * Chris Mattmann - ASF Member, NASA JPL and USC
>> * Julien Le Dem - Incubator PMC, Dremio
>> * Carl Steinbach - ASF Member, LinkedIn
>>
>> === Sponsoring Entity ===
>>
>> The Apache Incubator
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: [VOTE] Accept Kudu into the Apache Incubator

Posted by Hitesh Shah <hi...@apache.org>.

+1 (binding)

— Hitesh

On Nov 24, 2015, at 11:32 AM, Todd Lipcon <to...@apache.org> wrote:

> Hi all,
> 
> Discussion on the [DISCUSS] thread seems to have wound down, so I'd like to
> call a VOTE on acceptance of Kudu into the ASF Incubator. The proposal is
> pasted below and also available on the wiki at:
> https://wiki.apache.org/incubator/KuduProposal
> 
> The proposal is unchanged since the original version, except for the
> addition of Carl Steinbach as a Mentor.
> 
> Please cast your votes:
> 
> [] +1, accept Kudu into the Incubator
> [] +/-0, positive/negative non-counted expression of feelings
> [] -1, do not accept Kudu into the incubator (please state reasoning)
> 
> Given the US holiday this week, I imagine many folks are traveling or
> otherwise offline. So, let's run the vote for a full week rather than the
> traditional 72 hours. Unless the IPMC objects to the extended voting
> period, the vote will close on Tues, Dec 1st at noon PST.
> 
> Thanks
> -Todd
> -----
> 
> = Kudu Proposal =
> 
> == Abstract ==
> 
> Kudu is a distributed columnar storage engine built for the Apache Hadoop
> ecosystem.
> 
> == Proposal ==
> 
> Kudu is an open source storage engine for structured data which supports
> low-latency random access together with efficient analytical access
> patterns. Kudu distributes data using horizontal partitioning and
> replicates each partition using Raft consensus, providing low
> mean-time-to-recovery and low tail latencies. Kudu is designed within the
> context of the Apache Hadoop ecosystem and supports many integrations with
> other data analytics projects both inside and outside of the Apache
> Software Foundation.
> 
> 
> 
> We propose to incubate Kudu as a project of the Apache Software Foundation.
> 
> == Background ==
> 
> In recent years, explosive growth in the amount of data being generated and
> captured by enterprises has resulted in the rapid adoption of open source
> technology which is able to store massive data sets at scale and at low
> cost. In particular, the Apache Hadoop ecosystem has become a focal point
> for such “big data” workloads, because many traditional open source
> database systems have lagged in offering a scalable alternative.
> 
> 
> 
> Structured storage in the Hadoop ecosystem has typically been achieved in
> two ways: for static data sets, data is typically stored on Apache HDFS
> using binary data formats such as Apache Avro or Apache Parquet. However,
> neither HDFS nor these formats has any provision for updating individual
> records, or for efficient random access. Mutable data sets are typically
> stored in semi-structured stores such as Apache HBase or Apache Cassandra.
> These systems allow for low-latency record-level reads and writes, but lag
> far behind the static file formats in terms of sequential read throughput
> for applications such as SQL-based analytics or machine learning.
> 
> 
> 
> Kudu is a new storage system designed and implemented from the ground up to
> fill this gap between high-throughput sequential-access storage systems
> such as HDFS and low-latency random-access systems such as HBase or
> Cassandra. While these existing systems continue to hold advantages in some
> situations, Kudu offers a “happy medium” alternative that can dramatically
> simplify the architecture of many common workloads. In particular, Kudu
> offers a simple API for row-level inserts, updates, and deletes, while
> providing table scans at throughputs similar to Parquet, a commonly-used
> columnar format for static data.
> 
> 
> 
> More information on Kudu can be found at the existing open source project
> website: http://getkudu.io and in particular in the Kudu white-paper PDF:
> http://getkudu.io/kudu.pdf from which the above was excerpted.
> 
> == Rationale ==
> 
> As described above, Kudu fills an important gap in the open source storage
> ecosystem. After our initial open source project release in September 2015,
> we have seen a great amount of interest across a diverse set of users and
> companies. We believe that, as a storage system, it is critical to build an
> equally diverse set of contributors in the development community. Our
> experiences as committers and PMC members on other Apache projects have
> taught us the value of diverse communities in ensuring both longevity and
> high quality for such foundational systems.
> 
> == Initial Goals ==
> 
> * Move the existing codebase, website, documentation, and mailing lists to
> Apache-hosted infrastructure
> * Work with the infrastructure team to implement and approve our code
> review, build, and testing workflows in the context of the ASF
> * Incremental development and releases per Apache guidelines
> 
> == Current Status ==
> 
> ==== Releases ====
> 
> Kudu has undergone one public release, tagged here
> https://github.com/cloudera/kudu/tree/kudu0.5.0-release
> 
> This initial release was not performed in the typical ASF fashion -- no
> source tarball was released, but rather only convenience binaries made
> available in Cloudera’s repositories. We will adopt the ASF source release
> process upon joining the incubator.
> 
> 
> ==== Source ====
> 
> Kudu’s source is currently hosted on GitHub at
> https://github.com/cloudera/kudu
> 
> This repository will be transitioned to Apache’s git hosting during
> incubation.
> 
> 
> 
> ==== Code review ====
> 
> Kudu’s code reviews are currently public and hosted on Gerrit at
> http://gerrit.cloudera.org:8080/#/q/status:open+project:kudu
> 
> The Kudu developer community is very happy with gerrit and hopes to work
> with the Apache Infrastructure team to figure out how we can continue to
> use Gerrit within ASF policies.
> 
> 
> 
> ==== Issue tracking ====
> 
> Kudu’s bug and feature tracking is hosted on JIRA at:
> https://issues.cloudera.org/projects/KUDU/summary
> 
> This JIRA instance contains bugs and development discussion dating back 2
> years prior to Kudu’s open source release and will provide an initial seed
> for the ASF JIRA.
> 
> 
> 
> ==== Community discussion ====
> 
> Kudu has several public discussion forums, linked here:
> http://getkudu.io/community.html
> 
> 
> 
> ==== Build Infrastructure ====
> 
> The Kudu Gerrit instance is configured to only allow patches to be
> committed after running them through an extensive set of pre-commit tests
> and code lints. The project currently makes use of elastic public cloud
> resources to perform these tests. Until this point, these resources have
> been internal to Cloudera, though we are currently investing in moving to a
> publicly accessible infrastructure.
> 
> 
> 
> ==== Development practices ====
> 
> Given that Kudu is a persistent storage engine, the community has a high
> quality bar for contributions to its core. We have a firm belief that high
> quality is achieved through automation, not manual inspection, and hence
> put a focus on thorough testing and build infrastructure to ensure that
> bar. The development community also practices review-then-commit for all
> changes to ensure that changes are accompanied by appropriate tests, are
> well commented, etc.
> 
> Rather than seeing these practices as barriers to contribution, we believe
> that a fully automated and standardized review and testing practice makes
> it easier for new contributors to have patches accepted. Any new developer
> may post a patch to Gerrit using the same workflow as a seasoned
> contributor, and the same suite of tests will be automatically run. If the
> tests pass, a committer can quickly review and commit the contribution from
> their web browser.
> 
> === Meritocracy ===
> 
> We believe strongly in meritocracy in electing committers and PMC members.
> We believe that contributions can come in forms other than just code: for
> example, one of our initial proposed committers has contributed solely in
> the area of project documentation. We will encourage contributions and
> participation of all types, and ensure that contributors are appropriately
> recognized.
> 
> === Community ===
> 
> Though Kudu is relatively new as an open source project, it has already
> seen promising growth in its community across several organizations:
> 
> * '''Cloudera''' is the original development sponsor for Kudu.
> * '''Xiaomi''' has been helping to develop and optimize Kudu for a new
> production use case, contributing code, benchmarks, feedback, and
> conference talks.
> * '''Intel''' has contributed optimizations related to their hardware
> technologies.
> * '''Dropbox''' has been experimenting with Kudu for a machine monitoring
> use case, and has been contributing bug reports and product feedback.
> * '''Dremio''' is working on integration with Apache Drill and exploring
> using Kudu in a production use case.
> * Several community-built Docker images, tutorials, and blog posts have
> sprouted up since Kudu’s release.
> 
> 
> 
> By bringing Kudu to Apache, we hope to encourage further contribution from
> the above organizations as well as to engage new users and contributors in
> the community.
> 
> === Core Developers ===
> 
> Kudu was initially developed as a project at Cloudera. Most of the
> contributions to date have been by developers employed by Cloudera.
> 
> 
> 
> Many of the developers are committers or PMC members on other Apache
> projects.
> 
> === Alignment ===
> 
> As a project in the big data ecosystem, Kudu is aligned with several other
> ASF projects. Kudu includes input/output format integration with Apache
> Hadoop, and this integration can also provide a bridge to Apache Spark. We
> are planning to integrate with Apache Hive in the near future. We also
> integrate closely with Cloudera Impala, which is also currently being
> proposed for incubation. We have also scheduled a hackathon with the Apache
> Drill team to work on integration with that query engine.
> 
> == Known Risks ==
> 
> === Orphaned Products ===
> 
> The risk of Kudu being abandoned is low. Cloudera has invested a great deal
> in the initial development of the project, and intends to grow its
> investment over time as Kudu becomes a product adopted by its customer
> base. Several other organizations are also experimenting with Kudu for
> production use cases which would live for many years.
> 
> === Inexperience with Open Source ===
> 
> Kudu has been released in the open for less than two months. However, from
> our very first public announcement we have been committed to open-source
> style development:
> 
> * our code reviews are fully public and documented on a mailing list
> * our daily development chatter is in a public chat room
> * we send out weekly “community status” reports highlighting news and
> contributions
> * we published our entire JIRA history and discuss bugs in the open
> * we published our entire Git commit history, going back three years (no
> squashing)
> 
> 
> 
> Several of the initial committers are experienced open source developers,
> several being committers and/or PMC members on other ASF projects (Hadoop,
> HBase, Thrift, Flume, et al). Those who are not ASF committers have
> experience on non-ASF open source projects (Kiji, open-vm-tools, et al).
> 
> === Homogenous Developers ===
> 
> The initial committers are employees or former employees of Cloudera.
> However, the committers are spread across multiple offices (Palo Alto, San
> Francisco, Melbourne), so the team is familiar with working in a
> distributed environment across varied time zones.
> 
> 
> 
> The project has received some contributions from developers outside of
> Cloudera, and is starting to attract a ''user'' community as well. We hope
> to continue to encourage contributions from these developers and community
> members and grow them into committers after they have had time to continue
> their contributions.
> 
> === Reliance on Salaried Developers ===
> 
> As mentioned above, the majority of development up to this point has been
> sponsored by Cloudera. We have seen several community users participate in
> discussions who are hobbyists interested in distributed systems and
> databases, and hope that they will continue their participation in the
> project going forward.
> 
> === Relationships with Other Apache Products ===
> 
> Kudu is currently related to the following other Apache projects:
> 
> * Hadoop: Kudu provides MapReduce input/output formats for integration
> * Spark: Kudu integrates with Spark via the above-mentioned input formats,
> and work is progressing on support for Spark Data Frames and Spark SQL.
> 
> 
> 
> The Kudu team has reached out to several other Apache projects to start
> discussing integrations, including Flume, Kafka, Hive, and Drill.
> 
> 
> 
> Kudu integrates with Impala, which is also being proposed for incubation.
> 
> 
> 
> Kudu is already collaborating on ValueVector, a proposed TLP spinning out
> from the Apache Drill community.
> 
> 
> 
> We look forward to continuing to integrate and collaborate with these
> communities.
> 
> === An Excessive Fascination with the Apache Brand ===
> 
> Many of the initial committers are already experienced Apache committers,
> and understand the true value provided by the Apache Way and the principles
> of the ASF. We believe that this development and contribution model is
> especially appropriate for storage products, where Apache’s
> community-over-code philosophy ensures long term viability and
> consensus-based participation.
> 
> == Documentation ==
> 
> * Documentation is written in AsciiDoc and committed in the Kudu source
> repository:
> 
> * https://github.com/cloudera/kudu/tree/master/docs
> 
> 
> 
> * The Kudu web site is version-controlled on the ‘gh-pages’ branch of the
> above repository.
> 
> * A LaTeX whitepaper is also published, and the source is available within
> the same repository.
> * APIs are documented within the source code as JavaDoc or C++-style
> documentation comments.
> * Many design documents are stored within the source code repository as
> text files next to the code being documented.
> 
> == Source and Intellectual Property Submission Plan ==
> 
> The Kudu codebase and web site is currently hosted on GitHub and will be
> transitioned to the ASF repositories during incubation. Kudu is already
> licensed under the Apache 2.0 license.
> 
> 
> 
> Some portions of the code are imported from other open source projects
> under the Apache 2.0, BSD, or MIT licenses, with copyrights held by authors
> other than the initial committers. These copyright notices are maintained
> in those files as well as a top-level NOTICE.txt file. We believe this to
> be permissible under the license terms and ASF policies, and confirmed via
> a recent thread on general@incubator.apache.org .
> 
> 
> 
> The “Kudu” name is not a registered trademark, though before the initial
> release of the project, we performed a trademark search and Cloudera’s
> legal counsel deemed it acceptable in the context of a data storage engine.
> There exists an unrelated open source project by the same name related to
> deployments on Microsoft’s Azure cloud service. We have been in contact
> with legal counsel from Microsoft and have obtained their approval for the
> use of the Kudu name.
> 
> 
> 
> Cloudera currently owns several domain names related to Kudu (getkudu.io,
> kududb.io, et al) which will be transferred to the ASF and redirected to
> the official page during incubation.
> 
> 
> 
> Portions of Kudu are protected by pending or published patents owned by
> Cloudera. Given the protections already granted by the Apache License, we
> do not anticipate any explicit licensing or transfer of this intellectual
> property.
> 
> == External Dependencies ==
> 
> The full set of dependencies and licenses are listed in
> https://github.com/cloudera/kudu/blob/master/LICENSE.txt
> 
> and summarized here:
> 
> * '''Twitter Bootstrap''': Apache 2.0
> * '''d3''': BSD 3-clause
> * '''epoch JS library''': MIT
> * '''lz4''': BSD 2-clause
> * '''gflags''': BSD 3-clause
> * '''glog''': BSD 3-clause
> * '''gperftools''': BSD 3-clause
> * '''libev''': BSD 2-clause
> * '''squeasel''':MIT license
> * '''protobuf''': BSD 3-clause
> * '''rapidjson''': MIT
> * '''snappy''': BSD 3-clause
> * '''trace-viewer''': BSD 3-clause
> * '''zlib''': zlib license
> * '''llvm''': University of Illinois/NCSA Open Source (BSD-alike)
> * '''bitshuffle''': MIT
> * '''boost''': Boost license
> * '''curl''': MIT
> * '''libunwind''': MIT
> * '''nvml''': BSD 3-clause
> * '''cyrus-sasl''': Cyrus SASL license (BSD-alike)
> * '''openssl''': OpenSSL License (BSD-alike)
> 
> * '''Guava''': Apache 2.0
> * '''StumbleUpon Async''': BSD
> * '''Apache Hadoop''': Apache 2.0
> * '''Apache log4j''': Apache 2.0
> * '''Netty''': Apache 2.0
> * '''slf4j''': MIT
> * '''Apache Commons''': Apache 2.0
> * '''murmur''': Apache 2.0
> 
> 
> '''Build/test-only dependencies''':
> 
> * '''CMake''': BSD 3-clause
> * '''gcovr''': BSD 3-clause
> * '''gmock''': BSD 3-clause
> * '''Apache Maven''': Apache 2.0
> * '''JUnit''': EPL
> * '''Mockito''': MIT
> 
> == Cryptography ==
> 
> Kudu does not currently include any cryptography-related code.
> 
> == Required Resources ==
> 
> === Mailing lists ===
> 
> * private@kudu.incubator.apache.org (PMC)
> * commits@kudu.incubator.apache.org (git push emails)
> * issues@kudu.incubator.apache.org (JIRA issue feed)
> * dev@kudu.incubator.apache.org (Gerrit code reviews plus dev discussion)
> * user@kudu.incubator.apache.org (User questions)
> 
> 
> === Repository ===
> 
> * git://git.apache.org/kudu
> 
> === Gerrit ===
> 
> We hope to continue using Gerrit for our code review and commit workflow.
> The Kudu team has already been in contact with Jake Farrell to start
> discussions on how Gerrit can fit into the ASF. We know that several other
> ASF projects and podlings are also interested in Gerrit.
> 
> 
> 
> If the Infrastructure team does not have the bandwidth to support Gerrit,
> we will continue to support our own instance of Gerrit for Kudu, and make
> the necessary integrations such that commits are properly authenticated and
> maintain sufficient provenance to uphold the ASF standards (e.g. via the
> solution adopted by the AsterixDB podling).
> 
> == Issue Tracking ==
> 
> We would like to import our current JIRA project into the ASF JIRA, such
> that our historical commit messages and code comments continue to reference
> the appropriate bug numbers.
> 
> == Initial Committers ==
> 
> * Adar Dembo adar@cloudera.com
> * Alex Feinberg alex@strlen.net
> * Andrew Wang wang@apache.org
> * Dan Burkert dan@cloudera.com
> * David Alves dralves@apache.org
> * Jean-Daniel Cryans jdcryans@apache.org
> * Mike Percy mpercy@apache.org
> * Misty Stanley-Jones misty@apache.org
> * Todd Lipcon todd@apache.org
> 
> The initial list of committers was seeded by listing those contributors who
> have contributed 20 or more patches in the last 12 months, indicating that
> they are active and have achieved merit through participation on the
> project. We chose not to include other contributors who either have not yet
> contributed a significant number of patches, or whose contributions are far
> in the past and we don’t expect to be active within the ASF.
> 
> == Affiliations ==
> 
> * Adar Dembo - Cloudera
> * Alex Feinberg - Forward Networks
> * Andrew Wang - Cloudera
> * Dan Burkert - Cloudera
> * David Alves - Cloudera
> * Jean-Daniel Cryans - Cloudera
> * Mike Percy - Cloudera
> * Misty Stanley-Jones - Cloudera
> * Todd Lipcon - Cloudera
> 
> == Sponsors ==
> 
> === Champion ===
> 
> * Todd Lipcon
> 
> === Nominated Mentors ===
> 
> * Jake Farrell - ASF Member and Infra team member, Acquia
> * Brock Noland - ASF Member, StreamSets
> * Michael Stack - ASF Member, Cloudera
> * Jarek Jarcec Cecho - ASF Member, Cloudera
> * Chris Mattmann - ASF Member, NASA JPL and USC
> * Julien Le Dem - Incubator PMC, Dremio
> * Carl Steinbach - ASF Member, LinkedIn
> 
> === Sponsoring Entity ===
> 
> The Apache Incubator


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: [VOTE] Accept Kudu into the Apache Incubator

Posted by Ashish <pa...@gmail.com>.

+1 (non-binding)

On Tue, Nov 24, 2015 at 11:40 AM, Jarek Jarcec Cecho <ja...@apache.org> wrote:
>> [X] +1, accept Kudu into the Incubator
>
> (binding)
>
> Jarcec
>
>> On Nov 24, 2015, at 11:32 AM, Todd Lipcon <to...@apache.org> wrote:
>>
>> Hi all,
>>
>> Discussion on the [DISCUSS] thread seems to have wound down, so I'd like to
>> call a VOTE on acceptance of Kudu into the ASF Incubator. The proposal is
>> pasted below and also available on the wiki at:
>> https://wiki.apache.org/incubator/KuduProposal
>>
>> The proposal is unchanged since the original version, except for the
>> addition of Carl Steinbach as a Mentor.
>>
>> Please cast your votes:
>>
>> [] +1, accept Kudu into the Incubator
>> [] +/-0, positive/negative non-counted expression of feelings
>> [] -1, do not accept Kudu into the incubator (please state reasoning)
>>
>> Given the US holiday this week, I imagine many folks are traveling or
>> otherwise offline. So, let's run the vote for a full week rather than the
>> traditional 72 hours. Unless the IPMC objects to the extended voting
>> period, the vote will close on Tues, Dec 1st at noon PST.
>>
>> Thanks
>> -Todd
>> -----
>>
>> = Kudu Proposal =
>>
>> == Abstract ==
>>
>> Kudu is a distributed columnar storage engine built for the Apache Hadoop
>> ecosystem.
>>
>> == Proposal ==
>>
>> Kudu is an open source storage engine for structured data which supports
>> low-latency random access together with efficient analytical access
>> patterns. Kudu distributes data using horizontal partitioning and
>> replicates each partition using Raft consensus, providing low
>> mean-time-to-recovery and low tail latencies. Kudu is designed within the
>> context of the Apache Hadoop ecosystem and supports many integrations with
>> other data analytics projects both inside and outside of the Apache
>> Software Foundation.
>>
>>
>>
>> We propose to incubate Kudu as a project of the Apache Software Foundation.
>>
>> == Background ==
>>
>> In recent years, explosive growth in the amount of data being generated and
>> captured by enterprises has resulted in the rapid adoption of open source
>> technology which is able to store massive data sets at scale and at low
>> cost. In particular, the Apache Hadoop ecosystem has become a focal point
>> for such “big data” workloads, because many traditional open source
>> database systems have lagged in offering a scalable alternative.
>>
>>
>>
>> Structured storage in the Hadoop ecosystem has typically been achieved in
>> two ways: for static data sets, data is typically stored on Apache HDFS
>> using binary data formats such as Apache Avro or Apache Parquet. However,
>> neither HDFS nor these formats has any provision for updating individual
>> records, or for efficient random access. Mutable data sets are typically
>> stored in semi-structured stores such as Apache HBase or Apache Cassandra.
>> These systems allow for low-latency record-level reads and writes, but lag
>> far behind the static file formats in terms of sequential read throughput
>> for applications such as SQL-based analytics or machine learning.
>>
>>
>>
>> Kudu is a new storage system designed and implemented from the ground up to
>> fill this gap between high-throughput sequential-access storage systems
>> such as HDFS and low-latency random-access systems such as HBase or
>> Cassandra. While these existing systems continue to hold advantages in some
>> situations, Kudu offers a “happy medium” alternative that can dramatically
>> simplify the architecture of many common workloads. In particular, Kudu
>> offers a simple API for row-level inserts, updates, and deletes, while
>> providing table scans at throughputs similar to Parquet, a commonly-used
>> columnar format for static data.
>>
>>
>>
>> More information on Kudu can be found at the existing open source project
>> website: http://getkudu.io and in particular in the Kudu white-paper PDF:
>> http://getkudu.io/kudu.pdf from which the above was excerpted.
>>
>> == Rationale ==
>>
>> As described above, Kudu fills an important gap in the open source storage
>> ecosystem. After our initial open source project release in September 2015,
>> we have seen a great amount of interest across a diverse set of users and
>> companies. We believe that, as a storage system, it is critical to build an
>> equally diverse set of contributors in the development community. Our
>> experiences as committers and PMC members on other Apache projects have
>> taught us the value of diverse communities in ensuring both longevity and
>> high quality for such foundational systems.
>>
>> == Initial Goals ==
>>
>> * Move the existing codebase, website, documentation, and mailing lists to
>> Apache-hosted infrastructure
>> * Work with the infrastructure team to implement and approve our code
>> review, build, and testing workflows in the context of the ASF
>> * Incremental development and releases per Apache guidelines
>>
>> == Current Status ==
>>
>> ==== Releases ====
>>
>> Kudu has undergone one public release, tagged here
>> https://github.com/cloudera/kudu/tree/kudu0.5.0-release
>>
>> This initial release was not performed in the typical ASF fashion -- no
>> source tarball was released, but rather only convenience binaries made
>> available in Cloudera’s repositories. We will adopt the ASF source release
>> process upon joining the incubator.
>>
>>
>> ==== Source ====
>>
>> Kudu’s source is currently hosted on GitHub at
>> https://github.com/cloudera/kudu
>>
>> This repository will be transitioned to Apache’s git hosting during
>> incubation.
>>
>>
>>
>> ==== Code review ====
>>
>> Kudu’s code reviews are currently public and hosted on Gerrit at
>> http://gerrit.cloudera.org:8080/#/q/status:open+project:kudu
>>
>> The Kudu developer community is very happy with gerrit and hopes to work
>> with the Apache Infrastructure team to figure out how we can continue to
>> use Gerrit within ASF policies.
>>
>>
>>
>> ==== Issue tracking ====
>>
>> Kudu’s bug and feature tracking is hosted on JIRA at:
>> https://issues.cloudera.org/projects/KUDU/summary
>>
>> This JIRA instance contains bugs and development discussion dating back 2
>> years prior to Kudu’s open source release and will provide an initial seed
>> for the ASF JIRA.
>>
>>
>>
>> ==== Community discussion ====
>>
>> Kudu has several public discussion forums, linked here:
>> http://getkudu.io/community.html
>>
>>
>>
>> ==== Build Infrastructure ====
>>
>> The Kudu Gerrit instance is configured to only allow patches to be
>> committed after running them through an extensive set of pre-commit tests
>> and code lints. The project currently makes use of elastic public cloud
>> resources to perform these tests. Until this point, these resources have
>> been internal to Cloudera, though we are currently investing in moving to a
>> publicly accessible infrastructure.
>>
>>
>>
>> ==== Development practices ====
>>
>> Given that Kudu is a persistent storage engine, the community has a high
>> quality bar for contributions to its core. We have a firm belief that high
>> quality is achieved through automation, not manual inspection, and hence
>> put a focus on thorough testing and build infrastructure to ensure that
>> bar. The development community also practices review-then-commit for all
>> changes to ensure that changes are accompanied by appropriate tests, are
>> well commented, etc.
>>
>> Rather than seeing these practices as barriers to contribution, we believe
>> that a fully automated and standardized review and testing practice makes
>> it easier for new contributors to have patches accepted. Any new developer
>> may post a patch to Gerrit using the same workflow as a seasoned
>> contributor, and the same suite of tests will be automatically run. If the
>> tests pass, a committer can quickly review and commit the contribution from
>> their web browser.
>>
>> === Meritocracy ===
>>
>> We believe strongly in meritocracy in electing committers and PMC members.
>> We believe that contributions can come in forms other than just code: for
>> example, one of our initial proposed committers has contributed solely in
>> the area of project documentation. We will encourage contributions and
>> participation of all types, and ensure that contributors are appropriately
>> recognized.
>>
>> === Community ===
>>
>> Though Kudu is relatively new as an open source project, it has already
>> seen promising growth in its community across several organizations:
>>
>> * '''Cloudera''' is the original development sponsor for Kudu.
>> * '''Xiaomi''' has been helping to develop and optimize Kudu for a new
>> production use case, contributing code, benchmarks, feedback, and
>> conference talks.
>> * '''Intel''' has contributed optimizations related to their hardware
>> technologies.
>> * '''Dropbox''' has been experimenting with Kudu for a machine monitoring
>> use case, and has been contributing bug reports and product feedback.
>> * '''Dremio''' is working on integration with Apache Drill and exploring
>> using Kudu in a production use case.
>> * Several community-built Docker images, tutorials, and blog posts have
>> sprouted up since Kudu’s release.
>>
>>
>>
>> By bringing Kudu to Apache, we hope to encourage further contribution from
>> the above organizations as well as to engage new users and contributors in
>> the community.
>>
>> === Core Developers ===
>>
>> Kudu was initially developed as a project at Cloudera. Most of the
>> contributions to date have been by developers employed by Cloudera.
>>
>>
>>
>> Many of the developers are committers or PMC members on other Apache
>> projects.
>>
>> === Alignment ===
>>
>> As a project in the big data ecosystem, Kudu is aligned with several other
>> ASF projects. Kudu includes input/output format integration with Apache
>> Hadoop, and this integration can also provide a bridge to Apache Spark. We
>> are planning to integrate with Apache Hive in the near future. We also
>> integrate closely with Cloudera Impala, which is also currently being
>> proposed for incubation. We have also scheduled a hackathon with the Apache
>> Drill team to work on integration with that query engine.
>>
>> == Known Risks ==
>>
>> === Orphaned Products ===
>>
>> The risk of Kudu being abandoned is low. Cloudera has invested a great deal
>> in the initial development of the project, and intends to grow its
>> investment over time as Kudu becomes a product adopted by its customer
>> base. Several other organizations are also experimenting with Kudu for
>> production use cases which would live for many years.
>>
>> === Inexperience with Open Source ===
>>
>> Kudu has been released in the open for less than two months. However, from
>> our very first public announcement we have been committed to open-source
>> style development:
>>
>> * our code reviews are fully public and documented on a mailing list
>> * our daily development chatter is in a public chat room
>> * we send out weekly “community status” reports highlighting news and
>> contributions
>> * we published our entire JIRA history and discuss bugs in the open
>> * we published our entire Git commit history, going back three years (no
>> squashing)
>>
>>
>>
>> Several of the initial committers are experienced open source developers,
>> several being committers and/or PMC members on other ASF projects (Hadoop,
>> HBase, Thrift, Flume, et al). Those who are not ASF committers have
>> experience on non-ASF open source projects (Kiji, open-vm-tools, et al).
>>
>> === Homogenous Developers ===
>>
>> The initial committers are employees or former employees of Cloudera.
>> However, the committers are spread across multiple offices (Palo Alto, San
>> Francisco, Melbourne), so the team is familiar with working in a
>> distributed environment across varied time zones.
>>
>>
>>
>> The project has received some contributions from developers outside of
>> Cloudera, and is starting to attract a ''user'' community as well. We hope
>> to continue to encourage contributions from these developers and community
>> members and grow them into committers after they have had time to continue
>> their contributions.
>>
>> === Reliance on Salaried Developers ===
>>
>> As mentioned above, the majority of development up to this point has been
>> sponsored by Cloudera. We have seen several community users participate in
>> discussions who are hobbyists interested in distributed systems and
>> databases, and hope that they will continue their participation in the
>> project going forward.
>>
>> === Relationships with Other Apache Products ===
>>
>> Kudu is currently related to the following other Apache projects:
>>
>> * Hadoop: Kudu provides MapReduce input/output formats for integration
>> * Spark: Kudu integrates with Spark via the above-mentioned input formats,
>> and work is progressing on support for Spark Data Frames and Spark SQL.
>>
>>
>>
>> The Kudu team has reached out to several other Apache projects to start
>> discussing integrations, including Flume, Kafka, Hive, and Drill.
>>
>>
>>
>> Kudu integrates with Impala, which is also being proposed for incubation.
>>
>>
>>
>> Kudu is already collaborating on ValueVector, a proposed TLP spinning out
>> from the Apache Drill community.
>>
>>
>>
>> We look forward to continuing to integrate and collaborate with these
>> communities.
>>
>> === An Excessive Fascination with the Apache Brand ===
>>
>> Many of the initial committers are already experienced Apache committers,
>> and understand the true value provided by the Apache Way and the principles
>> of the ASF. We believe that this development and contribution model is
>> especially appropriate for storage products, where Apache’s
>> community-over-code philosophy ensures long term viability and
>> consensus-based participation.
>>
>> == Documentation ==
>>
>> * Documentation is written in AsciiDoc and committed in the Kudu source
>> repository:
>>
>> * https://github.com/cloudera/kudu/tree/master/docs
>>
>>
>>
>> * The Kudu web site is version-controlled on the ‘gh-pages’ branch of the
>> above repository.
>>
>> * A LaTeX whitepaper is also published, and the source is available within
>> the same repository.
>> * APIs are documented within the source code as JavaDoc or C++-style
>> documentation comments.
>> * Many design documents are stored within the source code repository as
>> text files next to the code being documented.
>>
>> == Source and Intellectual Property Submission Plan ==
>>
>> The Kudu codebase and web site is currently hosted on GitHub and will be
>> transitioned to the ASF repositories during incubation. Kudu is already
>> licensed under the Apache 2.0 license.
>>
>>
>>
>> Some portions of the code are imported from other open source projects
>> under the Apache 2.0, BSD, or MIT licenses, with copyrights held by authors
>> other than the initial committers. These copyright notices are maintained
>> in those files as well as a top-level NOTICE.txt file. We believe this to
>> be permissible under the license terms and ASF policies, and confirmed via
>> a recent thread on general@incubator.apache.org .
>>
>>
>>
>> The “Kudu” name is not a registered trademark, though before the initial
>> release of the project, we performed a trademark search and Cloudera’s
>> legal counsel deemed it acceptable in the context of a data storage engine.
>> There exists an unrelated open source project by the same name related to
>> deployments on Microsoft’s Azure cloud service. We have been in contact
>> with legal counsel from Microsoft and have obtained their approval for the
>> use of the Kudu name.
>>
>>
>>
>> Cloudera currently owns several domain names related to Kudu (getkudu.io,
>> kududb.io, et al) which will be transferred to the ASF and redirected to
>> the official page during incubation.
>>
>>
>>
>> Portions of Kudu are protected by pending or published patents owned by
>> Cloudera. Given the protections already granted by the Apache License, we
>> do not anticipate any explicit licensing or transfer of this intellectual
>> property.
>>
>> == External Dependencies ==
>>
>> The full set of dependencies and licenses are listed in
>> https://github.com/cloudera/kudu/blob/master/LICENSE.txt
>>
>> and summarized here:
>>
>> * '''Twitter Bootstrap''': Apache 2.0
>> * '''d3''': BSD 3-clause
>> * '''epoch JS library''': MIT
>> * '''lz4''': BSD 2-clause
>> * '''gflags''': BSD 3-clause
>> * '''glog''': BSD 3-clause
>> * '''gperftools''': BSD 3-clause
>> * '''libev''': BSD 2-clause
>> * '''squeasel''':MIT license
>> * '''protobuf''': BSD 3-clause
>> * '''rapidjson''': MIT
>> * '''snappy''': BSD 3-clause
>> * '''trace-viewer''': BSD 3-clause
>> * '''zlib''': zlib license
>> * '''llvm''': University of Illinois/NCSA Open Source (BSD-alike)
>> * '''bitshuffle''': MIT
>> * '''boost''': Boost license
>> * '''curl''': MIT
>> * '''libunwind''': MIT
>> * '''nvml''': BSD 3-clause
>> * '''cyrus-sasl''': Cyrus SASL license (BSD-alike)
>> * '''openssl''': OpenSSL License (BSD-alike)
>>
>> * '''Guava''': Apache 2.0
>> * '''StumbleUpon Async''': BSD
>> * '''Apache Hadoop''': Apache 2.0
>> * '''Apache log4j''': Apache 2.0
>> * '''Netty''': Apache 2.0
>> * '''slf4j''': MIT
>> * '''Apache Commons''': Apache 2.0
>> * '''murmur''': Apache 2.0
>>
>>
>> '''Build/test-only dependencies''':
>>
>> * '''CMake''': BSD 3-clause
>> * '''gcovr''': BSD 3-clause
>> * '''gmock''': BSD 3-clause
>> * '''Apache Maven''': Apache 2.0
>> * '''JUnit''': EPL
>> * '''Mockito''': MIT
>>
>> == Cryptography ==
>>
>> Kudu does not currently include any cryptography-related code.
>>
>> == Required Resources ==
>>
>> === Mailing lists ===
>>
>> * private@kudu.incubator.apache.org (PMC)
>> * commits@kudu.incubator.apache.org (git push emails)
>> * issues@kudu.incubator.apache.org (JIRA issue feed)
>> * dev@kudu.incubator.apache.org (Gerrit code reviews plus dev discussion)
>> * user@kudu.incubator.apache.org (User questions)
>>
>>
>> === Repository ===
>>
>> * git://git.apache.org/kudu
>>
>> === Gerrit ===
>>
>> We hope to continue using Gerrit for our code review and commit workflow.
>> The Kudu team has already been in contact with Jake Farrell to start
>> discussions on how Gerrit can fit into the ASF. We know that several other
>> ASF projects and podlings are also interested in Gerrit.
>>
>>
>>
>> If the Infrastructure team does not have the bandwidth to support Gerrit,
>> we will continue to support our own instance of Gerrit for Kudu, and make
>> the necessary integrations such that commits are properly authenticated and
>> maintain sufficient provenance to uphold the ASF standards (e.g. via the
>> solution adopted by the AsterixDB podling).
>>
>> == Issue Tracking ==
>>
>> We would like to import our current JIRA project into the ASF JIRA, such
>> that our historical commit messages and code comments continue to reference
>> the appropriate bug numbers.
>>
>> == Initial Committers ==
>>
>> * Adar Dembo adar@cloudera.com
>> * Alex Feinberg alex@strlen.net
>> * Andrew Wang wang@apache.org
>> * Dan Burkert dan@cloudera.com
>> * David Alves dralves@apache.org
>> * Jean-Daniel Cryans jdcryans@apache.org
>> * Mike Percy mpercy@apache.org
>> * Misty Stanley-Jones misty@apache.org
>> * Todd Lipcon todd@apache.org
>>
>> The initial list of committers was seeded by listing those contributors who
>> have contributed 20 or more patches in the last 12 months, indicating that
>> they are active and have achieved merit through participation on the
>> project. We chose not to include other contributors who either have not yet
>> contributed a significant number of patches, or whose contributions are far
>> in the past and we don’t expect to be active within the ASF.
>>
>> == Affiliations ==
>>
>> * Adar Dembo - Cloudera
>> * Alex Feinberg - Forward Networks
>> * Andrew Wang - Cloudera
>> * Dan Burkert - Cloudera
>> * David Alves - Cloudera
>> * Jean-Daniel Cryans - Cloudera
>> * Mike Percy - Cloudera
>> * Misty Stanley-Jones - Cloudera
>> * Todd Lipcon - Cloudera
>>
>> == Sponsors ==
>>
>> === Champion ===
>>
>> * Todd Lipcon
>>
>> === Nominated Mentors ===
>>
>> * Jake Farrell - ASF Member and Infra team member, Acquia
>> * Brock Noland - ASF Member, StreamSets
>> * Michael Stack - ASF Member, Cloudera
>> * Jarek Jarcec Cecho - ASF Member, Cloudera
>> * Chris Mattmann - ASF Member, NASA JPL and USC
>> * Julien Le Dem - Incubator PMC, Dremio
>> * Carl Steinbach - ASF Member, LinkedIn
>>
>> === Sponsoring Entity ===
>>
>> The Apache Incubator
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>



-- 
thanks
ashish

Blog: http://www.ashishpaliwal.com/blog
My Photo Galleries: http://www.pbase.com/ashishpaliwal

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: [VOTE] Accept Kudu into the Apache Incubator

Posted by Jarek Jarcec Cecho <ja...@apache.org>.

> [X] +1, accept Kudu into the Incubator

(binding)

Jarcec

> On Nov 24, 2015, at 11:32 AM, Todd Lipcon <to...@apache.org> wrote:
> 
> Hi all,
> 
> Discussion on the [DISCUSS] thread seems to have wound down, so I'd like to
> call a VOTE on acceptance of Kudu into the ASF Incubator. The proposal is
> pasted below and also available on the wiki at:
> https://wiki.apache.org/incubator/KuduProposal
> 
> The proposal is unchanged since the original version, except for the
> addition of Carl Steinbach as a Mentor.
> 
> Please cast your votes:
> 
> [] +1, accept Kudu into the Incubator
> [] +/-0, positive/negative non-counted expression of feelings
> [] -1, do not accept Kudu into the incubator (please state reasoning)
> 
> Given the US holiday this week, I imagine many folks are traveling or
> otherwise offline. So, let's run the vote for a full week rather than the
> traditional 72 hours. Unless the IPMC objects to the extended voting
> period, the vote will close on Tues, Dec 1st at noon PST.
> 
> Thanks
> -Todd
> -----
> 
> = Kudu Proposal =
> 
> == Abstract ==
> 
> Kudu is a distributed columnar storage engine built for the Apache Hadoop
> ecosystem.
> 
> == Proposal ==
> 
> Kudu is an open source storage engine for structured data which supports
> low-latency random access together with efficient analytical access
> patterns. Kudu distributes data using horizontal partitioning and
> replicates each partition using Raft consensus, providing low
> mean-time-to-recovery and low tail latencies. Kudu is designed within the
> context of the Apache Hadoop ecosystem and supports many integrations with
> other data analytics projects both inside and outside of the Apache
> Software Foundation.
> 
> 
> 
> We propose to incubate Kudu as a project of the Apache Software Foundation.
> 
> == Background ==
> 
> In recent years, explosive growth in the amount of data being generated and
> captured by enterprises has resulted in the rapid adoption of open source
> technology which is able to store massive data sets at scale and at low
> cost. In particular, the Apache Hadoop ecosystem has become a focal point
> for such “big data” workloads, because many traditional open source
> database systems have lagged in offering a scalable alternative.
> 
> 
> 
> Structured storage in the Hadoop ecosystem has typically been achieved in
> two ways: for static data sets, data is typically stored on Apache HDFS
> using binary data formats such as Apache Avro or Apache Parquet. However,
> neither HDFS nor these formats has any provision for updating individual
> records, or for efficient random access. Mutable data sets are typically
> stored in semi-structured stores such as Apache HBase or Apache Cassandra.
> These systems allow for low-latency record-level reads and writes, but lag
> far behind the static file formats in terms of sequential read throughput
> for applications such as SQL-based analytics or machine learning.
> 
> 
> 
> Kudu is a new storage system designed and implemented from the ground up to
> fill this gap between high-throughput sequential-access storage systems
> such as HDFS and low-latency random-access systems such as HBase or
> Cassandra. While these existing systems continue to hold advantages in some
> situations, Kudu offers a “happy medium” alternative that can dramatically
> simplify the architecture of many common workloads. In particular, Kudu
> offers a simple API for row-level inserts, updates, and deletes, while
> providing table scans at throughputs similar to Parquet, a commonly-used
> columnar format for static data.
> 
> 
> 
> More information on Kudu can be found at the existing open source project
> website: http://getkudu.io and in particular in the Kudu white-paper PDF:
> http://getkudu.io/kudu.pdf from which the above was excerpted.
> 
> == Rationale ==
> 
> As described above, Kudu fills an important gap in the open source storage
> ecosystem. After our initial open source project release in September 2015,
> we have seen a great amount of interest across a diverse set of users and
> companies. We believe that, as a storage system, it is critical to build an
> equally diverse set of contributors in the development community. Our
> experiences as committers and PMC members on other Apache projects have
> taught us the value of diverse communities in ensuring both longevity and
> high quality for such foundational systems.
> 
> == Initial Goals ==
> 
> * Move the existing codebase, website, documentation, and mailing lists to
> Apache-hosted infrastructure
> * Work with the infrastructure team to implement and approve our code
> review, build, and testing workflows in the context of the ASF
> * Incremental development and releases per Apache guidelines
> 
> == Current Status ==
> 
> ==== Releases ====
> 
> Kudu has undergone one public release, tagged here
> https://github.com/cloudera/kudu/tree/kudu0.5.0-release
> 
> This initial release was not performed in the typical ASF fashion -- no
> source tarball was released, but rather only convenience binaries made
> available in Cloudera’s repositories. We will adopt the ASF source release
> process upon joining the incubator.
> 
> 
> ==== Source ====
> 
> Kudu’s source is currently hosted on GitHub at
> https://github.com/cloudera/kudu
> 
> This repository will be transitioned to Apache’s git hosting during
> incubation.
> 
> 
> 
> ==== Code review ====
> 
> Kudu’s code reviews are currently public and hosted on Gerrit at
> http://gerrit.cloudera.org:8080/#/q/status:open+project:kudu
> 
> The Kudu developer community is very happy with gerrit and hopes to work
> with the Apache Infrastructure team to figure out how we can continue to
> use Gerrit within ASF policies.
> 
> 
> 
> ==== Issue tracking ====
> 
> Kudu’s bug and feature tracking is hosted on JIRA at:
> https://issues.cloudera.org/projects/KUDU/summary
> 
> This JIRA instance contains bugs and development discussion dating back 2
> years prior to Kudu’s open source release and will provide an initial seed
> for the ASF JIRA.
> 
> 
> 
> ==== Community discussion ====
> 
> Kudu has several public discussion forums, linked here:
> http://getkudu.io/community.html
> 
> 
> 
> ==== Build Infrastructure ====
> 
> The Kudu Gerrit instance is configured to only allow patches to be
> committed after running them through an extensive set of pre-commit tests
> and code lints. The project currently makes use of elastic public cloud
> resources to perform these tests. Until this point, these resources have
> been internal to Cloudera, though we are currently investing in moving to a
> publicly accessible infrastructure.
> 
> 
> 
> ==== Development practices ====
> 
> Given that Kudu is a persistent storage engine, the community has a high
> quality bar for contributions to its core. We have a firm belief that high
> quality is achieved through automation, not manual inspection, and hence
> put a focus on thorough testing and build infrastructure to ensure that
> bar. The development community also practices review-then-commit for all
> changes to ensure that changes are accompanied by appropriate tests, are
> well commented, etc.
> 
> Rather than seeing these practices as barriers to contribution, we believe
> that a fully automated and standardized review and testing practice makes
> it easier for new contributors to have patches accepted. Any new developer
> may post a patch to Gerrit using the same workflow as a seasoned
> contributor, and the same suite of tests will be automatically run. If the
> tests pass, a committer can quickly review and commit the contribution from
> their web browser.
> 
> === Meritocracy ===
> 
> We believe strongly in meritocracy in electing committers and PMC members.
> We believe that contributions can come in forms other than just code: for
> example, one of our initial proposed committers has contributed solely in
> the area of project documentation. We will encourage contributions and
> participation of all types, and ensure that contributors are appropriately
> recognized.
> 
> === Community ===
> 
> Though Kudu is relatively new as an open source project, it has already
> seen promising growth in its community across several organizations:
> 
> * '''Cloudera''' is the original development sponsor for Kudu.
> * '''Xiaomi''' has been helping to develop and optimize Kudu for a new
> production use case, contributing code, benchmarks, feedback, and
> conference talks.
> * '''Intel''' has contributed optimizations related to their hardware
> technologies.
> * '''Dropbox''' has been experimenting with Kudu for a machine monitoring
> use case, and has been contributing bug reports and product feedback.
> * '''Dremio''' is working on integration with Apache Drill and exploring
> using Kudu in a production use case.
> * Several community-built Docker images, tutorials, and blog posts have
> sprouted up since Kudu’s release.
> 
> 
> 
> By bringing Kudu to Apache, we hope to encourage further contribution from
> the above organizations as well as to engage new users and contributors in
> the community.
> 
> === Core Developers ===
> 
> Kudu was initially developed as a project at Cloudera. Most of the
> contributions to date have been by developers employed by Cloudera.
> 
> 
> 
> Many of the developers are committers or PMC members on other Apache
> projects.
> 
> === Alignment ===
> 
> As a project in the big data ecosystem, Kudu is aligned with several other
> ASF projects. Kudu includes input/output format integration with Apache
> Hadoop, and this integration can also provide a bridge to Apache Spark. We
> are planning to integrate with Apache Hive in the near future. We also
> integrate closely with Cloudera Impala, which is also currently being
> proposed for incubation. We have also scheduled a hackathon with the Apache
> Drill team to work on integration with that query engine.
> 
> == Known Risks ==
> 
> === Orphaned Products ===
> 
> The risk of Kudu being abandoned is low. Cloudera has invested a great deal
> in the initial development of the project, and intends to grow its
> investment over time as Kudu becomes a product adopted by its customer
> base. Several other organizations are also experimenting with Kudu for
> production use cases which would live for many years.
> 
> === Inexperience with Open Source ===
> 
> Kudu has been released in the open for less than two months. However, from
> our very first public announcement we have been committed to open-source
> style development:
> 
> * our code reviews are fully public and documented on a mailing list
> * our daily development chatter is in a public chat room
> * we send out weekly “community status” reports highlighting news and
> contributions
> * we published our entire JIRA history and discuss bugs in the open
> * we published our entire Git commit history, going back three years (no
> squashing)
> 
> 
> 
> Several of the initial committers are experienced open source developers,
> several being committers and/or PMC members on other ASF projects (Hadoop,
> HBase, Thrift, Flume, et al). Those who are not ASF committers have
> experience on non-ASF open source projects (Kiji, open-vm-tools, et al).
> 
> === Homogenous Developers ===
> 
> The initial committers are employees or former employees of Cloudera.
> However, the committers are spread across multiple offices (Palo Alto, San
> Francisco, Melbourne), so the team is familiar with working in a
> distributed environment across varied time zones.
> 
> 
> 
> The project has received some contributions from developers outside of
> Cloudera, and is starting to attract a ''user'' community as well. We hope
> to continue to encourage contributions from these developers and community
> members and grow them into committers after they have had time to continue
> their contributions.
> 
> === Reliance on Salaried Developers ===
> 
> As mentioned above, the majority of development up to this point has been
> sponsored by Cloudera. We have seen several community users participate in
> discussions who are hobbyists interested in distributed systems and
> databases, and hope that they will continue their participation in the
> project going forward.
> 
> === Relationships with Other Apache Products ===
> 
> Kudu is currently related to the following other Apache projects:
> 
> * Hadoop: Kudu provides MapReduce input/output formats for integration
> * Spark: Kudu integrates with Spark via the above-mentioned input formats,
> and work is progressing on support for Spark Data Frames and Spark SQL.
> 
> 
> 
> The Kudu team has reached out to several other Apache projects to start
> discussing integrations, including Flume, Kafka, Hive, and Drill.
> 
> 
> 
> Kudu integrates with Impala, which is also being proposed for incubation.
> 
> 
> 
> Kudu is already collaborating on ValueVector, a proposed TLP spinning out
> from the Apache Drill community.
> 
> 
> 
> We look forward to continuing to integrate and collaborate with these
> communities.
> 
> === An Excessive Fascination with the Apache Brand ===
> 
> Many of the initial committers are already experienced Apache committers,
> and understand the true value provided by the Apache Way and the principles
> of the ASF. We believe that this development and contribution model is
> especially appropriate for storage products, where Apache’s
> community-over-code philosophy ensures long term viability and
> consensus-based participation.
> 
> == Documentation ==
> 
> * Documentation is written in AsciiDoc and committed in the Kudu source
> repository:
> 
> * https://github.com/cloudera/kudu/tree/master/docs
> 
> 
> 
> * The Kudu web site is version-controlled on the ‘gh-pages’ branch of the
> above repository.
> 
> * A LaTeX whitepaper is also published, and the source is available within
> the same repository.
> * APIs are documented within the source code as JavaDoc or C++-style
> documentation comments.
> * Many design documents are stored within the source code repository as
> text files next to the code being documented.
> 
> == Source and Intellectual Property Submission Plan ==
> 
> The Kudu codebase and web site is currently hosted on GitHub and will be
> transitioned to the ASF repositories during incubation. Kudu is already
> licensed under the Apache 2.0 license.
> 
> 
> 
> Some portions of the code are imported from other open source projects
> under the Apache 2.0, BSD, or MIT licenses, with copyrights held by authors
> other than the initial committers. These copyright notices are maintained
> in those files as well as a top-level NOTICE.txt file. We believe this to
> be permissible under the license terms and ASF policies, and confirmed via
> a recent thread on general@incubator.apache.org .
> 
> 
> 
> The “Kudu” name is not a registered trademark, though before the initial
> release of the project, we performed a trademark search and Cloudera’s
> legal counsel deemed it acceptable in the context of a data storage engine.
> There exists an unrelated open source project by the same name related to
> deployments on Microsoft’s Azure cloud service. We have been in contact
> with legal counsel from Microsoft and have obtained their approval for the
> use of the Kudu name.
> 
> 
> 
> Cloudera currently owns several domain names related to Kudu (getkudu.io,
> kududb.io, et al) which will be transferred to the ASF and redirected to
> the official page during incubation.
> 
> 
> 
> Portions of Kudu are protected by pending or published patents owned by
> Cloudera. Given the protections already granted by the Apache License, we
> do not anticipate any explicit licensing or transfer of this intellectual
> property.
> 
> == External Dependencies ==
> 
> The full set of dependencies and licenses are listed in
> https://github.com/cloudera/kudu/blob/master/LICENSE.txt
> 
> and summarized here:
> 
> * '''Twitter Bootstrap''': Apache 2.0
> * '''d3''': BSD 3-clause
> * '''epoch JS library''': MIT
> * '''lz4''': BSD 2-clause
> * '''gflags''': BSD 3-clause
> * '''glog''': BSD 3-clause
> * '''gperftools''': BSD 3-clause
> * '''libev''': BSD 2-clause
> * '''squeasel''':MIT license
> * '''protobuf''': BSD 3-clause
> * '''rapidjson''': MIT
> * '''snappy''': BSD 3-clause
> * '''trace-viewer''': BSD 3-clause
> * '''zlib''': zlib license
> * '''llvm''': University of Illinois/NCSA Open Source (BSD-alike)
> * '''bitshuffle''': MIT
> * '''boost''': Boost license
> * '''curl''': MIT
> * '''libunwind''': MIT
> * '''nvml''': BSD 3-clause
> * '''cyrus-sasl''': Cyrus SASL license (BSD-alike)
> * '''openssl''': OpenSSL License (BSD-alike)
> 
> * '''Guava''': Apache 2.0
> * '''StumbleUpon Async''': BSD
> * '''Apache Hadoop''': Apache 2.0
> * '''Apache log4j''': Apache 2.0
> * '''Netty''': Apache 2.0
> * '''slf4j''': MIT
> * '''Apache Commons''': Apache 2.0
> * '''murmur''': Apache 2.0
> 
> 
> '''Build/test-only dependencies''':
> 
> * '''CMake''': BSD 3-clause
> * '''gcovr''': BSD 3-clause
> * '''gmock''': BSD 3-clause
> * '''Apache Maven''': Apache 2.0
> * '''JUnit''': EPL
> * '''Mockito''': MIT
> 
> == Cryptography ==
> 
> Kudu does not currently include any cryptography-related code.
> 
> == Required Resources ==
> 
> === Mailing lists ===
> 
> * private@kudu.incubator.apache.org (PMC)
> * commits@kudu.incubator.apache.org (git push emails)
> * issues@kudu.incubator.apache.org (JIRA issue feed)
> * dev@kudu.incubator.apache.org (Gerrit code reviews plus dev discussion)
> * user@kudu.incubator.apache.org (User questions)
> 
> 
> === Repository ===
> 
> * git://git.apache.org/kudu
> 
> === Gerrit ===
> 
> We hope to continue using Gerrit for our code review and commit workflow.
> The Kudu team has already been in contact with Jake Farrell to start
> discussions on how Gerrit can fit into the ASF. We know that several other
> ASF projects and podlings are also interested in Gerrit.
> 
> 
> 
> If the Infrastructure team does not have the bandwidth to support Gerrit,
> we will continue to support our own instance of Gerrit for Kudu, and make
> the necessary integrations such that commits are properly authenticated and
> maintain sufficient provenance to uphold the ASF standards (e.g. via the
> solution adopted by the AsterixDB podling).
> 
> == Issue Tracking ==
> 
> We would like to import our current JIRA project into the ASF JIRA, such
> that our historical commit messages and code comments continue to reference
> the appropriate bug numbers.
> 
> == Initial Committers ==
> 
> * Adar Dembo adar@cloudera.com
> * Alex Feinberg alex@strlen.net
> * Andrew Wang wang@apache.org
> * Dan Burkert dan@cloudera.com
> * David Alves dralves@apache.org
> * Jean-Daniel Cryans jdcryans@apache.org
> * Mike Percy mpercy@apache.org
> * Misty Stanley-Jones misty@apache.org
> * Todd Lipcon todd@apache.org
> 
> The initial list of committers was seeded by listing those contributors who
> have contributed 20 or more patches in the last 12 months, indicating that
> they are active and have achieved merit through participation on the
> project. We chose not to include other contributors who either have not yet
> contributed a significant number of patches, or whose contributions are far
> in the past and we don’t expect to be active within the ASF.
> 
> == Affiliations ==
> 
> * Adar Dembo - Cloudera
> * Alex Feinberg - Forward Networks
> * Andrew Wang - Cloudera
> * Dan Burkert - Cloudera
> * David Alves - Cloudera
> * Jean-Daniel Cryans - Cloudera
> * Mike Percy - Cloudera
> * Misty Stanley-Jones - Cloudera
> * Todd Lipcon - Cloudera
> 
> == Sponsors ==
> 
> === Champion ===
> 
> * Todd Lipcon
> 
> === Nominated Mentors ===
> 
> * Jake Farrell - ASF Member and Infra team member, Acquia
> * Brock Noland - ASF Member, StreamSets
> * Michael Stack - ASF Member, Cloudera
> * Jarek Jarcec Cecho - ASF Member, Cloudera
> * Chris Mattmann - ASF Member, NASA JPL and USC
> * Julien Le Dem - Incubator PMC, Dremio
> * Carl Steinbach - ASF Member, LinkedIn
> 
> === Sponsoring Entity ===
> 
> The Apache Incubator


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: [VOTE] Accept Kudu into the Apache Incubator

Posted by Owen O'Malley <om...@apache.org>.

+1 (binding)

On Tue, Nov 24, 2015 at 9:13 PM, Ralph Goers <ra...@dslextreme.com>
wrote:

> -1 (binding)
> I’d like to see the project start with CTR and use RTC only for specific
> cases (like where tests must be modified, over X (1000 lines?) of code
> added, etc.
>
> I must say I do find the part about achieving quality through automation
> attractive, but following that up with requiring RTC leads me to conclude
> that the project doesn’t really believe that to be true.
>
> Ralph
>
> > On Nov 24, 2015, at 12:32 PM, Todd Lipcon <to...@apache.org> wrote:
> >
> > Hi all,
> >
> > Discussion on the [DISCUSS] thread seems to have wound down, so I'd like
> to
> > call a VOTE on acceptance of Kudu into the ASF Incubator. The proposal is
> > pasted below and also available on the wiki at:
> > https://wiki.apache.org/incubator/KuduProposal
> >
> > The proposal is unchanged since the original version, except for the
> > addition of Carl Steinbach as a Mentor.
> >
> > Please cast your votes:
> >
> > [] +1, accept Kudu into the Incubator
> > [] +/-0, positive/negative non-counted expression of feelings
> > [] -1, do not accept Kudu into the incubator (please state reasoning)
> >
> > Given the US holiday this week, I imagine many folks are traveling or
> > otherwise offline. So, let's run the vote for a full week rather than the
> > traditional 72 hours. Unless the IPMC objects to the extended voting
> > period, the vote will close on Tues, Dec 1st at noon PST.
> >
> > Thanks
> > -Todd
> > -----
> >
> > = Kudu Proposal =
> >
> > == Abstract ==
> >
> > Kudu is a distributed columnar storage engine built for the Apache Hadoop
> > ecosystem.
> >
> > == Proposal ==
> >
> > Kudu is an open source storage engine for structured data which supports
> > low-latency random access together with efficient analytical access
> > patterns. Kudu distributes data using horizontal partitioning and
> > replicates each partition using Raft consensus, providing low
> > mean-time-to-recovery and low tail latencies. Kudu is designed within the
> > context of the Apache Hadoop ecosystem and supports many integrations
> with
> > other data analytics projects both inside and outside of the Apache
> > Software Foundation.
> >
> >
> >
> > We propose to incubate Kudu as a project of the Apache Software
> Foundation.
> >
> > == Background ==
> >
> > In recent years, explosive growth in the amount of data being generated
> and
> > captured by enterprises has resulted in the rapid adoption of open source
> > technology which is able to store massive data sets at scale and at low
> > cost. In particular, the Apache Hadoop ecosystem has become a focal point
> > for such “big data” workloads, because many traditional open source
> > database systems have lagged in offering a scalable alternative.
> >
> >
> >
> > Structured storage in the Hadoop ecosystem has typically been achieved in
> > two ways: for static data sets, data is typically stored on Apache HDFS
> > using binary data formats such as Apache Avro or Apache Parquet. However,
> > neither HDFS nor these formats has any provision for updating individual
> > records, or for efficient random access. Mutable data sets are typically
> > stored in semi-structured stores such as Apache HBase or Apache
> Cassandra.
> > These systems allow for low-latency record-level reads and writes, but
> lag
> > far behind the static file formats in terms of sequential read throughput
> > for applications such as SQL-based analytics or machine learning.
> >
> >
> >
> > Kudu is a new storage system designed and implemented from the ground up
> to
> > fill this gap between high-throughput sequential-access storage systems
> > such as HDFS and low-latency random-access systems such as HBase or
> > Cassandra. While these existing systems continue to hold advantages in
> some
> > situations, Kudu offers a “happy medium” alternative that can
> dramatically
> > simplify the architecture of many common workloads. In particular, Kudu
> > offers a simple API for row-level inserts, updates, and deletes, while
> > providing table scans at throughputs similar to Parquet, a commonly-used
> > columnar format for static data.
> >
> >
> >
> > More information on Kudu can be found at the existing open source project
> > website: http://getkudu.io and in particular in the Kudu white-paper
> PDF:
> > http://getkudu.io/kudu.pdf from which the above was excerpted.
> >
> > == Rationale ==
> >
> > As described above, Kudu fills an important gap in the open source
> storage
> > ecosystem. After our initial open source project release in September
> 2015,
> > we have seen a great amount of interest across a diverse set of users and
> > companies. We believe that, as a storage system, it is critical to build
> an
> > equally diverse set of contributors in the development community. Our
> > experiences as committers and PMC members on other Apache projects have
> > taught us the value of diverse communities in ensuring both longevity and
> > high quality for such foundational systems.
> >
> > == Initial Goals ==
> >
> > * Move the existing codebase, website, documentation, and mailing lists
> to
> > Apache-hosted infrastructure
> > * Work with the infrastructure team to implement and approve our code
> > review, build, and testing workflows in the context of the ASF
> > * Incremental development and releases per Apache guidelines
> >
> > == Current Status ==
> >
> > ==== Releases ====
> >
> > Kudu has undergone one public release, tagged here
> > https://github.com/cloudera/kudu/tree/kudu0.5.0-release
> >
> > This initial release was not performed in the typical ASF fashion -- no
> > source tarball was released, but rather only convenience binaries made
> > available in Cloudera’s repositories. We will adopt the ASF source
> release
> > process upon joining the incubator.
> >
> >
> > ==== Source ====
> >
> > Kudu’s source is currently hosted on GitHub at
> > https://github.com/cloudera/kudu
> >
> > This repository will be transitioned to Apache’s git hosting during
> > incubation.
> >
> >
> >
> > ==== Code review ====
> >
> > Kudu’s code reviews are currently public and hosted on Gerrit at
> > http://gerrit.cloudera.org:8080/#/q/status:open+project:kudu
> >
> > The Kudu developer community is very happy with gerrit and hopes to work
> > with the Apache Infrastructure team to figure out how we can continue to
> > use Gerrit within ASF policies.
> >
> >
> >
> > ==== Issue tracking ====
> >
> > Kudu’s bug and feature tracking is hosted on JIRA at:
> > https://issues.cloudera.org/projects/KUDU/summary
> >
> > This JIRA instance contains bugs and development discussion dating back 2
> > years prior to Kudu’s open source release and will provide an initial
> seed
> > for the ASF JIRA.
> >
> >
> >
> > ==== Community discussion ====
> >
> > Kudu has several public discussion forums, linked here:
> > http://getkudu.io/community.html
> >
> >
> >
> > ==== Build Infrastructure ====
> >
> > The Kudu Gerrit instance is configured to only allow patches to be
> > committed after running them through an extensive set of pre-commit tests
> > and code lints. The project currently makes use of elastic public cloud
> > resources to perform these tests. Until this point, these resources have
> > been internal to Cloudera, though we are currently investing in moving
> to a
> > publicly accessible infrastructure.
> >
> >
> >
> > ==== Development practices ====
> >
> > Given that Kudu is a persistent storage engine, the community has a high
> > quality bar for contributions to its core. We have a firm belief that
> high
> > quality is achieved through automation, not manual inspection, and hence
> > put a focus on thorough testing and build infrastructure to ensure that
> > bar. The development community also practices review-then-commit for all
> > changes to ensure that changes are accompanied by appropriate tests, are
> > well commented, etc.
> >
> > Rather than seeing these practices as barriers to contribution, we
> believe
> > that a fully automated and standardized review and testing practice makes
> > it easier for new contributors to have patches accepted. Any new
> developer
> > may post a patch to Gerrit using the same workflow as a seasoned
> > contributor, and the same suite of tests will be automatically run. If
> the
> > tests pass, a committer can quickly review and commit the contribution
> from
> > their web browser.
> >
> > === Meritocracy ===
> >
> > We believe strongly in meritocracy in electing committers and PMC
> members.
> > We believe that contributions can come in forms other than just code: for
> > example, one of our initial proposed committers has contributed solely in
> > the area of project documentation. We will encourage contributions and
> > participation of all types, and ensure that contributors are
> appropriately
> > recognized.
> >
> > === Community ===
> >
> > Though Kudu is relatively new as an open source project, it has already
> > seen promising growth in its community across several organizations:
> >
> > * '''Cloudera''' is the original development sponsor for Kudu.
> > * '''Xiaomi''' has been helping to develop and optimize Kudu for a new
> > production use case, contributing code, benchmarks, feedback, and
> > conference talks.
> > * '''Intel''' has contributed optimizations related to their hardware
> > technologies.
> > * '''Dropbox''' has been experimenting with Kudu for a machine monitoring
> > use case, and has been contributing bug reports and product feedback.
> > * '''Dremio''' is working on integration with Apache Drill and exploring
> > using Kudu in a production use case.
> > * Several community-built Docker images, tutorials, and blog posts have
> > sprouted up since Kudu’s release.
> >
> >
> >
> > By bringing Kudu to Apache, we hope to encourage further contribution
> from
> > the above organizations as well as to engage new users and contributors
> in
> > the community.
> >
> > === Core Developers ===
> >
> > Kudu was initially developed as a project at Cloudera. Most of the
> > contributions to date have been by developers employed by Cloudera.
> >
> >
> >
> > Many of the developers are committers or PMC members on other Apache
> > projects.
> >
> > === Alignment ===
> >
> > As a project in the big data ecosystem, Kudu is aligned with several
> other
> > ASF projects. Kudu includes input/output format integration with Apache
> > Hadoop, and this integration can also provide a bridge to Apache Spark.
> We
> > are planning to integrate with Apache Hive in the near future. We also
> > integrate closely with Cloudera Impala, which is also currently being
> > proposed for incubation. We have also scheduled a hackathon with the
> Apache
> > Drill team to work on integration with that query engine.
> >
> > == Known Risks ==
> >
> > === Orphaned Products ===
> >
> > The risk of Kudu being abandoned is low. Cloudera has invested a great
> deal
> > in the initial development of the project, and intends to grow its
> > investment over time as Kudu becomes a product adopted by its customer
> > base. Several other organizations are also experimenting with Kudu for
> > production use cases which would live for many years.
> >
> > === Inexperience with Open Source ===
> >
> > Kudu has been released in the open for less than two months. However,
> from
> > our very first public announcement we have been committed to open-source
> > style development:
> >
> > * our code reviews are fully public and documented on a mailing list
> > * our daily development chatter is in a public chat room
> > * we send out weekly “community status” reports highlighting news and
> > contributions
> > * we published our entire JIRA history and discuss bugs in the open
> > * we published our entire Git commit history, going back three years (no
> > squashing)
> >
> >
> >
> > Several of the initial committers are experienced open source developers,
> > several being committers and/or PMC members on other ASF projects
> (Hadoop,
> > HBase, Thrift, Flume, et al). Those who are not ASF committers have
> > experience on non-ASF open source projects (Kiji, open-vm-tools, et al).
> >
> > === Homogenous Developers ===
> >
> > The initial committers are employees or former employees of Cloudera.
> > However, the committers are spread across multiple offices (Palo Alto,
> San
> > Francisco, Melbourne), so the team is familiar with working in a
> > distributed environment across varied time zones.
> >
> >
> >
> > The project has received some contributions from developers outside of
> > Cloudera, and is starting to attract a ''user'' community as well. We
> hope
> > to continue to encourage contributions from these developers and
> community
> > members and grow them into committers after they have had time to
> continue
> > their contributions.
> >
> > === Reliance on Salaried Developers ===
> >
> > As mentioned above, the majority of development up to this point has been
> > sponsored by Cloudera. We have seen several community users participate
> in
> > discussions who are hobbyists interested in distributed systems and
> > databases, and hope that they will continue their participation in the
> > project going forward.
> >
> > === Relationships with Other Apache Products ===
> >
> > Kudu is currently related to the following other Apache projects:
> >
> > * Hadoop: Kudu provides MapReduce input/output formats for integration
> > * Spark: Kudu integrates with Spark via the above-mentioned input
> formats,
> > and work is progressing on support for Spark Data Frames and Spark SQL.
> >
> >
> >
> > The Kudu team has reached out to several other Apache projects to start
> > discussing integrations, including Flume, Kafka, Hive, and Drill.
> >
> >
> >
> > Kudu integrates with Impala, which is also being proposed for incubation.
> >
> >
> >
> > Kudu is already collaborating on ValueVector, a proposed TLP spinning out
> > from the Apache Drill community.
> >
> >
> >
> > We look forward to continuing to integrate and collaborate with these
> > communities.
> >
> > === An Excessive Fascination with the Apache Brand ===
> >
> > Many of the initial committers are already experienced Apache committers,
> > and understand the true value provided by the Apache Way and the
> principles
> > of the ASF. We believe that this development and contribution model is
> > especially appropriate for storage products, where Apache’s
> > community-over-code philosophy ensures long term viability and
> > consensus-based participation.
> >
> > == Documentation ==
> >
> > * Documentation is written in AsciiDoc and committed in the Kudu source
> > repository:
> >
> > * https://github.com/cloudera/kudu/tree/master/docs
> >
> >
> >
> > * The Kudu web site is version-controlled on the ‘gh-pages’ branch of the
> > above repository.
> >
> > * A LaTeX whitepaper is also published, and the source is available
> within
> > the same repository.
> > * APIs are documented within the source code as JavaDoc or C++-style
> > documentation comments.
> > * Many design documents are stored within the source code repository as
> > text files next to the code being documented.
> >
> > == Source and Intellectual Property Submission Plan ==
> >
> > The Kudu codebase and web site is currently hosted on GitHub and will be
> > transitioned to the ASF repositories during incubation. Kudu is already
> > licensed under the Apache 2.0 license.
> >
> >
> >
> > Some portions of the code are imported from other open source projects
> > under the Apache 2.0, BSD, or MIT licenses, with copyrights held by
> authors
> > other than the initial committers. These copyright notices are maintained
> > in those files as well as a top-level NOTICE.txt file. We believe this to
> > be permissible under the license terms and ASF policies, and confirmed
> via
> > a recent thread on general@incubator.apache.org .
> >
> >
> >
> > The “Kudu” name is not a registered trademark, though before the initial
> > release of the project, we performed a trademark search and Cloudera’s
> > legal counsel deemed it acceptable in the context of a data storage
> engine.
> > There exists an unrelated open source project by the same name related to
> > deployments on Microsoft’s Azure cloud service. We have been in contact
> > with legal counsel from Microsoft and have obtained their approval for
> the
> > use of the Kudu name.
> >
> >
> >
> > Cloudera currently owns several domain names related to Kudu (getkudu.io
> ,
> > kududb.io, et al) which will be transferred to the ASF and redirected to
> > the official page during incubation.
> >
> >
> >
> > Portions of Kudu are protected by pending or published patents owned by
> > Cloudera. Given the protections already granted by the Apache License, we
> > do not anticipate any explicit licensing or transfer of this intellectual
> > property.
> >
> > == External Dependencies ==
> >
> > The full set of dependencies and licenses are listed in
> > https://github.com/cloudera/kudu/blob/master/LICENSE.txt
> >
> > and summarized here:
> >
> > * '''Twitter Bootstrap''': Apache 2.0
> > * '''d3''': BSD 3-clause
> > * '''epoch JS library''': MIT
> > * '''lz4''': BSD 2-clause
> > * '''gflags''': BSD 3-clause
> > * '''glog''': BSD 3-clause
> > * '''gperftools''': BSD 3-clause
> > * '''libev''': BSD 2-clause
> > * '''squeasel''':MIT license
> > * '''protobuf''': BSD 3-clause
> > * '''rapidjson''': MIT
> > * '''snappy''': BSD 3-clause
> > * '''trace-viewer''': BSD 3-clause
> > * '''zlib''': zlib license
> > * '''llvm''': University of Illinois/NCSA Open Source (BSD-alike)
> > * '''bitshuffle''': MIT
> > * '''boost''': Boost license
> > * '''curl''': MIT
> > * '''libunwind''': MIT
> > * '''nvml''': BSD 3-clause
> > * '''cyrus-sasl''': Cyrus SASL license (BSD-alike)
> > * '''openssl''': OpenSSL License (BSD-alike)
> >
> > * '''Guava''': Apache 2.0
> > * '''StumbleUpon Async''': BSD
> > * '''Apache Hadoop''': Apache 2.0
> > * '''Apache log4j''': Apache 2.0
> > * '''Netty''': Apache 2.0
> > * '''slf4j''': MIT
> > * '''Apache Commons''': Apache 2.0
> > * '''murmur''': Apache 2.0
> >
> >
> > '''Build/test-only dependencies''':
> >
> > * '''CMake''': BSD 3-clause
> > * '''gcovr''': BSD 3-clause
> > * '''gmock''': BSD 3-clause
> > * '''Apache Maven''': Apache 2.0
> > * '''JUnit''': EPL
> > * '''Mockito''': MIT
> >
> > == Cryptography ==
> >
> > Kudu does not currently include any cryptography-related code.
> >
> > == Required Resources ==
> >
> > === Mailing lists ===
> >
> > * private@kudu.incubator.apache.org (PMC)
> > * commits@kudu.incubator.apache.org (git push emails)
> > * issues@kudu.incubator.apache.org (JIRA issue feed)
> > * dev@kudu.incubator.apache.org (Gerrit code reviews plus dev
> discussion)
> > * user@kudu.incubator.apache.org (User questions)
> >
> >
> > === Repository ===
> >
> > * git://git.apache.org/kudu
> >
> > === Gerrit ===
> >
> > We hope to continue using Gerrit for our code review and commit workflow.
> > The Kudu team has already been in contact with Jake Farrell to start
> > discussions on how Gerrit can fit into the ASF. We know that several
> other
> > ASF projects and podlings are also interested in Gerrit.
> >
> >
> >
> > If the Infrastructure team does not have the bandwidth to support Gerrit,
> > we will continue to support our own instance of Gerrit for Kudu, and make
> > the necessary integrations such that commits are properly authenticated
> and
> > maintain sufficient provenance to uphold the ASF standards (e.g. via the
> > solution adopted by the AsterixDB podling).
> >
> > == Issue Tracking ==
> >
> > We would like to import our current JIRA project into the ASF JIRA, such
> > that our historical commit messages and code comments continue to
> reference
> > the appropriate bug numbers.
> >
> > == Initial Committers ==
> >
> > * Adar Dembo adar@cloudera.com
> > * Alex Feinberg alex@strlen.net
> > * Andrew Wang wang@apache.org
> > * Dan Burkert dan@cloudera.com
> > * David Alves dralves@apache.org
> > * Jean-Daniel Cryans jdcryans@apache.org
> > * Mike Percy mpercy@apache.org
> > * Misty Stanley-Jones misty@apache.org
> > * Todd Lipcon todd@apache.org
> >
> > The initial list of committers was seeded by listing those contributors
> who
> > have contributed 20 or more patches in the last 12 months, indicating
> that
> > they are active and have achieved merit through participation on the
> > project. We chose not to include other contributors who either have not
> yet
> > contributed a significant number of patches, or whose contributions are
> far
> > in the past and we don’t expect to be active within the ASF.
> >
> > == Affiliations ==
> >
> > * Adar Dembo - Cloudera
> > * Alex Feinberg - Forward Networks
> > * Andrew Wang - Cloudera
> > * Dan Burkert - Cloudera
> > * David Alves - Cloudera
> > * Jean-Daniel Cryans - Cloudera
> > * Mike Percy - Cloudera
> > * Misty Stanley-Jones - Cloudera
> > * Todd Lipcon - Cloudera
> >
> > == Sponsors ==
> >
> > === Champion ===
> >
> > * Todd Lipcon
> >
> > === Nominated Mentors ===
> >
> > * Jake Farrell - ASF Member and Infra team member, Acquia
> > * Brock Noland - ASF Member, StreamSets
> > * Michael Stack - ASF Member, Cloudera
> > * Jarek Jarcec Cecho - ASF Member, Cloudera
> > * Chris Mattmann - ASF Member, NASA JPL and USC
> > * Julien Le Dem - Incubator PMC, Dremio
> > * Carl Steinbach - ASF Member, LinkedIn
> >
> > === Sponsoring Entity ===
> >
> > The Apache Incubator
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>
>

Re: [VOTE] Accept Kudu into the Apache Incubator

Posted by Ralph Goers <ra...@dslextreme.com>.

-1 (binding)
I’d like to see the project start with CTR and use RTC only for specific cases (like where tests must be modified, over X (1000 lines?) of code added, etc.

I must say I do find the part about achieving quality through automation attractive, but following that up with requiring RTC leads me to conclude that the project doesn’t really believe that to be true.

Ralph

> On Nov 24, 2015, at 12:32 PM, Todd Lipcon <to...@apache.org> wrote:
> 
> Hi all,
> 
> Discussion on the [DISCUSS] thread seems to have wound down, so I'd like to
> call a VOTE on acceptance of Kudu into the ASF Incubator. The proposal is
> pasted below and also available on the wiki at:
> https://wiki.apache.org/incubator/KuduProposal
> 
> The proposal is unchanged since the original version, except for the
> addition of Carl Steinbach as a Mentor.
> 
> Please cast your votes:
> 
> [] +1, accept Kudu into the Incubator
> [] +/-0, positive/negative non-counted expression of feelings
> [] -1, do not accept Kudu into the incubator (please state reasoning)
> 
> Given the US holiday this week, I imagine many folks are traveling or
> otherwise offline. So, let's run the vote for a full week rather than the
> traditional 72 hours. Unless the IPMC objects to the extended voting
> period, the vote will close on Tues, Dec 1st at noon PST.
> 
> Thanks
> -Todd
> -----
> 
> = Kudu Proposal =
> 
> == Abstract ==
> 
> Kudu is a distributed columnar storage engine built for the Apache Hadoop
> ecosystem.
> 
> == Proposal ==
> 
> Kudu is an open source storage engine for structured data which supports
> low-latency random access together with efficient analytical access
> patterns. Kudu distributes data using horizontal partitioning and
> replicates each partition using Raft consensus, providing low
> mean-time-to-recovery and low tail latencies. Kudu is designed within the
> context of the Apache Hadoop ecosystem and supports many integrations with
> other data analytics projects both inside and outside of the Apache
> Software Foundation.
> 
> 
> 
> We propose to incubate Kudu as a project of the Apache Software Foundation.
> 
> == Background ==
> 
> In recent years, explosive growth in the amount of data being generated and
> captured by enterprises has resulted in the rapid adoption of open source
> technology which is able to store massive data sets at scale and at low
> cost. In particular, the Apache Hadoop ecosystem has become a focal point
> for such “big data” workloads, because many traditional open source
> database systems have lagged in offering a scalable alternative.
> 
> 
> 
> Structured storage in the Hadoop ecosystem has typically been achieved in
> two ways: for static data sets, data is typically stored on Apache HDFS
> using binary data formats such as Apache Avro or Apache Parquet. However,
> neither HDFS nor these formats has any provision for updating individual
> records, or for efficient random access. Mutable data sets are typically
> stored in semi-structured stores such as Apache HBase or Apache Cassandra.
> These systems allow for low-latency record-level reads and writes, but lag
> far behind the static file formats in terms of sequential read throughput
> for applications such as SQL-based analytics or machine learning.
> 
> 
> 
> Kudu is a new storage system designed and implemented from the ground up to
> fill this gap between high-throughput sequential-access storage systems
> such as HDFS and low-latency random-access systems such as HBase or
> Cassandra. While these existing systems continue to hold advantages in some
> situations, Kudu offers a “happy medium” alternative that can dramatically
> simplify the architecture of many common workloads. In particular, Kudu
> offers a simple API for row-level inserts, updates, and deletes, while
> providing table scans at throughputs similar to Parquet, a commonly-used
> columnar format for static data.
> 
> 
> 
> More information on Kudu can be found at the existing open source project
> website: http://getkudu.io and in particular in the Kudu white-paper PDF:
> http://getkudu.io/kudu.pdf from which the above was excerpted.
> 
> == Rationale ==
> 
> As described above, Kudu fills an important gap in the open source storage
> ecosystem. After our initial open source project release in September 2015,
> we have seen a great amount of interest across a diverse set of users and
> companies. We believe that, as a storage system, it is critical to build an
> equally diverse set of contributors in the development community. Our
> experiences as committers and PMC members on other Apache projects have
> taught us the value of diverse communities in ensuring both longevity and
> high quality for such foundational systems.
> 
> == Initial Goals ==
> 
> * Move the existing codebase, website, documentation, and mailing lists to
> Apache-hosted infrastructure
> * Work with the infrastructure team to implement and approve our code
> review, build, and testing workflows in the context of the ASF
> * Incremental development and releases per Apache guidelines
> 
> == Current Status ==
> 
> ==== Releases ====
> 
> Kudu has undergone one public release, tagged here
> https://github.com/cloudera/kudu/tree/kudu0.5.0-release
> 
> This initial release was not performed in the typical ASF fashion -- no
> source tarball was released, but rather only convenience binaries made
> available in Cloudera’s repositories. We will adopt the ASF source release
> process upon joining the incubator.
> 
> 
> ==== Source ====
> 
> Kudu’s source is currently hosted on GitHub at
> https://github.com/cloudera/kudu
> 
> This repository will be transitioned to Apache’s git hosting during
> incubation.
> 
> 
> 
> ==== Code review ====
> 
> Kudu’s code reviews are currently public and hosted on Gerrit at
> http://gerrit.cloudera.org:8080/#/q/status:open+project:kudu
> 
> The Kudu developer community is very happy with gerrit and hopes to work
> with the Apache Infrastructure team to figure out how we can continue to
> use Gerrit within ASF policies.
> 
> 
> 
> ==== Issue tracking ====
> 
> Kudu’s bug and feature tracking is hosted on JIRA at:
> https://issues.cloudera.org/projects/KUDU/summary
> 
> This JIRA instance contains bugs and development discussion dating back 2
> years prior to Kudu’s open source release and will provide an initial seed
> for the ASF JIRA.
> 
> 
> 
> ==== Community discussion ====
> 
> Kudu has several public discussion forums, linked here:
> http://getkudu.io/community.html
> 
> 
> 
> ==== Build Infrastructure ====
> 
> The Kudu Gerrit instance is configured to only allow patches to be
> committed after running them through an extensive set of pre-commit tests
> and code lints. The project currently makes use of elastic public cloud
> resources to perform these tests. Until this point, these resources have
> been internal to Cloudera, though we are currently investing in moving to a
> publicly accessible infrastructure.
> 
> 
> 
> ==== Development practices ====
> 
> Given that Kudu is a persistent storage engine, the community has a high
> quality bar for contributions to its core. We have a firm belief that high
> quality is achieved through automation, not manual inspection, and hence
> put a focus on thorough testing and build infrastructure to ensure that
> bar. The development community also practices review-then-commit for all
> changes to ensure that changes are accompanied by appropriate tests, are
> well commented, etc.
> 
> Rather than seeing these practices as barriers to contribution, we believe
> that a fully automated and standardized review and testing practice makes
> it easier for new contributors to have patches accepted. Any new developer
> may post a patch to Gerrit using the same workflow as a seasoned
> contributor, and the same suite of tests will be automatically run. If the
> tests pass, a committer can quickly review and commit the contribution from
> their web browser.
> 
> === Meritocracy ===
> 
> We believe strongly in meritocracy in electing committers and PMC members.
> We believe that contributions can come in forms other than just code: for
> example, one of our initial proposed committers has contributed solely in
> the area of project documentation. We will encourage contributions and
> participation of all types, and ensure that contributors are appropriately
> recognized.
> 
> === Community ===
> 
> Though Kudu is relatively new as an open source project, it has already
> seen promising growth in its community across several organizations:
> 
> * '''Cloudera''' is the original development sponsor for Kudu.
> * '''Xiaomi''' has been helping to develop and optimize Kudu for a new
> production use case, contributing code, benchmarks, feedback, and
> conference talks.
> * '''Intel''' has contributed optimizations related to their hardware
> technologies.
> * '''Dropbox''' has been experimenting with Kudu for a machine monitoring
> use case, and has been contributing bug reports and product feedback.
> * '''Dremio''' is working on integration with Apache Drill and exploring
> using Kudu in a production use case.
> * Several community-built Docker images, tutorials, and blog posts have
> sprouted up since Kudu’s release.
> 
> 
> 
> By bringing Kudu to Apache, we hope to encourage further contribution from
> the above organizations as well as to engage new users and contributors in
> the community.
> 
> === Core Developers ===
> 
> Kudu was initially developed as a project at Cloudera. Most of the
> contributions to date have been by developers employed by Cloudera.
> 
> 
> 
> Many of the developers are committers or PMC members on other Apache
> projects.
> 
> === Alignment ===
> 
> As a project in the big data ecosystem, Kudu is aligned with several other
> ASF projects. Kudu includes input/output format integration with Apache
> Hadoop, and this integration can also provide a bridge to Apache Spark. We
> are planning to integrate with Apache Hive in the near future. We also
> integrate closely with Cloudera Impala, which is also currently being
> proposed for incubation. We have also scheduled a hackathon with the Apache
> Drill team to work on integration with that query engine.
> 
> == Known Risks ==
> 
> === Orphaned Products ===
> 
> The risk of Kudu being abandoned is low. Cloudera has invested a great deal
> in the initial development of the project, and intends to grow its
> investment over time as Kudu becomes a product adopted by its customer
> base. Several other organizations are also experimenting with Kudu for
> production use cases which would live for many years.
> 
> === Inexperience with Open Source ===
> 
> Kudu has been released in the open for less than two months. However, from
> our very first public announcement we have been committed to open-source
> style development:
> 
> * our code reviews are fully public and documented on a mailing list
> * our daily development chatter is in a public chat room
> * we send out weekly “community status” reports highlighting news and
> contributions
> * we published our entire JIRA history and discuss bugs in the open
> * we published our entire Git commit history, going back three years (no
> squashing)
> 
> 
> 
> Several of the initial committers are experienced open source developers,
> several being committers and/or PMC members on other ASF projects (Hadoop,
> HBase, Thrift, Flume, et al). Those who are not ASF committers have
> experience on non-ASF open source projects (Kiji, open-vm-tools, et al).
> 
> === Homogenous Developers ===
> 
> The initial committers are employees or former employees of Cloudera.
> However, the committers are spread across multiple offices (Palo Alto, San
> Francisco, Melbourne), so the team is familiar with working in a
> distributed environment across varied time zones.
> 
> 
> 
> The project has received some contributions from developers outside of
> Cloudera, and is starting to attract a ''user'' community as well. We hope
> to continue to encourage contributions from these developers and community
> members and grow them into committers after they have had time to continue
> their contributions.
> 
> === Reliance on Salaried Developers ===
> 
> As mentioned above, the majority of development up to this point has been
> sponsored by Cloudera. We have seen several community users participate in
> discussions who are hobbyists interested in distributed systems and
> databases, and hope that they will continue their participation in the
> project going forward.
> 
> === Relationships with Other Apache Products ===
> 
> Kudu is currently related to the following other Apache projects:
> 
> * Hadoop: Kudu provides MapReduce input/output formats for integration
> * Spark: Kudu integrates with Spark via the above-mentioned input formats,
> and work is progressing on support for Spark Data Frames and Spark SQL.
> 
> 
> 
> The Kudu team has reached out to several other Apache projects to start
> discussing integrations, including Flume, Kafka, Hive, and Drill.
> 
> 
> 
> Kudu integrates with Impala, which is also being proposed for incubation.
> 
> 
> 
> Kudu is already collaborating on ValueVector, a proposed TLP spinning out
> from the Apache Drill community.
> 
> 
> 
> We look forward to continuing to integrate and collaborate with these
> communities.
> 
> === An Excessive Fascination with the Apache Brand ===
> 
> Many of the initial committers are already experienced Apache committers,
> and understand the true value provided by the Apache Way and the principles
> of the ASF. We believe that this development and contribution model is
> especially appropriate for storage products, where Apache’s
> community-over-code philosophy ensures long term viability and
> consensus-based participation.
> 
> == Documentation ==
> 
> * Documentation is written in AsciiDoc and committed in the Kudu source
> repository:
> 
> * https://github.com/cloudera/kudu/tree/master/docs
> 
> 
> 
> * The Kudu web site is version-controlled on the ‘gh-pages’ branch of the
> above repository.
> 
> * A LaTeX whitepaper is also published, and the source is available within
> the same repository.
> * APIs are documented within the source code as JavaDoc or C++-style
> documentation comments.
> * Many design documents are stored within the source code repository as
> text files next to the code being documented.
> 
> == Source and Intellectual Property Submission Plan ==
> 
> The Kudu codebase and web site is currently hosted on GitHub and will be
> transitioned to the ASF repositories during incubation. Kudu is already
> licensed under the Apache 2.0 license.
> 
> 
> 
> Some portions of the code are imported from other open source projects
> under the Apache 2.0, BSD, or MIT licenses, with copyrights held by authors
> other than the initial committers. These copyright notices are maintained
> in those files as well as a top-level NOTICE.txt file. We believe this to
> be permissible under the license terms and ASF policies, and confirmed via
> a recent thread on general@incubator.apache.org .
> 
> 
> 
> The “Kudu” name is not a registered trademark, though before the initial
> release of the project, we performed a trademark search and Cloudera’s
> legal counsel deemed it acceptable in the context of a data storage engine.
> There exists an unrelated open source project by the same name related to
> deployments on Microsoft’s Azure cloud service. We have been in contact
> with legal counsel from Microsoft and have obtained their approval for the
> use of the Kudu name.
> 
> 
> 
> Cloudera currently owns several domain names related to Kudu (getkudu.io,
> kududb.io, et al) which will be transferred to the ASF and redirected to
> the official page during incubation.
> 
> 
> 
> Portions of Kudu are protected by pending or published patents owned by
> Cloudera. Given the protections already granted by the Apache License, we
> do not anticipate any explicit licensing or transfer of this intellectual
> property.
> 
> == External Dependencies ==
> 
> The full set of dependencies and licenses are listed in
> https://github.com/cloudera/kudu/blob/master/LICENSE.txt
> 
> and summarized here:
> 
> * '''Twitter Bootstrap''': Apache 2.0
> * '''d3''': BSD 3-clause
> * '''epoch JS library''': MIT
> * '''lz4''': BSD 2-clause
> * '''gflags''': BSD 3-clause
> * '''glog''': BSD 3-clause
> * '''gperftools''': BSD 3-clause
> * '''libev''': BSD 2-clause
> * '''squeasel''':MIT license
> * '''protobuf''': BSD 3-clause
> * '''rapidjson''': MIT
> * '''snappy''': BSD 3-clause
> * '''trace-viewer''': BSD 3-clause
> * '''zlib''': zlib license
> * '''llvm''': University of Illinois/NCSA Open Source (BSD-alike)
> * '''bitshuffle''': MIT
> * '''boost''': Boost license
> * '''curl''': MIT
> * '''libunwind''': MIT
> * '''nvml''': BSD 3-clause
> * '''cyrus-sasl''': Cyrus SASL license (BSD-alike)
> * '''openssl''': OpenSSL License (BSD-alike)
> 
> * '''Guava''': Apache 2.0
> * '''StumbleUpon Async''': BSD
> * '''Apache Hadoop''': Apache 2.0
> * '''Apache log4j''': Apache 2.0
> * '''Netty''': Apache 2.0
> * '''slf4j''': MIT
> * '''Apache Commons''': Apache 2.0
> * '''murmur''': Apache 2.0
> 
> 
> '''Build/test-only dependencies''':
> 
> * '''CMake''': BSD 3-clause
> * '''gcovr''': BSD 3-clause
> * '''gmock''': BSD 3-clause
> * '''Apache Maven''': Apache 2.0
> * '''JUnit''': EPL
> * '''Mockito''': MIT
> 
> == Cryptography ==
> 
> Kudu does not currently include any cryptography-related code.
> 
> == Required Resources ==
> 
> === Mailing lists ===
> 
> * private@kudu.incubator.apache.org (PMC)
> * commits@kudu.incubator.apache.org (git push emails)
> * issues@kudu.incubator.apache.org (JIRA issue feed)
> * dev@kudu.incubator.apache.org (Gerrit code reviews plus dev discussion)
> * user@kudu.incubator.apache.org (User questions)
> 
> 
> === Repository ===
> 
> * git://git.apache.org/kudu
> 
> === Gerrit ===
> 
> We hope to continue using Gerrit for our code review and commit workflow.
> The Kudu team has already been in contact with Jake Farrell to start
> discussions on how Gerrit can fit into the ASF. We know that several other
> ASF projects and podlings are also interested in Gerrit.
> 
> 
> 
> If the Infrastructure team does not have the bandwidth to support Gerrit,
> we will continue to support our own instance of Gerrit for Kudu, and make
> the necessary integrations such that commits are properly authenticated and
> maintain sufficient provenance to uphold the ASF standards (e.g. via the
> solution adopted by the AsterixDB podling).
> 
> == Issue Tracking ==
> 
> We would like to import our current JIRA project into the ASF JIRA, such
> that our historical commit messages and code comments continue to reference
> the appropriate bug numbers.
> 
> == Initial Committers ==
> 
> * Adar Dembo adar@cloudera.com
> * Alex Feinberg alex@strlen.net
> * Andrew Wang wang@apache.org
> * Dan Burkert dan@cloudera.com
> * David Alves dralves@apache.org
> * Jean-Daniel Cryans jdcryans@apache.org
> * Mike Percy mpercy@apache.org
> * Misty Stanley-Jones misty@apache.org
> * Todd Lipcon todd@apache.org
> 
> The initial list of committers was seeded by listing those contributors who
> have contributed 20 or more patches in the last 12 months, indicating that
> they are active and have achieved merit through participation on the
> project. We chose not to include other contributors who either have not yet
> contributed a significant number of patches, or whose contributions are far
> in the past and we don’t expect to be active within the ASF.
> 
> == Affiliations ==
> 
> * Adar Dembo - Cloudera
> * Alex Feinberg - Forward Networks
> * Andrew Wang - Cloudera
> * Dan Burkert - Cloudera
> * David Alves - Cloudera
> * Jean-Daniel Cryans - Cloudera
> * Mike Percy - Cloudera
> * Misty Stanley-Jones - Cloudera
> * Todd Lipcon - Cloudera
> 
> == Sponsors ==
> 
> === Champion ===
> 
> * Todd Lipcon
> 
> === Nominated Mentors ===
> 
> * Jake Farrell - ASF Member and Infra team member, Acquia
> * Brock Noland - ASF Member, StreamSets
> * Michael Stack - ASF Member, Cloudera
> * Jarek Jarcec Cecho - ASF Member, Cloudera
> * Chris Mattmann - ASF Member, NASA JPL and USC
> * Julien Le Dem - Incubator PMC, Dremio
> * Carl Steinbach - ASF Member, LinkedIn
> 
> === Sponsoring Entity ===
> 
> The Apache Incubator



---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: [VOTE] Accept Kudu into the Apache Incubator

Posted by Rob Vesse <rv...@dotnetrdf.org>.

+1 (binding)

Rob

On 24/11/2015 19:32, "Todd Lipcon" <todd@cloudera.com on behalf of
todd@apache.org> wrote:

>Hi all,
>
>Discussion on the [DISCUSS] thread seems to have wound down, so I'd like
>to
>call a VOTE on acceptance of Kudu into the ASF Incubator. The proposal is
>pasted below and also available on the wiki at:
>https://wiki.apache.org/incubator/KuduProposal
>
>The proposal is unchanged since the original version, except for the
>addition of Carl Steinbach as a Mentor.
>
>Please cast your votes:
>
>[] +1, accept Kudu into the Incubator
>[] +/-0, positive/negative non-counted expression of feelings
>[] -1, do not accept Kudu into the incubator (please state reasoning)
>
>Given the US holiday this week, I imagine many folks are traveling or
>otherwise offline. So, let's run the vote for a full week rather than the
>traditional 72 hours. Unless the IPMC objects to the extended voting
>period, the vote will close on Tues, Dec 1st at noon PST.
>
>Thanks
>-Todd
>-----
>
>= Kudu Proposal =
>
>== Abstract ==
>
>Kudu is a distributed columnar storage engine built for the Apache Hadoop
>ecosystem.
>
>== Proposal ==
>
>Kudu is an open source storage engine for structured data which supports
>low-latency random access together with efficient analytical access
>patterns. Kudu distributes data using horizontal partitioning and
>replicates each partition using Raft consensus, providing low
>mean-time-to-recovery and low tail latencies. Kudu is designed within the
>context of the Apache Hadoop ecosystem and supports many integrations with
>other data analytics projects both inside and outside of the Apache
>Software Foundation.
>
>
>
>We propose to incubate Kudu as a project of the Apache Software
>Foundation.
>
>== Background ==
>
>In recent years, explosive growth in the amount of data being generated
>and
>captured by enterprises has resulted in the rapid adoption of open source
>technology which is able to store massive data sets at scale and at low
>cost. In particular, the Apache Hadoop ecosystem has become a focal point
>for such “big data” workloads, because many traditional open source
>database systems have lagged in offering a scalable alternative.
>
>
>
>Structured storage in the Hadoop ecosystem has typically been achieved in
>two ways: for static data sets, data is typically stored on Apache HDFS
>using binary data formats such as Apache Avro or Apache Parquet. However,
>neither HDFS nor these formats has any provision for updating individual
>records, or for efficient random access. Mutable data sets are typically
>stored in semi-structured stores such as Apache HBase or Apache Cassandra.
>These systems allow for low-latency record-level reads and writes, but lag
>far behind the static file formats in terms of sequential read throughput
>for applications such as SQL-based analytics or machine learning.
>
>
>
>Kudu is a new storage system designed and implemented from the ground up
>to
>fill this gap between high-throughput sequential-access storage systems
>such as HDFS and low-latency random-access systems such as HBase or
>Cassandra. While these existing systems continue to hold advantages in
>some
>situations, Kudu offers a “happy medium” alternative that can dramatically
>simplify the architecture of many common workloads. In particular, Kudu
>offers a simple API for row-level inserts, updates, and deletes, while
>providing table scans at throughputs similar to Parquet, a commonly-used
>columnar format for static data.
>
>
>
>More information on Kudu can be found at the existing open source project
>website: http://getkudu.io and in particular in the Kudu white-paper PDF:
>http://getkudu.io/kudu.pdf from which the above was excerpted.
>
>== Rationale ==
>
>As described above, Kudu fills an important gap in the open source storage
>ecosystem. After our initial open source project release in September
>2015,
>we have seen a great amount of interest across a diverse set of users and
>companies. We believe that, as a storage system, it is critical to build
>an
>equally diverse set of contributors in the development community. Our
>experiences as committers and PMC members on other Apache projects have
>taught us the value of diverse communities in ensuring both longevity and
>high quality for such foundational systems.
>
>== Initial Goals ==
>
> * Move the existing codebase, website, documentation, and mailing lists
>to
>Apache-hosted infrastructure
> * Work with the infrastructure team to implement and approve our code
>review, build, and testing workflows in the context of the ASF
> * Incremental development and releases per Apache guidelines
>
>== Current Status ==
>
>==== Releases ====
>
>Kudu has undergone one public release, tagged here
>https://github.com/cloudera/kudu/tree/kudu0.5.0-release
>
>This initial release was not performed in the typical ASF fashion -- no
>source tarball was released, but rather only convenience binaries made
>available in Cloudera’s repositories. We will adopt the ASF source release
>process upon joining the incubator.
>
>
>==== Source ====
>
>Kudu’s source is currently hosted on GitHub at
>https://github.com/cloudera/kudu
>
>This repository will be transitioned to Apache’s git hosting during
>incubation.
>
>
>
>==== Code review ====
>
>Kudu’s code reviews are currently public and hosted on Gerrit at
>http://gerrit.cloudera.org:8080/#/q/status:open+project:kudu
>
>The Kudu developer community is very happy with gerrit and hopes to work
>with the Apache Infrastructure team to figure out how we can continue to
>use Gerrit within ASF policies.
>
>
>
>==== Issue tracking ====
>
>Kudu’s bug and feature tracking is hosted on JIRA at:
>https://issues.cloudera.org/projects/KUDU/summary
>
>This JIRA instance contains bugs and development discussion dating back 2
>years prior to Kudu’s open source release and will provide an initial seed
>for the ASF JIRA.
>
>
>
>==== Community discussion ====
>
>Kudu has several public discussion forums, linked here:
>http://getkudu.io/community.html
>
>
>
>==== Build Infrastructure ====
>
>The Kudu Gerrit instance is configured to only allow patches to be
>committed after running them through an extensive set of pre-commit tests
>and code lints. The project currently makes use of elastic public cloud
>resources to perform these tests. Until this point, these resources have
>been internal to Cloudera, though we are currently investing in moving to
>a
>publicly accessible infrastructure.
>
>
>
>==== Development practices ====
>
>Given that Kudu is a persistent storage engine, the community has a high
>quality bar for contributions to its core. We have a firm belief that high
>quality is achieved through automation, not manual inspection, and hence
>put a focus on thorough testing and build infrastructure to ensure that
>bar. The development community also practices review-then-commit for all
>changes to ensure that changes are accompanied by appropriate tests, are
>well commented, etc.
>
>Rather than seeing these practices as barriers to contribution, we believe
>that a fully automated and standardized review and testing practice makes
>it easier for new contributors to have patches accepted. Any new developer
>may post a patch to Gerrit using the same workflow as a seasoned
>contributor, and the same suite of tests will be automatically run. If the
>tests pass, a committer can quickly review and commit the contribution
>from
>their web browser.
>
>=== Meritocracy ===
>
>We believe strongly in meritocracy in electing committers and PMC members.
>We believe that contributions can come in forms other than just code: for
>example, one of our initial proposed committers has contributed solely in
>the area of project documentation. We will encourage contributions and
>participation of all types, and ensure that contributors are appropriately
>recognized.
>
>=== Community ===
>
>Though Kudu is relatively new as an open source project, it has already
>seen promising growth in its community across several organizations:
>
> * '''Cloudera''' is the original development sponsor for Kudu.
> * '''Xiaomi''' has been helping to develop and optimize Kudu for a new
>production use case, contributing code, benchmarks, feedback, and
>conference talks.
> * '''Intel''' has contributed optimizations related to their hardware
>technologies.
> * '''Dropbox''' has been experimenting with Kudu for a machine monitoring
>use case, and has been contributing bug reports and product feedback.
> * '''Dremio''' is working on integration with Apache Drill and exploring
>using Kudu in a production use case.
> * Several community-built Docker images, tutorials, and blog posts have
>sprouted up since Kudu’s release.
>
>
>
>By bringing Kudu to Apache, we hope to encourage further contribution from
>the above organizations as well as to engage new users and contributors in
>the community.
>
>=== Core Developers ===
>
>Kudu was initially developed as a project at Cloudera. Most of the
>contributions to date have been by developers employed by Cloudera.
>
>
>
>Many of the developers are committers or PMC members on other Apache
>projects.
>
>=== Alignment ===
>
>As a project in the big data ecosystem, Kudu is aligned with several other
>ASF projects. Kudu includes input/output format integration with Apache
>Hadoop, and this integration can also provide a bridge to Apache Spark. We
>are planning to integrate with Apache Hive in the near future. We also
>integrate closely with Cloudera Impala, which is also currently being
>proposed for incubation. We have also scheduled a hackathon with the
>Apache
>Drill team to work on integration with that query engine.
>
>== Known Risks ==
>
>=== Orphaned Products ===
>
>The risk of Kudu being abandoned is low. Cloudera has invested a great
>deal
>in the initial development of the project, and intends to grow its
>investment over time as Kudu becomes a product adopted by its customer
>base. Several other organizations are also experimenting with Kudu for
>production use cases which would live for many years.
>
>=== Inexperience with Open Source ===
>
>Kudu has been released in the open for less than two months. However, from
>our very first public announcement we have been committed to open-source
>style development:
>
> * our code reviews are fully public and documented on a mailing list
> * our daily development chatter is in a public chat room
> * we send out weekly “community status” reports highlighting news and
>contributions
> * we published our entire JIRA history and discuss bugs in the open
> * we published our entire Git commit history, going back three years (no
>squashing)
>
>
>
>Several of the initial committers are experienced open source developers,
>several being committers and/or PMC members on other ASF projects (Hadoop,
>HBase, Thrift, Flume, et al). Those who are not ASF committers have
>experience on non-ASF open source projects (Kiji, open-vm-tools, et al).
>
>=== Homogenous Developers ===
>
>The initial committers are employees or former employees of Cloudera.
>However, the committers are spread across multiple offices (Palo Alto, San
>Francisco, Melbourne), so the team is familiar with working in a
>distributed environment across varied time zones.
>
>
>
>The project has received some contributions from developers outside of
>Cloudera, and is starting to attract a ''user'' community as well. We hope
>to continue to encourage contributions from these developers and community
>members and grow them into committers after they have had time to continue
>their contributions.
>
>=== Reliance on Salaried Developers ===
>
>As mentioned above, the majority of development up to this point has been
>sponsored by Cloudera. We have seen several community users participate in
>discussions who are hobbyists interested in distributed systems and
>databases, and hope that they will continue their participation in the
>project going forward.
>
>=== Relationships with Other Apache Products ===
>
>Kudu is currently related to the following other Apache projects:
>
> * Hadoop: Kudu provides MapReduce input/output formats for integration
> * Spark: Kudu integrates with Spark via the above-mentioned input
>formats,
>and work is progressing on support for Spark Data Frames and Spark SQL.
>
>
>
>The Kudu team has reached out to several other Apache projects to start
>discussing integrations, including Flume, Kafka, Hive, and Drill.
>
>
>
>Kudu integrates with Impala, which is also being proposed for incubation.
>
>
>
>Kudu is already collaborating on ValueVector, a proposed TLP spinning out
>from the Apache Drill community.
>
>
>
>We look forward to continuing to integrate and collaborate with these
>communities.
>
>=== An Excessive Fascination with the Apache Brand ===
>
>Many of the initial committers are already experienced Apache committers,
>and understand the true value provided by the Apache Way and the
>principles
>of the ASF. We believe that this development and contribution model is
>especially appropriate for storage products, where Apache’s
>community-over-code philosophy ensures long term viability and
>consensus-based participation.
>
>== Documentation ==
>
> * Documentation is written in AsciiDoc and committed in the Kudu source
>repository:
>
> * https://github.com/cloudera/kudu/tree/master/docs
>
>
>
> * The Kudu web site is version-controlled on the ‘gh-pages’ branch of the
>above repository.
>
> * A LaTeX whitepaper is also published, and the source is available
>within
>the same repository.
> * APIs are documented within the source code as JavaDoc or C++-style
>documentation comments.
> * Many design documents are stored within the source code repository as
>text files next to the code being documented.
>
>== Source and Intellectual Property Submission Plan ==
>
>The Kudu codebase and web site is currently hosted on GitHub and will be
>transitioned to the ASF repositories during incubation. Kudu is already
>licensed under the Apache 2.0 license.
>
>
>
>Some portions of the code are imported from other open source projects
>under the Apache 2.0, BSD, or MIT licenses, with copyrights held by
>authors
>other than the initial committers. These copyright notices are maintained
>in those files as well as a top-level NOTICE.txt file. We believe this to
>be permissible under the license terms and ASF policies, and confirmed via
>a recent thread on general@incubator.apache.org .
>
>
>
>The “Kudu” name is not a registered trademark, though before the initial
>release of the project, we performed a trademark search and Cloudera’s
>legal counsel deemed it acceptable in the context of a data storage
>engine.
>There exists an unrelated open source project by the same name related to
>deployments on Microsoft’s Azure cloud service. We have been in contact
>with legal counsel from Microsoft and have obtained their approval for the
>use of the Kudu name.
>
>
>
>Cloudera currently owns several domain names related to Kudu (getkudu.io,
>kududb.io, et al) which will be transferred to the ASF and redirected to
>the official page during incubation.
>
>
>
>Portions of Kudu are protected by pending or published patents owned by
>Cloudera. Given the protections already granted by the Apache License, we
>do not anticipate any explicit licensing or transfer of this intellectual
>property.
>
>== External Dependencies ==
>
>The full set of dependencies and licenses are listed in
>https://github.com/cloudera/kudu/blob/master/LICENSE.txt
>
>and summarized here:
>
> * '''Twitter Bootstrap''': Apache 2.0
> * '''d3''': BSD 3-clause
> * '''epoch JS library''': MIT
> * '''lz4''': BSD 2-clause
> * '''gflags''': BSD 3-clause
> * '''glog''': BSD 3-clause
> * '''gperftools''': BSD 3-clause
> * '''libev''': BSD 2-clause
> * '''squeasel''':MIT license
> * '''protobuf''': BSD 3-clause
> * '''rapidjson''': MIT
> * '''snappy''': BSD 3-clause
> * '''trace-viewer''': BSD 3-clause
> * '''zlib''': zlib license
> * '''llvm''': University of Illinois/NCSA Open Source (BSD-alike)
> * '''bitshuffle''': MIT
> * '''boost''': Boost license
> * '''curl''': MIT
> * '''libunwind''': MIT
> * '''nvml''': BSD 3-clause
> * '''cyrus-sasl''': Cyrus SASL license (BSD-alike)
> * '''openssl''': OpenSSL License (BSD-alike)
>
> * '''Guava''': Apache 2.0
> * '''StumbleUpon Async''': BSD
> * '''Apache Hadoop''': Apache 2.0
> * '''Apache log4j''': Apache 2.0
> * '''Netty''': Apache 2.0
> * '''slf4j''': MIT
> * '''Apache Commons''': Apache 2.0
> * '''murmur''': Apache 2.0
>
>
>'''Build/test-only dependencies''':
>
> * '''CMake''': BSD 3-clause
> * '''gcovr''': BSD 3-clause
> * '''gmock''': BSD 3-clause
> * '''Apache Maven''': Apache 2.0
> * '''JUnit''': EPL
> * '''Mockito''': MIT
>
>== Cryptography ==
>
>Kudu does not currently include any cryptography-related code.
>
>== Required Resources ==
>
>=== Mailing lists ===
>
> * private@kudu.incubator.apache.org (PMC)
> * commits@kudu.incubator.apache.org (git push emails)
> * issues@kudu.incubator.apache.org (JIRA issue feed)
> * dev@kudu.incubator.apache.org (Gerrit code reviews plus dev discussion)
> * user@kudu.incubator.apache.org (User questions)
>
>
>=== Repository ===
>
> * git://git.apache.org/kudu
>
>=== Gerrit ===
>
>We hope to continue using Gerrit for our code review and commit workflow.
>The Kudu team has already been in contact with Jake Farrell to start
>discussions on how Gerrit can fit into the ASF. We know that several other
>ASF projects and podlings are also interested in Gerrit.
>
>
>
>If the Infrastructure team does not have the bandwidth to support Gerrit,
>we will continue to support our own instance of Gerrit for Kudu, and make
>the necessary integrations such that commits are properly authenticated
>and
>maintain sufficient provenance to uphold the ASF standards (e.g. via the
>solution adopted by the AsterixDB podling).
>
>== Issue Tracking ==
>
>We would like to import our current JIRA project into the ASF JIRA, such
>that our historical commit messages and code comments continue to
>reference
>the appropriate bug numbers.
>
>== Initial Committers ==
>
> * Adar Dembo adar@cloudera.com
> * Alex Feinberg alex@strlen.net
> * Andrew Wang wang@apache.org
> * Dan Burkert dan@cloudera.com
> * David Alves dralves@apache.org
> * Jean-Daniel Cryans jdcryans@apache.org
> * Mike Percy mpercy@apache.org
> * Misty Stanley-Jones misty@apache.org
> * Todd Lipcon todd@apache.org
>
>The initial list of committers was seeded by listing those contributors
>who
>have contributed 20 or more patches in the last 12 months, indicating that
>they are active and have achieved merit through participation on the
>project. We chose not to include other contributors who either have not
>yet
>contributed a significant number of patches, or whose contributions are
>far
>in the past and we don’t expect to be active within the ASF.
>
>== Affiliations ==
>
> * Adar Dembo - Cloudera
> * Alex Feinberg - Forward Networks
> * Andrew Wang - Cloudera
> * Dan Burkert - Cloudera
> * David Alves - Cloudera
> * Jean-Daniel Cryans - Cloudera
> * Mike Percy - Cloudera
> * Misty Stanley-Jones - Cloudera
> * Todd Lipcon - Cloudera
>
>== Sponsors ==
>
>=== Champion ===
>
> * Todd Lipcon
>
>=== Nominated Mentors ===
>
> * Jake Farrell - ASF Member and Infra team member, Acquia
> * Brock Noland - ASF Member, StreamSets
> * Michael Stack - ASF Member, Cloudera
> * Jarek Jarcec Cecho - ASF Member, Cloudera
> * Chris Mattmann - ASF Member, NASA JPL and USC
> * Julien Le Dem - Incubator PMC, Dremio
> * Carl Steinbach - ASF Member, LinkedIn
>
>=== Sponsoring Entity ===
>
>The Apache Incubator





---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: [VOTE] Accept Kudu into the Apache Incubator

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.

+1

Regards
JB

On 11/24/2015 08:32 PM, Todd Lipcon wrote:
> Hi all,
>
> Discussion on the [DISCUSS] thread seems to have wound down, so I'd like to
> call a VOTE on acceptance of Kudu into the ASF Incubator. The proposal is
> pasted below and also available on the wiki at:
> https://wiki.apache.org/incubator/KuduProposal
>
> The proposal is unchanged since the original version, except for the
> addition of Carl Steinbach as a Mentor.
>
> Please cast your votes:
>
> [] +1, accept Kudu into the Incubator
> [] +/-0, positive/negative non-counted expression of feelings
> [] -1, do not accept Kudu into the incubator (please state reasoning)
>
> Given the US holiday this week, I imagine many folks are traveling or
> otherwise offline. So, let's run the vote for a full week rather than the
> traditional 72 hours. Unless the IPMC objects to the extended voting
> period, the vote will close on Tues, Dec 1st at noon PST.
>
> Thanks
> -Todd
> -----
>
> = Kudu Proposal =
>
> == Abstract ==
>
> Kudu is a distributed columnar storage engine built for the Apache Hadoop
> ecosystem.
>
> == Proposal ==
>
> Kudu is an open source storage engine for structured data which supports
> low-latency random access together with efficient analytical access
> patterns. Kudu distributes data using horizontal partitioning and
> replicates each partition using Raft consensus, providing low
> mean-time-to-recovery and low tail latencies. Kudu is designed within the
> context of the Apache Hadoop ecosystem and supports many integrations with
> other data analytics projects both inside and outside of the Apache
> Software Foundation.
>
>
>
> We propose to incubate Kudu as a project of the Apache Software Foundation.
>
> == Background ==
>
> In recent years, explosive growth in the amount of data being generated and
> captured by enterprises has resulted in the rapid adoption of open source
> technology which is able to store massive data sets at scale and at low
> cost. In particular, the Apache Hadoop ecosystem has become a focal point
> for such “big data” workloads, because many traditional open source
> database systems have lagged in offering a scalable alternative.
>
>
>
> Structured storage in the Hadoop ecosystem has typically been achieved in
> two ways: for static data sets, data is typically stored on Apache HDFS
> using binary data formats such as Apache Avro or Apache Parquet. However,
> neither HDFS nor these formats has any provision for updating individual
> records, or for efficient random access. Mutable data sets are typically
> stored in semi-structured stores such as Apache HBase or Apache Cassandra.
> These systems allow for low-latency record-level reads and writes, but lag
> far behind the static file formats in terms of sequential read throughput
> for applications such as SQL-based analytics or machine learning.
>
>
>
> Kudu is a new storage system designed and implemented from the ground up to
> fill this gap between high-throughput sequential-access storage systems
> such as HDFS and low-latency random-access systems such as HBase or
> Cassandra. While these existing systems continue to hold advantages in some
> situations, Kudu offers a “happy medium” alternative that can dramatically
> simplify the architecture of many common workloads. In particular, Kudu
> offers a simple API for row-level inserts, updates, and deletes, while
> providing table scans at throughputs similar to Parquet, a commonly-used
> columnar format for static data.
>
>
>
> More information on Kudu can be found at the existing open source project
> website: http://getkudu.io and in particular in the Kudu white-paper PDF:
> http://getkudu.io/kudu.pdf from which the above was excerpted.
>
> == Rationale ==
>
> As described above, Kudu fills an important gap in the open source storage
> ecosystem. After our initial open source project release in September 2015,
> we have seen a great amount of interest across a diverse set of users and
> companies. We believe that, as a storage system, it is critical to build an
> equally diverse set of contributors in the development community. Our
> experiences as committers and PMC members on other Apache projects have
> taught us the value of diverse communities in ensuring both longevity and
> high quality for such foundational systems.
>
> == Initial Goals ==
>
>   * Move the existing codebase, website, documentation, and mailing lists to
> Apache-hosted infrastructure
>   * Work with the infrastructure team to implement and approve our code
> review, build, and testing workflows in the context of the ASF
>   * Incremental development and releases per Apache guidelines
>
> == Current Status ==
>
> ==== Releases ====
>
> Kudu has undergone one public release, tagged here
> https://github.com/cloudera/kudu/tree/kudu0.5.0-release
>
> This initial release was not performed in the typical ASF fashion -- no
> source tarball was released, but rather only convenience binaries made
> available in Cloudera’s repositories. We will adopt the ASF source release
> process upon joining the incubator.
>
>
> ==== Source ====
>
> Kudu’s source is currently hosted on GitHub at
> https://github.com/cloudera/kudu
>
> This repository will be transitioned to Apache’s git hosting during
> incubation.
>
>
>
> ==== Code review ====
>
> Kudu’s code reviews are currently public and hosted on Gerrit at
> http://gerrit.cloudera.org:8080/#/q/status:open+project:kudu
>
> The Kudu developer community is very happy with gerrit and hopes to work
> with the Apache Infrastructure team to figure out how we can continue to
> use Gerrit within ASF policies.
>
>
>
> ==== Issue tracking ====
>
> Kudu’s bug and feature tracking is hosted on JIRA at:
> https://issues.cloudera.org/projects/KUDU/summary
>
> This JIRA instance contains bugs and development discussion dating back 2
> years prior to Kudu’s open source release and will provide an initial seed
> for the ASF JIRA.
>
>
>
> ==== Community discussion ====
>
> Kudu has several public discussion forums, linked here:
> http://getkudu.io/community.html
>
>
>
> ==== Build Infrastructure ====
>
> The Kudu Gerrit instance is configured to only allow patches to be
> committed after running them through an extensive set of pre-commit tests
> and code lints. The project currently makes use of elastic public cloud
> resources to perform these tests. Until this point, these resources have
> been internal to Cloudera, though we are currently investing in moving to a
> publicly accessible infrastructure.
>
>
>
> ==== Development practices ====
>
> Given that Kudu is a persistent storage engine, the community has a high
> quality bar for contributions to its core. We have a firm belief that high
> quality is achieved through automation, not manual inspection, and hence
> put a focus on thorough testing and build infrastructure to ensure that
> bar. The development community also practices review-then-commit for all
> changes to ensure that changes are accompanied by appropriate tests, are
> well commented, etc.
>
> Rather than seeing these practices as barriers to contribution, we believe
> that a fully automated and standardized review and testing practice makes
> it easier for new contributors to have patches accepted. Any new developer
> may post a patch to Gerrit using the same workflow as a seasoned
> contributor, and the same suite of tests will be automatically run. If the
> tests pass, a committer can quickly review and commit the contribution from
> their web browser.
>
> === Meritocracy ===
>
> We believe strongly in meritocracy in electing committers and PMC members.
> We believe that contributions can come in forms other than just code: for
> example, one of our initial proposed committers has contributed solely in
> the area of project documentation. We will encourage contributions and
> participation of all types, and ensure that contributors are appropriately
> recognized.
>
> === Community ===
>
> Though Kudu is relatively new as an open source project, it has already
> seen promising growth in its community across several organizations:
>
>   * '''Cloudera''' is the original development sponsor for Kudu.
>   * '''Xiaomi''' has been helping to develop and optimize Kudu for a new
> production use case, contributing code, benchmarks, feedback, and
> conference talks.
>   * '''Intel''' has contributed optimizations related to their hardware
> technologies.
>   * '''Dropbox''' has been experimenting with Kudu for a machine monitoring
> use case, and has been contributing bug reports and product feedback.
>   * '''Dremio''' is working on integration with Apache Drill and exploring
> using Kudu in a production use case.
>   * Several community-built Docker images, tutorials, and blog posts have
> sprouted up since Kudu’s release.
>
>
>
> By bringing Kudu to Apache, we hope to encourage further contribution from
> the above organizations as well as to engage new users and contributors in
> the community.
>
> === Core Developers ===
>
> Kudu was initially developed as a project at Cloudera. Most of the
> contributions to date have been by developers employed by Cloudera.
>
>
>
> Many of the developers are committers or PMC members on other Apache
> projects.
>
> === Alignment ===
>
> As a project in the big data ecosystem, Kudu is aligned with several other
> ASF projects. Kudu includes input/output format integration with Apache
> Hadoop, and this integration can also provide a bridge to Apache Spark. We
> are planning to integrate with Apache Hive in the near future. We also
> integrate closely with Cloudera Impala, which is also currently being
> proposed for incubation. We have also scheduled a hackathon with the Apache
> Drill team to work on integration with that query engine.
>
> == Known Risks ==
>
> === Orphaned Products ===
>
> The risk of Kudu being abandoned is low. Cloudera has invested a great deal
> in the initial development of the project, and intends to grow its
> investment over time as Kudu becomes a product adopted by its customer
> base. Several other organizations are also experimenting with Kudu for
> production use cases which would live for many years.
>
> === Inexperience with Open Source ===
>
> Kudu has been released in the open for less than two months. However, from
> our very first public announcement we have been committed to open-source
> style development:
>
>   * our code reviews are fully public and documented on a mailing list
>   * our daily development chatter is in a public chat room
>   * we send out weekly “community status” reports highlighting news and
> contributions
>   * we published our entire JIRA history and discuss bugs in the open
>   * we published our entire Git commit history, going back three years (no
> squashing)
>
>
>
> Several of the initial committers are experienced open source developers,
> several being committers and/or PMC members on other ASF projects (Hadoop,
> HBase, Thrift, Flume, et al). Those who are not ASF committers have
> experience on non-ASF open source projects (Kiji, open-vm-tools, et al).
>
> === Homogenous Developers ===
>
> The initial committers are employees or former employees of Cloudera.
> However, the committers are spread across multiple offices (Palo Alto, San
> Francisco, Melbourne), so the team is familiar with working in a
> distributed environment across varied time zones.
>
>
>
> The project has received some contributions from developers outside of
> Cloudera, and is starting to attract a ''user'' community as well. We hope
> to continue to encourage contributions from these developers and community
> members and grow them into committers after they have had time to continue
> their contributions.
>
> === Reliance on Salaried Developers ===
>
> As mentioned above, the majority of development up to this point has been
> sponsored by Cloudera. We have seen several community users participate in
> discussions who are hobbyists interested in distributed systems and
> databases, and hope that they will continue their participation in the
> project going forward.
>
> === Relationships with Other Apache Products ===
>
> Kudu is currently related to the following other Apache projects:
>
>   * Hadoop: Kudu provides MapReduce input/output formats for integration
>   * Spark: Kudu integrates with Spark via the above-mentioned input formats,
> and work is progressing on support for Spark Data Frames and Spark SQL.
>
>
>
> The Kudu team has reached out to several other Apache projects to start
> discussing integrations, including Flume, Kafka, Hive, and Drill.
>
>
>
> Kudu integrates with Impala, which is also being proposed for incubation.
>
>
>
> Kudu is already collaborating on ValueVector, a proposed TLP spinning out
> from the Apache Drill community.
>
>
>
> We look forward to continuing to integrate and collaborate with these
> communities.
>
> === An Excessive Fascination with the Apache Brand ===
>
> Many of the initial committers are already experienced Apache committers,
> and understand the true value provided by the Apache Way and the principles
> of the ASF. We believe that this development and contribution model is
> especially appropriate for storage products, where Apache’s
> community-over-code philosophy ensures long term viability and
> consensus-based participation.
>
> == Documentation ==
>
>   * Documentation is written in AsciiDoc and committed in the Kudu source
> repository:
>
>   * https://github.com/cloudera/kudu/tree/master/docs
>
>
>
>   * The Kudu web site is version-controlled on the ‘gh-pages’ branch of the
> above repository.
>
>   * A LaTeX whitepaper is also published, and the source is available within
> the same repository.
>   * APIs are documented within the source code as JavaDoc or C++-style
> documentation comments.
>   * Many design documents are stored within the source code repository as
> text files next to the code being documented.
>
> == Source and Intellectual Property Submission Plan ==
>
> The Kudu codebase and web site is currently hosted on GitHub and will be
> transitioned to the ASF repositories during incubation. Kudu is already
> licensed under the Apache 2.0 license.
>
>
>
> Some portions of the code are imported from other open source projects
> under the Apache 2.0, BSD, or MIT licenses, with copyrights held by authors
> other than the initial committers. These copyright notices are maintained
> in those files as well as a top-level NOTICE.txt file. We believe this to
> be permissible under the license terms and ASF policies, and confirmed via
> a recent thread on general@incubator.apache.org .
>
>
>
> The “Kudu” name is not a registered trademark, though before the initial
> release of the project, we performed a trademark search and Cloudera’s
> legal counsel deemed it acceptable in the context of a data storage engine.
> There exists an unrelated open source project by the same name related to
> deployments on Microsoft’s Azure cloud service. We have been in contact
> with legal counsel from Microsoft and have obtained their approval for the
> use of the Kudu name.
>
>
>
> Cloudera currently owns several domain names related to Kudu (getkudu.io,
> kududb.io, et al) which will be transferred to the ASF and redirected to
> the official page during incubation.
>
>
>
> Portions of Kudu are protected by pending or published patents owned by
> Cloudera. Given the protections already granted by the Apache License, we
> do not anticipate any explicit licensing or transfer of this intellectual
> property.
>
> == External Dependencies ==
>
> The full set of dependencies and licenses are listed in
> https://github.com/cloudera/kudu/blob/master/LICENSE.txt
>
> and summarized here:
>
>   * '''Twitter Bootstrap''': Apache 2.0
>   * '''d3''': BSD 3-clause
>   * '''epoch JS library''': MIT
>   * '''lz4''': BSD 2-clause
>   * '''gflags''': BSD 3-clause
>   * '''glog''': BSD 3-clause
>   * '''gperftools''': BSD 3-clause
>   * '''libev''': BSD 2-clause
>   * '''squeasel''':MIT license
>   * '''protobuf''': BSD 3-clause
>   * '''rapidjson''': MIT
>   * '''snappy''': BSD 3-clause
>   * '''trace-viewer''': BSD 3-clause
>   * '''zlib''': zlib license
>   * '''llvm''': University of Illinois/NCSA Open Source (BSD-alike)
>   * '''bitshuffle''': MIT
>   * '''boost''': Boost license
>   * '''curl''': MIT
>   * '''libunwind''': MIT
>   * '''nvml''': BSD 3-clause
>   * '''cyrus-sasl''': Cyrus SASL license (BSD-alike)
>   * '''openssl''': OpenSSL License (BSD-alike)
>
>   * '''Guava''': Apache 2.0
>   * '''StumbleUpon Async''': BSD
>   * '''Apache Hadoop''': Apache 2.0
>   * '''Apache log4j''': Apache 2.0
>   * '''Netty''': Apache 2.0
>   * '''slf4j''': MIT
>   * '''Apache Commons''': Apache 2.0
>   * '''murmur''': Apache 2.0
>
>
> '''Build/test-only dependencies''':
>
>   * '''CMake''': BSD 3-clause
>   * '''gcovr''': BSD 3-clause
>   * '''gmock''': BSD 3-clause
>   * '''Apache Maven''': Apache 2.0
>   * '''JUnit''': EPL
>   * '''Mockito''': MIT
>
> == Cryptography ==
>
> Kudu does not currently include any cryptography-related code.
>
> == Required Resources ==
>
> === Mailing lists ===
>
>   * private@kudu.incubator.apache.org (PMC)
>   * commits@kudu.incubator.apache.org (git push emails)
>   * issues@kudu.incubator.apache.org (JIRA issue feed)
>   * dev@kudu.incubator.apache.org (Gerrit code reviews plus dev discussion)
>   * user@kudu.incubator.apache.org (User questions)
>
>
> === Repository ===
>
>   * git://git.apache.org/kudu
>
> === Gerrit ===
>
> We hope to continue using Gerrit for our code review and commit workflow.
> The Kudu team has already been in contact with Jake Farrell to start
> discussions on how Gerrit can fit into the ASF. We know that several other
> ASF projects and podlings are also interested in Gerrit.
>
>
>
> If the Infrastructure team does not have the bandwidth to support Gerrit,
> we will continue to support our own instance of Gerrit for Kudu, and make
> the necessary integrations such that commits are properly authenticated and
> maintain sufficient provenance to uphold the ASF standards (e.g. via the
> solution adopted by the AsterixDB podling).
>
> == Issue Tracking ==
>
> We would like to import our current JIRA project into the ASF JIRA, such
> that our historical commit messages and code comments continue to reference
> the appropriate bug numbers.
>
> == Initial Committers ==
>
>   * Adar Dembo adar@cloudera.com
>   * Alex Feinberg alex@strlen.net
>   * Andrew Wang wang@apache.org
>   * Dan Burkert dan@cloudera.com
>   * David Alves dralves@apache.org
>   * Jean-Daniel Cryans jdcryans@apache.org
>   * Mike Percy mpercy@apache.org
>   * Misty Stanley-Jones misty@apache.org
>   * Todd Lipcon todd@apache.org
>
> The initial list of committers was seeded by listing those contributors who
> have contributed 20 or more patches in the last 12 months, indicating that
> they are active and have achieved merit through participation on the
> project. We chose not to include other contributors who either have not yet
> contributed a significant number of patches, or whose contributions are far
> in the past and we don’t expect to be active within the ASF.
>
> == Affiliations ==
>
>   * Adar Dembo - Cloudera
>   * Alex Feinberg - Forward Networks
>   * Andrew Wang - Cloudera
>   * Dan Burkert - Cloudera
>   * David Alves - Cloudera
>   * Jean-Daniel Cryans - Cloudera
>   * Mike Percy - Cloudera
>   * Misty Stanley-Jones - Cloudera
>   * Todd Lipcon - Cloudera
>
> == Sponsors ==
>
> === Champion ===
>
>   * Todd Lipcon
>
> === Nominated Mentors ===
>
>   * Jake Farrell - ASF Member and Infra team member, Acquia
>   * Brock Noland - ASF Member, StreamSets
>   * Michael Stack - ASF Member, Cloudera
>   * Jarek Jarcec Cecho - ASF Member, Cloudera
>   * Chris Mattmann - ASF Member, NASA JPL and USC
>   * Julien Le Dem - Incubator PMC, Dremio
>   * Carl Steinbach - ASF Member, LinkedIn
>
> === Sponsoring Entity ===
>
> The Apache Incubator
>

-- 
Jean-Baptiste Onofré
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: [VOTE] Accept Kudu into the Apache Incubator

Posted by Reynold Xin <rx...@databricks.com>.

+1


On Tue, Nov 24, 2015 at 11:32 AM, Todd Lipcon <to...@apache.org> wrote:

> Hi all,
>
> Discussion on the [DISCUSS] thread seems to have wound down, so I'd like to
> call a VOTE on acceptance of Kudu into the ASF Incubator. The proposal is
> pasted below and also available on the wiki at:
> https://wiki.apache.org/incubator/KuduProposal
>
> The proposal is unchanged since the original version, except for the
> addition of Carl Steinbach as a Mentor.
>
> Please cast your votes:
>
> [] +1, accept Kudu into the Incubator
> [] +/-0, positive/negative non-counted expression of feelings
> [] -1, do not accept Kudu into the incubator (please state reasoning)
>
> Given the US holiday this week, I imagine many folks are traveling or
> otherwise offline. So, let's run the vote for a full week rather than the
> traditional 72 hours. Unless the IPMC objects to the extended voting
> period, the vote will close on Tues, Dec 1st at noon PST.
>
> Thanks
> -Todd
> -----
>
> = Kudu Proposal =
>
> == Abstract ==
>
> Kudu is a distributed columnar storage engine built for the Apache Hadoop
> ecosystem.
>
> == Proposal ==
>
> Kudu is an open source storage engine for structured data which supports
> low-latency random access together with efficient analytical access
> patterns. Kudu distributes data using horizontal partitioning and
> replicates each partition using Raft consensus, providing low
> mean-time-to-recovery and low tail latencies. Kudu is designed within the
> context of the Apache Hadoop ecosystem and supports many integrations with
> other data analytics projects both inside and outside of the Apache
> Software Foundation.
>
>
>
> We propose to incubate Kudu as a project of the Apache Software Foundation.
>
> == Background ==
>
> In recent years, explosive growth in the amount of data being generated and
> captured by enterprises has resulted in the rapid adoption of open source
> technology which is able to store massive data sets at scale and at low
> cost. In particular, the Apache Hadoop ecosystem has become a focal point
> for such “big data” workloads, because many traditional open source
> database systems have lagged in offering a scalable alternative.
>
>
>
> Structured storage in the Hadoop ecosystem has typically been achieved in
> two ways: for static data sets, data is typically stored on Apache HDFS
> using binary data formats such as Apache Avro or Apache Parquet. However,
> neither HDFS nor these formats has any provision for updating individual
> records, or for efficient random access. Mutable data sets are typically
> stored in semi-structured stores such as Apache HBase or Apache Cassandra.
> These systems allow for low-latency record-level reads and writes, but lag
> far behind the static file formats in terms of sequential read throughput
> for applications such as SQL-based analytics or machine learning.
>
>
>
> Kudu is a new storage system designed and implemented from the ground up to
> fill this gap between high-throughput sequential-access storage systems
> such as HDFS and low-latency random-access systems such as HBase or
> Cassandra. While these existing systems continue to hold advantages in some
> situations, Kudu offers a “happy medium” alternative that can dramatically
> simplify the architecture of many common workloads. In particular, Kudu
> offers a simple API for row-level inserts, updates, and deletes, while
> providing table scans at throughputs similar to Parquet, a commonly-used
> columnar format for static data.
>
>
>
> More information on Kudu can be found at the existing open source project
> website: http://getkudu.io and in particular in the Kudu white-paper PDF:
> http://getkudu.io/kudu.pdf from which the above was excerpted.
>
> == Rationale ==
>
> As described above, Kudu fills an important gap in the open source storage
> ecosystem. After our initial open source project release in September 2015,
> we have seen a great amount of interest across a diverse set of users and
> companies. We believe that, as a storage system, it is critical to build an
> equally diverse set of contributors in the development community. Our
> experiences as committers and PMC members on other Apache projects have
> taught us the value of diverse communities in ensuring both longevity and
> high quality for such foundational systems.
>
> == Initial Goals ==
>
>  * Move the existing codebase, website, documentation, and mailing lists to
> Apache-hosted infrastructure
>  * Work with the infrastructure team to implement and approve our code
> review, build, and testing workflows in the context of the ASF
>  * Incremental development and releases per Apache guidelines
>
> == Current Status ==
>
> ==== Releases ====
>
> Kudu has undergone one public release, tagged here
> https://github.com/cloudera/kudu/tree/kudu0.5.0-release
>
> This initial release was not performed in the typical ASF fashion -- no
> source tarball was released, but rather only convenience binaries made
> available in Cloudera’s repositories. We will adopt the ASF source release
> process upon joining the incubator.
>
>
> ==== Source ====
>
> Kudu’s source is currently hosted on GitHub at
> https://github.com/cloudera/kudu
>
> This repository will be transitioned to Apache’s git hosting during
> incubation.
>
>
>
> ==== Code review ====
>
> Kudu’s code reviews are currently public and hosted on Gerrit at
> http://gerrit.cloudera.org:8080/#/q/status:open+project:kudu
>
> The Kudu developer community is very happy with gerrit and hopes to work
> with the Apache Infrastructure team to figure out how we can continue to
> use Gerrit within ASF policies.
>
>
>
> ==== Issue tracking ====
>
> Kudu’s bug and feature tracking is hosted on JIRA at:
> https://issues.cloudera.org/projects/KUDU/summary
>
> This JIRA instance contains bugs and development discussion dating back 2
> years prior to Kudu’s open source release and will provide an initial seed
> for the ASF JIRA.
>
>
>
> ==== Community discussion ====
>
> Kudu has several public discussion forums, linked here:
> http://getkudu.io/community.html
>
>
>
> ==== Build Infrastructure ====
>
> The Kudu Gerrit instance is configured to only allow patches to be
> committed after running them through an extensive set of pre-commit tests
> and code lints. The project currently makes use of elastic public cloud
> resources to perform these tests. Until this point, these resources have
> been internal to Cloudera, though we are currently investing in moving to a
> publicly accessible infrastructure.
>
>
>
> ==== Development practices ====
>
> Given that Kudu is a persistent storage engine, the community has a high
> quality bar for contributions to its core. We have a firm belief that high
> quality is achieved through automation, not manual inspection, and hence
> put a focus on thorough testing and build infrastructure to ensure that
> bar. The development community also practices review-then-commit for all
> changes to ensure that changes are accompanied by appropriate tests, are
> well commented, etc.
>
> Rather than seeing these practices as barriers to contribution, we believe
> that a fully automated and standardized review and testing practice makes
> it easier for new contributors to have patches accepted. Any new developer
> may post a patch to Gerrit using the same workflow as a seasoned
> contributor, and the same suite of tests will be automatically run. If the
> tests pass, a committer can quickly review and commit the contribution from
> their web browser.
>
> === Meritocracy ===
>
> We believe strongly in meritocracy in electing committers and PMC members.
> We believe that contributions can come in forms other than just code: for
> example, one of our initial proposed committers has contributed solely in
> the area of project documentation. We will encourage contributions and
> participation of all types, and ensure that contributors are appropriately
> recognized.
>
> === Community ===
>
> Though Kudu is relatively new as an open source project, it has already
> seen promising growth in its community across several organizations:
>
>  * '''Cloudera''' is the original development sponsor for Kudu.
>  * '''Xiaomi''' has been helping to develop and optimize Kudu for a new
> production use case, contributing code, benchmarks, feedback, and
> conference talks.
>  * '''Intel''' has contributed optimizations related to their hardware
> technologies.
>  * '''Dropbox''' has been experimenting with Kudu for a machine monitoring
> use case, and has been contributing bug reports and product feedback.
>  * '''Dremio''' is working on integration with Apache Drill and exploring
> using Kudu in a production use case.
>  * Several community-built Docker images, tutorials, and blog posts have
> sprouted up since Kudu’s release.
>
>
>
> By bringing Kudu to Apache, we hope to encourage further contribution from
> the above organizations as well as to engage new users and contributors in
> the community.
>
> === Core Developers ===
>
> Kudu was initially developed as a project at Cloudera. Most of the
> contributions to date have been by developers employed by Cloudera.
>
>
>
> Many of the developers are committers or PMC members on other Apache
> projects.
>
> === Alignment ===
>
> As a project in the big data ecosystem, Kudu is aligned with several other
> ASF projects. Kudu includes input/output format integration with Apache
> Hadoop, and this integration can also provide a bridge to Apache Spark. We
> are planning to integrate with Apache Hive in the near future. We also
> integrate closely with Cloudera Impala, which is also currently being
> proposed for incubation. We have also scheduled a hackathon with the Apache
> Drill team to work on integration with that query engine.
>
> == Known Risks ==
>
> === Orphaned Products ===
>
> The risk of Kudu being abandoned is low. Cloudera has invested a great deal
> in the initial development of the project, and intends to grow its
> investment over time as Kudu becomes a product adopted by its customer
> base. Several other organizations are also experimenting with Kudu for
> production use cases which would live for many years.
>
> === Inexperience with Open Source ===
>
> Kudu has been released in the open for less than two months. However, from
> our very first public announcement we have been committed to open-source
> style development:
>
>  * our code reviews are fully public and documented on a mailing list
>  * our daily development chatter is in a public chat room
>  * we send out weekly “community status” reports highlighting news and
> contributions
>  * we published our entire JIRA history and discuss bugs in the open
>  * we published our entire Git commit history, going back three years (no
> squashing)
>
>
>
> Several of the initial committers are experienced open source developers,
> several being committers and/or PMC members on other ASF projects (Hadoop,
> HBase, Thrift, Flume, et al). Those who are not ASF committers have
> experience on non-ASF open source projects (Kiji, open-vm-tools, et al).
>
> === Homogenous Developers ===
>
> The initial committers are employees or former employees of Cloudera.
> However, the committers are spread across multiple offices (Palo Alto, San
> Francisco, Melbourne), so the team is familiar with working in a
> distributed environment across varied time zones.
>
>
>
> The project has received some contributions from developers outside of
> Cloudera, and is starting to attract a ''user'' community as well. We hope
> to continue to encourage contributions from these developers and community
> members and grow them into committers after they have had time to continue
> their contributions.
>
> === Reliance on Salaried Developers ===
>
> As mentioned above, the majority of development up to this point has been
> sponsored by Cloudera. We have seen several community users participate in
> discussions who are hobbyists interested in distributed systems and
> databases, and hope that they will continue their participation in the
> project going forward.
>
> === Relationships with Other Apache Products ===
>
> Kudu is currently related to the following other Apache projects:
>
>  * Hadoop: Kudu provides MapReduce input/output formats for integration
>  * Spark: Kudu integrates with Spark via the above-mentioned input formats,
> and work is progressing on support for Spark Data Frames and Spark SQL.
>
>
>
> The Kudu team has reached out to several other Apache projects to start
> discussing integrations, including Flume, Kafka, Hive, and Drill.
>
>
>
> Kudu integrates with Impala, which is also being proposed for incubation.
>
>
>
> Kudu is already collaborating on ValueVector, a proposed TLP spinning out
> from the Apache Drill community.
>
>
>
> We look forward to continuing to integrate and collaborate with these
> communities.
>
> === An Excessive Fascination with the Apache Brand ===
>
> Many of the initial committers are already experienced Apache committers,
> and understand the true value provided by the Apache Way and the principles
> of the ASF. We believe that this development and contribution model is
> especially appropriate for storage products, where Apache’s
> community-over-code philosophy ensures long term viability and
> consensus-based participation.
>
> == Documentation ==
>
>  * Documentation is written in AsciiDoc and committed in the Kudu source
> repository:
>
>  * https://github.com/cloudera/kudu/tree/master/docs
>
>
>
>  * The Kudu web site is version-controlled on the ‘gh-pages’ branch of the
> above repository.
>
>  * A LaTeX whitepaper is also published, and the source is available within
> the same repository.
>  * APIs are documented within the source code as JavaDoc or C++-style
> documentation comments.
>  * Many design documents are stored within the source code repository as
> text files next to the code being documented.
>
> == Source and Intellectual Property Submission Plan ==
>
> The Kudu codebase and web site is currently hosted on GitHub and will be
> transitioned to the ASF repositories during incubation. Kudu is already
> licensed under the Apache 2.0 license.
>
>
>
> Some portions of the code are imported from other open source projects
> under the Apache 2.0, BSD, or MIT licenses, with copyrights held by authors
> other than the initial committers. These copyright notices are maintained
> in those files as well as a top-level NOTICE.txt file. We believe this to
> be permissible under the license terms and ASF policies, and confirmed via
> a recent thread on general@incubator.apache.org .
>
>
>
> The “Kudu” name is not a registered trademark, though before the initial
> release of the project, we performed a trademark search and Cloudera’s
> legal counsel deemed it acceptable in the context of a data storage engine.
> There exists an unrelated open source project by the same name related to
> deployments on Microsoft’s Azure cloud service. We have been in contact
> with legal counsel from Microsoft and have obtained their approval for the
> use of the Kudu name.
>
>
>
> Cloudera currently owns several domain names related to Kudu (getkudu.io,
> kududb.io, et al) which will be transferred to the ASF and redirected to
> the official page during incubation.
>
>
>
> Portions of Kudu are protected by pending or published patents owned by
> Cloudera. Given the protections already granted by the Apache License, we
> do not anticipate any explicit licensing or transfer of this intellectual
> property.
>
> == External Dependencies ==
>
> The full set of dependencies and licenses are listed in
> https://github.com/cloudera/kudu/blob/master/LICENSE.txt
>
> and summarized here:
>
>  * '''Twitter Bootstrap''': Apache 2.0
>  * '''d3''': BSD 3-clause
>  * '''epoch JS library''': MIT
>  * '''lz4''': BSD 2-clause
>  * '''gflags''': BSD 3-clause
>  * '''glog''': BSD 3-clause
>  * '''gperftools''': BSD 3-clause
>  * '''libev''': BSD 2-clause
>  * '''squeasel''':MIT license
>  * '''protobuf''': BSD 3-clause
>  * '''rapidjson''': MIT
>  * '''snappy''': BSD 3-clause
>  * '''trace-viewer''': BSD 3-clause
>  * '''zlib''': zlib license
>  * '''llvm''': University of Illinois/NCSA Open Source (BSD-alike)
>  * '''bitshuffle''': MIT
>  * '''boost''': Boost license
>  * '''curl''': MIT
>  * '''libunwind''': MIT
>  * '''nvml''': BSD 3-clause
>  * '''cyrus-sasl''': Cyrus SASL license (BSD-alike)
>  * '''openssl''': OpenSSL License (BSD-alike)
>
>  * '''Guava''': Apache 2.0
>  * '''StumbleUpon Async''': BSD
>  * '''Apache Hadoop''': Apache 2.0
>  * '''Apache log4j''': Apache 2.0
>  * '''Netty''': Apache 2.0
>  * '''slf4j''': MIT
>  * '''Apache Commons''': Apache 2.0
>  * '''murmur''': Apache 2.0
>
>
> '''Build/test-only dependencies''':
>
>  * '''CMake''': BSD 3-clause
>  * '''gcovr''': BSD 3-clause
>  * '''gmock''': BSD 3-clause
>  * '''Apache Maven''': Apache 2.0
>  * '''JUnit''': EPL
>  * '''Mockito''': MIT
>
> == Cryptography ==
>
> Kudu does not currently include any cryptography-related code.
>
> == Required Resources ==
>
> === Mailing lists ===
>
>  * private@kudu.incubator.apache.org (PMC)
>  * commits@kudu.incubator.apache.org (git push emails)
>  * issues@kudu.incubator.apache.org (JIRA issue feed)
>  * dev@kudu.incubator.apache.org (Gerrit code reviews plus dev discussion)
>  * user@kudu.incubator.apache.org (User questions)
>
>
> === Repository ===
>
>  * git://git.apache.org/kudu
>
> === Gerrit ===
>
> We hope to continue using Gerrit for our code review and commit workflow.
> The Kudu team has already been in contact with Jake Farrell to start
> discussions on how Gerrit can fit into the ASF. We know that several other
> ASF projects and podlings are also interested in Gerrit.
>
>
>
> If the Infrastructure team does not have the bandwidth to support Gerrit,
> we will continue to support our own instance of Gerrit for Kudu, and make
> the necessary integrations such that commits are properly authenticated and
> maintain sufficient provenance to uphold the ASF standards (e.g. via the
> solution adopted by the AsterixDB podling).
>
> == Issue Tracking ==
>
> We would like to import our current JIRA project into the ASF JIRA, such
> that our historical commit messages and code comments continue to reference
> the appropriate bug numbers.
>
> == Initial Committers ==
>
>  * Adar Dembo adar@cloudera.com
>  * Alex Feinberg alex@strlen.net
>  * Andrew Wang wang@apache.org
>  * Dan Burkert dan@cloudera.com
>  * David Alves dralves@apache.org
>  * Jean-Daniel Cryans jdcryans@apache.org
>  * Mike Percy mpercy@apache.org
>  * Misty Stanley-Jones misty@apache.org
>  * Todd Lipcon todd@apache.org
>
> The initial list of committers was seeded by listing those contributors who
> have contributed 20 or more patches in the last 12 months, indicating that
> they are active and have achieved merit through participation on the
> project. We chose not to include other contributors who either have not yet
> contributed a significant number of patches, or whose contributions are far
> in the past and we don’t expect to be active within the ASF.
>
> == Affiliations ==
>
>  * Adar Dembo - Cloudera
>  * Alex Feinberg - Forward Networks
>  * Andrew Wang - Cloudera
>  * Dan Burkert - Cloudera
>  * David Alves - Cloudera
>  * Jean-Daniel Cryans - Cloudera
>  * Mike Percy - Cloudera
>  * Misty Stanley-Jones - Cloudera
>  * Todd Lipcon - Cloudera
>
> == Sponsors ==
>
> === Champion ===
>
>  * Todd Lipcon
>
> === Nominated Mentors ===
>
>  * Jake Farrell - ASF Member and Infra team member, Acquia
>  * Brock Noland - ASF Member, StreamSets
>  * Michael Stack - ASF Member, Cloudera
>  * Jarek Jarcec Cecho - ASF Member, Cloudera
>  * Chris Mattmann - ASF Member, NASA JPL and USC
>  * Julien Le Dem - Incubator PMC, Dremio
>  * Carl Steinbach - ASF Member, LinkedIn
>
> === Sponsoring Entity ===
>
> The Apache Incubator
>

Re: [VOTE] Accept Kudu into the Apache Incubator

Posted by Alex Karasulu <ak...@apache.org>.

+1 (binding)

On Tue, Nov 24, 2015 at 10:08 PM, Arvind Prabhakar <ar...@apache.org>
wrote:

> +1 (binding)
>
> Regards,
> Arvind Prabhakar
>
> On Tue, Nov 24, 2015 at 11:32 AM, Todd Lipcon <to...@apache.org> wrote:
>
> > Hi all,
> >
> > Discussion on the [DISCUSS] thread seems to have wound down, so I'd like
> to
> > call a VOTE on acceptance of Kudu into the ASF Incubator. The proposal is
> > pasted below and also available on the wiki at:
> > https://wiki.apache.org/incubator/KuduProposal
> >
> > The proposal is unchanged since the original version, except for the
> > addition of Carl Steinbach as a Mentor.
> >
> > Please cast your votes:
> >
> > [] +1, accept Kudu into the Incubator
> > [] +/-0, positive/negative non-counted expression of feelings
> > [] -1, do not accept Kudu into the incubator (please state reasoning)
> >
> > Given the US holiday this week, I imagine many folks are traveling or
> > otherwise offline. So, let's run the vote for a full week rather than the
> > traditional 72 hours. Unless the IPMC objects to the extended voting
> > period, the vote will close on Tues, Dec 1st at noon PST.
> >
> > Thanks
> > -Todd
> > -----
> >
> > = Kudu Proposal =
> >
> > == Abstract ==
> >
> > Kudu is a distributed columnar storage engine built for the Apache Hadoop
> > ecosystem.
> >
> > == Proposal ==
> >
> > Kudu is an open source storage engine for structured data which supports
> > low-latency random access together with efficient analytical access
> > patterns. Kudu distributes data using horizontal partitioning and
> > replicates each partition using Raft consensus, providing low
> > mean-time-to-recovery and low tail latencies. Kudu is designed within the
> > context of the Apache Hadoop ecosystem and supports many integrations
> with
> > other data analytics projects both inside and outside of the Apache
> > Software Foundation.
> >
> >
> >
> > We propose to incubate Kudu as a project of the Apache Software
> Foundation.
> >
> > == Background ==
> >
> > In recent years, explosive growth in the amount of data being generated
> and
> > captured by enterprises has resulted in the rapid adoption of open source
> > technology which is able to store massive data sets at scale and at low
> > cost. In particular, the Apache Hadoop ecosystem has become a focal point
> > for such “big data” workloads, because many traditional open source
> > database systems have lagged in offering a scalable alternative.
> >
> >
> >
> > Structured storage in the Hadoop ecosystem has typically been achieved in
> > two ways: for static data sets, data is typically stored on Apache HDFS
> > using binary data formats such as Apache Avro or Apache Parquet. However,
> > neither HDFS nor these formats has any provision for updating individual
> > records, or for efficient random access. Mutable data sets are typically
> > stored in semi-structured stores such as Apache HBase or Apache
> Cassandra.
> > These systems allow for low-latency record-level reads and writes, but
> lag
> > far behind the static file formats in terms of sequential read throughput
> > for applications such as SQL-based analytics or machine learning.
> >
> >
> >
> > Kudu is a new storage system designed and implemented from the ground up
> to
> > fill this gap between high-throughput sequential-access storage systems
> > such as HDFS and low-latency random-access systems such as HBase or
> > Cassandra. While these existing systems continue to hold advantages in
> some
> > situations, Kudu offers a “happy medium” alternative that can
> dramatically
> > simplify the architecture of many common workloads. In particular, Kudu
> > offers a simple API for row-level inserts, updates, and deletes, while
> > providing table scans at throughputs similar to Parquet, a commonly-used
> > columnar format for static data.
> >
> >
> >
> > More information on Kudu can be found at the existing open source project
> > website: http://getkudu.io and in particular in the Kudu white-paper
> PDF:
> > http://getkudu.io/kudu.pdf from which the above was excerpted.
> >
> > == Rationale ==
> >
> > As described above, Kudu fills an important gap in the open source
> storage
> > ecosystem. After our initial open source project release in September
> 2015,
> > we have seen a great amount of interest across a diverse set of users and
> > companies. We believe that, as a storage system, it is critical to build
> an
> > equally diverse set of contributors in the development community. Our
> > experiences as committers and PMC members on other Apache projects have
> > taught us the value of diverse communities in ensuring both longevity and
> > high quality for such foundational systems.
> >
> > == Initial Goals ==
> >
> >  * Move the existing codebase, website, documentation, and mailing lists
> to
> > Apache-hosted infrastructure
> >  * Work with the infrastructure team to implement and approve our code
> > review, build, and testing workflows in the context of the ASF
> >  * Incremental development and releases per Apache guidelines
> >
> > == Current Status ==
> >
> > ==== Releases ====
> >
> > Kudu has undergone one public release, tagged here
> > https://github.com/cloudera/kudu/tree/kudu0.5.0-release
> >
> > This initial release was not performed in the typical ASF fashion -- no
> > source tarball was released, but rather only convenience binaries made
> > available in Cloudera’s repositories. We will adopt the ASF source
> release
> > process upon joining the incubator.
> >
> >
> > ==== Source ====
> >
> > Kudu’s source is currently hosted on GitHub at
> > https://github.com/cloudera/kudu
> >
> > This repository will be transitioned to Apache’s git hosting during
> > incubation.
> >
> >
> >
> > ==== Code review ====
> >
> > Kudu’s code reviews are currently public and hosted on Gerrit at
> > http://gerrit.cloudera.org:8080/#/q/status:open+project:kudu
> >
> > The Kudu developer community is very happy with gerrit and hopes to work
> > with the Apache Infrastructure team to figure out how we can continue to
> > use Gerrit within ASF policies.
> >
> >
> >
> > ==== Issue tracking ====
> >
> > Kudu’s bug and feature tracking is hosted on JIRA at:
> > https://issues.cloudera.org/projects/KUDU/summary
> >
> > This JIRA instance contains bugs and development discussion dating back 2
> > years prior to Kudu’s open source release and will provide an initial
> seed
> > for the ASF JIRA.
> >
> >
> >
> > ==== Community discussion ====
> >
> > Kudu has several public discussion forums, linked here:
> > http://getkudu.io/community.html
> >
> >
> >
> > ==== Build Infrastructure ====
> >
> > The Kudu Gerrit instance is configured to only allow patches to be
> > committed after running them through an extensive set of pre-commit tests
> > and code lints. The project currently makes use of elastic public cloud
> > resources to perform these tests. Until this point, these resources have
> > been internal to Cloudera, though we are currently investing in moving
> to a
> > publicly accessible infrastructure.
> >
> >
> >
> > ==== Development practices ====
> >
> > Given that Kudu is a persistent storage engine, the community has a high
> > quality bar for contributions to its core. We have a firm belief that
> high
> > quality is achieved through automation, not manual inspection, and hence
> > put a focus on thorough testing and build infrastructure to ensure that
> > bar. The development community also practices review-then-commit for all
> > changes to ensure that changes are accompanied by appropriate tests, are
> > well commented, etc.
> >
> > Rather than seeing these practices as barriers to contribution, we
> believe
> > that a fully automated and standardized review and testing practice makes
> > it easier for new contributors to have patches accepted. Any new
> developer
> > may post a patch to Gerrit using the same workflow as a seasoned
> > contributor, and the same suite of tests will be automatically run. If
> the
> > tests pass, a committer can quickly review and commit the contribution
> from
> > their web browser.
> >
> > === Meritocracy ===
> >
> > We believe strongly in meritocracy in electing committers and PMC
> members.
> > We believe that contributions can come in forms other than just code: for
> > example, one of our initial proposed committers has contributed solely in
> > the area of project documentation. We will encourage contributions and
> > participation of all types, and ensure that contributors are
> appropriately
> > recognized.
> >
> > === Community ===
> >
> > Though Kudu is relatively new as an open source project, it has already
> > seen promising growth in its community across several organizations:
> >
> >  * '''Cloudera''' is the original development sponsor for Kudu.
> >  * '''Xiaomi''' has been helping to develop and optimize Kudu for a new
> > production use case, contributing code, benchmarks, feedback, and
> > conference talks.
> >  * '''Intel''' has contributed optimizations related to their hardware
> > technologies.
> >  * '''Dropbox''' has been experimenting with Kudu for a machine
> monitoring
> > use case, and has been contributing bug reports and product feedback.
> >  * '''Dremio''' is working on integration with Apache Drill and exploring
> > using Kudu in a production use case.
> >  * Several community-built Docker images, tutorials, and blog posts have
> > sprouted up since Kudu’s release.
> >
> >
> >
> > By bringing Kudu to Apache, we hope to encourage further contribution
> from
> > the above organizations as well as to engage new users and contributors
> in
> > the community.
> >
> > === Core Developers ===
> >
> > Kudu was initially developed as a project at Cloudera. Most of the
> > contributions to date have been by developers employed by Cloudera.
> >
> >
> >
> > Many of the developers are committers or PMC members on other Apache
> > projects.
> >
> > === Alignment ===
> >
> > As a project in the big data ecosystem, Kudu is aligned with several
> other
> > ASF projects. Kudu includes input/output format integration with Apache
> > Hadoop, and this integration can also provide a bridge to Apache Spark.
> We
> > are planning to integrate with Apache Hive in the near future. We also
> > integrate closely with Cloudera Impala, which is also currently being
> > proposed for incubation. We have also scheduled a hackathon with the
> Apache
> > Drill team to work on integration with that query engine.
> >
> > == Known Risks ==
> >
> > === Orphaned Products ===
> >
> > The risk of Kudu being abandoned is low. Cloudera has invested a great
> deal
> > in the initial development of the project, and intends to grow its
> > investment over time as Kudu becomes a product adopted by its customer
> > base. Several other organizations are also experimenting with Kudu for
> > production use cases which would live for many years.
> >
> > === Inexperience with Open Source ===
> >
> > Kudu has been released in the open for less than two months. However,
> from
> > our very first public announcement we have been committed to open-source
> > style development:
> >
> >  * our code reviews are fully public and documented on a mailing list
> >  * our daily development chatter is in a public chat room
> >  * we send out weekly “community status” reports highlighting news and
> > contributions
> >  * we published our entire JIRA history and discuss bugs in the open
> >  * we published our entire Git commit history, going back three years (no
> > squashing)
> >
> >
> >
> > Several of the initial committers are experienced open source developers,
> > several being committers and/or PMC members on other ASF projects
> (Hadoop,
> > HBase, Thrift, Flume, et al). Those who are not ASF committers have
> > experience on non-ASF open source projects (Kiji, open-vm-tools, et al).
> >
> > === Homogenous Developers ===
> >
> > The initial committers are employees or former employees of Cloudera.
> > However, the committers are spread across multiple offices (Palo Alto,
> San
> > Francisco, Melbourne), so the team is familiar with working in a
> > distributed environment across varied time zones.
> >
> >
> >
> > The project has received some contributions from developers outside of
> > Cloudera, and is starting to attract a ''user'' community as well. We
> hope
> > to continue to encourage contributions from these developers and
> community
> > members and grow them into committers after they have had time to
> continue
> > their contributions.
> >
> > === Reliance on Salaried Developers ===
> >
> > As mentioned above, the majority of development up to this point has been
> > sponsored by Cloudera. We have seen several community users participate
> in
> > discussions who are hobbyists interested in distributed systems and
> > databases, and hope that they will continue their participation in the
> > project going forward.
> >
> > === Relationships with Other Apache Products ===
> >
> > Kudu is currently related to the following other Apache projects:
> >
> >  * Hadoop: Kudu provides MapReduce input/output formats for integration
> >  * Spark: Kudu integrates with Spark via the above-mentioned input
> formats,
> > and work is progressing on support for Spark Data Frames and Spark SQL.
> >
> >
> >
> > The Kudu team has reached out to several other Apache projects to start
> > discussing integrations, including Flume, Kafka, Hive, and Drill.
> >
> >
> >
> > Kudu integrates with Impala, which is also being proposed for incubation.
> >
> >
> >
> > Kudu is already collaborating on ValueVector, a proposed TLP spinning out
> > from the Apache Drill community.
> >
> >
> >
> > We look forward to continuing to integrate and collaborate with these
> > communities.
> >
> > === An Excessive Fascination with the Apache Brand ===
> >
> > Many of the initial committers are already experienced Apache committers,
> > and understand the true value provided by the Apache Way and the
> principles
> > of the ASF. We believe that this development and contribution model is
> > especially appropriate for storage products, where Apache’s
> > community-over-code philosophy ensures long term viability and
> > consensus-based participation.
> >
> > == Documentation ==
> >
> >  * Documentation is written in AsciiDoc and committed in the Kudu source
> > repository:
> >
> >  * https://github.com/cloudera/kudu/tree/master/docs
> >
> >
> >
> >  * The Kudu web site is version-controlled on the ‘gh-pages’ branch of
> the
> > above repository.
> >
> >  * A LaTeX whitepaper is also published, and the source is available
> within
> > the same repository.
> >  * APIs are documented within the source code as JavaDoc or C++-style
> > documentation comments.
> >  * Many design documents are stored within the source code repository as
> > text files next to the code being documented.
> >
> > == Source and Intellectual Property Submission Plan ==
> >
> > The Kudu codebase and web site is currently hosted on GitHub and will be
> > transitioned to the ASF repositories during incubation. Kudu is already
> > licensed under the Apache 2.0 license.
> >
> >
> >
> > Some portions of the code are imported from other open source projects
> > under the Apache 2.0, BSD, or MIT licenses, with copyrights held by
> authors
> > other than the initial committers. These copyright notices are maintained
> > in those files as well as a top-level NOTICE.txt file. We believe this to
> > be permissible under the license terms and ASF policies, and confirmed
> via
> > a recent thread on general@incubator.apache.org .
> >
> >
> >
> > The “Kudu” name is not a registered trademark, though before the initial
> > release of the project, we performed a trademark search and Cloudera’s
> > legal counsel deemed it acceptable in the context of a data storage
> engine.
> > There exists an unrelated open source project by the same name related to
> > deployments on Microsoft’s Azure cloud service. We have been in contact
> > with legal counsel from Microsoft and have obtained their approval for
> the
> > use of the Kudu name.
> >
> >
> >
> > Cloudera currently owns several domain names related to Kudu (getkudu.io
> ,
> > kududb.io, et al) which will be transferred to the ASF and redirected to
> > the official page during incubation.
> >
> >
> >
> > Portions of Kudu are protected by pending or published patents owned by
> > Cloudera. Given the protections already granted by the Apache License, we
> > do not anticipate any explicit licensing or transfer of this intellectual
> > property.
> >
> > == External Dependencies ==
> >
> > The full set of dependencies and licenses are listed in
> > https://github.com/cloudera/kudu/blob/master/LICENSE.txt
> >
> > and summarized here:
> >
> >  * '''Twitter Bootstrap''': Apache 2.0
> >  * '''d3''': BSD 3-clause
> >  * '''epoch JS library''': MIT
> >  * '''lz4''': BSD 2-clause
> >  * '''gflags''': BSD 3-clause
> >  * '''glog''': BSD 3-clause
> >  * '''gperftools''': BSD 3-clause
> >  * '''libev''': BSD 2-clause
> >  * '''squeasel''':MIT license
> >  * '''protobuf''': BSD 3-clause
> >  * '''rapidjson''': MIT
> >  * '''snappy''': BSD 3-clause
> >  * '''trace-viewer''': BSD 3-clause
> >  * '''zlib''': zlib license
> >  * '''llvm''': University of Illinois/NCSA Open Source (BSD-alike)
> >  * '''bitshuffle''': MIT
> >  * '''boost''': Boost license
> >  * '''curl''': MIT
> >  * '''libunwind''': MIT
> >  * '''nvml''': BSD 3-clause
> >  * '''cyrus-sasl''': Cyrus SASL license (BSD-alike)
> >  * '''openssl''': OpenSSL License (BSD-alike)
> >
> >  * '''Guava''': Apache 2.0
> >  * '''StumbleUpon Async''': BSD
> >  * '''Apache Hadoop''': Apache 2.0
> >  * '''Apache log4j''': Apache 2.0
> >  * '''Netty''': Apache 2.0
> >  * '''slf4j''': MIT
> >  * '''Apache Commons''': Apache 2.0
> >  * '''murmur''': Apache 2.0
> >
> >
> > '''Build/test-only dependencies''':
> >
> >  * '''CMake''': BSD 3-clause
> >  * '''gcovr''': BSD 3-clause
> >  * '''gmock''': BSD 3-clause
> >  * '''Apache Maven''': Apache 2.0
> >  * '''JUnit''': EPL
> >  * '''Mockito''': MIT
> >
> > == Cryptography ==
> >
> > Kudu does not currently include any cryptography-related code.
> >
> > == Required Resources ==
> >
> > === Mailing lists ===
> >
> >  * private@kudu.incubator.apache.org (PMC)
> >  * commits@kudu.incubator.apache.org (git push emails)
> >  * issues@kudu.incubator.apache.org (JIRA issue feed)
> >  * dev@kudu.incubator.apache.org (Gerrit code reviews plus dev
> discussion)
> >  * user@kudu.incubator.apache.org (User questions)
> >
> >
> > === Repository ===
> >
> >  * git://git.apache.org/kudu
> >
> > === Gerrit ===
> >
> > We hope to continue using Gerrit for our code review and commit workflow.
> > The Kudu team has already been in contact with Jake Farrell to start
> > discussions on how Gerrit can fit into the ASF. We know that several
> other
> > ASF projects and podlings are also interested in Gerrit.
> >
> >
> >
> > If the Infrastructure team does not have the bandwidth to support Gerrit,
> > we will continue to support our own instance of Gerrit for Kudu, and make
> > the necessary integrations such that commits are properly authenticated
> and
> > maintain sufficient provenance to uphold the ASF standards (e.g. via the
> > solution adopted by the AsterixDB podling).
> >
> > == Issue Tracking ==
> >
> > We would like to import our current JIRA project into the ASF JIRA, such
> > that our historical commit messages and code comments continue to
> reference
> > the appropriate bug numbers.
> >
> > == Initial Committers ==
> >
> >  * Adar Dembo adar@cloudera.com
> >  * Alex Feinberg alex@strlen.net
> >  * Andrew Wang wang@apache.org
> >  * Dan Burkert dan@cloudera.com
> >  * David Alves dralves@apache.org
> >  * Jean-Daniel Cryans jdcryans@apache.org
> >  * Mike Percy mpercy@apache.org
> >  * Misty Stanley-Jones misty@apache.org
> >  * Todd Lipcon todd@apache.org
> >
> > The initial list of committers was seeded by listing those contributors
> who
> > have contributed 20 or more patches in the last 12 months, indicating
> that
> > they are active and have achieved merit through participation on the
> > project. We chose not to include other contributors who either have not
> yet
> > contributed a significant number of patches, or whose contributions are
> far
> > in the past and we don’t expect to be active within the ASF.
> >
> > == Affiliations ==
> >
> >  * Adar Dembo - Cloudera
> >  * Alex Feinberg - Forward Networks
> >  * Andrew Wang - Cloudera
> >  * Dan Burkert - Cloudera
> >  * David Alves - Cloudera
> >  * Jean-Daniel Cryans - Cloudera
> >  * Mike Percy - Cloudera
> >  * Misty Stanley-Jones - Cloudera
> >  * Todd Lipcon - Cloudera
> >
> > == Sponsors ==
> >
> > === Champion ===
> >
> >  * Todd Lipcon
> >
> > === Nominated Mentors ===
> >
> >  * Jake Farrell - ASF Member and Infra team member, Acquia
> >  * Brock Noland - ASF Member, StreamSets
> >  * Michael Stack - ASF Member, Cloudera
> >  * Jarek Jarcec Cecho - ASF Member, Cloudera
> >  * Chris Mattmann - ASF Member, NASA JPL and USC
> >  * Julien Le Dem - Incubator PMC, Dremio
> >  * Carl Steinbach - ASF Member, LinkedIn
> >
> > === Sponsoring Entity ===
> >
> > The Apache Incubator
> >
>



-- 
Best Regards,
-- Alex

Re: [VOTE] Accept Kudu into the Apache Incubator

Posted by Arvind Prabhakar <ar...@apache.org>.

+1 (binding)

Regards,
Arvind Prabhakar

On Tue, Nov 24, 2015 at 11:32 AM, Todd Lipcon <to...@apache.org> wrote:

> Hi all,
>
> Discussion on the [DISCUSS] thread seems to have wound down, so I'd like to
> call a VOTE on acceptance of Kudu into the ASF Incubator. The proposal is
> pasted below and also available on the wiki at:
> https://wiki.apache.org/incubator/KuduProposal
>
> The proposal is unchanged since the original version, except for the
> addition of Carl Steinbach as a Mentor.
>
> Please cast your votes:
>
> [] +1, accept Kudu into the Incubator
> [] +/-0, positive/negative non-counted expression of feelings
> [] -1, do not accept Kudu into the incubator (please state reasoning)
>
> Given the US holiday this week, I imagine many folks are traveling or
> otherwise offline. So, let's run the vote for a full week rather than the
> traditional 72 hours. Unless the IPMC objects to the extended voting
> period, the vote will close on Tues, Dec 1st at noon PST.
>
> Thanks
> -Todd
> -----
>
> = Kudu Proposal =
>
> == Abstract ==
>
> Kudu is a distributed columnar storage engine built for the Apache Hadoop
> ecosystem.
>
> == Proposal ==
>
> Kudu is an open source storage engine for structured data which supports
> low-latency random access together with efficient analytical access
> patterns. Kudu distributes data using horizontal partitioning and
> replicates each partition using Raft consensus, providing low
> mean-time-to-recovery and low tail latencies. Kudu is designed within the
> context of the Apache Hadoop ecosystem and supports many integrations with
> other data analytics projects both inside and outside of the Apache
> Software Foundation.
>
>
>
> We propose to incubate Kudu as a project of the Apache Software Foundation.
>
> == Background ==
>
> In recent years, explosive growth in the amount of data being generated and
> captured by enterprises has resulted in the rapid adoption of open source
> technology which is able to store massive data sets at scale and at low
> cost. In particular, the Apache Hadoop ecosystem has become a focal point
> for such “big data” workloads, because many traditional open source
> database systems have lagged in offering a scalable alternative.
>
>
>
> Structured storage in the Hadoop ecosystem has typically been achieved in
> two ways: for static data sets, data is typically stored on Apache HDFS
> using binary data formats such as Apache Avro or Apache Parquet. However,
> neither HDFS nor these formats has any provision for updating individual
> records, or for efficient random access. Mutable data sets are typically
> stored in semi-structured stores such as Apache HBase or Apache Cassandra.
> These systems allow for low-latency record-level reads and writes, but lag
> far behind the static file formats in terms of sequential read throughput
> for applications such as SQL-based analytics or machine learning.
>
>
>
> Kudu is a new storage system designed and implemented from the ground up to
> fill this gap between high-throughput sequential-access storage systems
> such as HDFS and low-latency random-access systems such as HBase or
> Cassandra. While these existing systems continue to hold advantages in some
> situations, Kudu offers a “happy medium” alternative that can dramatically
> simplify the architecture of many common workloads. In particular, Kudu
> offers a simple API for row-level inserts, updates, and deletes, while
> providing table scans at throughputs similar to Parquet, a commonly-used
> columnar format for static data.
>
>
>
> More information on Kudu can be found at the existing open source project
> website: http://getkudu.io and in particular in the Kudu white-paper PDF:
> http://getkudu.io/kudu.pdf from which the above was excerpted.
>
> == Rationale ==
>
> As described above, Kudu fills an important gap in the open source storage
> ecosystem. After our initial open source project release in September 2015,
> we have seen a great amount of interest across a diverse set of users and
> companies. We believe that, as a storage system, it is critical to build an
> equally diverse set of contributors in the development community. Our
> experiences as committers and PMC members on other Apache projects have
> taught us the value of diverse communities in ensuring both longevity and
> high quality for such foundational systems.
>
> == Initial Goals ==
>
>  * Move the existing codebase, website, documentation, and mailing lists to
> Apache-hosted infrastructure
>  * Work with the infrastructure team to implement and approve our code
> review, build, and testing workflows in the context of the ASF
>  * Incremental development and releases per Apache guidelines
>
> == Current Status ==
>
> ==== Releases ====
>
> Kudu has undergone one public release, tagged here
> https://github.com/cloudera/kudu/tree/kudu0.5.0-release
>
> This initial release was not performed in the typical ASF fashion -- no
> source tarball was released, but rather only convenience binaries made
> available in Cloudera’s repositories. We will adopt the ASF source release
> process upon joining the incubator.
>
>
> ==== Source ====
>
> Kudu’s source is currently hosted on GitHub at
> https://github.com/cloudera/kudu
>
> This repository will be transitioned to Apache’s git hosting during
> incubation.
>
>
>
> ==== Code review ====
>
> Kudu’s code reviews are currently public and hosted on Gerrit at
> http://gerrit.cloudera.org:8080/#/q/status:open+project:kudu
>
> The Kudu developer community is very happy with gerrit and hopes to work
> with the Apache Infrastructure team to figure out how we can continue to
> use Gerrit within ASF policies.
>
>
>
> ==== Issue tracking ====
>
> Kudu’s bug and feature tracking is hosted on JIRA at:
> https://issues.cloudera.org/projects/KUDU/summary
>
> This JIRA instance contains bugs and development discussion dating back 2
> years prior to Kudu’s open source release and will provide an initial seed
> for the ASF JIRA.
>
>
>
> ==== Community discussion ====
>
> Kudu has several public discussion forums, linked here:
> http://getkudu.io/community.html
>
>
>
> ==== Build Infrastructure ====
>
> The Kudu Gerrit instance is configured to only allow patches to be
> committed after running them through an extensive set of pre-commit tests
> and code lints. The project currently makes use of elastic public cloud
> resources to perform these tests. Until this point, these resources have
> been internal to Cloudera, though we are currently investing in moving to a
> publicly accessible infrastructure.
>
>
>
> ==== Development practices ====
>
> Given that Kudu is a persistent storage engine, the community has a high
> quality bar for contributions to its core. We have a firm belief that high
> quality is achieved through automation, not manual inspection, and hence
> put a focus on thorough testing and build infrastructure to ensure that
> bar. The development community also practices review-then-commit for all
> changes to ensure that changes are accompanied by appropriate tests, are
> well commented, etc.
>
> Rather than seeing these practices as barriers to contribution, we believe
> that a fully automated and standardized review and testing practice makes
> it easier for new contributors to have patches accepted. Any new developer
> may post a patch to Gerrit using the same workflow as a seasoned
> contributor, and the same suite of tests will be automatically run. If the
> tests pass, a committer can quickly review and commit the contribution from
> their web browser.
>
> === Meritocracy ===
>
> We believe strongly in meritocracy in electing committers and PMC members.
> We believe that contributions can come in forms other than just code: for
> example, one of our initial proposed committers has contributed solely in
> the area of project documentation. We will encourage contributions and
> participation of all types, and ensure that contributors are appropriately
> recognized.
>
> === Community ===
>
> Though Kudu is relatively new as an open source project, it has already
> seen promising growth in its community across several organizations:
>
>  * '''Cloudera''' is the original development sponsor for Kudu.
>  * '''Xiaomi''' has been helping to develop and optimize Kudu for a new
> production use case, contributing code, benchmarks, feedback, and
> conference talks.
>  * '''Intel''' has contributed optimizations related to their hardware
> technologies.
>  * '''Dropbox''' has been experimenting with Kudu for a machine monitoring
> use case, and has been contributing bug reports and product feedback.
>  * '''Dremio''' is working on integration with Apache Drill and exploring
> using Kudu in a production use case.
>  * Several community-built Docker images, tutorials, and blog posts have
> sprouted up since Kudu’s release.
>
>
>
> By bringing Kudu to Apache, we hope to encourage further contribution from
> the above organizations as well as to engage new users and contributors in
> the community.
>
> === Core Developers ===
>
> Kudu was initially developed as a project at Cloudera. Most of the
> contributions to date have been by developers employed by Cloudera.
>
>
>
> Many of the developers are committers or PMC members on other Apache
> projects.
>
> === Alignment ===
>
> As a project in the big data ecosystem, Kudu is aligned with several other
> ASF projects. Kudu includes input/output format integration with Apache
> Hadoop, and this integration can also provide a bridge to Apache Spark. We
> are planning to integrate with Apache Hive in the near future. We also
> integrate closely with Cloudera Impala, which is also currently being
> proposed for incubation. We have also scheduled a hackathon with the Apache
> Drill team to work on integration with that query engine.
>
> == Known Risks ==
>
> === Orphaned Products ===
>
> The risk of Kudu being abandoned is low. Cloudera has invested a great deal
> in the initial development of the project, and intends to grow its
> investment over time as Kudu becomes a product adopted by its customer
> base. Several other organizations are also experimenting with Kudu for
> production use cases which would live for many years.
>
> === Inexperience with Open Source ===
>
> Kudu has been released in the open for less than two months. However, from
> our very first public announcement we have been committed to open-source
> style development:
>
>  * our code reviews are fully public and documented on a mailing list
>  * our daily development chatter is in a public chat room
>  * we send out weekly “community status” reports highlighting news and
> contributions
>  * we published our entire JIRA history and discuss bugs in the open
>  * we published our entire Git commit history, going back three years (no
> squashing)
>
>
>
> Several of the initial committers are experienced open source developers,
> several being committers and/or PMC members on other ASF projects (Hadoop,
> HBase, Thrift, Flume, et al). Those who are not ASF committers have
> experience on non-ASF open source projects (Kiji, open-vm-tools, et al).
>
> === Homogenous Developers ===
>
> The initial committers are employees or former employees of Cloudera.
> However, the committers are spread across multiple offices (Palo Alto, San
> Francisco, Melbourne), so the team is familiar with working in a
> distributed environment across varied time zones.
>
>
>
> The project has received some contributions from developers outside of
> Cloudera, and is starting to attract a ''user'' community as well. We hope
> to continue to encourage contributions from these developers and community
> members and grow them into committers after they have had time to continue
> their contributions.
>
> === Reliance on Salaried Developers ===
>
> As mentioned above, the majority of development up to this point has been
> sponsored by Cloudera. We have seen several community users participate in
> discussions who are hobbyists interested in distributed systems and
> databases, and hope that they will continue their participation in the
> project going forward.
>
> === Relationships with Other Apache Products ===
>
> Kudu is currently related to the following other Apache projects:
>
>  * Hadoop: Kudu provides MapReduce input/output formats for integration
>  * Spark: Kudu integrates with Spark via the above-mentioned input formats,
> and work is progressing on support for Spark Data Frames and Spark SQL.
>
>
>
> The Kudu team has reached out to several other Apache projects to start
> discussing integrations, including Flume, Kafka, Hive, and Drill.
>
>
>
> Kudu integrates with Impala, which is also being proposed for incubation.
>
>
>
> Kudu is already collaborating on ValueVector, a proposed TLP spinning out
> from the Apache Drill community.
>
>
>
> We look forward to continuing to integrate and collaborate with these
> communities.
>
> === An Excessive Fascination with the Apache Brand ===
>
> Many of the initial committers are already experienced Apache committers,
> and understand the true value provided by the Apache Way and the principles
> of the ASF. We believe that this development and contribution model is
> especially appropriate for storage products, where Apache’s
> community-over-code philosophy ensures long term viability and
> consensus-based participation.
>
> == Documentation ==
>
>  * Documentation is written in AsciiDoc and committed in the Kudu source
> repository:
>
>  * https://github.com/cloudera/kudu/tree/master/docs
>
>
>
>  * The Kudu web site is version-controlled on the ‘gh-pages’ branch of the
> above repository.
>
>  * A LaTeX whitepaper is also published, and the source is available within
> the same repository.
>  * APIs are documented within the source code as JavaDoc or C++-style
> documentation comments.
>  * Many design documents are stored within the source code repository as
> text files next to the code being documented.
>
> == Source and Intellectual Property Submission Plan ==
>
> The Kudu codebase and web site is currently hosted on GitHub and will be
> transitioned to the ASF repositories during incubation. Kudu is already
> licensed under the Apache 2.0 license.
>
>
>
> Some portions of the code are imported from other open source projects
> under the Apache 2.0, BSD, or MIT licenses, with copyrights held by authors
> other than the initial committers. These copyright notices are maintained
> in those files as well as a top-level NOTICE.txt file. We believe this to
> be permissible under the license terms and ASF policies, and confirmed via
> a recent thread on general@incubator.apache.org .
>
>
>
> The “Kudu” name is not a registered trademark, though before the initial
> release of the project, we performed a trademark search and Cloudera’s
> legal counsel deemed it acceptable in the context of a data storage engine.
> There exists an unrelated open source project by the same name related to
> deployments on Microsoft’s Azure cloud service. We have been in contact
> with legal counsel from Microsoft and have obtained their approval for the
> use of the Kudu name.
>
>
>
> Cloudera currently owns several domain names related to Kudu (getkudu.io,
> kududb.io, et al) which will be transferred to the ASF and redirected to
> the official page during incubation.
>
>
>
> Portions of Kudu are protected by pending or published patents owned by
> Cloudera. Given the protections already granted by the Apache License, we
> do not anticipate any explicit licensing or transfer of this intellectual
> property.
>
> == External Dependencies ==
>
> The full set of dependencies and licenses are listed in
> https://github.com/cloudera/kudu/blob/master/LICENSE.txt
>
> and summarized here:
>
>  * '''Twitter Bootstrap''': Apache 2.0
>  * '''d3''': BSD 3-clause
>  * '''epoch JS library''': MIT
>  * '''lz4''': BSD 2-clause
>  * '''gflags''': BSD 3-clause
>  * '''glog''': BSD 3-clause
>  * '''gperftools''': BSD 3-clause
>  * '''libev''': BSD 2-clause
>  * '''squeasel''':MIT license
>  * '''protobuf''': BSD 3-clause
>  * '''rapidjson''': MIT
>  * '''snappy''': BSD 3-clause
>  * '''trace-viewer''': BSD 3-clause
>  * '''zlib''': zlib license
>  * '''llvm''': University of Illinois/NCSA Open Source (BSD-alike)
>  * '''bitshuffle''': MIT
>  * '''boost''': Boost license
>  * '''curl''': MIT
>  * '''libunwind''': MIT
>  * '''nvml''': BSD 3-clause
>  * '''cyrus-sasl''': Cyrus SASL license (BSD-alike)
>  * '''openssl''': OpenSSL License (BSD-alike)
>
>  * '''Guava''': Apache 2.0
>  * '''StumbleUpon Async''': BSD
>  * '''Apache Hadoop''': Apache 2.0
>  * '''Apache log4j''': Apache 2.0
>  * '''Netty''': Apache 2.0
>  * '''slf4j''': MIT
>  * '''Apache Commons''': Apache 2.0
>  * '''murmur''': Apache 2.0
>
>
> '''Build/test-only dependencies''':
>
>  * '''CMake''': BSD 3-clause
>  * '''gcovr''': BSD 3-clause
>  * '''gmock''': BSD 3-clause
>  * '''Apache Maven''': Apache 2.0
>  * '''JUnit''': EPL
>  * '''Mockito''': MIT
>
> == Cryptography ==
>
> Kudu does not currently include any cryptography-related code.
>
> == Required Resources ==
>
> === Mailing lists ===
>
>  * private@kudu.incubator.apache.org (PMC)
>  * commits@kudu.incubator.apache.org (git push emails)
>  * issues@kudu.incubator.apache.org (JIRA issue feed)
>  * dev@kudu.incubator.apache.org (Gerrit code reviews plus dev discussion)
>  * user@kudu.incubator.apache.org (User questions)
>
>
> === Repository ===
>
>  * git://git.apache.org/kudu
>
> === Gerrit ===
>
> We hope to continue using Gerrit for our code review and commit workflow.
> The Kudu team has already been in contact with Jake Farrell to start
> discussions on how Gerrit can fit into the ASF. We know that several other
> ASF projects and podlings are also interested in Gerrit.
>
>
>
> If the Infrastructure team does not have the bandwidth to support Gerrit,
> we will continue to support our own instance of Gerrit for Kudu, and make
> the necessary integrations such that commits are properly authenticated and
> maintain sufficient provenance to uphold the ASF standards (e.g. via the
> solution adopted by the AsterixDB podling).
>
> == Issue Tracking ==
>
> We would like to import our current JIRA project into the ASF JIRA, such
> that our historical commit messages and code comments continue to reference
> the appropriate bug numbers.
>
> == Initial Committers ==
>
>  * Adar Dembo adar@cloudera.com
>  * Alex Feinberg alex@strlen.net
>  * Andrew Wang wang@apache.org
>  * Dan Burkert dan@cloudera.com
>  * David Alves dralves@apache.org
>  * Jean-Daniel Cryans jdcryans@apache.org
>  * Mike Percy mpercy@apache.org
>  * Misty Stanley-Jones misty@apache.org
>  * Todd Lipcon todd@apache.org
>
> The initial list of committers was seeded by listing those contributors who
> have contributed 20 or more patches in the last 12 months, indicating that
> they are active and have achieved merit through participation on the
> project. We chose not to include other contributors who either have not yet
> contributed a significant number of patches, or whose contributions are far
> in the past and we don’t expect to be active within the ASF.
>
> == Affiliations ==
>
>  * Adar Dembo - Cloudera
>  * Alex Feinberg - Forward Networks
>  * Andrew Wang - Cloudera
>  * Dan Burkert - Cloudera
>  * David Alves - Cloudera
>  * Jean-Daniel Cryans - Cloudera
>  * Mike Percy - Cloudera
>  * Misty Stanley-Jones - Cloudera
>  * Todd Lipcon - Cloudera
>
> == Sponsors ==
>
> === Champion ===
>
>  * Todd Lipcon
>
> === Nominated Mentors ===
>
>  * Jake Farrell - ASF Member and Infra team member, Acquia
>  * Brock Noland - ASF Member, StreamSets
>  * Michael Stack - ASF Member, Cloudera
>  * Jarek Jarcec Cecho - ASF Member, Cloudera
>  * Chris Mattmann - ASF Member, NASA JPL and USC
>  * Julien Le Dem - Incubator PMC, Dremio
>  * Carl Steinbach - ASF Member, LinkedIn
>
> === Sponsoring Entity ===
>
> The Apache Incubator
>

Re: [VOTE] Accept Kudu into the Apache Incubator

Posted by Tom White <to...@apache.org>.

+1 (binding)

Tom

On Tue, Nov 24, 2015 at 7:32 PM, Todd Lipcon <to...@apache.org> wrote:
> Hi all,
>
> Discussion on the [DISCUSS] thread seems to have wound down, so I'd like to
> call a VOTE on acceptance of Kudu into the ASF Incubator. The proposal is
> pasted below and also available on the wiki at:
> https://wiki.apache.org/incubator/KuduProposal
>
> The proposal is unchanged since the original version, except for the
> addition of Carl Steinbach as a Mentor.
>
> Please cast your votes:
>
> [] +1, accept Kudu into the Incubator
> [] +/-0, positive/negative non-counted expression of feelings
> [] -1, do not accept Kudu into the incubator (please state reasoning)
>
> Given the US holiday this week, I imagine many folks are traveling or
> otherwise offline. So, let's run the vote for a full week rather than the
> traditional 72 hours. Unless the IPMC objects to the extended voting
> period, the vote will close on Tues, Dec 1st at noon PST.
>
> Thanks
> -Todd
> -----
>
> = Kudu Proposal =
>
> == Abstract ==
>
> Kudu is a distributed columnar storage engine built for the Apache Hadoop
> ecosystem.
>
> == Proposal ==
>
> Kudu is an open source storage engine for structured data which supports
> low-latency random access together with efficient analytical access
> patterns. Kudu distributes data using horizontal partitioning and
> replicates each partition using Raft consensus, providing low
> mean-time-to-recovery and low tail latencies. Kudu is designed within the
> context of the Apache Hadoop ecosystem and supports many integrations with
> other data analytics projects both inside and outside of the Apache
> Software Foundation.
>
>
>
> We propose to incubate Kudu as a project of the Apache Software Foundation.
>
> == Background ==
>
> In recent years, explosive growth in the amount of data being generated and
> captured by enterprises has resulted in the rapid adoption of open source
> technology which is able to store massive data sets at scale and at low
> cost. In particular, the Apache Hadoop ecosystem has become a focal point
> for such “big data” workloads, because many traditional open source
> database systems have lagged in offering a scalable alternative.
>
>
>
> Structured storage in the Hadoop ecosystem has typically been achieved in
> two ways: for static data sets, data is typically stored on Apache HDFS
> using binary data formats such as Apache Avro or Apache Parquet. However,
> neither HDFS nor these formats has any provision for updating individual
> records, or for efficient random access. Mutable data sets are typically
> stored in semi-structured stores such as Apache HBase or Apache Cassandra.
> These systems allow for low-latency record-level reads and writes, but lag
> far behind the static file formats in terms of sequential read throughput
> for applications such as SQL-based analytics or machine learning.
>
>
>
> Kudu is a new storage system designed and implemented from the ground up to
> fill this gap between high-throughput sequential-access storage systems
> such as HDFS and low-latency random-access systems such as HBase or
> Cassandra. While these existing systems continue to hold advantages in some
> situations, Kudu offers a “happy medium” alternative that can dramatically
> simplify the architecture of many common workloads. In particular, Kudu
> offers a simple API for row-level inserts, updates, and deletes, while
> providing table scans at throughputs similar to Parquet, a commonly-used
> columnar format for static data.
>
>
>
> More information on Kudu can be found at the existing open source project
> website: http://getkudu.io and in particular in the Kudu white-paper PDF:
> http://getkudu.io/kudu.pdf from which the above was excerpted.
>
> == Rationale ==
>
> As described above, Kudu fills an important gap in the open source storage
> ecosystem. After our initial open source project release in September 2015,
> we have seen a great amount of interest across a diverse set of users and
> companies. We believe that, as a storage system, it is critical to build an
> equally diverse set of contributors in the development community. Our
> experiences as committers and PMC members on other Apache projects have
> taught us the value of diverse communities in ensuring both longevity and
> high quality for such foundational systems.
>
> == Initial Goals ==
>
>  * Move the existing codebase, website, documentation, and mailing lists to
> Apache-hosted infrastructure
>  * Work with the infrastructure team to implement and approve our code
> review, build, and testing workflows in the context of the ASF
>  * Incremental development and releases per Apache guidelines
>
> == Current Status ==
>
> ==== Releases ====
>
> Kudu has undergone one public release, tagged here
> https://github.com/cloudera/kudu/tree/kudu0.5.0-release
>
> This initial release was not performed in the typical ASF fashion -- no
> source tarball was released, but rather only convenience binaries made
> available in Cloudera’s repositories. We will adopt the ASF source release
> process upon joining the incubator.
>
>
> ==== Source ====
>
> Kudu’s source is currently hosted on GitHub at
> https://github.com/cloudera/kudu
>
> This repository will be transitioned to Apache’s git hosting during
> incubation.
>
>
>
> ==== Code review ====
>
> Kudu’s code reviews are currently public and hosted on Gerrit at
> http://gerrit.cloudera.org:8080/#/q/status:open+project:kudu
>
> The Kudu developer community is very happy with gerrit and hopes to work
> with the Apache Infrastructure team to figure out how we can continue to
> use Gerrit within ASF policies.
>
>
>
> ==== Issue tracking ====
>
> Kudu’s bug and feature tracking is hosted on JIRA at:
> https://issues.cloudera.org/projects/KUDU/summary
>
> This JIRA instance contains bugs and development discussion dating back 2
> years prior to Kudu’s open source release and will provide an initial seed
> for the ASF JIRA.
>
>
>
> ==== Community discussion ====
>
> Kudu has several public discussion forums, linked here:
> http://getkudu.io/community.html
>
>
>
> ==== Build Infrastructure ====
>
> The Kudu Gerrit instance is configured to only allow patches to be
> committed after running them through an extensive set of pre-commit tests
> and code lints. The project currently makes use of elastic public cloud
> resources to perform these tests. Until this point, these resources have
> been internal to Cloudera, though we are currently investing in moving to a
> publicly accessible infrastructure.
>
>
>
> ==== Development practices ====
>
> Given that Kudu is a persistent storage engine, the community has a high
> quality bar for contributions to its core. We have a firm belief that high
> quality is achieved through automation, not manual inspection, and hence
> put a focus on thorough testing and build infrastructure to ensure that
> bar. The development community also practices review-then-commit for all
> changes to ensure that changes are accompanied by appropriate tests, are
> well commented, etc.
>
> Rather than seeing these practices as barriers to contribution, we believe
> that a fully automated and standardized review and testing practice makes
> it easier for new contributors to have patches accepted. Any new developer
> may post a patch to Gerrit using the same workflow as a seasoned
> contributor, and the same suite of tests will be automatically run. If the
> tests pass, a committer can quickly review and commit the contribution from
> their web browser.
>
> === Meritocracy ===
>
> We believe strongly in meritocracy in electing committers and PMC members.
> We believe that contributions can come in forms other than just code: for
> example, one of our initial proposed committers has contributed solely in
> the area of project documentation. We will encourage contributions and
> participation of all types, and ensure that contributors are appropriately
> recognized.
>
> === Community ===
>
> Though Kudu is relatively new as an open source project, it has already
> seen promising growth in its community across several organizations:
>
>  * '''Cloudera''' is the original development sponsor for Kudu.
>  * '''Xiaomi''' has been helping to develop and optimize Kudu for a new
> production use case, contributing code, benchmarks, feedback, and
> conference talks.
>  * '''Intel''' has contributed optimizations related to their hardware
> technologies.
>  * '''Dropbox''' has been experimenting with Kudu for a machine monitoring
> use case, and has been contributing bug reports and product feedback.
>  * '''Dremio''' is working on integration with Apache Drill and exploring
> using Kudu in a production use case.
>  * Several community-built Docker images, tutorials, and blog posts have
> sprouted up since Kudu’s release.
>
>
>
> By bringing Kudu to Apache, we hope to encourage further contribution from
> the above organizations as well as to engage new users and contributors in
> the community.
>
> === Core Developers ===
>
> Kudu was initially developed as a project at Cloudera. Most of the
> contributions to date have been by developers employed by Cloudera.
>
>
>
> Many of the developers are committers or PMC members on other Apache
> projects.
>
> === Alignment ===
>
> As a project in the big data ecosystem, Kudu is aligned with several other
> ASF projects. Kudu includes input/output format integration with Apache
> Hadoop, and this integration can also provide a bridge to Apache Spark. We
> are planning to integrate with Apache Hive in the near future. We also
> integrate closely with Cloudera Impala, which is also currently being
> proposed for incubation. We have also scheduled a hackathon with the Apache
> Drill team to work on integration with that query engine.
>
> == Known Risks ==
>
> === Orphaned Products ===
>
> The risk of Kudu being abandoned is low. Cloudera has invested a great deal
> in the initial development of the project, and intends to grow its
> investment over time as Kudu becomes a product adopted by its customer
> base. Several other organizations are also experimenting with Kudu for
> production use cases which would live for many years.
>
> === Inexperience with Open Source ===
>
> Kudu has been released in the open for less than two months. However, from
> our very first public announcement we have been committed to open-source
> style development:
>
>  * our code reviews are fully public and documented on a mailing list
>  * our daily development chatter is in a public chat room
>  * we send out weekly “community status” reports highlighting news and
> contributions
>  * we published our entire JIRA history and discuss bugs in the open
>  * we published our entire Git commit history, going back three years (no
> squashing)
>
>
>
> Several of the initial committers are experienced open source developers,
> several being committers and/or PMC members on other ASF projects (Hadoop,
> HBase, Thrift, Flume, et al). Those who are not ASF committers have
> experience on non-ASF open source projects (Kiji, open-vm-tools, et al).
>
> === Homogenous Developers ===
>
> The initial committers are employees or former employees of Cloudera.
> However, the committers are spread across multiple offices (Palo Alto, San
> Francisco, Melbourne), so the team is familiar with working in a
> distributed environment across varied time zones.
>
>
>
> The project has received some contributions from developers outside of
> Cloudera, and is starting to attract a ''user'' community as well. We hope
> to continue to encourage contributions from these developers and community
> members and grow them into committers after they have had time to continue
> their contributions.
>
> === Reliance on Salaried Developers ===
>
> As mentioned above, the majority of development up to this point has been
> sponsored by Cloudera. We have seen several community users participate in
> discussions who are hobbyists interested in distributed systems and
> databases, and hope that they will continue their participation in the
> project going forward.
>
> === Relationships with Other Apache Products ===
>
> Kudu is currently related to the following other Apache projects:
>
>  * Hadoop: Kudu provides MapReduce input/output formats for integration
>  * Spark: Kudu integrates with Spark via the above-mentioned input formats,
> and work is progressing on support for Spark Data Frames and Spark SQL.
>
>
>
> The Kudu team has reached out to several other Apache projects to start
> discussing integrations, including Flume, Kafka, Hive, and Drill.
>
>
>
> Kudu integrates with Impala, which is also being proposed for incubation.
>
>
>
> Kudu is already collaborating on ValueVector, a proposed TLP spinning out
> from the Apache Drill community.
>
>
>
> We look forward to continuing to integrate and collaborate with these
> communities.
>
> === An Excessive Fascination with the Apache Brand ===
>
> Many of the initial committers are already experienced Apache committers,
> and understand the true value provided by the Apache Way and the principles
> of the ASF. We believe that this development and contribution model is
> especially appropriate for storage products, where Apache’s
> community-over-code philosophy ensures long term viability and
> consensus-based participation.
>
> == Documentation ==
>
>  * Documentation is written in AsciiDoc and committed in the Kudu source
> repository:
>
>  * https://github.com/cloudera/kudu/tree/master/docs
>
>
>
>  * The Kudu web site is version-controlled on the ‘gh-pages’ branch of the
> above repository.
>
>  * A LaTeX whitepaper is also published, and the source is available within
> the same repository.
>  * APIs are documented within the source code as JavaDoc or C++-style
> documentation comments.
>  * Many design documents are stored within the source code repository as
> text files next to the code being documented.
>
> == Source and Intellectual Property Submission Plan ==
>
> The Kudu codebase and web site is currently hosted on GitHub and will be
> transitioned to the ASF repositories during incubation. Kudu is already
> licensed under the Apache 2.0 license.
>
>
>
> Some portions of the code are imported from other open source projects
> under the Apache 2.0, BSD, or MIT licenses, with copyrights held by authors
> other than the initial committers. These copyright notices are maintained
> in those files as well as a top-level NOTICE.txt file. We believe this to
> be permissible under the license terms and ASF policies, and confirmed via
> a recent thread on general@incubator.apache.org .
>
>
>
> The “Kudu” name is not a registered trademark, though before the initial
> release of the project, we performed a trademark search and Cloudera’s
> legal counsel deemed it acceptable in the context of a data storage engine.
> There exists an unrelated open source project by the same name related to
> deployments on Microsoft’s Azure cloud service. We have been in contact
> with legal counsel from Microsoft and have obtained their approval for the
> use of the Kudu name.
>
>
>
> Cloudera currently owns several domain names related to Kudu (getkudu.io,
> kududb.io, et al) which will be transferred to the ASF and redirected to
> the official page during incubation.
>
>
>
> Portions of Kudu are protected by pending or published patents owned by
> Cloudera. Given the protections already granted by the Apache License, we
> do not anticipate any explicit licensing or transfer of this intellectual
> property.
>
> == External Dependencies ==
>
> The full set of dependencies and licenses are listed in
> https://github.com/cloudera/kudu/blob/master/LICENSE.txt
>
> and summarized here:
>
>  * '''Twitter Bootstrap''': Apache 2.0
>  * '''d3''': BSD 3-clause
>  * '''epoch JS library''': MIT
>  * '''lz4''': BSD 2-clause
>  * '''gflags''': BSD 3-clause
>  * '''glog''': BSD 3-clause
>  * '''gperftools''': BSD 3-clause
>  * '''libev''': BSD 2-clause
>  * '''squeasel''':MIT license
>  * '''protobuf''': BSD 3-clause
>  * '''rapidjson''': MIT
>  * '''snappy''': BSD 3-clause
>  * '''trace-viewer''': BSD 3-clause
>  * '''zlib''': zlib license
>  * '''llvm''': University of Illinois/NCSA Open Source (BSD-alike)
>  * '''bitshuffle''': MIT
>  * '''boost''': Boost license
>  * '''curl''': MIT
>  * '''libunwind''': MIT
>  * '''nvml''': BSD 3-clause
>  * '''cyrus-sasl''': Cyrus SASL license (BSD-alike)
>  * '''openssl''': OpenSSL License (BSD-alike)
>
>  * '''Guava''': Apache 2.0
>  * '''StumbleUpon Async''': BSD
>  * '''Apache Hadoop''': Apache 2.0
>  * '''Apache log4j''': Apache 2.0
>  * '''Netty''': Apache 2.0
>  * '''slf4j''': MIT
>  * '''Apache Commons''': Apache 2.0
>  * '''murmur''': Apache 2.0
>
>
> '''Build/test-only dependencies''':
>
>  * '''CMake''': BSD 3-clause
>  * '''gcovr''': BSD 3-clause
>  * '''gmock''': BSD 3-clause
>  * '''Apache Maven''': Apache 2.0
>  * '''JUnit''': EPL
>  * '''Mockito''': MIT
>
> == Cryptography ==
>
> Kudu does not currently include any cryptography-related code.
>
> == Required Resources ==
>
> === Mailing lists ===
>
>  * private@kudu.incubator.apache.org (PMC)
>  * commits@kudu.incubator.apache.org (git push emails)
>  * issues@kudu.incubator.apache.org (JIRA issue feed)
>  * dev@kudu.incubator.apache.org (Gerrit code reviews plus dev discussion)
>  * user@kudu.incubator.apache.org (User questions)
>
>
> === Repository ===
>
>  * git://git.apache.org/kudu
>
> === Gerrit ===
>
> We hope to continue using Gerrit for our code review and commit workflow.
> The Kudu team has already been in contact with Jake Farrell to start
> discussions on how Gerrit can fit into the ASF. We know that several other
> ASF projects and podlings are also interested in Gerrit.
>
>
>
> If the Infrastructure team does not have the bandwidth to support Gerrit,
> we will continue to support our own instance of Gerrit for Kudu, and make
> the necessary integrations such that commits are properly authenticated and
> maintain sufficient provenance to uphold the ASF standards (e.g. via the
> solution adopted by the AsterixDB podling).
>
> == Issue Tracking ==
>
> We would like to import our current JIRA project into the ASF JIRA, such
> that our historical commit messages and code comments continue to reference
> the appropriate bug numbers.
>
> == Initial Committers ==
>
>  * Adar Dembo adar@cloudera.com
>  * Alex Feinberg alex@strlen.net
>  * Andrew Wang wang@apache.org
>  * Dan Burkert dan@cloudera.com
>  * David Alves dralves@apache.org
>  * Jean-Daniel Cryans jdcryans@apache.org
>  * Mike Percy mpercy@apache.org
>  * Misty Stanley-Jones misty@apache.org
>  * Todd Lipcon todd@apache.org
>
> The initial list of committers was seeded by listing those contributors who
> have contributed 20 or more patches in the last 12 months, indicating that
> they are active and have achieved merit through participation on the
> project. We chose not to include other contributors who either have not yet
> contributed a significant number of patches, or whose contributions are far
> in the past and we don’t expect to be active within the ASF.
>
> == Affiliations ==
>
>  * Adar Dembo - Cloudera
>  * Alex Feinberg - Forward Networks
>  * Andrew Wang - Cloudera
>  * Dan Burkert - Cloudera
>  * David Alves - Cloudera
>  * Jean-Daniel Cryans - Cloudera
>  * Mike Percy - Cloudera
>  * Misty Stanley-Jones - Cloudera
>  * Todd Lipcon - Cloudera
>
> == Sponsors ==
>
> === Champion ===
>
>  * Todd Lipcon
>
> === Nominated Mentors ===
>
>  * Jake Farrell - ASF Member and Infra team member, Acquia
>  * Brock Noland - ASF Member, StreamSets
>  * Michael Stack - ASF Member, Cloudera
>  * Jarek Jarcec Cecho - ASF Member, Cloudera
>  * Chris Mattmann - ASF Member, NASA JPL and USC
>  * Julien Le Dem - Incubator PMC, Dremio
>  * Carl Steinbach - ASF Member, LinkedIn
>
> === Sponsoring Entity ===
>
> The Apache Incubator

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: [VOTE] Accept Kudu into the Apache Incubator

Posted by Luke Han <lu...@gmail.com>.

+1 (non-binding)


Best Regards!
---------------------

Luke Han

On Wed, Nov 25, 2015 at 7:47 AM, Julien Le Dem <ju...@dremio.com> wrote:

> +1 (binding)
>
> On Tue, Nov 24, 2015 at 1:57 PM, Edward J. Yoon <ed...@apache.org>
> wrote:
>
> > +1 (binding)
> >
> > On Wed, Nov 25, 2015 at 6:26 AM, Patrick Angeles
> > <pa...@gmail.com> wrote:
> > > +1 (non-binding)
> > >
> > > On Tue, Nov 24, 2015 at 4:23 PM, Jake Farrell <jf...@apache.org>
> > wrote:
> > >
> > >> +1 (binding)
> > >>
> > >> -Jake
> > >>
> > >> On Tue, Nov 24, 2015 at 2:32 PM, Todd Lipcon <to...@apache.org> wrote:
> > >>
> > >> > Hi all,
> > >> >
> > >> > Discussion on the [DISCUSS] thread seems to have wound down, so I'd
> > like
> > >> to
> > >> > call a VOTE on acceptance of Kudu into the ASF Incubator. The
> > proposal is
> > >> > pasted below and also available on the wiki at:
> > >> > https://wiki.apache.org/incubator/KuduProposal
> > >> >
> > >> > The proposal is unchanged since the original version, except for the
> > >> > addition of Carl Steinbach as a Mentor.
> > >> >
> > >> > Please cast your votes:
> > >> >
> > >> > [] +1, accept Kudu into the Incubator
> > >> > [] +/-0, positive/negative non-counted expression of feelings
> > >> > [] -1, do not accept Kudu into the incubator (please state
> reasoning)
> > >> >
> > >> > Given the US holiday this week, I imagine many folks are traveling
> or
> > >> > otherwise offline. So, let's run the vote for a full week rather
> than
> > the
> > >> > traditional 72 hours. Unless the IPMC objects to the extended voting
> > >> > period, the vote will close on Tues, Dec 1st at noon PST.
> > >> >
> > >> > Thanks
> > >> > -Todd
> > >> > -----
> > >> >
> > >> > = Kudu Proposal =
> > >> >
> > >> > == Abstract ==
> > >> >
> > >> > Kudu is a distributed columnar storage engine built for the Apache
> > Hadoop
> > >> > ecosystem.
> > >> >
> > >> > == Proposal ==
> > >> >
> > >> > Kudu is an open source storage engine for structured data which
> > supports
> > >> > low-latency random access together with efficient analytical access
> > >> > patterns. Kudu distributes data using horizontal partitioning and
> > >> > replicates each partition using Raft consensus, providing low
> > >> > mean-time-to-recovery and low tail latencies. Kudu is designed
> within
> > the
> > >> > context of the Apache Hadoop ecosystem and supports many
> integrations
> > >> with
> > >> > other data analytics projects both inside and outside of the Apache
> > >> > Software Foundation.
> > >> >
> > >> >
> > >> >
> > >> > We propose to incubate Kudu as a project of the Apache Software
> > >> Foundation.
> > >> >
> > >> > == Background ==
> > >> >
> > >> > In recent years, explosive growth in the amount of data being
> > generated
> > >> and
> > >> > captured by enterprises has resulted in the rapid adoption of open
> > source
> > >> > technology which is able to store massive data sets at scale and at
> > low
> > >> > cost. In particular, the Apache Hadoop ecosystem has become a focal
> > point
> > >> > for such “big data” workloads, because many traditional open source
> > >> > database systems have lagged in offering a scalable alternative.
> > >> >
> > >> >
> > >> >
> > >> > Structured storage in the Hadoop ecosystem has typically been
> > achieved in
> > >> > two ways: for static data sets, data is typically stored on Apache
> > HDFS
> > >> > using binary data formats such as Apache Avro or Apache Parquet.
> > However,
> > >> > neither HDFS nor these formats has any provision for updating
> > individual
> > >> > records, or for efficient random access. Mutable data sets are
> > typically
> > >> > stored in semi-structured stores such as Apache HBase or Apache
> > >> Cassandra.
> > >> > These systems allow for low-latency record-level reads and writes,
> but
> > >> lag
> > >> > far behind the static file formats in terms of sequential read
> > throughput
> > >> > for applications such as SQL-based analytics or machine learning.
> > >> >
> > >> >
> > >> >
> > >> > Kudu is a new storage system designed and implemented from the
> ground
> > up
> > >> to
> > >> > fill this gap between high-throughput sequential-access storage
> > systems
> > >> > such as HDFS and low-latency random-access systems such as HBase or
> > >> > Cassandra. While these existing systems continue to hold advantages
> in
> > >> some
> > >> > situations, Kudu offers a “happy medium” alternative that can
> > >> dramatically
> > >> > simplify the architecture of many common workloads. In particular,
> > Kudu
> > >> > offers a simple API for row-level inserts, updates, and deletes,
> while
> > >> > providing table scans at throughputs similar to Parquet, a
> > commonly-used
> > >> > columnar format for static data.
> > >> >
> > >> >
> > >> >
> > >> > More information on Kudu can be found at the existing open source
> > project
> > >> > website: http://getkudu.io and in particular in the Kudu
> white-paper
> > >> PDF:
> > >> > http://getkudu.io/kudu.pdf from which the above was excerpted.
> > >> >
> > >> > == Rationale ==
> > >> >
> > >> > As described above, Kudu fills an important gap in the open source
> > >> storage
> > >> > ecosystem. After our initial open source project release in
> September
> > >> 2015,
> > >> > we have seen a great amount of interest across a diverse set of
> users
> > and
> > >> > companies. We believe that, as a storage system, it is critical to
> > build
> > >> an
> > >> > equally diverse set of contributors in the development community.
> Our
> > >> > experiences as committers and PMC members on other Apache projects
> > have
> > >> > taught us the value of diverse communities in ensuring both
> longevity
> > and
> > >> > high quality for such foundational systems.
> > >> >
> > >> > == Initial Goals ==
> > >> >
> > >> >  * Move the existing codebase, website, documentation, and mailing
> > lists
> > >> to
> > >> > Apache-hosted infrastructure
> > >> >  * Work with the infrastructure team to implement and approve our
> code
> > >> > review, build, and testing workflows in the context of the ASF
> > >> >  * Incremental development and releases per Apache guidelines
> > >> >
> > >> > == Current Status ==
> > >> >
> > >> > ==== Releases ====
> > >> >
> > >> > Kudu has undergone one public release, tagged here
> > >> > https://github.com/cloudera/kudu/tree/kudu0.5.0-release
> > >> >
> > >> > This initial release was not performed in the typical ASF fashion --
> > no
> > >> > source tarball was released, but rather only convenience binaries
> made
> > >> > available in Cloudera’s repositories. We will adopt the ASF source
> > >> release
> > >> > process upon joining the incubator.
> > >> >
> > >> >
> > >> > ==== Source ====
> > >> >
> > >> > Kudu’s source is currently hosted on GitHub at
> > >> > https://github.com/cloudera/kudu
> > >> >
> > >> > This repository will be transitioned to Apache’s git hosting during
> > >> > incubation.
> > >> >
> > >> >
> > >> >
> > >> > ==== Code review ====
> > >> >
> > >> > Kudu’s code reviews are currently public and hosted on Gerrit at
> > >> > http://gerrit.cloudera.org:8080/#/q/status:open+project:kudu
> > >> >
> > >> > The Kudu developer community is very happy with gerrit and hopes to
> > work
> > >> > with the Apache Infrastructure team to figure out how we can
> continue
> > to
> > >> > use Gerrit within ASF policies.
> > >> >
> > >> >
> > >> >
> > >> > ==== Issue tracking ====
> > >> >
> > >> > Kudu’s bug and feature tracking is hosted on JIRA at:
> > >> > https://issues.cloudera.org/projects/KUDU/summary
> > >> >
> > >> > This JIRA instance contains bugs and development discussion dating
> > back 2
> > >> > years prior to Kudu’s open source release and will provide an
> initial
> > >> seed
> > >> > for the ASF JIRA.
> > >> >
> > >> >
> > >> >
> > >> > ==== Community discussion ====
> > >> >
> > >> > Kudu has several public discussion forums, linked here:
> > >> > http://getkudu.io/community.html
> > >> >
> > >> >
> > >> >
> > >> > ==== Build Infrastructure ====
> > >> >
> > >> > The Kudu Gerrit instance is configured to only allow patches to be
> > >> > committed after running them through an extensive set of pre-commit
> > tests
> > >> > and code lints. The project currently makes use of elastic public
> > cloud
> > >> > resources to perform these tests. Until this point, these resources
> > have
> > >> > been internal to Cloudera, though we are currently investing in
> moving
> > >> to a
> > >> > publicly accessible infrastructure.
> > >> >
> > >> >
> > >> >
> > >> > ==== Development practices ====
> > >> >
> > >> > Given that Kudu is a persistent storage engine, the community has a
> > high
> > >> > quality bar for contributions to its core. We have a firm belief
> that
> > >> high
> > >> > quality is achieved through automation, not manual inspection, and
> > hence
> > >> > put a focus on thorough testing and build infrastructure to ensure
> > that
> > >> > bar. The development community also practices review-then-commit for
> > all
> > >> > changes to ensure that changes are accompanied by appropriate tests,
> > are
> > >> > well commented, etc.
> > >> >
> > >> > Rather than seeing these practices as barriers to contribution, we
> > >> believe
> > >> > that a fully automated and standardized review and testing practice
> > makes
> > >> > it easier for new contributors to have patches accepted. Any new
> > >> developer
> > >> > may post a patch to Gerrit using the same workflow as a seasoned
> > >> > contributor, and the same suite of tests will be automatically run.
> If
> > >> the
> > >> > tests pass, a committer can quickly review and commit the
> contribution
> > >> from
> > >> > their web browser.
> > >> >
> > >> > === Meritocracy ===
> > >> >
> > >> > We believe strongly in meritocracy in electing committers and PMC
> > >> members.
> > >> > We believe that contributions can come in forms other than just
> code:
> > for
> > >> > example, one of our initial proposed committers has contributed
> > solely in
> > >> > the area of project documentation. We will encourage contributions
> and
> > >> > participation of all types, and ensure that contributors are
> > >> appropriately
> > >> > recognized.
> > >> >
> > >> > === Community ===
> > >> >
> > >> > Though Kudu is relatively new as an open source project, it has
> > already
> > >> > seen promising growth in its community across several organizations:
> > >> >
> > >> >  * '''Cloudera''' is the original development sponsor for Kudu.
> > >> >  * '''Xiaomi''' has been helping to develop and optimize Kudu for a
> > new
> > >> > production use case, contributing code, benchmarks, feedback, and
> > >> > conference talks.
> > >> >  * '''Intel''' has contributed optimizations related to their
> hardware
> > >> > technologies.
> > >> >  * '''Dropbox''' has been experimenting with Kudu for a machine
> > >> monitoring
> > >> > use case, and has been contributing bug reports and product
> feedback.
> > >> >  * '''Dremio''' is working on integration with Apache Drill and
> > exploring
> > >> > using Kudu in a production use case.
> > >> >  * Several community-built Docker images, tutorials, and blog posts
> > have
> > >> > sprouted up since Kudu’s release.
> > >> >
> > >> >
> > >> >
> > >> > By bringing Kudu to Apache, we hope to encourage further
> contribution
> > >> from
> > >> > the above organizations as well as to engage new users and
> > contributors
> > >> in
> > >> > the community.
> > >> >
> > >> > === Core Developers ===
> > >> >
> > >> > Kudu was initially developed as a project at Cloudera. Most of the
> > >> > contributions to date have been by developers employed by Cloudera.
> > >> >
> > >> >
> > >> >
> > >> > Many of the developers are committers or PMC members on other Apache
> > >> > projects.
> > >> >
> > >> > === Alignment ===
> > >> >
> > >> > As a project in the big data ecosystem, Kudu is aligned with several
> > >> other
> > >> > ASF projects. Kudu includes input/output format integration with
> > Apache
> > >> > Hadoop, and this integration can also provide a bridge to Apache
> > Spark.
> > >> We
> > >> > are planning to integrate with Apache Hive in the near future. We
> also
> > >> > integrate closely with Cloudera Impala, which is also currently
> being
> > >> > proposed for incubation. We have also scheduled a hackathon with the
> > >> Apache
> > >> > Drill team to work on integration with that query engine.
> > >> >
> > >> > == Known Risks ==
> > >> >
> > >> > === Orphaned Products ===
> > >> >
> > >> > The risk of Kudu being abandoned is low. Cloudera has invested a
> great
> > >> deal
> > >> > in the initial development of the project, and intends to grow its
> > >> > investment over time as Kudu becomes a product adopted by its
> customer
> > >> > base. Several other organizations are also experimenting with Kudu
> for
> > >> > production use cases which would live for many years.
> > >> >
> > >> > === Inexperience with Open Source ===
> > >> >
> > >> > Kudu has been released in the open for less than two months.
> However,
> > >> from
> > >> > our very first public announcement we have been committed to
> > open-source
> > >> > style development:
> > >> >
> > >> >  * our code reviews are fully public and documented on a mailing
> list
> > >> >  * our daily development chatter is in a public chat room
> > >> >  * we send out weekly “community status” reports highlighting news
> and
> > >> > contributions
> > >> >  * we published our entire JIRA history and discuss bugs in the open
> > >> >  * we published our entire Git commit history, going back three
> years
> > (no
> > >> > squashing)
> > >> >
> > >> >
> > >> >
> > >> > Several of the initial committers are experienced open source
> > developers,
> > >> > several being committers and/or PMC members on other ASF projects
> > >> (Hadoop,
> > >> > HBase, Thrift, Flume, et al). Those who are not ASF committers have
> > >> > experience on non-ASF open source projects (Kiji, open-vm-tools, et
> > al).
> > >> >
> > >> > === Homogenous Developers ===
> > >> >
> > >> > The initial committers are employees or former employees of
> Cloudera.
> > >> > However, the committers are spread across multiple offices (Palo
> Alto,
> > >> San
> > >> > Francisco, Melbourne), so the team is familiar with working in a
> > >> > distributed environment across varied time zones.
> > >> >
> > >> >
> > >> >
> > >> > The project has received some contributions from developers outside
> of
> > >> > Cloudera, and is starting to attract a ''user'' community as well.
> We
> > >> hope
> > >> > to continue to encourage contributions from these developers and
> > >> community
> > >> > members and grow them into committers after they have had time to
> > >> continue
> > >> > their contributions.
> > >> >
> > >> > === Reliance on Salaried Developers ===
> > >> >
> > >> > As mentioned above, the majority of development up to this point has
> > been
> > >> > sponsored by Cloudera. We have seen several community users
> > participate
> > >> in
> > >> > discussions who are hobbyists interested in distributed systems and
> > >> > databases, and hope that they will continue their participation in
> the
> > >> > project going forward.
> > >> >
> > >> > === Relationships with Other Apache Products ===
> > >> >
> > >> > Kudu is currently related to the following other Apache projects:
> > >> >
> > >> >  * Hadoop: Kudu provides MapReduce input/output formats for
> > integration
> > >> >  * Spark: Kudu integrates with Spark via the above-mentioned input
> > >> formats,
> > >> > and work is progressing on support for Spark Data Frames and Spark
> > SQL.
> > >> >
> > >> >
> > >> >
> > >> > The Kudu team has reached out to several other Apache projects to
> > start
> > >> > discussing integrations, including Flume, Kafka, Hive, and Drill.
> > >> >
> > >> >
> > >> >
> > >> > Kudu integrates with Impala, which is also being proposed for
> > incubation.
> > >> >
> > >> >
> > >> >
> > >> > Kudu is already collaborating on ValueVector, a proposed TLP
> spinning
> > out
> > >> > from the Apache Drill community.
> > >> >
> > >> >
> > >> >
> > >> > We look forward to continuing to integrate and collaborate with
> these
> > >> > communities.
> > >> >
> > >> > === An Excessive Fascination with the Apache Brand ===
> > >> >
> > >> > Many of the initial committers are already experienced Apache
> > committers,
> > >> > and understand the true value provided by the Apache Way and the
> > >> principles
> > >> > of the ASF. We believe that this development and contribution model
> is
> > >> > especially appropriate for storage products, where Apache’s
> > >> > community-over-code philosophy ensures long term viability and
> > >> > consensus-based participation.
> > >> >
> > >> > == Documentation ==
> > >> >
> > >> >  * Documentation is written in AsciiDoc and committed in the Kudu
> > source
> > >> > repository:
> > >> >
> > >> >  * https://github.com/cloudera/kudu/tree/master/docs
> > >> >
> > >> >
> > >> >
> > >> >  * The Kudu web site is version-controlled on the ‘gh-pages’ branch
> of
> > >> the
> > >> > above repository.
> > >> >
> > >> >  * A LaTeX whitepaper is also published, and the source is available
> > >> within
> > >> > the same repository.
> > >> >  * APIs are documented within the source code as JavaDoc or
> C++-style
> > >> > documentation comments.
> > >> >  * Many design documents are stored within the source code
> repository
> > as
> > >> > text files next to the code being documented.
> > >> >
> > >> > == Source and Intellectual Property Submission Plan ==
> > >> >
> > >> > The Kudu codebase and web site is currently hosted on GitHub and
> will
> > be
> > >> > transitioned to the ASF repositories during incubation. Kudu is
> > already
> > >> > licensed under the Apache 2.0 license.
> > >> >
> > >> >
> > >> >
> > >> > Some portions of the code are imported from other open source
> projects
> > >> > under the Apache 2.0, BSD, or MIT licenses, with copyrights held by
> > >> authors
> > >> > other than the initial committers. These copyright notices are
> > maintained
> > >> > in those files as well as a top-level NOTICE.txt file. We believe
> > this to
> > >> > be permissible under the license terms and ASF policies, and
> confirmed
> > >> via
> > >> > a recent thread on general@incubator.apache.org .
> > >> >
> > >> >
> > >> >
> > >> > The “Kudu” name is not a registered trademark, though before the
> > initial
> > >> > release of the project, we performed a trademark search and
> Cloudera’s
> > >> > legal counsel deemed it acceptable in the context of a data storage
> > >> engine.
> > >> > There exists an unrelated open source project by the same name
> > related to
> > >> > deployments on Microsoft’s Azure cloud service. We have been in
> > contact
> > >> > with legal counsel from Microsoft and have obtained their approval
> for
> > >> the
> > >> > use of the Kudu name.
> > >> >
> > >> >
> > >> >
> > >> > Cloudera currently owns several domain names related to Kudu (
> > getkudu.io
> > >> ,
> > >> > kududb.io, et al) which will be transferred to the ASF and
> > redirected to
> > >> > the official page during incubation.
> > >> >
> > >> >
> > >> >
> > >> > Portions of Kudu are protected by pending or published patents owned
> > by
> > >> > Cloudera. Given the protections already granted by the Apache
> > License, we
> > >> > do not anticipate any explicit licensing or transfer of this
> > intellectual
> > >> > property.
> > >> >
> > >> > == External Dependencies ==
> > >> >
> > >> > The full set of dependencies and licenses are listed in
> > >> > https://github.com/cloudera/kudu/blob/master/LICENSE.txt
> > >> >
> > >> > and summarized here:
> > >> >
> > >> >  * '''Twitter Bootstrap''': Apache 2.0
> > >> >  * '''d3''': BSD 3-clause
> > >> >  * '''epoch JS library''': MIT
> > >> >  * '''lz4''': BSD 2-clause
> > >> >  * '''gflags''': BSD 3-clause
> > >> >  * '''glog''': BSD 3-clause
> > >> >  * '''gperftools''': BSD 3-clause
> > >> >  * '''libev''': BSD 2-clause
> > >> >  * '''squeasel''':MIT license
> > >> >  * '''protobuf''': BSD 3-clause
> > >> >  * '''rapidjson''': MIT
> > >> >  * '''snappy''': BSD 3-clause
> > >> >  * '''trace-viewer''': BSD 3-clause
> > >> >  * '''zlib''': zlib license
> > >> >  * '''llvm''': University of Illinois/NCSA Open Source (BSD-alike)
> > >> >  * '''bitshuffle''': MIT
> > >> >  * '''boost''': Boost license
> > >> >  * '''curl''': MIT
> > >> >  * '''libunwind''': MIT
> > >> >  * '''nvml''': BSD 3-clause
> > >> >  * '''cyrus-sasl''': Cyrus SASL license (BSD-alike)
> > >> >  * '''openssl''': OpenSSL License (BSD-alike)
> > >> >
> > >> >  * '''Guava''': Apache 2.0
> > >> >  * '''StumbleUpon Async''': BSD
> > >> >  * '''Apache Hadoop''': Apache 2.0
> > >> >  * '''Apache log4j''': Apache 2.0
> > >> >  * '''Netty''': Apache 2.0
> > >> >  * '''slf4j''': MIT
> > >> >  * '''Apache Commons''': Apache 2.0
> > >> >  * '''murmur''': Apache 2.0
> > >> >
> > >> >
> > >> > '''Build/test-only dependencies''':
> > >> >
> > >> >  * '''CMake''': BSD 3-clause
> > >> >  * '''gcovr''': BSD 3-clause
> > >> >  * '''gmock''': BSD 3-clause
> > >> >  * '''Apache Maven''': Apache 2.0
> > >> >  * '''JUnit''': EPL
> > >> >  * '''Mockito''': MIT
> > >> >
> > >> > == Cryptography ==
> > >> >
> > >> > Kudu does not currently include any cryptography-related code.
> > >> >
> > >> > == Required Resources ==
> > >> >
> > >> > === Mailing lists ===
> > >> >
> > >> >  * private@kudu.incubator.apache.org (PMC)
> > >> >  * commits@kudu.incubator.apache.org (git push emails)
> > >> >  * issues@kudu.incubator.apache.org (JIRA issue feed)
> > >> >  * dev@kudu.incubator.apache.org (Gerrit code reviews plus dev
> > >> discussion)
> > >> >  * user@kudu.incubator.apache.org (User questions)
> > >> >
> > >> >
> > >> > === Repository ===
> > >> >
> > >> >  * git://git.apache.org/kudu
> > >> >
> > >> > === Gerrit ===
> > >> >
> > >> > We hope to continue using Gerrit for our code review and commit
> > workflow.
> > >> > The Kudu team has already been in contact with Jake Farrell to start
> > >> > discussions on how Gerrit can fit into the ASF. We know that several
> > >> other
> > >> > ASF projects and podlings are also interested in Gerrit.
> > >> >
> > >> >
> > >> >
> > >> > If the Infrastructure team does not have the bandwidth to support
> > Gerrit,
> > >> > we will continue to support our own instance of Gerrit for Kudu, and
> > make
> > >> > the necessary integrations such that commits are properly
> > authenticated
> > >> and
> > >> > maintain sufficient provenance to uphold the ASF standards (e.g. via
> > the
> > >> > solution adopted by the AsterixDB podling).
> > >> >
> > >> > == Issue Tracking ==
> > >> >
> > >> > We would like to import our current JIRA project into the ASF JIRA,
> > such
> > >> > that our historical commit messages and code comments continue to
> > >> reference
> > >> > the appropriate bug numbers.
> > >> >
> > >> > == Initial Committers ==
> > >> >
> > >> >  * Adar Dembo adar@cloudera.com
> > >> >  * Alex Feinberg alex@strlen.net
> > >> >  * Andrew Wang wang@apache.org
> > >> >  * Dan Burkert dan@cloudera.com
> > >> >  * David Alves dralves@apache.org
> > >> >  * Jean-Daniel Cryans jdcryans@apache.org
> > >> >  * Mike Percy mpercy@apache.org
> > >> >  * Misty Stanley-Jones misty@apache.org
> > >> >  * Todd Lipcon todd@apache.org
> > >> >
> > >> > The initial list of committers was seeded by listing those
> > contributors
> > >> who
> > >> > have contributed 20 or more patches in the last 12 months,
> indicating
> > >> that
> > >> > they are active and have achieved merit through participation on the
> > >> > project. We chose not to include other contributors who either have
> > not
> > >> yet
> > >> > contributed a significant number of patches, or whose contributions
> > are
> > >> far
> > >> > in the past and we don’t expect to be active within the ASF.
> > >> >
> > >> > == Affiliations ==
> > >> >
> > >> >  * Adar Dembo - Cloudera
> > >> >  * Alex Feinberg - Forward Networks
> > >> >  * Andrew Wang - Cloudera
> > >> >  * Dan Burkert - Cloudera
> > >> >  * David Alves - Cloudera
> > >> >  * Jean-Daniel Cryans - Cloudera
> > >> >  * Mike Percy - Cloudera
> > >> >  * Misty Stanley-Jones - Cloudera
> > >> >  * Todd Lipcon - Cloudera
> > >> >
> > >> > == Sponsors ==
> > >> >
> > >> > === Champion ===
> > >> >
> > >> >  * Todd Lipcon
> > >> >
> > >> > === Nominated Mentors ===
> > >> >
> > >> >  * Jake Farrell - ASF Member and Infra team member, Acquia
> > >> >  * Brock Noland - ASF Member, StreamSets
> > >> >  * Michael Stack - ASF Member, Cloudera
> > >> >  * Jarek Jarcec Cecho - ASF Member, Cloudera
> > >> >  * Chris Mattmann - ASF Member, NASA JPL and USC
> > >> >  * Julien Le Dem - Incubator PMC, Dremio
> > >> >  * Carl Steinbach - ASF Member, LinkedIn
> > >> >
> > >> > === Sponsoring Entity ===
> > >> >
> > >> > The Apache Incubator
> > >> >
> > >>
> >
> >
> >
> > --
> > Best Regards, Edward J. Yoon
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> > For additional commands, e-mail: general-help@incubator.apache.org
> >
> >
>
>
> --
> Julien
>

Re: [VOTE] Accept Kudu into the Apache Incubator

Posted by Julien Le Dem <ju...@dremio.com>.

+1 (binding)

On Tue, Nov 24, 2015 at 1:57 PM, Edward J. Yoon <ed...@apache.org>
wrote:

> +1 (binding)
>
> On Wed, Nov 25, 2015 at 6:26 AM, Patrick Angeles
> <pa...@gmail.com> wrote:
> > +1 (non-binding)
> >
> > On Tue, Nov 24, 2015 at 4:23 PM, Jake Farrell <jf...@apache.org>
> wrote:
> >
> >> +1 (binding)
> >>
> >> -Jake
> >>
> >> On Tue, Nov 24, 2015 at 2:32 PM, Todd Lipcon <to...@apache.org> wrote:
> >>
> >> > Hi all,
> >> >
> >> > Discussion on the [DISCUSS] thread seems to have wound down, so I'd
> like
> >> to
> >> > call a VOTE on acceptance of Kudu into the ASF Incubator. The
> proposal is
> >> > pasted below and also available on the wiki at:
> >> > https://wiki.apache.org/incubator/KuduProposal
> >> >
> >> > The proposal is unchanged since the original version, except for the
> >> > addition of Carl Steinbach as a Mentor.
> >> >
> >> > Please cast your votes:
> >> >
> >> > [] +1, accept Kudu into the Incubator
> >> > [] +/-0, positive/negative non-counted expression of feelings
> >> > [] -1, do not accept Kudu into the incubator (please state reasoning)
> >> >
> >> > Given the US holiday this week, I imagine many folks are traveling or
> >> > otherwise offline. So, let's run the vote for a full week rather than
> the
> >> > traditional 72 hours. Unless the IPMC objects to the extended voting
> >> > period, the vote will close on Tues, Dec 1st at noon PST.
> >> >
> >> > Thanks
> >> > -Todd
> >> > -----
> >> >
> >> > = Kudu Proposal =
> >> >
> >> > == Abstract ==
> >> >
> >> > Kudu is a distributed columnar storage engine built for the Apache
> Hadoop
> >> > ecosystem.
> >> >
> >> > == Proposal ==
> >> >
> >> > Kudu is an open source storage engine for structured data which
> supports
> >> > low-latency random access together with efficient analytical access
> >> > patterns. Kudu distributes data using horizontal partitioning and
> >> > replicates each partition using Raft consensus, providing low
> >> > mean-time-to-recovery and low tail latencies. Kudu is designed within
> the
> >> > context of the Apache Hadoop ecosystem and supports many integrations
> >> with
> >> > other data analytics projects both inside and outside of the Apache
> >> > Software Foundation.
> >> >
> >> >
> >> >
> >> > We propose to incubate Kudu as a project of the Apache Software
> >> Foundation.
> >> >
> >> > == Background ==
> >> >
> >> > In recent years, explosive growth in the amount of data being
> generated
> >> and
> >> > captured by enterprises has resulted in the rapid adoption of open
> source
> >> > technology which is able to store massive data sets at scale and at
> low
> >> > cost. In particular, the Apache Hadoop ecosystem has become a focal
> point
> >> > for such “big data” workloads, because many traditional open source
> >> > database systems have lagged in offering a scalable alternative.
> >> >
> >> >
> >> >
> >> > Structured storage in the Hadoop ecosystem has typically been
> achieved in
> >> > two ways: for static data sets, data is typically stored on Apache
> HDFS
> >> > using binary data formats such as Apache Avro or Apache Parquet.
> However,
> >> > neither HDFS nor these formats has any provision for updating
> individual
> >> > records, or for efficient random access. Mutable data sets are
> typically
> >> > stored in semi-structured stores such as Apache HBase or Apache
> >> Cassandra.
> >> > These systems allow for low-latency record-level reads and writes, but
> >> lag
> >> > far behind the static file formats in terms of sequential read
> throughput
> >> > for applications such as SQL-based analytics or machine learning.
> >> >
> >> >
> >> >
> >> > Kudu is a new storage system designed and implemented from the ground
> up
> >> to
> >> > fill this gap between high-throughput sequential-access storage
> systems
> >> > such as HDFS and low-latency random-access systems such as HBase or
> >> > Cassandra. While these existing systems continue to hold advantages in
> >> some
> >> > situations, Kudu offers a “happy medium” alternative that can
> >> dramatically
> >> > simplify the architecture of many common workloads. In particular,
> Kudu
> >> > offers a simple API for row-level inserts, updates, and deletes, while
> >> > providing table scans at throughputs similar to Parquet, a
> commonly-used
> >> > columnar format for static data.
> >> >
> >> >
> >> >
> >> > More information on Kudu can be found at the existing open source
> project
> >> > website: http://getkudu.io and in particular in the Kudu white-paper
> >> PDF:
> >> > http://getkudu.io/kudu.pdf from which the above was excerpted.
> >> >
> >> > == Rationale ==
> >> >
> >> > As described above, Kudu fills an important gap in the open source
> >> storage
> >> > ecosystem. After our initial open source project release in September
> >> 2015,
> >> > we have seen a great amount of interest across a diverse set of users
> and
> >> > companies. We believe that, as a storage system, it is critical to
> build
> >> an
> >> > equally diverse set of contributors in the development community. Our
> >> > experiences as committers and PMC members on other Apache projects
> have
> >> > taught us the value of diverse communities in ensuring both longevity
> and
> >> > high quality for such foundational systems.
> >> >
> >> > == Initial Goals ==
> >> >
> >> >  * Move the existing codebase, website, documentation, and mailing
> lists
> >> to
> >> > Apache-hosted infrastructure
> >> >  * Work with the infrastructure team to implement and approve our code
> >> > review, build, and testing workflows in the context of the ASF
> >> >  * Incremental development and releases per Apache guidelines
> >> >
> >> > == Current Status ==
> >> >
> >> > ==== Releases ====
> >> >
> >> > Kudu has undergone one public release, tagged here
> >> > https://github.com/cloudera/kudu/tree/kudu0.5.0-release
> >> >
> >> > This initial release was not performed in the typical ASF fashion --
> no
> >> > source tarball was released, but rather only convenience binaries made
> >> > available in Cloudera’s repositories. We will adopt the ASF source
> >> release
> >> > process upon joining the incubator.
> >> >
> >> >
> >> > ==== Source ====
> >> >
> >> > Kudu’s source is currently hosted on GitHub at
> >> > https://github.com/cloudera/kudu
> >> >
> >> > This repository will be transitioned to Apache’s git hosting during
> >> > incubation.
> >> >
> >> >
> >> >
> >> > ==== Code review ====
> >> >
> >> > Kudu’s code reviews are currently public and hosted on Gerrit at
> >> > http://gerrit.cloudera.org:8080/#/q/status:open+project:kudu
> >> >
> >> > The Kudu developer community is very happy with gerrit and hopes to
> work
> >> > with the Apache Infrastructure team to figure out how we can continue
> to
> >> > use Gerrit within ASF policies.
> >> >
> >> >
> >> >
> >> > ==== Issue tracking ====
> >> >
> >> > Kudu’s bug and feature tracking is hosted on JIRA at:
> >> > https://issues.cloudera.org/projects/KUDU/summary
> >> >
> >> > This JIRA instance contains bugs and development discussion dating
> back 2
> >> > years prior to Kudu’s open source release and will provide an initial
> >> seed
> >> > for the ASF JIRA.
> >> >
> >> >
> >> >
> >> > ==== Community discussion ====
> >> >
> >> > Kudu has several public discussion forums, linked here:
> >> > http://getkudu.io/community.html
> >> >
> >> >
> >> >
> >> > ==== Build Infrastructure ====
> >> >
> >> > The Kudu Gerrit instance is configured to only allow patches to be
> >> > committed after running them through an extensive set of pre-commit
> tests
> >> > and code lints. The project currently makes use of elastic public
> cloud
> >> > resources to perform these tests. Until this point, these resources
> have
> >> > been internal to Cloudera, though we are currently investing in moving
> >> to a
> >> > publicly accessible infrastructure.
> >> >
> >> >
> >> >
> >> > ==== Development practices ====
> >> >
> >> > Given that Kudu is a persistent storage engine, the community has a
> high
> >> > quality bar for contributions to its core. We have a firm belief that
> >> high
> >> > quality is achieved through automation, not manual inspection, and
> hence
> >> > put a focus on thorough testing and build infrastructure to ensure
> that
> >> > bar. The development community also practices review-then-commit for
> all
> >> > changes to ensure that changes are accompanied by appropriate tests,
> are
> >> > well commented, etc.
> >> >
> >> > Rather than seeing these practices as barriers to contribution, we
> >> believe
> >> > that a fully automated and standardized review and testing practice
> makes
> >> > it easier for new contributors to have patches accepted. Any new
> >> developer
> >> > may post a patch to Gerrit using the same workflow as a seasoned
> >> > contributor, and the same suite of tests will be automatically run. If
> >> the
> >> > tests pass, a committer can quickly review and commit the contribution
> >> from
> >> > their web browser.
> >> >
> >> > === Meritocracy ===
> >> >
> >> > We believe strongly in meritocracy in electing committers and PMC
> >> members.
> >> > We believe that contributions can come in forms other than just code:
> for
> >> > example, one of our initial proposed committers has contributed
> solely in
> >> > the area of project documentation. We will encourage contributions and
> >> > participation of all types, and ensure that contributors are
> >> appropriately
> >> > recognized.
> >> >
> >> > === Community ===
> >> >
> >> > Though Kudu is relatively new as an open source project, it has
> already
> >> > seen promising growth in its community across several organizations:
> >> >
> >> >  * '''Cloudera''' is the original development sponsor for Kudu.
> >> >  * '''Xiaomi''' has been helping to develop and optimize Kudu for a
> new
> >> > production use case, contributing code, benchmarks, feedback, and
> >> > conference talks.
> >> >  * '''Intel''' has contributed optimizations related to their hardware
> >> > technologies.
> >> >  * '''Dropbox''' has been experimenting with Kudu for a machine
> >> monitoring
> >> > use case, and has been contributing bug reports and product feedback.
> >> >  * '''Dremio''' is working on integration with Apache Drill and
> exploring
> >> > using Kudu in a production use case.
> >> >  * Several community-built Docker images, tutorials, and blog posts
> have
> >> > sprouted up since Kudu’s release.
> >> >
> >> >
> >> >
> >> > By bringing Kudu to Apache, we hope to encourage further contribution
> >> from
> >> > the above organizations as well as to engage new users and
> contributors
> >> in
> >> > the community.
> >> >
> >> > === Core Developers ===
> >> >
> >> > Kudu was initially developed as a project at Cloudera. Most of the
> >> > contributions to date have been by developers employed by Cloudera.
> >> >
> >> >
> >> >
> >> > Many of the developers are committers or PMC members on other Apache
> >> > projects.
> >> >
> >> > === Alignment ===
> >> >
> >> > As a project in the big data ecosystem, Kudu is aligned with several
> >> other
> >> > ASF projects. Kudu includes input/output format integration with
> Apache
> >> > Hadoop, and this integration can also provide a bridge to Apache
> Spark.
> >> We
> >> > are planning to integrate with Apache Hive in the near future. We also
> >> > integrate closely with Cloudera Impala, which is also currently being
> >> > proposed for incubation. We have also scheduled a hackathon with the
> >> Apache
> >> > Drill team to work on integration with that query engine.
> >> >
> >> > == Known Risks ==
> >> >
> >> > === Orphaned Products ===
> >> >
> >> > The risk of Kudu being abandoned is low. Cloudera has invested a great
> >> deal
> >> > in the initial development of the project, and intends to grow its
> >> > investment over time as Kudu becomes a product adopted by its customer
> >> > base. Several other organizations are also experimenting with Kudu for
> >> > production use cases which would live for many years.
> >> >
> >> > === Inexperience with Open Source ===
> >> >
> >> > Kudu has been released in the open for less than two months. However,
> >> from
> >> > our very first public announcement we have been committed to
> open-source
> >> > style development:
> >> >
> >> >  * our code reviews are fully public and documented on a mailing list
> >> >  * our daily development chatter is in a public chat room
> >> >  * we send out weekly “community status” reports highlighting news and
> >> > contributions
> >> >  * we published our entire JIRA history and discuss bugs in the open
> >> >  * we published our entire Git commit history, going back three years
> (no
> >> > squashing)
> >> >
> >> >
> >> >
> >> > Several of the initial committers are experienced open source
> developers,
> >> > several being committers and/or PMC members on other ASF projects
> >> (Hadoop,
> >> > HBase, Thrift, Flume, et al). Those who are not ASF committers have
> >> > experience on non-ASF open source projects (Kiji, open-vm-tools, et
> al).
> >> >
> >> > === Homogenous Developers ===
> >> >
> >> > The initial committers are employees or former employees of Cloudera.
> >> > However, the committers are spread across multiple offices (Palo Alto,
> >> San
> >> > Francisco, Melbourne), so the team is familiar with working in a
> >> > distributed environment across varied time zones.
> >> >
> >> >
> >> >
> >> > The project has received some contributions from developers outside of
> >> > Cloudera, and is starting to attract a ''user'' community as well. We
> >> hope
> >> > to continue to encourage contributions from these developers and
> >> community
> >> > members and grow them into committers after they have had time to
> >> continue
> >> > their contributions.
> >> >
> >> > === Reliance on Salaried Developers ===
> >> >
> >> > As mentioned above, the majority of development up to this point has
> been
> >> > sponsored by Cloudera. We have seen several community users
> participate
> >> in
> >> > discussions who are hobbyists interested in distributed systems and
> >> > databases, and hope that they will continue their participation in the
> >> > project going forward.
> >> >
> >> > === Relationships with Other Apache Products ===
> >> >
> >> > Kudu is currently related to the following other Apache projects:
> >> >
> >> >  * Hadoop: Kudu provides MapReduce input/output formats for
> integration
> >> >  * Spark: Kudu integrates with Spark via the above-mentioned input
> >> formats,
> >> > and work is progressing on support for Spark Data Frames and Spark
> SQL.
> >> >
> >> >
> >> >
> >> > The Kudu team has reached out to several other Apache projects to
> start
> >> > discussing integrations, including Flume, Kafka, Hive, and Drill.
> >> >
> >> >
> >> >
> >> > Kudu integrates with Impala, which is also being proposed for
> incubation.
> >> >
> >> >
> >> >
> >> > Kudu is already collaborating on ValueVector, a proposed TLP spinning
> out
> >> > from the Apache Drill community.
> >> >
> >> >
> >> >
> >> > We look forward to continuing to integrate and collaborate with these
> >> > communities.
> >> >
> >> > === An Excessive Fascination with the Apache Brand ===
> >> >
> >> > Many of the initial committers are already experienced Apache
> committers,
> >> > and understand the true value provided by the Apache Way and the
> >> principles
> >> > of the ASF. We believe that this development and contribution model is
> >> > especially appropriate for storage products, where Apache’s
> >> > community-over-code philosophy ensures long term viability and
> >> > consensus-based participation.
> >> >
> >> > == Documentation ==
> >> >
> >> >  * Documentation is written in AsciiDoc and committed in the Kudu
> source
> >> > repository:
> >> >
> >> >  * https://github.com/cloudera/kudu/tree/master/docs
> >> >
> >> >
> >> >
> >> >  * The Kudu web site is version-controlled on the ‘gh-pages’ branch of
> >> the
> >> > above repository.
> >> >
> >> >  * A LaTeX whitepaper is also published, and the source is available
> >> within
> >> > the same repository.
> >> >  * APIs are documented within the source code as JavaDoc or C++-style
> >> > documentation comments.
> >> >  * Many design documents are stored within the source code repository
> as
> >> > text files next to the code being documented.
> >> >
> >> > == Source and Intellectual Property Submission Plan ==
> >> >
> >> > The Kudu codebase and web site is currently hosted on GitHub and will
> be
> >> > transitioned to the ASF repositories during incubation. Kudu is
> already
> >> > licensed under the Apache 2.0 license.
> >> >
> >> >
> >> >
> >> > Some portions of the code are imported from other open source projects
> >> > under the Apache 2.0, BSD, or MIT licenses, with copyrights held by
> >> authors
> >> > other than the initial committers. These copyright notices are
> maintained
> >> > in those files as well as a top-level NOTICE.txt file. We believe
> this to
> >> > be permissible under the license terms and ASF policies, and confirmed
> >> via
> >> > a recent thread on general@incubator.apache.org .
> >> >
> >> >
> >> >
> >> > The “Kudu” name is not a registered trademark, though before the
> initial
> >> > release of the project, we performed a trademark search and Cloudera’s
> >> > legal counsel deemed it acceptable in the context of a data storage
> >> engine.
> >> > There exists an unrelated open source project by the same name
> related to
> >> > deployments on Microsoft’s Azure cloud service. We have been in
> contact
> >> > with legal counsel from Microsoft and have obtained their approval for
> >> the
> >> > use of the Kudu name.
> >> >
> >> >
> >> >
> >> > Cloudera currently owns several domain names related to Kudu (
> getkudu.io
> >> ,
> >> > kududb.io, et al) which will be transferred to the ASF and
> redirected to
> >> > the official page during incubation.
> >> >
> >> >
> >> >
> >> > Portions of Kudu are protected by pending or published patents owned
> by
> >> > Cloudera. Given the protections already granted by the Apache
> License, we
> >> > do not anticipate any explicit licensing or transfer of this
> intellectual
> >> > property.
> >> >
> >> > == External Dependencies ==
> >> >
> >> > The full set of dependencies and licenses are listed in
> >> > https://github.com/cloudera/kudu/blob/master/LICENSE.txt
> >> >
> >> > and summarized here:
> >> >
> >> >  * '''Twitter Bootstrap''': Apache 2.0
> >> >  * '''d3''': BSD 3-clause
> >> >  * '''epoch JS library''': MIT
> >> >  * '''lz4''': BSD 2-clause
> >> >  * '''gflags''': BSD 3-clause
> >> >  * '''glog''': BSD 3-clause
> >> >  * '''gperftools''': BSD 3-clause
> >> >  * '''libev''': BSD 2-clause
> >> >  * '''squeasel''':MIT license
> >> >  * '''protobuf''': BSD 3-clause
> >> >  * '''rapidjson''': MIT
> >> >  * '''snappy''': BSD 3-clause
> >> >  * '''trace-viewer''': BSD 3-clause
> >> >  * '''zlib''': zlib license
> >> >  * '''llvm''': University of Illinois/NCSA Open Source (BSD-alike)
> >> >  * '''bitshuffle''': MIT
> >> >  * '''boost''': Boost license
> >> >  * '''curl''': MIT
> >> >  * '''libunwind''': MIT
> >> >  * '''nvml''': BSD 3-clause
> >> >  * '''cyrus-sasl''': Cyrus SASL license (BSD-alike)
> >> >  * '''openssl''': OpenSSL License (BSD-alike)
> >> >
> >> >  * '''Guava''': Apache 2.0
> >> >  * '''StumbleUpon Async''': BSD
> >> >  * '''Apache Hadoop''': Apache 2.0
> >> >  * '''Apache log4j''': Apache 2.0
> >> >  * '''Netty''': Apache 2.0
> >> >  * '''slf4j''': MIT
> >> >  * '''Apache Commons''': Apache 2.0
> >> >  * '''murmur''': Apache 2.0
> >> >
> >> >
> >> > '''Build/test-only dependencies''':
> >> >
> >> >  * '''CMake''': BSD 3-clause
> >> >  * '''gcovr''': BSD 3-clause
> >> >  * '''gmock''': BSD 3-clause
> >> >  * '''Apache Maven''': Apache 2.0
> >> >  * '''JUnit''': EPL
> >> >  * '''Mockito''': MIT
> >> >
> >> > == Cryptography ==
> >> >
> >> > Kudu does not currently include any cryptography-related code.
> >> >
> >> > == Required Resources ==
> >> >
> >> > === Mailing lists ===
> >> >
> >> >  * private@kudu.incubator.apache.org (PMC)
> >> >  * commits@kudu.incubator.apache.org (git push emails)
> >> >  * issues@kudu.incubator.apache.org (JIRA issue feed)
> >> >  * dev@kudu.incubator.apache.org (Gerrit code reviews plus dev
> >> discussion)
> >> >  * user@kudu.incubator.apache.org (User questions)
> >> >
> >> >
> >> > === Repository ===
> >> >
> >> >  * git://git.apache.org/kudu
> >> >
> >> > === Gerrit ===
> >> >
> >> > We hope to continue using Gerrit for our code review and commit
> workflow.
> >> > The Kudu team has already been in contact with Jake Farrell to start
> >> > discussions on how Gerrit can fit into the ASF. We know that several
> >> other
> >> > ASF projects and podlings are also interested in Gerrit.
> >> >
> >> >
> >> >
> >> > If the Infrastructure team does not have the bandwidth to support
> Gerrit,
> >> > we will continue to support our own instance of Gerrit for Kudu, and
> make
> >> > the necessary integrations such that commits are properly
> authenticated
> >> and
> >> > maintain sufficient provenance to uphold the ASF standards (e.g. via
> the
> >> > solution adopted by the AsterixDB podling).
> >> >
> >> > == Issue Tracking ==
> >> >
> >> > We would like to import our current JIRA project into the ASF JIRA,
> such
> >> > that our historical commit messages and code comments continue to
> >> reference
> >> > the appropriate bug numbers.
> >> >
> >> > == Initial Committers ==
> >> >
> >> >  * Adar Dembo adar@cloudera.com
> >> >  * Alex Feinberg alex@strlen.net
> >> >  * Andrew Wang wang@apache.org
> >> >  * Dan Burkert dan@cloudera.com
> >> >  * David Alves dralves@apache.org
> >> >  * Jean-Daniel Cryans jdcryans@apache.org
> >> >  * Mike Percy mpercy@apache.org
> >> >  * Misty Stanley-Jones misty@apache.org
> >> >  * Todd Lipcon todd@apache.org
> >> >
> >> > The initial list of committers was seeded by listing those
> contributors
> >> who
> >> > have contributed 20 or more patches in the last 12 months, indicating
> >> that
> >> > they are active and have achieved merit through participation on the
> >> > project. We chose not to include other contributors who either have
> not
> >> yet
> >> > contributed a significant number of patches, or whose contributions
> are
> >> far
> >> > in the past and we don’t expect to be active within the ASF.
> >> >
> >> > == Affiliations ==
> >> >
> >> >  * Adar Dembo - Cloudera
> >> >  * Alex Feinberg - Forward Networks
> >> >  * Andrew Wang - Cloudera
> >> >  * Dan Burkert - Cloudera
> >> >  * David Alves - Cloudera
> >> >  * Jean-Daniel Cryans - Cloudera
> >> >  * Mike Percy - Cloudera
> >> >  * Misty Stanley-Jones - Cloudera
> >> >  * Todd Lipcon - Cloudera
> >> >
> >> > == Sponsors ==
> >> >
> >> > === Champion ===
> >> >
> >> >  * Todd Lipcon
> >> >
> >> > === Nominated Mentors ===
> >> >
> >> >  * Jake Farrell - ASF Member and Infra team member, Acquia
> >> >  * Brock Noland - ASF Member, StreamSets
> >> >  * Michael Stack - ASF Member, Cloudera
> >> >  * Jarek Jarcec Cecho - ASF Member, Cloudera
> >> >  * Chris Mattmann - ASF Member, NASA JPL and USC
> >> >  * Julien Le Dem - Incubator PMC, Dremio
> >> >  * Carl Steinbach - ASF Member, LinkedIn
> >> >
> >> > === Sponsoring Entity ===
> >> >
> >> > The Apache Incubator
> >> >
> >>
>
>
>
> --
> Best Regards, Edward J. Yoon
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>
>


-- 
Julien

Re: [VOTE] Accept Kudu into the Apache Incubator

Posted by "Edward J. Yoon" <ed...@apache.org>.

+1 (binding)

On Wed, Nov 25, 2015 at 6:26 AM, Patrick Angeles
<pa...@gmail.com> wrote:
> +1 (non-binding)
>
> On Tue, Nov 24, 2015 at 4:23 PM, Jake Farrell <jf...@apache.org> wrote:
>
>> +1 (binding)
>>
>> -Jake
>>
>> On Tue, Nov 24, 2015 at 2:32 PM, Todd Lipcon <to...@apache.org> wrote:
>>
>> > Hi all,
>> >
>> > Discussion on the [DISCUSS] thread seems to have wound down, so I'd like
>> to
>> > call a VOTE on acceptance of Kudu into the ASF Incubator. The proposal is
>> > pasted below and also available on the wiki at:
>> > https://wiki.apache.org/incubator/KuduProposal
>> >
>> > The proposal is unchanged since the original version, except for the
>> > addition of Carl Steinbach as a Mentor.
>> >
>> > Please cast your votes:
>> >
>> > [] +1, accept Kudu into the Incubator
>> > [] +/-0, positive/negative non-counted expression of feelings
>> > [] -1, do not accept Kudu into the incubator (please state reasoning)
>> >
>> > Given the US holiday this week, I imagine many folks are traveling or
>> > otherwise offline. So, let's run the vote for a full week rather than the
>> > traditional 72 hours. Unless the IPMC objects to the extended voting
>> > period, the vote will close on Tues, Dec 1st at noon PST.
>> >
>> > Thanks
>> > -Todd
>> > -----
>> >
>> > = Kudu Proposal =
>> >
>> > == Abstract ==
>> >
>> > Kudu is a distributed columnar storage engine built for the Apache Hadoop
>> > ecosystem.
>> >
>> > == Proposal ==
>> >
>> > Kudu is an open source storage engine for structured data which supports
>> > low-latency random access together with efficient analytical access
>> > patterns. Kudu distributes data using horizontal partitioning and
>> > replicates each partition using Raft consensus, providing low
>> > mean-time-to-recovery and low tail latencies. Kudu is designed within the
>> > context of the Apache Hadoop ecosystem and supports many integrations
>> with
>> > other data analytics projects both inside and outside of the Apache
>> > Software Foundation.
>> >
>> >
>> >
>> > We propose to incubate Kudu as a project of the Apache Software
>> Foundation.
>> >
>> > == Background ==
>> >
>> > In recent years, explosive growth in the amount of data being generated
>> and
>> > captured by enterprises has resulted in the rapid adoption of open source
>> > technology which is able to store massive data sets at scale and at low
>> > cost. In particular, the Apache Hadoop ecosystem has become a focal point
>> > for such “big data” workloads, because many traditional open source
>> > database systems have lagged in offering a scalable alternative.
>> >
>> >
>> >
>> > Structured storage in the Hadoop ecosystem has typically been achieved in
>> > two ways: for static data sets, data is typically stored on Apache HDFS
>> > using binary data formats such as Apache Avro or Apache Parquet. However,
>> > neither HDFS nor these formats has any provision for updating individual
>> > records, or for efficient random access. Mutable data sets are typically
>> > stored in semi-structured stores such as Apache HBase or Apache
>> Cassandra.
>> > These systems allow for low-latency record-level reads and writes, but
>> lag
>> > far behind the static file formats in terms of sequential read throughput
>> > for applications such as SQL-based analytics or machine learning.
>> >
>> >
>> >
>> > Kudu is a new storage system designed and implemented from the ground up
>> to
>> > fill this gap between high-throughput sequential-access storage systems
>> > such as HDFS and low-latency random-access systems such as HBase or
>> > Cassandra. While these existing systems continue to hold advantages in
>> some
>> > situations, Kudu offers a “happy medium” alternative that can
>> dramatically
>> > simplify the architecture of many common workloads. In particular, Kudu
>> > offers a simple API for row-level inserts, updates, and deletes, while
>> > providing table scans at throughputs similar to Parquet, a commonly-used
>> > columnar format for static data.
>> >
>> >
>> >
>> > More information on Kudu can be found at the existing open source project
>> > website: http://getkudu.io and in particular in the Kudu white-paper
>> PDF:
>> > http://getkudu.io/kudu.pdf from which the above was excerpted.
>> >
>> > == Rationale ==
>> >
>> > As described above, Kudu fills an important gap in the open source
>> storage
>> > ecosystem. After our initial open source project release in September
>> 2015,
>> > we have seen a great amount of interest across a diverse set of users and
>> > companies. We believe that, as a storage system, it is critical to build
>> an
>> > equally diverse set of contributors in the development community. Our
>> > experiences as committers and PMC members on other Apache projects have
>> > taught us the value of diverse communities in ensuring both longevity and
>> > high quality for such foundational systems.
>> >
>> > == Initial Goals ==
>> >
>> >  * Move the existing codebase, website, documentation, and mailing lists
>> to
>> > Apache-hosted infrastructure
>> >  * Work with the infrastructure team to implement and approve our code
>> > review, build, and testing workflows in the context of the ASF
>> >  * Incremental development and releases per Apache guidelines
>> >
>> > == Current Status ==
>> >
>> > ==== Releases ====
>> >
>> > Kudu has undergone one public release, tagged here
>> > https://github.com/cloudera/kudu/tree/kudu0.5.0-release
>> >
>> > This initial release was not performed in the typical ASF fashion -- no
>> > source tarball was released, but rather only convenience binaries made
>> > available in Cloudera’s repositories. We will adopt the ASF source
>> release
>> > process upon joining the incubator.
>> >
>> >
>> > ==== Source ====
>> >
>> > Kudu’s source is currently hosted on GitHub at
>> > https://github.com/cloudera/kudu
>> >
>> > This repository will be transitioned to Apache’s git hosting during
>> > incubation.
>> >
>> >
>> >
>> > ==== Code review ====
>> >
>> > Kudu’s code reviews are currently public and hosted on Gerrit at
>> > http://gerrit.cloudera.org:8080/#/q/status:open+project:kudu
>> >
>> > The Kudu developer community is very happy with gerrit and hopes to work
>> > with the Apache Infrastructure team to figure out how we can continue to
>> > use Gerrit within ASF policies.
>> >
>> >
>> >
>> > ==== Issue tracking ====
>> >
>> > Kudu’s bug and feature tracking is hosted on JIRA at:
>> > https://issues.cloudera.org/projects/KUDU/summary
>> >
>> > This JIRA instance contains bugs and development discussion dating back 2
>> > years prior to Kudu’s open source release and will provide an initial
>> seed
>> > for the ASF JIRA.
>> >
>> >
>> >
>> > ==== Community discussion ====
>> >
>> > Kudu has several public discussion forums, linked here:
>> > http://getkudu.io/community.html
>> >
>> >
>> >
>> > ==== Build Infrastructure ====
>> >
>> > The Kudu Gerrit instance is configured to only allow patches to be
>> > committed after running them through an extensive set of pre-commit tests
>> > and code lints. The project currently makes use of elastic public cloud
>> > resources to perform these tests. Until this point, these resources have
>> > been internal to Cloudera, though we are currently investing in moving
>> to a
>> > publicly accessible infrastructure.
>> >
>> >
>> >
>> > ==== Development practices ====
>> >
>> > Given that Kudu is a persistent storage engine, the community has a high
>> > quality bar for contributions to its core. We have a firm belief that
>> high
>> > quality is achieved through automation, not manual inspection, and hence
>> > put a focus on thorough testing and build infrastructure to ensure that
>> > bar. The development community also practices review-then-commit for all
>> > changes to ensure that changes are accompanied by appropriate tests, are
>> > well commented, etc.
>> >
>> > Rather than seeing these practices as barriers to contribution, we
>> believe
>> > that a fully automated and standardized review and testing practice makes
>> > it easier for new contributors to have patches accepted. Any new
>> developer
>> > may post a patch to Gerrit using the same workflow as a seasoned
>> > contributor, and the same suite of tests will be automatically run. If
>> the
>> > tests pass, a committer can quickly review and commit the contribution
>> from
>> > their web browser.
>> >
>> > === Meritocracy ===
>> >
>> > We believe strongly in meritocracy in electing committers and PMC
>> members.
>> > We believe that contributions can come in forms other than just code: for
>> > example, one of our initial proposed committers has contributed solely in
>> > the area of project documentation. We will encourage contributions and
>> > participation of all types, and ensure that contributors are
>> appropriately
>> > recognized.
>> >
>> > === Community ===
>> >
>> > Though Kudu is relatively new as an open source project, it has already
>> > seen promising growth in its community across several organizations:
>> >
>> >  * '''Cloudera''' is the original development sponsor for Kudu.
>> >  * '''Xiaomi''' has been helping to develop and optimize Kudu for a new
>> > production use case, contributing code, benchmarks, feedback, and
>> > conference talks.
>> >  * '''Intel''' has contributed optimizations related to their hardware
>> > technologies.
>> >  * '''Dropbox''' has been experimenting with Kudu for a machine
>> monitoring
>> > use case, and has been contributing bug reports and product feedback.
>> >  * '''Dremio''' is working on integration with Apache Drill and exploring
>> > using Kudu in a production use case.
>> >  * Several community-built Docker images, tutorials, and blog posts have
>> > sprouted up since Kudu’s release.
>> >
>> >
>> >
>> > By bringing Kudu to Apache, we hope to encourage further contribution
>> from
>> > the above organizations as well as to engage new users and contributors
>> in
>> > the community.
>> >
>> > === Core Developers ===
>> >
>> > Kudu was initially developed as a project at Cloudera. Most of the
>> > contributions to date have been by developers employed by Cloudera.
>> >
>> >
>> >
>> > Many of the developers are committers or PMC members on other Apache
>> > projects.
>> >
>> > === Alignment ===
>> >
>> > As a project in the big data ecosystem, Kudu is aligned with several
>> other
>> > ASF projects. Kudu includes input/output format integration with Apache
>> > Hadoop, and this integration can also provide a bridge to Apache Spark.
>> We
>> > are planning to integrate with Apache Hive in the near future. We also
>> > integrate closely with Cloudera Impala, which is also currently being
>> > proposed for incubation. We have also scheduled a hackathon with the
>> Apache
>> > Drill team to work on integration with that query engine.
>> >
>> > == Known Risks ==
>> >
>> > === Orphaned Products ===
>> >
>> > The risk of Kudu being abandoned is low. Cloudera has invested a great
>> deal
>> > in the initial development of the project, and intends to grow its
>> > investment over time as Kudu becomes a product adopted by its customer
>> > base. Several other organizations are also experimenting with Kudu for
>> > production use cases which would live for many years.
>> >
>> > === Inexperience with Open Source ===
>> >
>> > Kudu has been released in the open for less than two months. However,
>> from
>> > our very first public announcement we have been committed to open-source
>> > style development:
>> >
>> >  * our code reviews are fully public and documented on a mailing list
>> >  * our daily development chatter is in a public chat room
>> >  * we send out weekly “community status” reports highlighting news and
>> > contributions
>> >  * we published our entire JIRA history and discuss bugs in the open
>> >  * we published our entire Git commit history, going back three years (no
>> > squashing)
>> >
>> >
>> >
>> > Several of the initial committers are experienced open source developers,
>> > several being committers and/or PMC members on other ASF projects
>> (Hadoop,
>> > HBase, Thrift, Flume, et al). Those who are not ASF committers have
>> > experience on non-ASF open source projects (Kiji, open-vm-tools, et al).
>> >
>> > === Homogenous Developers ===
>> >
>> > The initial committers are employees or former employees of Cloudera.
>> > However, the committers are spread across multiple offices (Palo Alto,
>> San
>> > Francisco, Melbourne), so the team is familiar with working in a
>> > distributed environment across varied time zones.
>> >
>> >
>> >
>> > The project has received some contributions from developers outside of
>> > Cloudera, and is starting to attract a ''user'' community as well. We
>> hope
>> > to continue to encourage contributions from these developers and
>> community
>> > members and grow them into committers after they have had time to
>> continue
>> > their contributions.
>> >
>> > === Reliance on Salaried Developers ===
>> >
>> > As mentioned above, the majority of development up to this point has been
>> > sponsored by Cloudera. We have seen several community users participate
>> in
>> > discussions who are hobbyists interested in distributed systems and
>> > databases, and hope that they will continue their participation in the
>> > project going forward.
>> >
>> > === Relationships with Other Apache Products ===
>> >
>> > Kudu is currently related to the following other Apache projects:
>> >
>> >  * Hadoop: Kudu provides MapReduce input/output formats for integration
>> >  * Spark: Kudu integrates with Spark via the above-mentioned input
>> formats,
>> > and work is progressing on support for Spark Data Frames and Spark SQL.
>> >
>> >
>> >
>> > The Kudu team has reached out to several other Apache projects to start
>> > discussing integrations, including Flume, Kafka, Hive, and Drill.
>> >
>> >
>> >
>> > Kudu integrates with Impala, which is also being proposed for incubation.
>> >
>> >
>> >
>> > Kudu is already collaborating on ValueVector, a proposed TLP spinning out
>> > from the Apache Drill community.
>> >
>> >
>> >
>> > We look forward to continuing to integrate and collaborate with these
>> > communities.
>> >
>> > === An Excessive Fascination with the Apache Brand ===
>> >
>> > Many of the initial committers are already experienced Apache committers,
>> > and understand the true value provided by the Apache Way and the
>> principles
>> > of the ASF. We believe that this development and contribution model is
>> > especially appropriate for storage products, where Apache’s
>> > community-over-code philosophy ensures long term viability and
>> > consensus-based participation.
>> >
>> > == Documentation ==
>> >
>> >  * Documentation is written in AsciiDoc and committed in the Kudu source
>> > repository:
>> >
>> >  * https://github.com/cloudera/kudu/tree/master/docs
>> >
>> >
>> >
>> >  * The Kudu web site is version-controlled on the ‘gh-pages’ branch of
>> the
>> > above repository.
>> >
>> >  * A LaTeX whitepaper is also published, and the source is available
>> within
>> > the same repository.
>> >  * APIs are documented within the source code as JavaDoc or C++-style
>> > documentation comments.
>> >  * Many design documents are stored within the source code repository as
>> > text files next to the code being documented.
>> >
>> > == Source and Intellectual Property Submission Plan ==
>> >
>> > The Kudu codebase and web site is currently hosted on GitHub and will be
>> > transitioned to the ASF repositories during incubation. Kudu is already
>> > licensed under the Apache 2.0 license.
>> >
>> >
>> >
>> > Some portions of the code are imported from other open source projects
>> > under the Apache 2.0, BSD, or MIT licenses, with copyrights held by
>> authors
>> > other than the initial committers. These copyright notices are maintained
>> > in those files as well as a top-level NOTICE.txt file. We believe this to
>> > be permissible under the license terms and ASF policies, and confirmed
>> via
>> > a recent thread on general@incubator.apache.org .
>> >
>> >
>> >
>> > The “Kudu” name is not a registered trademark, though before the initial
>> > release of the project, we performed a trademark search and Cloudera’s
>> > legal counsel deemed it acceptable in the context of a data storage
>> engine.
>> > There exists an unrelated open source project by the same name related to
>> > deployments on Microsoft’s Azure cloud service. We have been in contact
>> > with legal counsel from Microsoft and have obtained their approval for
>> the
>> > use of the Kudu name.
>> >
>> >
>> >
>> > Cloudera currently owns several domain names related to Kudu (getkudu.io
>> ,
>> > kududb.io, et al) which will be transferred to the ASF and redirected to
>> > the official page during incubation.
>> >
>> >
>> >
>> > Portions of Kudu are protected by pending or published patents owned by
>> > Cloudera. Given the protections already granted by the Apache License, we
>> > do not anticipate any explicit licensing or transfer of this intellectual
>> > property.
>> >
>> > == External Dependencies ==
>> >
>> > The full set of dependencies and licenses are listed in
>> > https://github.com/cloudera/kudu/blob/master/LICENSE.txt
>> >
>> > and summarized here:
>> >
>> >  * '''Twitter Bootstrap''': Apache 2.0
>> >  * '''d3''': BSD 3-clause
>> >  * '''epoch JS library''': MIT
>> >  * '''lz4''': BSD 2-clause
>> >  * '''gflags''': BSD 3-clause
>> >  * '''glog''': BSD 3-clause
>> >  * '''gperftools''': BSD 3-clause
>> >  * '''libev''': BSD 2-clause
>> >  * '''squeasel''':MIT license
>> >  * '''protobuf''': BSD 3-clause
>> >  * '''rapidjson''': MIT
>> >  * '''snappy''': BSD 3-clause
>> >  * '''trace-viewer''': BSD 3-clause
>> >  * '''zlib''': zlib license
>> >  * '''llvm''': University of Illinois/NCSA Open Source (BSD-alike)
>> >  * '''bitshuffle''': MIT
>> >  * '''boost''': Boost license
>> >  * '''curl''': MIT
>> >  * '''libunwind''': MIT
>> >  * '''nvml''': BSD 3-clause
>> >  * '''cyrus-sasl''': Cyrus SASL license (BSD-alike)
>> >  * '''openssl''': OpenSSL License (BSD-alike)
>> >
>> >  * '''Guava''': Apache 2.0
>> >  * '''StumbleUpon Async''': BSD
>> >  * '''Apache Hadoop''': Apache 2.0
>> >  * '''Apache log4j''': Apache 2.0
>> >  * '''Netty''': Apache 2.0
>> >  * '''slf4j''': MIT
>> >  * '''Apache Commons''': Apache 2.0
>> >  * '''murmur''': Apache 2.0
>> >
>> >
>> > '''Build/test-only dependencies''':
>> >
>> >  * '''CMake''': BSD 3-clause
>> >  * '''gcovr''': BSD 3-clause
>> >  * '''gmock''': BSD 3-clause
>> >  * '''Apache Maven''': Apache 2.0
>> >  * '''JUnit''': EPL
>> >  * '''Mockito''': MIT
>> >
>> > == Cryptography ==
>> >
>> > Kudu does not currently include any cryptography-related code.
>> >
>> > == Required Resources ==
>> >
>> > === Mailing lists ===
>> >
>> >  * private@kudu.incubator.apache.org (PMC)
>> >  * commits@kudu.incubator.apache.org (git push emails)
>> >  * issues@kudu.incubator.apache.org (JIRA issue feed)
>> >  * dev@kudu.incubator.apache.org (Gerrit code reviews plus dev
>> discussion)
>> >  * user@kudu.incubator.apache.org (User questions)
>> >
>> >
>> > === Repository ===
>> >
>> >  * git://git.apache.org/kudu
>> >
>> > === Gerrit ===
>> >
>> > We hope to continue using Gerrit for our code review and commit workflow.
>> > The Kudu team has already been in contact with Jake Farrell to start
>> > discussions on how Gerrit can fit into the ASF. We know that several
>> other
>> > ASF projects and podlings are also interested in Gerrit.
>> >
>> >
>> >
>> > If the Infrastructure team does not have the bandwidth to support Gerrit,
>> > we will continue to support our own instance of Gerrit for Kudu, and make
>> > the necessary integrations such that commits are properly authenticated
>> and
>> > maintain sufficient provenance to uphold the ASF standards (e.g. via the
>> > solution adopted by the AsterixDB podling).
>> >
>> > == Issue Tracking ==
>> >
>> > We would like to import our current JIRA project into the ASF JIRA, such
>> > that our historical commit messages and code comments continue to
>> reference
>> > the appropriate bug numbers.
>> >
>> > == Initial Committers ==
>> >
>> >  * Adar Dembo adar@cloudera.com
>> >  * Alex Feinberg alex@strlen.net
>> >  * Andrew Wang wang@apache.org
>> >  * Dan Burkert dan@cloudera.com
>> >  * David Alves dralves@apache.org
>> >  * Jean-Daniel Cryans jdcryans@apache.org
>> >  * Mike Percy mpercy@apache.org
>> >  * Misty Stanley-Jones misty@apache.org
>> >  * Todd Lipcon todd@apache.org
>> >
>> > The initial list of committers was seeded by listing those contributors
>> who
>> > have contributed 20 or more patches in the last 12 months, indicating
>> that
>> > they are active and have achieved merit through participation on the
>> > project. We chose not to include other contributors who either have not
>> yet
>> > contributed a significant number of patches, or whose contributions are
>> far
>> > in the past and we don’t expect to be active within the ASF.
>> >
>> > == Affiliations ==
>> >
>> >  * Adar Dembo - Cloudera
>> >  * Alex Feinberg - Forward Networks
>> >  * Andrew Wang - Cloudera
>> >  * Dan Burkert - Cloudera
>> >  * David Alves - Cloudera
>> >  * Jean-Daniel Cryans - Cloudera
>> >  * Mike Percy - Cloudera
>> >  * Misty Stanley-Jones - Cloudera
>> >  * Todd Lipcon - Cloudera
>> >
>> > == Sponsors ==
>> >
>> > === Champion ===
>> >
>> >  * Todd Lipcon
>> >
>> > === Nominated Mentors ===
>> >
>> >  * Jake Farrell - ASF Member and Infra team member, Acquia
>> >  * Brock Noland - ASF Member, StreamSets
>> >  * Michael Stack - ASF Member, Cloudera
>> >  * Jarek Jarcec Cecho - ASF Member, Cloudera
>> >  * Chris Mattmann - ASF Member, NASA JPL and USC
>> >  * Julien Le Dem - Incubator PMC, Dremio
>> >  * Carl Steinbach - ASF Member, LinkedIn
>> >
>> > === Sponsoring Entity ===
>> >
>> > The Apache Incubator
>> >
>>



-- 
Best Regards, Edward J. Yoon

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: [VOTE] Accept Kudu into the Apache Incubator

Posted by Patrick Angeles <pa...@gmail.com>.

+1 (non-binding)

On Tue, Nov 24, 2015 at 4:23 PM, Jake Farrell <jf...@apache.org> wrote:

> +1 (binding)
>
> -Jake
>
> On Tue, Nov 24, 2015 at 2:32 PM, Todd Lipcon <to...@apache.org> wrote:
>
> > Hi all,
> >
> > Discussion on the [DISCUSS] thread seems to have wound down, so I'd like
> to
> > call a VOTE on acceptance of Kudu into the ASF Incubator. The proposal is
> > pasted below and also available on the wiki at:
> > https://wiki.apache.org/incubator/KuduProposal
> >
> > The proposal is unchanged since the original version, except for the
> > addition of Carl Steinbach as a Mentor.
> >
> > Please cast your votes:
> >
> > [] +1, accept Kudu into the Incubator
> > [] +/-0, positive/negative non-counted expression of feelings
> > [] -1, do not accept Kudu into the incubator (please state reasoning)
> >
> > Given the US holiday this week, I imagine many folks are traveling or
> > otherwise offline. So, let's run the vote for a full week rather than the
> > traditional 72 hours. Unless the IPMC objects to the extended voting
> > period, the vote will close on Tues, Dec 1st at noon PST.
> >
> > Thanks
> > -Todd
> > -----
> >
> > = Kudu Proposal =
> >
> > == Abstract ==
> >
> > Kudu is a distributed columnar storage engine built for the Apache Hadoop
> > ecosystem.
> >
> > == Proposal ==
> >
> > Kudu is an open source storage engine for structured data which supports
> > low-latency random access together with efficient analytical access
> > patterns. Kudu distributes data using horizontal partitioning and
> > replicates each partition using Raft consensus, providing low
> > mean-time-to-recovery and low tail latencies. Kudu is designed within the
> > context of the Apache Hadoop ecosystem and supports many integrations
> with
> > other data analytics projects both inside and outside of the Apache
> > Software Foundation.
> >
> >
> >
> > We propose to incubate Kudu as a project of the Apache Software
> Foundation.
> >
> > == Background ==
> >
> > In recent years, explosive growth in the amount of data being generated
> and
> > captured by enterprises has resulted in the rapid adoption of open source
> > technology which is able to store massive data sets at scale and at low
> > cost. In particular, the Apache Hadoop ecosystem has become a focal point
> > for such “big data” workloads, because many traditional open source
> > database systems have lagged in offering a scalable alternative.
> >
> >
> >
> > Structured storage in the Hadoop ecosystem has typically been achieved in
> > two ways: for static data sets, data is typically stored on Apache HDFS
> > using binary data formats such as Apache Avro or Apache Parquet. However,
> > neither HDFS nor these formats has any provision for updating individual
> > records, or for efficient random access. Mutable data sets are typically
> > stored in semi-structured stores such as Apache HBase or Apache
> Cassandra.
> > These systems allow for low-latency record-level reads and writes, but
> lag
> > far behind the static file formats in terms of sequential read throughput
> > for applications such as SQL-based analytics or machine learning.
> >
> >
> >
> > Kudu is a new storage system designed and implemented from the ground up
> to
> > fill this gap between high-throughput sequential-access storage systems
> > such as HDFS and low-latency random-access systems such as HBase or
> > Cassandra. While these existing systems continue to hold advantages in
> some
> > situations, Kudu offers a “happy medium” alternative that can
> dramatically
> > simplify the architecture of many common workloads. In particular, Kudu
> > offers a simple API for row-level inserts, updates, and deletes, while
> > providing table scans at throughputs similar to Parquet, a commonly-used
> > columnar format for static data.
> >
> >
> >
> > More information on Kudu can be found at the existing open source project
> > website: http://getkudu.io and in particular in the Kudu white-paper
> PDF:
> > http://getkudu.io/kudu.pdf from which the above was excerpted.
> >
> > == Rationale ==
> >
> > As described above, Kudu fills an important gap in the open source
> storage
> > ecosystem. After our initial open source project release in September
> 2015,
> > we have seen a great amount of interest across a diverse set of users and
> > companies. We believe that, as a storage system, it is critical to build
> an
> > equally diverse set of contributors in the development community. Our
> > experiences as committers and PMC members on other Apache projects have
> > taught us the value of diverse communities in ensuring both longevity and
> > high quality for such foundational systems.
> >
> > == Initial Goals ==
> >
> >  * Move the existing codebase, website, documentation, and mailing lists
> to
> > Apache-hosted infrastructure
> >  * Work with the infrastructure team to implement and approve our code
> > review, build, and testing workflows in the context of the ASF
> >  * Incremental development and releases per Apache guidelines
> >
> > == Current Status ==
> >
> > ==== Releases ====
> >
> > Kudu has undergone one public release, tagged here
> > https://github.com/cloudera/kudu/tree/kudu0.5.0-release
> >
> > This initial release was not performed in the typical ASF fashion -- no
> > source tarball was released, but rather only convenience binaries made
> > available in Cloudera’s repositories. We will adopt the ASF source
> release
> > process upon joining the incubator.
> >
> >
> > ==== Source ====
> >
> > Kudu’s source is currently hosted on GitHub at
> > https://github.com/cloudera/kudu
> >
> > This repository will be transitioned to Apache’s git hosting during
> > incubation.
> >
> >
> >
> > ==== Code review ====
> >
> > Kudu’s code reviews are currently public and hosted on Gerrit at
> > http://gerrit.cloudera.org:8080/#/q/status:open+project:kudu
> >
> > The Kudu developer community is very happy with gerrit and hopes to work
> > with the Apache Infrastructure team to figure out how we can continue to
> > use Gerrit within ASF policies.
> >
> >
> >
> > ==== Issue tracking ====
> >
> > Kudu’s bug and feature tracking is hosted on JIRA at:
> > https://issues.cloudera.org/projects/KUDU/summary
> >
> > This JIRA instance contains bugs and development discussion dating back 2
> > years prior to Kudu’s open source release and will provide an initial
> seed
> > for the ASF JIRA.
> >
> >
> >
> > ==== Community discussion ====
> >
> > Kudu has several public discussion forums, linked here:
> > http://getkudu.io/community.html
> >
> >
> >
> > ==== Build Infrastructure ====
> >
> > The Kudu Gerrit instance is configured to only allow patches to be
> > committed after running them through an extensive set of pre-commit tests
> > and code lints. The project currently makes use of elastic public cloud
> > resources to perform these tests. Until this point, these resources have
> > been internal to Cloudera, though we are currently investing in moving
> to a
> > publicly accessible infrastructure.
> >
> >
> >
> > ==== Development practices ====
> >
> > Given that Kudu is a persistent storage engine, the community has a high
> > quality bar for contributions to its core. We have a firm belief that
> high
> > quality is achieved through automation, not manual inspection, and hence
> > put a focus on thorough testing and build infrastructure to ensure that
> > bar. The development community also practices review-then-commit for all
> > changes to ensure that changes are accompanied by appropriate tests, are
> > well commented, etc.
> >
> > Rather than seeing these practices as barriers to contribution, we
> believe
> > that a fully automated and standardized review and testing practice makes
> > it easier for new contributors to have patches accepted. Any new
> developer
> > may post a patch to Gerrit using the same workflow as a seasoned
> > contributor, and the same suite of tests will be automatically run. If
> the
> > tests pass, a committer can quickly review and commit the contribution
> from
> > their web browser.
> >
> > === Meritocracy ===
> >
> > We believe strongly in meritocracy in electing committers and PMC
> members.
> > We believe that contributions can come in forms other than just code: for
> > example, one of our initial proposed committers has contributed solely in
> > the area of project documentation. We will encourage contributions and
> > participation of all types, and ensure that contributors are
> appropriately
> > recognized.
> >
> > === Community ===
> >
> > Though Kudu is relatively new as an open source project, it has already
> > seen promising growth in its community across several organizations:
> >
> >  * '''Cloudera''' is the original development sponsor for Kudu.
> >  * '''Xiaomi''' has been helping to develop and optimize Kudu for a new
> > production use case, contributing code, benchmarks, feedback, and
> > conference talks.
> >  * '''Intel''' has contributed optimizations related to their hardware
> > technologies.
> >  * '''Dropbox''' has been experimenting with Kudu for a machine
> monitoring
> > use case, and has been contributing bug reports and product feedback.
> >  * '''Dremio''' is working on integration with Apache Drill and exploring
> > using Kudu in a production use case.
> >  * Several community-built Docker images, tutorials, and blog posts have
> > sprouted up since Kudu’s release.
> >
> >
> >
> > By bringing Kudu to Apache, we hope to encourage further contribution
> from
> > the above organizations as well as to engage new users and contributors
> in
> > the community.
> >
> > === Core Developers ===
> >
> > Kudu was initially developed as a project at Cloudera. Most of the
> > contributions to date have been by developers employed by Cloudera.
> >
> >
> >
> > Many of the developers are committers or PMC members on other Apache
> > projects.
> >
> > === Alignment ===
> >
> > As a project in the big data ecosystem, Kudu is aligned with several
> other
> > ASF projects. Kudu includes input/output format integration with Apache
> > Hadoop, and this integration can also provide a bridge to Apache Spark.
> We
> > are planning to integrate with Apache Hive in the near future. We also
> > integrate closely with Cloudera Impala, which is also currently being
> > proposed for incubation. We have also scheduled a hackathon with the
> Apache
> > Drill team to work on integration with that query engine.
> >
> > == Known Risks ==
> >
> > === Orphaned Products ===
> >
> > The risk of Kudu being abandoned is low. Cloudera has invested a great
> deal
> > in the initial development of the project, and intends to grow its
> > investment over time as Kudu becomes a product adopted by its customer
> > base. Several other organizations are also experimenting with Kudu for
> > production use cases which would live for many years.
> >
> > === Inexperience with Open Source ===
> >
> > Kudu has been released in the open for less than two months. However,
> from
> > our very first public announcement we have been committed to open-source
> > style development:
> >
> >  * our code reviews are fully public and documented on a mailing list
> >  * our daily development chatter is in a public chat room
> >  * we send out weekly “community status” reports highlighting news and
> > contributions
> >  * we published our entire JIRA history and discuss bugs in the open
> >  * we published our entire Git commit history, going back three years (no
> > squashing)
> >
> >
> >
> > Several of the initial committers are experienced open source developers,
> > several being committers and/or PMC members on other ASF projects
> (Hadoop,
> > HBase, Thrift, Flume, et al). Those who are not ASF committers have
> > experience on non-ASF open source projects (Kiji, open-vm-tools, et al).
> >
> > === Homogenous Developers ===
> >
> > The initial committers are employees or former employees of Cloudera.
> > However, the committers are spread across multiple offices (Palo Alto,
> San
> > Francisco, Melbourne), so the team is familiar with working in a
> > distributed environment across varied time zones.
> >
> >
> >
> > The project has received some contributions from developers outside of
> > Cloudera, and is starting to attract a ''user'' community as well. We
> hope
> > to continue to encourage contributions from these developers and
> community
> > members and grow them into committers after they have had time to
> continue
> > their contributions.
> >
> > === Reliance on Salaried Developers ===
> >
> > As mentioned above, the majority of development up to this point has been
> > sponsored by Cloudera. We have seen several community users participate
> in
> > discussions who are hobbyists interested in distributed systems and
> > databases, and hope that they will continue their participation in the
> > project going forward.
> >
> > === Relationships with Other Apache Products ===
> >
> > Kudu is currently related to the following other Apache projects:
> >
> >  * Hadoop: Kudu provides MapReduce input/output formats for integration
> >  * Spark: Kudu integrates with Spark via the above-mentioned input
> formats,
> > and work is progressing on support for Spark Data Frames and Spark SQL.
> >
> >
> >
> > The Kudu team has reached out to several other Apache projects to start
> > discussing integrations, including Flume, Kafka, Hive, and Drill.
> >
> >
> >
> > Kudu integrates with Impala, which is also being proposed for incubation.
> >
> >
> >
> > Kudu is already collaborating on ValueVector, a proposed TLP spinning out
> > from the Apache Drill community.
> >
> >
> >
> > We look forward to continuing to integrate and collaborate with these
> > communities.
> >
> > === An Excessive Fascination with the Apache Brand ===
> >
> > Many of the initial committers are already experienced Apache committers,
> > and understand the true value provided by the Apache Way and the
> principles
> > of the ASF. We believe that this development and contribution model is
> > especially appropriate for storage products, where Apache’s
> > community-over-code philosophy ensures long term viability and
> > consensus-based participation.
> >
> > == Documentation ==
> >
> >  * Documentation is written in AsciiDoc and committed in the Kudu source
> > repository:
> >
> >  * https://github.com/cloudera/kudu/tree/master/docs
> >
> >
> >
> >  * The Kudu web site is version-controlled on the ‘gh-pages’ branch of
> the
> > above repository.
> >
> >  * A LaTeX whitepaper is also published, and the source is available
> within
> > the same repository.
> >  * APIs are documented within the source code as JavaDoc or C++-style
> > documentation comments.
> >  * Many design documents are stored within the source code repository as
> > text files next to the code being documented.
> >
> > == Source and Intellectual Property Submission Plan ==
> >
> > The Kudu codebase and web site is currently hosted on GitHub and will be
> > transitioned to the ASF repositories during incubation. Kudu is already
> > licensed under the Apache 2.0 license.
> >
> >
> >
> > Some portions of the code are imported from other open source projects
> > under the Apache 2.0, BSD, or MIT licenses, with copyrights held by
> authors
> > other than the initial committers. These copyright notices are maintained
> > in those files as well as a top-level NOTICE.txt file. We believe this to
> > be permissible under the license terms and ASF policies, and confirmed
> via
> > a recent thread on general@incubator.apache.org .
> >
> >
> >
> > The “Kudu” name is not a registered trademark, though before the initial
> > release of the project, we performed a trademark search and Cloudera’s
> > legal counsel deemed it acceptable in the context of a data storage
> engine.
> > There exists an unrelated open source project by the same name related to
> > deployments on Microsoft’s Azure cloud service. We have been in contact
> > with legal counsel from Microsoft and have obtained their approval for
> the
> > use of the Kudu name.
> >
> >
> >
> > Cloudera currently owns several domain names related to Kudu (getkudu.io
> ,
> > kududb.io, et al) which will be transferred to the ASF and redirected to
> > the official page during incubation.
> >
> >
> >
> > Portions of Kudu are protected by pending or published patents owned by
> > Cloudera. Given the protections already granted by the Apache License, we
> > do not anticipate any explicit licensing or transfer of this intellectual
> > property.
> >
> > == External Dependencies ==
> >
> > The full set of dependencies and licenses are listed in
> > https://github.com/cloudera/kudu/blob/master/LICENSE.txt
> >
> > and summarized here:
> >
> >  * '''Twitter Bootstrap''': Apache 2.0
> >  * '''d3''': BSD 3-clause
> >  * '''epoch JS library''': MIT
> >  * '''lz4''': BSD 2-clause
> >  * '''gflags''': BSD 3-clause
> >  * '''glog''': BSD 3-clause
> >  * '''gperftools''': BSD 3-clause
> >  * '''libev''': BSD 2-clause
> >  * '''squeasel''':MIT license
> >  * '''protobuf''': BSD 3-clause
> >  * '''rapidjson''': MIT
> >  * '''snappy''': BSD 3-clause
> >  * '''trace-viewer''': BSD 3-clause
> >  * '''zlib''': zlib license
> >  * '''llvm''': University of Illinois/NCSA Open Source (BSD-alike)
> >  * '''bitshuffle''': MIT
> >  * '''boost''': Boost license
> >  * '''curl''': MIT
> >  * '''libunwind''': MIT
> >  * '''nvml''': BSD 3-clause
> >  * '''cyrus-sasl''': Cyrus SASL license (BSD-alike)
> >  * '''openssl''': OpenSSL License (BSD-alike)
> >
> >  * '''Guava''': Apache 2.0
> >  * '''StumbleUpon Async''': BSD
> >  * '''Apache Hadoop''': Apache 2.0
> >  * '''Apache log4j''': Apache 2.0
> >  * '''Netty''': Apache 2.0
> >  * '''slf4j''': MIT
> >  * '''Apache Commons''': Apache 2.0
> >  * '''murmur''': Apache 2.0
> >
> >
> > '''Build/test-only dependencies''':
> >
> >  * '''CMake''': BSD 3-clause
> >  * '''gcovr''': BSD 3-clause
> >  * '''gmock''': BSD 3-clause
> >  * '''Apache Maven''': Apache 2.0
> >  * '''JUnit''': EPL
> >  * '''Mockito''': MIT
> >
> > == Cryptography ==
> >
> > Kudu does not currently include any cryptography-related code.
> >
> > == Required Resources ==
> >
> > === Mailing lists ===
> >
> >  * private@kudu.incubator.apache.org (PMC)
> >  * commits@kudu.incubator.apache.org (git push emails)
> >  * issues@kudu.incubator.apache.org (JIRA issue feed)
> >  * dev@kudu.incubator.apache.org (Gerrit code reviews plus dev
> discussion)
> >  * user@kudu.incubator.apache.org (User questions)
> >
> >
> > === Repository ===
> >
> >  * git://git.apache.org/kudu
> >
> > === Gerrit ===
> >
> > We hope to continue using Gerrit for our code review and commit workflow.
> > The Kudu team has already been in contact with Jake Farrell to start
> > discussions on how Gerrit can fit into the ASF. We know that several
> other
> > ASF projects and podlings are also interested in Gerrit.
> >
> >
> >
> > If the Infrastructure team does not have the bandwidth to support Gerrit,
> > we will continue to support our own instance of Gerrit for Kudu, and make
> > the necessary integrations such that commits are properly authenticated
> and
> > maintain sufficient provenance to uphold the ASF standards (e.g. via the
> > solution adopted by the AsterixDB podling).
> >
> > == Issue Tracking ==
> >
> > We would like to import our current JIRA project into the ASF JIRA, such
> > that our historical commit messages and code comments continue to
> reference
> > the appropriate bug numbers.
> >
> > == Initial Committers ==
> >
> >  * Adar Dembo adar@cloudera.com
> >  * Alex Feinberg alex@strlen.net
> >  * Andrew Wang wang@apache.org
> >  * Dan Burkert dan@cloudera.com
> >  * David Alves dralves@apache.org
> >  * Jean-Daniel Cryans jdcryans@apache.org
> >  * Mike Percy mpercy@apache.org
> >  * Misty Stanley-Jones misty@apache.org
> >  * Todd Lipcon todd@apache.org
> >
> > The initial list of committers was seeded by listing those contributors
> who
> > have contributed 20 or more patches in the last 12 months, indicating
> that
> > they are active and have achieved merit through participation on the
> > project. We chose not to include other contributors who either have not
> yet
> > contributed a significant number of patches, or whose contributions are
> far
> > in the past and we don’t expect to be active within the ASF.
> >
> > == Affiliations ==
> >
> >  * Adar Dembo - Cloudera
> >  * Alex Feinberg - Forward Networks
> >  * Andrew Wang - Cloudera
> >  * Dan Burkert - Cloudera
> >  * David Alves - Cloudera
> >  * Jean-Daniel Cryans - Cloudera
> >  * Mike Percy - Cloudera
> >  * Misty Stanley-Jones - Cloudera
> >  * Todd Lipcon - Cloudera
> >
> > == Sponsors ==
> >
> > === Champion ===
> >
> >  * Todd Lipcon
> >
> > === Nominated Mentors ===
> >
> >  * Jake Farrell - ASF Member and Infra team member, Acquia
> >  * Brock Noland - ASF Member, StreamSets
> >  * Michael Stack - ASF Member, Cloudera
> >  * Jarek Jarcec Cecho - ASF Member, Cloudera
> >  * Chris Mattmann - ASF Member, NASA JPL and USC
> >  * Julien Le Dem - Incubator PMC, Dremio
> >  * Carl Steinbach - ASF Member, LinkedIn
> >
> > === Sponsoring Entity ===
> >
> > The Apache Incubator
> >
>

Re: [VOTE] Accept Kudu into the Apache Incubator

Posted by Jake Farrell <jf...@apache.org>.

+1 (binding)

-Jake

On Tue, Nov 24, 2015 at 2:32 PM, Todd Lipcon <to...@apache.org> wrote:

> Hi all,
>
> Discussion on the [DISCUSS] thread seems to have wound down, so I'd like to
> call a VOTE on acceptance of Kudu into the ASF Incubator. The proposal is
> pasted below and also available on the wiki at:
> https://wiki.apache.org/incubator/KuduProposal
>
> The proposal is unchanged since the original version, except for the
> addition of Carl Steinbach as a Mentor.
>
> Please cast your votes:
>
> [] +1, accept Kudu into the Incubator
> [] +/-0, positive/negative non-counted expression of feelings
> [] -1, do not accept Kudu into the incubator (please state reasoning)
>
> Given the US holiday this week, I imagine many folks are traveling or
> otherwise offline. So, let's run the vote for a full week rather than the
> traditional 72 hours. Unless the IPMC objects to the extended voting
> period, the vote will close on Tues, Dec 1st at noon PST.
>
> Thanks
> -Todd
> -----
>
> = Kudu Proposal =
>
> == Abstract ==
>
> Kudu is a distributed columnar storage engine built for the Apache Hadoop
> ecosystem.
>
> == Proposal ==
>
> Kudu is an open source storage engine for structured data which supports
> low-latency random access together with efficient analytical access
> patterns. Kudu distributes data using horizontal partitioning and
> replicates each partition using Raft consensus, providing low
> mean-time-to-recovery and low tail latencies. Kudu is designed within the
> context of the Apache Hadoop ecosystem and supports many integrations with
> other data analytics projects both inside and outside of the Apache
> Software Foundation.
>
>
>
> We propose to incubate Kudu as a project of the Apache Software Foundation.
>
> == Background ==
>
> In recent years, explosive growth in the amount of data being generated and
> captured by enterprises has resulted in the rapid adoption of open source
> technology which is able to store massive data sets at scale and at low
> cost. In particular, the Apache Hadoop ecosystem has become a focal point
> for such “big data” workloads, because many traditional open source
> database systems have lagged in offering a scalable alternative.
>
>
>
> Structured storage in the Hadoop ecosystem has typically been achieved in
> two ways: for static data sets, data is typically stored on Apache HDFS
> using binary data formats such as Apache Avro or Apache Parquet. However,
> neither HDFS nor these formats has any provision for updating individual
> records, or for efficient random access. Mutable data sets are typically
> stored in semi-structured stores such as Apache HBase or Apache Cassandra.
> These systems allow for low-latency record-level reads and writes, but lag
> far behind the static file formats in terms of sequential read throughput
> for applications such as SQL-based analytics or machine learning.
>
>
>
> Kudu is a new storage system designed and implemented from the ground up to
> fill this gap between high-throughput sequential-access storage systems
> such as HDFS and low-latency random-access systems such as HBase or
> Cassandra. While these existing systems continue to hold advantages in some
> situations, Kudu offers a “happy medium” alternative that can dramatically
> simplify the architecture of many common workloads. In particular, Kudu
> offers a simple API for row-level inserts, updates, and deletes, while
> providing table scans at throughputs similar to Parquet, a commonly-used
> columnar format for static data.
>
>
>
> More information on Kudu can be found at the existing open source project
> website: http://getkudu.io and in particular in the Kudu white-paper PDF:
> http://getkudu.io/kudu.pdf from which the above was excerpted.
>
> == Rationale ==
>
> As described above, Kudu fills an important gap in the open source storage
> ecosystem. After our initial open source project release in September 2015,
> we have seen a great amount of interest across a diverse set of users and
> companies. We believe that, as a storage system, it is critical to build an
> equally diverse set of contributors in the development community. Our
> experiences as committers and PMC members on other Apache projects have
> taught us the value of diverse communities in ensuring both longevity and
> high quality for such foundational systems.
>
> == Initial Goals ==
>
>  * Move the existing codebase, website, documentation, and mailing lists to
> Apache-hosted infrastructure
>  * Work with the infrastructure team to implement and approve our code
> review, build, and testing workflows in the context of the ASF
>  * Incremental development and releases per Apache guidelines
>
> == Current Status ==
>
> ==== Releases ====
>
> Kudu has undergone one public release, tagged here
> https://github.com/cloudera/kudu/tree/kudu0.5.0-release
>
> This initial release was not performed in the typical ASF fashion -- no
> source tarball was released, but rather only convenience binaries made
> available in Cloudera’s repositories. We will adopt the ASF source release
> process upon joining the incubator.
>
>
> ==== Source ====
>
> Kudu’s source is currently hosted on GitHub at
> https://github.com/cloudera/kudu
>
> This repository will be transitioned to Apache’s git hosting during
> incubation.
>
>
>
> ==== Code review ====
>
> Kudu’s code reviews are currently public and hosted on Gerrit at
> http://gerrit.cloudera.org:8080/#/q/status:open+project:kudu
>
> The Kudu developer community is very happy with gerrit and hopes to work
> with the Apache Infrastructure team to figure out how we can continue to
> use Gerrit within ASF policies.
>
>
>
> ==== Issue tracking ====
>
> Kudu’s bug and feature tracking is hosted on JIRA at:
> https://issues.cloudera.org/projects/KUDU/summary
>
> This JIRA instance contains bugs and development discussion dating back 2
> years prior to Kudu’s open source release and will provide an initial seed
> for the ASF JIRA.
>
>
>
> ==== Community discussion ====
>
> Kudu has several public discussion forums, linked here:
> http://getkudu.io/community.html
>
>
>
> ==== Build Infrastructure ====
>
> The Kudu Gerrit instance is configured to only allow patches to be
> committed after running them through an extensive set of pre-commit tests
> and code lints. The project currently makes use of elastic public cloud
> resources to perform these tests. Until this point, these resources have
> been internal to Cloudera, though we are currently investing in moving to a
> publicly accessible infrastructure.
>
>
>
> ==== Development practices ====
>
> Given that Kudu is a persistent storage engine, the community has a high
> quality bar for contributions to its core. We have a firm belief that high
> quality is achieved through automation, not manual inspection, and hence
> put a focus on thorough testing and build infrastructure to ensure that
> bar. The development community also practices review-then-commit for all
> changes to ensure that changes are accompanied by appropriate tests, are
> well commented, etc.
>
> Rather than seeing these practices as barriers to contribution, we believe
> that a fully automated and standardized review and testing practice makes
> it easier for new contributors to have patches accepted. Any new developer
> may post a patch to Gerrit using the same workflow as a seasoned
> contributor, and the same suite of tests will be automatically run. If the
> tests pass, a committer can quickly review and commit the contribution from
> their web browser.
>
> === Meritocracy ===
>
> We believe strongly in meritocracy in electing committers and PMC members.
> We believe that contributions can come in forms other than just code: for
> example, one of our initial proposed committers has contributed solely in
> the area of project documentation. We will encourage contributions and
> participation of all types, and ensure that contributors are appropriately
> recognized.
>
> === Community ===
>
> Though Kudu is relatively new as an open source project, it has already
> seen promising growth in its community across several organizations:
>
>  * '''Cloudera''' is the original development sponsor for Kudu.
>  * '''Xiaomi''' has been helping to develop and optimize Kudu for a new
> production use case, contributing code, benchmarks, feedback, and
> conference talks.
>  * '''Intel''' has contributed optimizations related to their hardware
> technologies.
>  * '''Dropbox''' has been experimenting with Kudu for a machine monitoring
> use case, and has been contributing bug reports and product feedback.
>  * '''Dremio''' is working on integration with Apache Drill and exploring
> using Kudu in a production use case.
>  * Several community-built Docker images, tutorials, and blog posts have
> sprouted up since Kudu’s release.
>
>
>
> By bringing Kudu to Apache, we hope to encourage further contribution from
> the above organizations as well as to engage new users and contributors in
> the community.
>
> === Core Developers ===
>
> Kudu was initially developed as a project at Cloudera. Most of the
> contributions to date have been by developers employed by Cloudera.
>
>
>
> Many of the developers are committers or PMC members on other Apache
> projects.
>
> === Alignment ===
>
> As a project in the big data ecosystem, Kudu is aligned with several other
> ASF projects. Kudu includes input/output format integration with Apache
> Hadoop, and this integration can also provide a bridge to Apache Spark. We
> are planning to integrate with Apache Hive in the near future. We also
> integrate closely with Cloudera Impala, which is also currently being
> proposed for incubation. We have also scheduled a hackathon with the Apache
> Drill team to work on integration with that query engine.
>
> == Known Risks ==
>
> === Orphaned Products ===
>
> The risk of Kudu being abandoned is low. Cloudera has invested a great deal
> in the initial development of the project, and intends to grow its
> investment over time as Kudu becomes a product adopted by its customer
> base. Several other organizations are also experimenting with Kudu for
> production use cases which would live for many years.
>
> === Inexperience with Open Source ===
>
> Kudu has been released in the open for less than two months. However, from
> our very first public announcement we have been committed to open-source
> style development:
>
>  * our code reviews are fully public and documented on a mailing list
>  * our daily development chatter is in a public chat room
>  * we send out weekly “community status” reports highlighting news and
> contributions
>  * we published our entire JIRA history and discuss bugs in the open
>  * we published our entire Git commit history, going back three years (no
> squashing)
>
>
>
> Several of the initial committers are experienced open source developers,
> several being committers and/or PMC members on other ASF projects (Hadoop,
> HBase, Thrift, Flume, et al). Those who are not ASF committers have
> experience on non-ASF open source projects (Kiji, open-vm-tools, et al).
>
> === Homogenous Developers ===
>
> The initial committers are employees or former employees of Cloudera.
> However, the committers are spread across multiple offices (Palo Alto, San
> Francisco, Melbourne), so the team is familiar with working in a
> distributed environment across varied time zones.
>
>
>
> The project has received some contributions from developers outside of
> Cloudera, and is starting to attract a ''user'' community as well. We hope
> to continue to encourage contributions from these developers and community
> members and grow them into committers after they have had time to continue
> their contributions.
>
> === Reliance on Salaried Developers ===
>
> As mentioned above, the majority of development up to this point has been
> sponsored by Cloudera. We have seen several community users participate in
> discussions who are hobbyists interested in distributed systems and
> databases, and hope that they will continue their participation in the
> project going forward.
>
> === Relationships with Other Apache Products ===
>
> Kudu is currently related to the following other Apache projects:
>
>  * Hadoop: Kudu provides MapReduce input/output formats for integration
>  * Spark: Kudu integrates with Spark via the above-mentioned input formats,
> and work is progressing on support for Spark Data Frames and Spark SQL.
>
>
>
> The Kudu team has reached out to several other Apache projects to start
> discussing integrations, including Flume, Kafka, Hive, and Drill.
>
>
>
> Kudu integrates with Impala, which is also being proposed for incubation.
>
>
>
> Kudu is already collaborating on ValueVector, a proposed TLP spinning out
> from the Apache Drill community.
>
>
>
> We look forward to continuing to integrate and collaborate with these
> communities.
>
> === An Excessive Fascination with the Apache Brand ===
>
> Many of the initial committers are already experienced Apache committers,
> and understand the true value provided by the Apache Way and the principles
> of the ASF. We believe that this development and contribution model is
> especially appropriate for storage products, where Apache’s
> community-over-code philosophy ensures long term viability and
> consensus-based participation.
>
> == Documentation ==
>
>  * Documentation is written in AsciiDoc and committed in the Kudu source
> repository:
>
>  * https://github.com/cloudera/kudu/tree/master/docs
>
>
>
>  * The Kudu web site is version-controlled on the ‘gh-pages’ branch of the
> above repository.
>
>  * A LaTeX whitepaper is also published, and the source is available within
> the same repository.
>  * APIs are documented within the source code as JavaDoc or C++-style
> documentation comments.
>  * Many design documents are stored within the source code repository as
> text files next to the code being documented.
>
> == Source and Intellectual Property Submission Plan ==
>
> The Kudu codebase and web site is currently hosted on GitHub and will be
> transitioned to the ASF repositories during incubation. Kudu is already
> licensed under the Apache 2.0 license.
>
>
>
> Some portions of the code are imported from other open source projects
> under the Apache 2.0, BSD, or MIT licenses, with copyrights held by authors
> other than the initial committers. These copyright notices are maintained
> in those files as well as a top-level NOTICE.txt file. We believe this to
> be permissible under the license terms and ASF policies, and confirmed via
> a recent thread on general@incubator.apache.org .
>
>
>
> The “Kudu” name is not a registered trademark, though before the initial
> release of the project, we performed a trademark search and Cloudera’s
> legal counsel deemed it acceptable in the context of a data storage engine.
> There exists an unrelated open source project by the same name related to
> deployments on Microsoft’s Azure cloud service. We have been in contact
> with legal counsel from Microsoft and have obtained their approval for the
> use of the Kudu name.
>
>
>
> Cloudera currently owns several domain names related to Kudu (getkudu.io,
> kududb.io, et al) which will be transferred to the ASF and redirected to
> the official page during incubation.
>
>
>
> Portions of Kudu are protected by pending or published patents owned by
> Cloudera. Given the protections already granted by the Apache License, we
> do not anticipate any explicit licensing or transfer of this intellectual
> property.
>
> == External Dependencies ==
>
> The full set of dependencies and licenses are listed in
> https://github.com/cloudera/kudu/blob/master/LICENSE.txt
>
> and summarized here:
>
>  * '''Twitter Bootstrap''': Apache 2.0
>  * '''d3''': BSD 3-clause
>  * '''epoch JS library''': MIT
>  * '''lz4''': BSD 2-clause
>  * '''gflags''': BSD 3-clause
>  * '''glog''': BSD 3-clause
>  * '''gperftools''': BSD 3-clause
>  * '''libev''': BSD 2-clause
>  * '''squeasel''':MIT license
>  * '''protobuf''': BSD 3-clause
>  * '''rapidjson''': MIT
>  * '''snappy''': BSD 3-clause
>  * '''trace-viewer''': BSD 3-clause
>  * '''zlib''': zlib license
>  * '''llvm''': University of Illinois/NCSA Open Source (BSD-alike)
>  * '''bitshuffle''': MIT
>  * '''boost''': Boost license
>  * '''curl''': MIT
>  * '''libunwind''': MIT
>  * '''nvml''': BSD 3-clause
>  * '''cyrus-sasl''': Cyrus SASL license (BSD-alike)
>  * '''openssl''': OpenSSL License (BSD-alike)
>
>  * '''Guava''': Apache 2.0
>  * '''StumbleUpon Async''': BSD
>  * '''Apache Hadoop''': Apache 2.0
>  * '''Apache log4j''': Apache 2.0
>  * '''Netty''': Apache 2.0
>  * '''slf4j''': MIT
>  * '''Apache Commons''': Apache 2.0
>  * '''murmur''': Apache 2.0
>
>
> '''Build/test-only dependencies''':
>
>  * '''CMake''': BSD 3-clause
>  * '''gcovr''': BSD 3-clause
>  * '''gmock''': BSD 3-clause
>  * '''Apache Maven''': Apache 2.0
>  * '''JUnit''': EPL
>  * '''Mockito''': MIT
>
> == Cryptography ==
>
> Kudu does not currently include any cryptography-related code.
>
> == Required Resources ==
>
> === Mailing lists ===
>
>  * private@kudu.incubator.apache.org (PMC)
>  * commits@kudu.incubator.apache.org (git push emails)
>  * issues@kudu.incubator.apache.org (JIRA issue feed)
>  * dev@kudu.incubator.apache.org (Gerrit code reviews plus dev discussion)
>  * user@kudu.incubator.apache.org (User questions)
>
>
> === Repository ===
>
>  * git://git.apache.org/kudu
>
> === Gerrit ===
>
> We hope to continue using Gerrit for our code review and commit workflow.
> The Kudu team has already been in contact with Jake Farrell to start
> discussions on how Gerrit can fit into the ASF. We know that several other
> ASF projects and podlings are also interested in Gerrit.
>
>
>
> If the Infrastructure team does not have the bandwidth to support Gerrit,
> we will continue to support our own instance of Gerrit for Kudu, and make
> the necessary integrations such that commits are properly authenticated and
> maintain sufficient provenance to uphold the ASF standards (e.g. via the
> solution adopted by the AsterixDB podling).
>
> == Issue Tracking ==
>
> We would like to import our current JIRA project into the ASF JIRA, such
> that our historical commit messages and code comments continue to reference
> the appropriate bug numbers.
>
> == Initial Committers ==
>
>  * Adar Dembo adar@cloudera.com
>  * Alex Feinberg alex@strlen.net
>  * Andrew Wang wang@apache.org
>  * Dan Burkert dan@cloudera.com
>  * David Alves dralves@apache.org
>  * Jean-Daniel Cryans jdcryans@apache.org
>  * Mike Percy mpercy@apache.org
>  * Misty Stanley-Jones misty@apache.org
>  * Todd Lipcon todd@apache.org
>
> The initial list of committers was seeded by listing those contributors who
> have contributed 20 or more patches in the last 12 months, indicating that
> they are active and have achieved merit through participation on the
> project. We chose not to include other contributors who either have not yet
> contributed a significant number of patches, or whose contributions are far
> in the past and we don’t expect to be active within the ASF.
>
> == Affiliations ==
>
>  * Adar Dembo - Cloudera
>  * Alex Feinberg - Forward Networks
>  * Andrew Wang - Cloudera
>  * Dan Burkert - Cloudera
>  * David Alves - Cloudera
>  * Jean-Daniel Cryans - Cloudera
>  * Mike Percy - Cloudera
>  * Misty Stanley-Jones - Cloudera
>  * Todd Lipcon - Cloudera
>
> == Sponsors ==
>
> === Champion ===
>
>  * Todd Lipcon
>
> === Nominated Mentors ===
>
>  * Jake Farrell - ASF Member and Infra team member, Acquia
>  * Brock Noland - ASF Member, StreamSets
>  * Michael Stack - ASF Member, Cloudera
>  * Jarek Jarcec Cecho - ASF Member, Cloudera
>  * Chris Mattmann - ASF Member, NASA JPL and USC
>  * Julien Le Dem - Incubator PMC, Dremio
>  * Carl Steinbach - ASF Member, LinkedIn
>
> === Sponsoring Entity ===
>
> The Apache Incubator
>

Re: [VOTE] Accept Kudu into the Apache Incubator

Posted by Mike Percy <mp...@apache.org>.

On Tue, Nov 24, 2015 at 11:32 AM, Todd Lipcon <to...@apache.org> wrote:

> Hi all,
>
> Discussion on the [DISCUSS] thread seems to have wound down, so I'd like to
> call a VOTE on acceptance of Kudu into the ASF Incubator. The proposal is
> pasted below and also available on the wiki at:
> https://wiki.apache.org/incubator/KuduProposal
>
> The proposal is unchanged since the original version, except for the
> addition of Carl Steinbach as a Mentor.
>
> Please cast your votes:
>
> [] +1, accept Kudu into the Incubator
> [] +/-0, positive/negative non-counted expression of feelings
> [] -1, do not accept Kudu into the incubator (please state reasoning)
>

+1 (non-binding)

Mike

Re: [VOTE] Accept Kudu into the Apache Incubator

Posted by Hyunsik Choi <hy...@apache.org>.

+1 (binding)

Good luck!

On Tue, Nov 24, 2015 at 5:29 PM, Sean Busbey <bu...@cloudera.com> wrote:
> +1 (binding)
>
> On Tue, Nov 24, 2015 at 1:32 PM, Todd Lipcon <to...@apache.org> wrote:
>
>> Hi all,
>>
>> Discussion on the [DISCUSS] thread seems to have wound down, so I'd like to
>> call a VOTE on acceptance of Kudu into the ASF Incubator. The proposal is
>> pasted below and also available on the wiki at:
>> https://wiki.apache.org/incubator/KuduProposal
>>
>> The proposal is unchanged since the original version, except for the
>> addition of Carl Steinbach as a Mentor.
>>
>> Please cast your votes:
>>
>> [] +1, accept Kudu into the Incubator
>> [] +/-0, positive/negative non-counted expression of feelings
>> [] -1, do not accept Kudu into the incubator (please state reasoning)
>>
>> Given the US holiday this week, I imagine many folks are traveling or
>> otherwise offline. So, let's run the vote for a full week rather than the
>> traditional 72 hours. Unless the IPMC objects to the extended voting
>> period, the vote will close on Tues, Dec 1st at noon PST.
>>
>> Thanks
>> -Todd
>> -----
>>
>> = Kudu Proposal =
>>
>> == Abstract ==
>>
>> Kudu is a distributed columnar storage engine built for the Apache Hadoop
>> ecosystem.
>>
>> == Proposal ==
>>
>> Kudu is an open source storage engine for structured data which supports
>> low-latency random access together with efficient analytical access
>> patterns. Kudu distributes data using horizontal partitioning and
>> replicates each partition using Raft consensus, providing low
>> mean-time-to-recovery and low tail latencies. Kudu is designed within the
>> context of the Apache Hadoop ecosystem and supports many integrations with
>> other data analytics projects both inside and outside of the Apache
>> Software Foundation.
>>
>>
>>
>> We propose to incubate Kudu as a project of the Apache Software Foundation.
>>
>> == Background ==
>>
>> In recent years, explosive growth in the amount of data being generated and
>> captured by enterprises has resulted in the rapid adoption of open source
>> technology which is able to store massive data sets at scale and at low
>> cost. In particular, the Apache Hadoop ecosystem has become a focal point
>> for such “big data” workloads, because many traditional open source
>> database systems have lagged in offering a scalable alternative.
>>
>>
>>
>> Structured storage in the Hadoop ecosystem has typically been achieved in
>> two ways: for static data sets, data is typically stored on Apache HDFS
>> using binary data formats such as Apache Avro or Apache Parquet. However,
>> neither HDFS nor these formats has any provision for updating individual
>> records, or for efficient random access. Mutable data sets are typically
>> stored in semi-structured stores such as Apache HBase or Apache Cassandra.
>> These systems allow for low-latency record-level reads and writes, but lag
>> far behind the static file formats in terms of sequential read throughput
>> for applications such as SQL-based analytics or machine learning.
>>
>>
>>
>> Kudu is a new storage system designed and implemented from the ground up to
>> fill this gap between high-throughput sequential-access storage systems
>> such as HDFS and low-latency random-access systems such as HBase or
>> Cassandra. While these existing systems continue to hold advantages in some
>> situations, Kudu offers a “happy medium” alternative that can dramatically
>> simplify the architecture of many common workloads. In particular, Kudu
>> offers a simple API for row-level inserts, updates, and deletes, while
>> providing table scans at throughputs similar to Parquet, a commonly-used
>> columnar format for static data.
>>
>>
>>
>> More information on Kudu can be found at the existing open source project
>> website: http://getkudu.io and in particular in the Kudu white-paper PDF:
>> http://getkudu.io/kudu.pdf from which the above was excerpted.
>>
>> == Rationale ==
>>
>> As described above, Kudu fills an important gap in the open source storage
>> ecosystem. After our initial open source project release in September 2015,
>> we have seen a great amount of interest across a diverse set of users and
>> companies. We believe that, as a storage system, it is critical to build an
>> equally diverse set of contributors in the development community. Our
>> experiences as committers and PMC members on other Apache projects have
>> taught us the value of diverse communities in ensuring both longevity and
>> high quality for such foundational systems.
>>
>> == Initial Goals ==
>>
>>  * Move the existing codebase, website, documentation, and mailing lists to
>> Apache-hosted infrastructure
>>  * Work with the infrastructure team to implement and approve our code
>> review, build, and testing workflows in the context of the ASF
>>  * Incremental development and releases per Apache guidelines
>>
>> == Current Status ==
>>
>> ==== Releases ====
>>
>> Kudu has undergone one public release, tagged here
>> https://github.com/cloudera/kudu/tree/kudu0.5.0-release
>>
>> This initial release was not performed in the typical ASF fashion -- no
>> source tarball was released, but rather only convenience binaries made
>> available in Cloudera’s repositories. We will adopt the ASF source release
>> process upon joining the incubator.
>>
>>
>> ==== Source ====
>>
>> Kudu’s source is currently hosted on GitHub at
>> https://github.com/cloudera/kudu
>>
>> This repository will be transitioned to Apache’s git hosting during
>> incubation.
>>
>>
>>
>> ==== Code review ====
>>
>> Kudu’s code reviews are currently public and hosted on Gerrit at
>> http://gerrit.cloudera.org:8080/#/q/status:open+project:kudu
>>
>> The Kudu developer community is very happy with gerrit and hopes to work
>> with the Apache Infrastructure team to figure out how we can continue to
>> use Gerrit within ASF policies.
>>
>>
>>
>> ==== Issue tracking ====
>>
>> Kudu’s bug and feature tracking is hosted on JIRA at:
>> https://issues.cloudera.org/projects/KUDU/summary
>>
>> This JIRA instance contains bugs and development discussion dating back 2
>> years prior to Kudu’s open source release and will provide an initial seed
>> for the ASF JIRA.
>>
>>
>>
>> ==== Community discussion ====
>>
>> Kudu has several public discussion forums, linked here:
>> http://getkudu.io/community.html
>>
>>
>>
>> ==== Build Infrastructure ====
>>
>> The Kudu Gerrit instance is configured to only allow patches to be
>> committed after running them through an extensive set of pre-commit tests
>> and code lints. The project currently makes use of elastic public cloud
>> resources to perform these tests. Until this point, these resources have
>> been internal to Cloudera, though we are currently investing in moving to a
>> publicly accessible infrastructure.
>>
>>
>>
>> ==== Development practices ====
>>
>> Given that Kudu is a persistent storage engine, the community has a high
>> quality bar for contributions to its core. We have a firm belief that high
>> quality is achieved through automation, not manual inspection, and hence
>> put a focus on thorough testing and build infrastructure to ensure that
>> bar. The development community also practices review-then-commit for all
>> changes to ensure that changes are accompanied by appropriate tests, are
>> well commented, etc.
>>
>> Rather than seeing these practices as barriers to contribution, we believe
>> that a fully automated and standardized review and testing practice makes
>> it easier for new contributors to have patches accepted. Any new developer
>> may post a patch to Gerrit using the same workflow as a seasoned
>> contributor, and the same suite of tests will be automatically run. If the
>> tests pass, a committer can quickly review and commit the contribution from
>> their web browser.
>>
>> === Meritocracy ===
>>
>> We believe strongly in meritocracy in electing committers and PMC members.
>> We believe that contributions can come in forms other than just code: for
>> example, one of our initial proposed committers has contributed solely in
>> the area of project documentation. We will encourage contributions and
>> participation of all types, and ensure that contributors are appropriately
>> recognized.
>>
>> === Community ===
>>
>> Though Kudu is relatively new as an open source project, it has already
>> seen promising growth in its community across several organizations:
>>
>>  * '''Cloudera''' is the original development sponsor for Kudu.
>>  * '''Xiaomi''' has been helping to develop and optimize Kudu for a new
>> production use case, contributing code, benchmarks, feedback, and
>> conference talks.
>>  * '''Intel''' has contributed optimizations related to their hardware
>> technologies.
>>  * '''Dropbox''' has been experimenting with Kudu for a machine monitoring
>> use case, and has been contributing bug reports and product feedback.
>>  * '''Dremio''' is working on integration with Apache Drill and exploring
>> using Kudu in a production use case.
>>  * Several community-built Docker images, tutorials, and blog posts have
>> sprouted up since Kudu’s release.
>>
>>
>>
>> By bringing Kudu to Apache, we hope to encourage further contribution from
>> the above organizations as well as to engage new users and contributors in
>> the community.
>>
>> === Core Developers ===
>>
>> Kudu was initially developed as a project at Cloudera. Most of the
>> contributions to date have been by developers employed by Cloudera.
>>
>>
>>
>> Many of the developers are committers or PMC members on other Apache
>> projects.
>>
>> === Alignment ===
>>
>> As a project in the big data ecosystem, Kudu is aligned with several other
>> ASF projects. Kudu includes input/output format integration with Apache
>> Hadoop, and this integration can also provide a bridge to Apache Spark. We
>> are planning to integrate with Apache Hive in the near future. We also
>> integrate closely with Cloudera Impala, which is also currently being
>> proposed for incubation. We have also scheduled a hackathon with the Apache
>> Drill team to work on integration with that query engine.
>>
>> == Known Risks ==
>>
>> === Orphaned Products ===
>>
>> The risk of Kudu being abandoned is low. Cloudera has invested a great deal
>> in the initial development of the project, and intends to grow its
>> investment over time as Kudu becomes a product adopted by its customer
>> base. Several other organizations are also experimenting with Kudu for
>> production use cases which would live for many years.
>>
>> === Inexperience with Open Source ===
>>
>> Kudu has been released in the open for less than two months. However, from
>> our very first public announcement we have been committed to open-source
>> style development:
>>
>>  * our code reviews are fully public and documented on a mailing list
>>  * our daily development chatter is in a public chat room
>>  * we send out weekly “community status” reports highlighting news and
>> contributions
>>  * we published our entire JIRA history and discuss bugs in the open
>>  * we published our entire Git commit history, going back three years (no
>> squashing)
>>
>>
>>
>> Several of the initial committers are experienced open source developers,
>> several being committers and/or PMC members on other ASF projects (Hadoop,
>> HBase, Thrift, Flume, et al). Those who are not ASF committers have
>> experience on non-ASF open source projects (Kiji, open-vm-tools, et al).
>>
>> === Homogenous Developers ===
>>
>> The initial committers are employees or former employees of Cloudera.
>> However, the committers are spread across multiple offices (Palo Alto, San
>> Francisco, Melbourne), so the team is familiar with working in a
>> distributed environment across varied time zones.
>>
>>
>>
>> The project has received some contributions from developers outside of
>> Cloudera, and is starting to attract a ''user'' community as well. We hope
>> to continue to encourage contributions from these developers and community
>> members and grow them into committers after they have had time to continue
>> their contributions.
>>
>> === Reliance on Salaried Developers ===
>>
>> As mentioned above, the majority of development up to this point has been
>> sponsored by Cloudera. We have seen several community users participate in
>> discussions who are hobbyists interested in distributed systems and
>> databases, and hope that they will continue their participation in the
>> project going forward.
>>
>> === Relationships with Other Apache Products ===
>>
>> Kudu is currently related to the following other Apache projects:
>>
>>  * Hadoop: Kudu provides MapReduce input/output formats for integration
>>  * Spark: Kudu integrates with Spark via the above-mentioned input formats,
>> and work is progressing on support for Spark Data Frames and Spark SQL.
>>
>>
>>
>> The Kudu team has reached out to several other Apache projects to start
>> discussing integrations, including Flume, Kafka, Hive, and Drill.
>>
>>
>>
>> Kudu integrates with Impala, which is also being proposed for incubation.
>>
>>
>>
>> Kudu is already collaborating on ValueVector, a proposed TLP spinning out
>> from the Apache Drill community.
>>
>>
>>
>> We look forward to continuing to integrate and collaborate with these
>> communities.
>>
>> === An Excessive Fascination with the Apache Brand ===
>>
>> Many of the initial committers are already experienced Apache committers,
>> and understand the true value provided by the Apache Way and the principles
>> of the ASF. We believe that this development and contribution model is
>> especially appropriate for storage products, where Apache’s
>> community-over-code philosophy ensures long term viability and
>> consensus-based participation.
>>
>> == Documentation ==
>>
>>  * Documentation is written in AsciiDoc and committed in the Kudu source
>> repository:
>>
>>  * https://github.com/cloudera/kudu/tree/master/docs
>>
>>
>>
>>  * The Kudu web site is version-controlled on the ‘gh-pages’ branch of the
>> above repository.
>>
>>  * A LaTeX whitepaper is also published, and the source is available within
>> the same repository.
>>  * APIs are documented within the source code as JavaDoc or C++-style
>> documentation comments.
>>  * Many design documents are stored within the source code repository as
>> text files next to the code being documented.
>>
>> == Source and Intellectual Property Submission Plan ==
>>
>> The Kudu codebase and web site is currently hosted on GitHub and will be
>> transitioned to the ASF repositories during incubation. Kudu is already
>> licensed under the Apache 2.0 license.
>>
>>
>>
>> Some portions of the code are imported from other open source projects
>> under the Apache 2.0, BSD, or MIT licenses, with copyrights held by authors
>> other than the initial committers. These copyright notices are maintained
>> in those files as well as a top-level NOTICE.txt file. We believe this to
>> be permissible under the license terms and ASF policies, and confirmed via
>> a recent thread on general@incubator.apache.org .
>>
>>
>>
>> The “Kudu” name is not a registered trademark, though before the initial
>> release of the project, we performed a trademark search and Cloudera’s
>> legal counsel deemed it acceptable in the context of a data storage engine.
>> There exists an unrelated open source project by the same name related to
>> deployments on Microsoft’s Azure cloud service. We have been in contact
>> with legal counsel from Microsoft and have obtained their approval for the
>> use of the Kudu name.
>>
>>
>>
>> Cloudera currently owns several domain names related to Kudu (getkudu.io,
>> kududb.io, et al) which will be transferred to the ASF and redirected to
>> the official page during incubation.
>>
>>
>>
>> Portions of Kudu are protected by pending or published patents owned by
>> Cloudera. Given the protections already granted by the Apache License, we
>> do not anticipate any explicit licensing or transfer of this intellectual
>> property.
>>
>> == External Dependencies ==
>>
>> The full set of dependencies and licenses are listed in
>> https://github.com/cloudera/kudu/blob/master/LICENSE.txt
>>
>> and summarized here:
>>
>>  * '''Twitter Bootstrap''': Apache 2.0
>>  * '''d3''': BSD 3-clause
>>  * '''epoch JS library''': MIT
>>  * '''lz4''': BSD 2-clause
>>  * '''gflags''': BSD 3-clause
>>  * '''glog''': BSD 3-clause
>>  * '''gperftools''': BSD 3-clause
>>  * '''libev''': BSD 2-clause
>>  * '''squeasel''':MIT license
>>  * '''protobuf''': BSD 3-clause
>>  * '''rapidjson''': MIT
>>  * '''snappy''': BSD 3-clause
>>  * '''trace-viewer''': BSD 3-clause
>>  * '''zlib''': zlib license
>>  * '''llvm''': University of Illinois/NCSA Open Source (BSD-alike)
>>  * '''bitshuffle''': MIT
>>  * '''boost''': Boost license
>>  * '''curl''': MIT
>>  * '''libunwind''': MIT
>>  * '''nvml''': BSD 3-clause
>>  * '''cyrus-sasl''': Cyrus SASL license (BSD-alike)
>>  * '''openssl''': OpenSSL License (BSD-alike)
>>
>>  * '''Guava''': Apache 2.0
>>  * '''StumbleUpon Async''': BSD
>>  * '''Apache Hadoop''': Apache 2.0
>>  * '''Apache log4j''': Apache 2.0
>>  * '''Netty''': Apache 2.0
>>  * '''slf4j''': MIT
>>  * '''Apache Commons''': Apache 2.0
>>  * '''murmur''': Apache 2.0
>>
>>
>> '''Build/test-only dependencies''':
>>
>>  * '''CMake''': BSD 3-clause
>>  * '''gcovr''': BSD 3-clause
>>  * '''gmock''': BSD 3-clause
>>  * '''Apache Maven''': Apache 2.0
>>  * '''JUnit''': EPL
>>  * '''Mockito''': MIT
>>
>> == Cryptography ==
>>
>> Kudu does not currently include any cryptography-related code.
>>
>> == Required Resources ==
>>
>> === Mailing lists ===
>>
>>  * private@kudu.incubator.apache.org (PMC)
>>  * commits@kudu.incubator.apache.org (git push emails)
>>  * issues@kudu.incubator.apache.org (JIRA issue feed)
>>  * dev@kudu.incubator.apache.org (Gerrit code reviews plus dev discussion)
>>  * user@kudu.incubator.apache.org (User questions)
>>
>>
>> === Repository ===
>>
>>  * git://git.apache.org/kudu
>>
>> === Gerrit ===
>>
>> We hope to continue using Gerrit for our code review and commit workflow.
>> The Kudu team has already been in contact with Jake Farrell to start
>> discussions on how Gerrit can fit into the ASF. We know that several other
>> ASF projects and podlings are also interested in Gerrit.
>>
>>
>>
>> If the Infrastructure team does not have the bandwidth to support Gerrit,
>> we will continue to support our own instance of Gerrit for Kudu, and make
>> the necessary integrations such that commits are properly authenticated and
>> maintain sufficient provenance to uphold the ASF standards (e.g. via the
>> solution adopted by the AsterixDB podling).
>>
>> == Issue Tracking ==
>>
>> We would like to import our current JIRA project into the ASF JIRA, such
>> that our historical commit messages and code comments continue to reference
>> the appropriate bug numbers.
>>
>> == Initial Committers ==
>>
>>  * Adar Dembo adar@cloudera.com
>>  * Alex Feinberg alex@strlen.net
>>  * Andrew Wang wang@apache.org
>>  * Dan Burkert dan@cloudera.com
>>  * David Alves dralves@apache.org
>>  * Jean-Daniel Cryans jdcryans@apache.org
>>  * Mike Percy mpercy@apache.org
>>  * Misty Stanley-Jones misty@apache.org
>>  * Todd Lipcon todd@apache.org
>>
>> The initial list of committers was seeded by listing those contributors who
>> have contributed 20 or more patches in the last 12 months, indicating that
>> they are active and have achieved merit through participation on the
>> project. We chose not to include other contributors who either have not yet
>> contributed a significant number of patches, or whose contributions are far
>> in the past and we don’t expect to be active within the ASF.
>>
>> == Affiliations ==
>>
>>  * Adar Dembo - Cloudera
>>  * Alex Feinberg - Forward Networks
>>  * Andrew Wang - Cloudera
>>  * Dan Burkert - Cloudera
>>  * David Alves - Cloudera
>>  * Jean-Daniel Cryans - Cloudera
>>  * Mike Percy - Cloudera
>>  * Misty Stanley-Jones - Cloudera
>>  * Todd Lipcon - Cloudera
>>
>> == Sponsors ==
>>
>> === Champion ===
>>
>>  * Todd Lipcon
>>
>> === Nominated Mentors ===
>>
>>  * Jake Farrell - ASF Member and Infra team member, Acquia
>>  * Brock Noland - ASF Member, StreamSets
>>  * Michael Stack - ASF Member, Cloudera
>>  * Jarek Jarcec Cecho - ASF Member, Cloudera
>>  * Chris Mattmann - ASF Member, NASA JPL and USC
>>  * Julien Le Dem - Incubator PMC, Dremio
>>  * Carl Steinbach - ASF Member, LinkedIn
>>
>> === Sponsoring Entity ===
>>
>> The Apache Incubator
>>
>
>
>
> --
> Sean

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: [VOTE] Accept Kudu into the Apache Incubator

Posted by Sean Busbey <bu...@cloudera.com>.

+1 (binding)

On Tue, Nov 24, 2015 at 1:32 PM, Todd Lipcon <to...@apache.org> wrote:

> Hi all,
>
> Discussion on the [DISCUSS] thread seems to have wound down, so I'd like to
> call a VOTE on acceptance of Kudu into the ASF Incubator. The proposal is
> pasted below and also available on the wiki at:
> https://wiki.apache.org/incubator/KuduProposal
>
> The proposal is unchanged since the original version, except for the
> addition of Carl Steinbach as a Mentor.
>
> Please cast your votes:
>
> [] +1, accept Kudu into the Incubator
> [] +/-0, positive/negative non-counted expression of feelings
> [] -1, do not accept Kudu into the incubator (please state reasoning)
>
> Given the US holiday this week, I imagine many folks are traveling or
> otherwise offline. So, let's run the vote for a full week rather than the
> traditional 72 hours. Unless the IPMC objects to the extended voting
> period, the vote will close on Tues, Dec 1st at noon PST.
>
> Thanks
> -Todd
> -----
>
> = Kudu Proposal =
>
> == Abstract ==
>
> Kudu is a distributed columnar storage engine built for the Apache Hadoop
> ecosystem.
>
> == Proposal ==
>
> Kudu is an open source storage engine for structured data which supports
> low-latency random access together with efficient analytical access
> patterns. Kudu distributes data using horizontal partitioning and
> replicates each partition using Raft consensus, providing low
> mean-time-to-recovery and low tail latencies. Kudu is designed within the
> context of the Apache Hadoop ecosystem and supports many integrations with
> other data analytics projects both inside and outside of the Apache
> Software Foundation.
>
>
>
> We propose to incubate Kudu as a project of the Apache Software Foundation.
>
> == Background ==
>
> In recent years, explosive growth in the amount of data being generated and
> captured by enterprises has resulted in the rapid adoption of open source
> technology which is able to store massive data sets at scale and at low
> cost. In particular, the Apache Hadoop ecosystem has become a focal point
> for such “big data” workloads, because many traditional open source
> database systems have lagged in offering a scalable alternative.
>
>
>
> Structured storage in the Hadoop ecosystem has typically been achieved in
> two ways: for static data sets, data is typically stored on Apache HDFS
> using binary data formats such as Apache Avro or Apache Parquet. However,
> neither HDFS nor these formats has any provision for updating individual
> records, or for efficient random access. Mutable data sets are typically
> stored in semi-structured stores such as Apache HBase or Apache Cassandra.
> These systems allow for low-latency record-level reads and writes, but lag
> far behind the static file formats in terms of sequential read throughput
> for applications such as SQL-based analytics or machine learning.
>
>
>
> Kudu is a new storage system designed and implemented from the ground up to
> fill this gap between high-throughput sequential-access storage systems
> such as HDFS and low-latency random-access systems such as HBase or
> Cassandra. While these existing systems continue to hold advantages in some
> situations, Kudu offers a “happy medium” alternative that can dramatically
> simplify the architecture of many common workloads. In particular, Kudu
> offers a simple API for row-level inserts, updates, and deletes, while
> providing table scans at throughputs similar to Parquet, a commonly-used
> columnar format for static data.
>
>
>
> More information on Kudu can be found at the existing open source project
> website: http://getkudu.io and in particular in the Kudu white-paper PDF:
> http://getkudu.io/kudu.pdf from which the above was excerpted.
>
> == Rationale ==
>
> As described above, Kudu fills an important gap in the open source storage
> ecosystem. After our initial open source project release in September 2015,
> we have seen a great amount of interest across a diverse set of users and
> companies. We believe that, as a storage system, it is critical to build an
> equally diverse set of contributors in the development community. Our
> experiences as committers and PMC members on other Apache projects have
> taught us the value of diverse communities in ensuring both longevity and
> high quality for such foundational systems.
>
> == Initial Goals ==
>
>  * Move the existing codebase, website, documentation, and mailing lists to
> Apache-hosted infrastructure
>  * Work with the infrastructure team to implement and approve our code
> review, build, and testing workflows in the context of the ASF
>  * Incremental development and releases per Apache guidelines
>
> == Current Status ==
>
> ==== Releases ====
>
> Kudu has undergone one public release, tagged here
> https://github.com/cloudera/kudu/tree/kudu0.5.0-release
>
> This initial release was not performed in the typical ASF fashion -- no
> source tarball was released, but rather only convenience binaries made
> available in Cloudera’s repositories. We will adopt the ASF source release
> process upon joining the incubator.
>
>
> ==== Source ====
>
> Kudu’s source is currently hosted on GitHub at
> https://github.com/cloudera/kudu
>
> This repository will be transitioned to Apache’s git hosting during
> incubation.
>
>
>
> ==== Code review ====
>
> Kudu’s code reviews are currently public and hosted on Gerrit at
> http://gerrit.cloudera.org:8080/#/q/status:open+project:kudu
>
> The Kudu developer community is very happy with gerrit and hopes to work
> with the Apache Infrastructure team to figure out how we can continue to
> use Gerrit within ASF policies.
>
>
>
> ==== Issue tracking ====
>
> Kudu’s bug and feature tracking is hosted on JIRA at:
> https://issues.cloudera.org/projects/KUDU/summary
>
> This JIRA instance contains bugs and development discussion dating back 2
> years prior to Kudu’s open source release and will provide an initial seed
> for the ASF JIRA.
>
>
>
> ==== Community discussion ====
>
> Kudu has several public discussion forums, linked here:
> http://getkudu.io/community.html
>
>
>
> ==== Build Infrastructure ====
>
> The Kudu Gerrit instance is configured to only allow patches to be
> committed after running them through an extensive set of pre-commit tests
> and code lints. The project currently makes use of elastic public cloud
> resources to perform these tests. Until this point, these resources have
> been internal to Cloudera, though we are currently investing in moving to a
> publicly accessible infrastructure.
>
>
>
> ==== Development practices ====
>
> Given that Kudu is a persistent storage engine, the community has a high
> quality bar for contributions to its core. We have a firm belief that high
> quality is achieved through automation, not manual inspection, and hence
> put a focus on thorough testing and build infrastructure to ensure that
> bar. The development community also practices review-then-commit for all
> changes to ensure that changes are accompanied by appropriate tests, are
> well commented, etc.
>
> Rather than seeing these practices as barriers to contribution, we believe
> that a fully automated and standardized review and testing practice makes
> it easier for new contributors to have patches accepted. Any new developer
> may post a patch to Gerrit using the same workflow as a seasoned
> contributor, and the same suite of tests will be automatically run. If the
> tests pass, a committer can quickly review and commit the contribution from
> their web browser.
>
> === Meritocracy ===
>
> We believe strongly in meritocracy in electing committers and PMC members.
> We believe that contributions can come in forms other than just code: for
> example, one of our initial proposed committers has contributed solely in
> the area of project documentation. We will encourage contributions and
> participation of all types, and ensure that contributors are appropriately
> recognized.
>
> === Community ===
>
> Though Kudu is relatively new as an open source project, it has already
> seen promising growth in its community across several organizations:
>
>  * '''Cloudera''' is the original development sponsor for Kudu.
>  * '''Xiaomi''' has been helping to develop and optimize Kudu for a new
> production use case, contributing code, benchmarks, feedback, and
> conference talks.
>  * '''Intel''' has contributed optimizations related to their hardware
> technologies.
>  * '''Dropbox''' has been experimenting with Kudu for a machine monitoring
> use case, and has been contributing bug reports and product feedback.
>  * '''Dremio''' is working on integration with Apache Drill and exploring
> using Kudu in a production use case.
>  * Several community-built Docker images, tutorials, and blog posts have
> sprouted up since Kudu’s release.
>
>
>
> By bringing Kudu to Apache, we hope to encourage further contribution from
> the above organizations as well as to engage new users and contributors in
> the community.
>
> === Core Developers ===
>
> Kudu was initially developed as a project at Cloudera. Most of the
> contributions to date have been by developers employed by Cloudera.
>
>
>
> Many of the developers are committers or PMC members on other Apache
> projects.
>
> === Alignment ===
>
> As a project in the big data ecosystem, Kudu is aligned with several other
> ASF projects. Kudu includes input/output format integration with Apache
> Hadoop, and this integration can also provide a bridge to Apache Spark. We
> are planning to integrate with Apache Hive in the near future. We also
> integrate closely with Cloudera Impala, which is also currently being
> proposed for incubation. We have also scheduled a hackathon with the Apache
> Drill team to work on integration with that query engine.
>
> == Known Risks ==
>
> === Orphaned Products ===
>
> The risk of Kudu being abandoned is low. Cloudera has invested a great deal
> in the initial development of the project, and intends to grow its
> investment over time as Kudu becomes a product adopted by its customer
> base. Several other organizations are also experimenting with Kudu for
> production use cases which would live for many years.
>
> === Inexperience with Open Source ===
>
> Kudu has been released in the open for less than two months. However, from
> our very first public announcement we have been committed to open-source
> style development:
>
>  * our code reviews are fully public and documented on a mailing list
>  * our daily development chatter is in a public chat room
>  * we send out weekly “community status” reports highlighting news and
> contributions
>  * we published our entire JIRA history and discuss bugs in the open
>  * we published our entire Git commit history, going back three years (no
> squashing)
>
>
>
> Several of the initial committers are experienced open source developers,
> several being committers and/or PMC members on other ASF projects (Hadoop,
> HBase, Thrift, Flume, et al). Those who are not ASF committers have
> experience on non-ASF open source projects (Kiji, open-vm-tools, et al).
>
> === Homogenous Developers ===
>
> The initial committers are employees or former employees of Cloudera.
> However, the committers are spread across multiple offices (Palo Alto, San
> Francisco, Melbourne), so the team is familiar with working in a
> distributed environment across varied time zones.
>
>
>
> The project has received some contributions from developers outside of
> Cloudera, and is starting to attract a ''user'' community as well. We hope
> to continue to encourage contributions from these developers and community
> members and grow them into committers after they have had time to continue
> their contributions.
>
> === Reliance on Salaried Developers ===
>
> As mentioned above, the majority of development up to this point has been
> sponsored by Cloudera. We have seen several community users participate in
> discussions who are hobbyists interested in distributed systems and
> databases, and hope that they will continue their participation in the
> project going forward.
>
> === Relationships with Other Apache Products ===
>
> Kudu is currently related to the following other Apache projects:
>
>  * Hadoop: Kudu provides MapReduce input/output formats for integration
>  * Spark: Kudu integrates with Spark via the above-mentioned input formats,
> and work is progressing on support for Spark Data Frames and Spark SQL.
>
>
>
> The Kudu team has reached out to several other Apache projects to start
> discussing integrations, including Flume, Kafka, Hive, and Drill.
>
>
>
> Kudu integrates with Impala, which is also being proposed for incubation.
>
>
>
> Kudu is already collaborating on ValueVector, a proposed TLP spinning out
> from the Apache Drill community.
>
>
>
> We look forward to continuing to integrate and collaborate with these
> communities.
>
> === An Excessive Fascination with the Apache Brand ===
>
> Many of the initial committers are already experienced Apache committers,
> and understand the true value provided by the Apache Way and the principles
> of the ASF. We believe that this development and contribution model is
> especially appropriate for storage products, where Apache’s
> community-over-code philosophy ensures long term viability and
> consensus-based participation.
>
> == Documentation ==
>
>  * Documentation is written in AsciiDoc and committed in the Kudu source
> repository:
>
>  * https://github.com/cloudera/kudu/tree/master/docs
>
>
>
>  * The Kudu web site is version-controlled on the ‘gh-pages’ branch of the
> above repository.
>
>  * A LaTeX whitepaper is also published, and the source is available within
> the same repository.
>  * APIs are documented within the source code as JavaDoc or C++-style
> documentation comments.
>  * Many design documents are stored within the source code repository as
> text files next to the code being documented.
>
> == Source and Intellectual Property Submission Plan ==
>
> The Kudu codebase and web site is currently hosted on GitHub and will be
> transitioned to the ASF repositories during incubation. Kudu is already
> licensed under the Apache 2.0 license.
>
>
>
> Some portions of the code are imported from other open source projects
> under the Apache 2.0, BSD, or MIT licenses, with copyrights held by authors
> other than the initial committers. These copyright notices are maintained
> in those files as well as a top-level NOTICE.txt file. We believe this to
> be permissible under the license terms and ASF policies, and confirmed via
> a recent thread on general@incubator.apache.org .
>
>
>
> The “Kudu” name is not a registered trademark, though before the initial
> release of the project, we performed a trademark search and Cloudera’s
> legal counsel deemed it acceptable in the context of a data storage engine.
> There exists an unrelated open source project by the same name related to
> deployments on Microsoft’s Azure cloud service. We have been in contact
> with legal counsel from Microsoft and have obtained their approval for the
> use of the Kudu name.
>
>
>
> Cloudera currently owns several domain names related to Kudu (getkudu.io,
> kududb.io, et al) which will be transferred to the ASF and redirected to
> the official page during incubation.
>
>
>
> Portions of Kudu are protected by pending or published patents owned by
> Cloudera. Given the protections already granted by the Apache License, we
> do not anticipate any explicit licensing or transfer of this intellectual
> property.
>
> == External Dependencies ==
>
> The full set of dependencies and licenses are listed in
> https://github.com/cloudera/kudu/blob/master/LICENSE.txt
>
> and summarized here:
>
>  * '''Twitter Bootstrap''': Apache 2.0
>  * '''d3''': BSD 3-clause
>  * '''epoch JS library''': MIT
>  * '''lz4''': BSD 2-clause
>  * '''gflags''': BSD 3-clause
>  * '''glog''': BSD 3-clause
>  * '''gperftools''': BSD 3-clause
>  * '''libev''': BSD 2-clause
>  * '''squeasel''':MIT license
>  * '''protobuf''': BSD 3-clause
>  * '''rapidjson''': MIT
>  * '''snappy''': BSD 3-clause
>  * '''trace-viewer''': BSD 3-clause
>  * '''zlib''': zlib license
>  * '''llvm''': University of Illinois/NCSA Open Source (BSD-alike)
>  * '''bitshuffle''': MIT
>  * '''boost''': Boost license
>  * '''curl''': MIT
>  * '''libunwind''': MIT
>  * '''nvml''': BSD 3-clause
>  * '''cyrus-sasl''': Cyrus SASL license (BSD-alike)
>  * '''openssl''': OpenSSL License (BSD-alike)
>
>  * '''Guava''': Apache 2.0
>  * '''StumbleUpon Async''': BSD
>  * '''Apache Hadoop''': Apache 2.0
>  * '''Apache log4j''': Apache 2.0
>  * '''Netty''': Apache 2.0
>  * '''slf4j''': MIT
>  * '''Apache Commons''': Apache 2.0
>  * '''murmur''': Apache 2.0
>
>
> '''Build/test-only dependencies''':
>
>  * '''CMake''': BSD 3-clause
>  * '''gcovr''': BSD 3-clause
>  * '''gmock''': BSD 3-clause
>  * '''Apache Maven''': Apache 2.0
>  * '''JUnit''': EPL
>  * '''Mockito''': MIT
>
> == Cryptography ==
>
> Kudu does not currently include any cryptography-related code.
>
> == Required Resources ==
>
> === Mailing lists ===
>
>  * private@kudu.incubator.apache.org (PMC)
>  * commits@kudu.incubator.apache.org (git push emails)
>  * issues@kudu.incubator.apache.org (JIRA issue feed)
>  * dev@kudu.incubator.apache.org (Gerrit code reviews plus dev discussion)
>  * user@kudu.incubator.apache.org (User questions)
>
>
> === Repository ===
>
>  * git://git.apache.org/kudu
>
> === Gerrit ===
>
> We hope to continue using Gerrit for our code review and commit workflow.
> The Kudu team has already been in contact with Jake Farrell to start
> discussions on how Gerrit can fit into the ASF. We know that several other
> ASF projects and podlings are also interested in Gerrit.
>
>
>
> If the Infrastructure team does not have the bandwidth to support Gerrit,
> we will continue to support our own instance of Gerrit for Kudu, and make
> the necessary integrations such that commits are properly authenticated and
> maintain sufficient provenance to uphold the ASF standards (e.g. via the
> solution adopted by the AsterixDB podling).
>
> == Issue Tracking ==
>
> We would like to import our current JIRA project into the ASF JIRA, such
> that our historical commit messages and code comments continue to reference
> the appropriate bug numbers.
>
> == Initial Committers ==
>
>  * Adar Dembo adar@cloudera.com
>  * Alex Feinberg alex@strlen.net
>  * Andrew Wang wang@apache.org
>  * Dan Burkert dan@cloudera.com
>  * David Alves dralves@apache.org
>  * Jean-Daniel Cryans jdcryans@apache.org
>  * Mike Percy mpercy@apache.org
>  * Misty Stanley-Jones misty@apache.org
>  * Todd Lipcon todd@apache.org
>
> The initial list of committers was seeded by listing those contributors who
> have contributed 20 or more patches in the last 12 months, indicating that
> they are active and have achieved merit through participation on the
> project. We chose not to include other contributors who either have not yet
> contributed a significant number of patches, or whose contributions are far
> in the past and we don’t expect to be active within the ASF.
>
> == Affiliations ==
>
>  * Adar Dembo - Cloudera
>  * Alex Feinberg - Forward Networks
>  * Andrew Wang - Cloudera
>  * Dan Burkert - Cloudera
>  * David Alves - Cloudera
>  * Jean-Daniel Cryans - Cloudera
>  * Mike Percy - Cloudera
>  * Misty Stanley-Jones - Cloudera
>  * Todd Lipcon - Cloudera
>
> == Sponsors ==
>
> === Champion ===
>
>  * Todd Lipcon
>
> === Nominated Mentors ===
>
>  * Jake Farrell - ASF Member and Infra team member, Acquia
>  * Brock Noland - ASF Member, StreamSets
>  * Michael Stack - ASF Member, Cloudera
>  * Jarek Jarcec Cecho - ASF Member, Cloudera
>  * Chris Mattmann - ASF Member, NASA JPL and USC
>  * Julien Le Dem - Incubator PMC, Dremio
>  * Carl Steinbach - ASF Member, LinkedIn
>
> === Sponsoring Entity ===
>
> The Apache Incubator
>



-- 
Sean

Re: [VOTE] Accept Kudu into the Apache Incubator

Posted by Andrei Savu <as...@apache.org>.

+1 (binding)

-- Andrei Savu

On Tue, Nov 24, 2015 at 11:32 AM, Todd Lipcon <to...@apache.org> wrote:

> Hi all,
>
> Discussion on the [DISCUSS] thread seems to have wound down, so I'd like to
> call a VOTE on acceptance of Kudu into the ASF Incubator. The proposal is
> pasted below and also available on the wiki at:
> https://wiki.apache.org/incubator/KuduProposal
>
> The proposal is unchanged since the original version, except for the
> addition of Carl Steinbach as a Mentor.
>
> Please cast your votes:
>
> [] +1, accept Kudu into the Incubator
> [] +/-0, positive/negative non-counted expression of feelings
> [] -1, do not accept Kudu into the incubator (please state reasoning)
>
> Given the US holiday this week, I imagine many folks are traveling or
> otherwise offline. So, let's run the vote for a full week rather than the
> traditional 72 hours. Unless the IPMC objects to the extended voting
> period, the vote will close on Tues, Dec 1st at noon PST.
>
> Thanks
> -Todd
> -----
>
> = Kudu Proposal =
>
> == Abstract ==
>
> Kudu is a distributed columnar storage engine built for the Apache Hadoop
> ecosystem.
>
> == Proposal ==
>
> Kudu is an open source storage engine for structured data which supports
> low-latency random access together with efficient analytical access
> patterns. Kudu distributes data using horizontal partitioning and
> replicates each partition using Raft consensus, providing low
> mean-time-to-recovery and low tail latencies. Kudu is designed within the
> context of the Apache Hadoop ecosystem and supports many integrations with
> other data analytics projects both inside and outside of the Apache
> Software Foundation.
>
>
>
> We propose to incubate Kudu as a project of the Apache Software Foundation.
>
> == Background ==
>
> In recent years, explosive growth in the amount of data being generated and
> captured by enterprises has resulted in the rapid adoption of open source
> technology which is able to store massive data sets at scale and at low
> cost. In particular, the Apache Hadoop ecosystem has become a focal point
> for such “big data” workloads, because many traditional open source
> database systems have lagged in offering a scalable alternative.
>
>
>
> Structured storage in the Hadoop ecosystem has typically been achieved in
> two ways: for static data sets, data is typically stored on Apache HDFS
> using binary data formats such as Apache Avro or Apache Parquet. However,
> neither HDFS nor these formats has any provision for updating individual
> records, or for efficient random access. Mutable data sets are typically
> stored in semi-structured stores such as Apache HBase or Apache Cassandra.
> These systems allow for low-latency record-level reads and writes, but lag
> far behind the static file formats in terms of sequential read throughput
> for applications such as SQL-based analytics or machine learning.
>
>
>
> Kudu is a new storage system designed and implemented from the ground up to
> fill this gap between high-throughput sequential-access storage systems
> such as HDFS and low-latency random-access systems such as HBase or
> Cassandra. While these existing systems continue to hold advantages in some
> situations, Kudu offers a “happy medium” alternative that can dramatically
> simplify the architecture of many common workloads. In particular, Kudu
> offers a simple API for row-level inserts, updates, and deletes, while
> providing table scans at throughputs similar to Parquet, a commonly-used
> columnar format for static data.
>
>
>
> More information on Kudu can be found at the existing open source project
> website: http://getkudu.io and in particular in the Kudu white-paper PDF:
> http://getkudu.io/kudu.pdf from which the above was excerpted.
>
> == Rationale ==
>
> As described above, Kudu fills an important gap in the open source storage
> ecosystem. After our initial open source project release in September 2015,
> we have seen a great amount of interest across a diverse set of users and
> companies. We believe that, as a storage system, it is critical to build an
> equally diverse set of contributors in the development community. Our
> experiences as committers and PMC members on other Apache projects have
> taught us the value of diverse communities in ensuring both longevity and
> high quality for such foundational systems.
>
> == Initial Goals ==
>
>  * Move the existing codebase, website, documentation, and mailing lists to
> Apache-hosted infrastructure
>  * Work with the infrastructure team to implement and approve our code
> review, build, and testing workflows in the context of the ASF
>  * Incremental development and releases per Apache guidelines
>
> == Current Status ==
>
> ==== Releases ====
>
> Kudu has undergone one public release, tagged here
> https://github.com/cloudera/kudu/tree/kudu0.5.0-release
>
> This initial release was not performed in the typical ASF fashion -- no
> source tarball was released, but rather only convenience binaries made
> available in Cloudera’s repositories. We will adopt the ASF source release
> process upon joining the incubator.
>
>
> ==== Source ====
>
> Kudu’s source is currently hosted on GitHub at
> https://github.com/cloudera/kudu
>
> This repository will be transitioned to Apache’s git hosting during
> incubation.
>
>
>
> ==== Code review ====
>
> Kudu’s code reviews are currently public and hosted on Gerrit at
> http://gerrit.cloudera.org:8080/#/q/status:open+project:kudu
>
> The Kudu developer community is very happy with gerrit and hopes to work
> with the Apache Infrastructure team to figure out how we can continue to
> use Gerrit within ASF policies.
>
>
>
> ==== Issue tracking ====
>
> Kudu’s bug and feature tracking is hosted on JIRA at:
> https://issues.cloudera.org/projects/KUDU/summary
>
> This JIRA instance contains bugs and development discussion dating back 2
> years prior to Kudu’s open source release and will provide an initial seed
> for the ASF JIRA.
>
>
>
> ==== Community discussion ====
>
> Kudu has several public discussion forums, linked here:
> http://getkudu.io/community.html
>
>
>
> ==== Build Infrastructure ====
>
> The Kudu Gerrit instance is configured to only allow patches to be
> committed after running them through an extensive set of pre-commit tests
> and code lints. The project currently makes use of elastic public cloud
> resources to perform these tests. Until this point, these resources have
> been internal to Cloudera, though we are currently investing in moving to a
> publicly accessible infrastructure.
>
>
>
> ==== Development practices ====
>
> Given that Kudu is a persistent storage engine, the community has a high
> quality bar for contributions to its core. We have a firm belief that high
> quality is achieved through automation, not manual inspection, and hence
> put a focus on thorough testing and build infrastructure to ensure that
> bar. The development community also practices review-then-commit for all
> changes to ensure that changes are accompanied by appropriate tests, are
> well commented, etc.
>
> Rather than seeing these practices as barriers to contribution, we believe
> that a fully automated and standardized review and testing practice makes
> it easier for new contributors to have patches accepted. Any new developer
> may post a patch to Gerrit using the same workflow as a seasoned
> contributor, and the same suite of tests will be automatically run. If the
> tests pass, a committer can quickly review and commit the contribution from
> their web browser.
>
> === Meritocracy ===
>
> We believe strongly in meritocracy in electing committers and PMC members.
> We believe that contributions can come in forms other than just code: for
> example, one of our initial proposed committers has contributed solely in
> the area of project documentation. We will encourage contributions and
> participation of all types, and ensure that contributors are appropriately
> recognized.
>
> === Community ===
>
> Though Kudu is relatively new as an open source project, it has already
> seen promising growth in its community across several organizations:
>
>  * '''Cloudera''' is the original development sponsor for Kudu.
>  * '''Xiaomi''' has been helping to develop and optimize Kudu for a new
> production use case, contributing code, benchmarks, feedback, and
> conference talks.
>  * '''Intel''' has contributed optimizations related to their hardware
> technologies.
>  * '''Dropbox''' has been experimenting with Kudu for a machine monitoring
> use case, and has been contributing bug reports and product feedback.
>  * '''Dremio''' is working on integration with Apache Drill and exploring
> using Kudu in a production use case.
>  * Several community-built Docker images, tutorials, and blog posts have
> sprouted up since Kudu’s release.
>
>
>
> By bringing Kudu to Apache, we hope to encourage further contribution from
> the above organizations as well as to engage new users and contributors in
> the community.
>
> === Core Developers ===
>
> Kudu was initially developed as a project at Cloudera. Most of the
> contributions to date have been by developers employed by Cloudera.
>
>
>
> Many of the developers are committers or PMC members on other Apache
> projects.
>
> === Alignment ===
>
> As a project in the big data ecosystem, Kudu is aligned with several other
> ASF projects. Kudu includes input/output format integration with Apache
> Hadoop, and this integration can also provide a bridge to Apache Spark. We
> are planning to integrate with Apache Hive in the near future. We also
> integrate closely with Cloudera Impala, which is also currently being
> proposed for incubation. We have also scheduled a hackathon with the Apache
> Drill team to work on integration with that query engine.
>
> == Known Risks ==
>
> === Orphaned Products ===
>
> The risk of Kudu being abandoned is low. Cloudera has invested a great deal
> in the initial development of the project, and intends to grow its
> investment over time as Kudu becomes a product adopted by its customer
> base. Several other organizations are also experimenting with Kudu for
> production use cases which would live for many years.
>
> === Inexperience with Open Source ===
>
> Kudu has been released in the open for less than two months. However, from
> our very first public announcement we have been committed to open-source
> style development:
>
>  * our code reviews are fully public and documented on a mailing list
>  * our daily development chatter is in a public chat room
>  * we send out weekly “community status” reports highlighting news and
> contributions
>  * we published our entire JIRA history and discuss bugs in the open
>  * we published our entire Git commit history, going back three years (no
> squashing)
>
>
>
> Several of the initial committers are experienced open source developers,
> several being committers and/or PMC members on other ASF projects (Hadoop,
> HBase, Thrift, Flume, et al). Those who are not ASF committers have
> experience on non-ASF open source projects (Kiji, open-vm-tools, et al).
>
> === Homogenous Developers ===
>
> The initial committers are employees or former employees of Cloudera.
> However, the committers are spread across multiple offices (Palo Alto, San
> Francisco, Melbourne), so the team is familiar with working in a
> distributed environment across varied time zones.
>
>
>
> The project has received some contributions from developers outside of
> Cloudera, and is starting to attract a ''user'' community as well. We hope
> to continue to encourage contributions from these developers and community
> members and grow them into committers after they have had time to continue
> their contributions.
>
> === Reliance on Salaried Developers ===
>
> As mentioned above, the majority of development up to this point has been
> sponsored by Cloudera. We have seen several community users participate in
> discussions who are hobbyists interested in distributed systems and
> databases, and hope that they will continue their participation in the
> project going forward.
>
> === Relationships with Other Apache Products ===
>
> Kudu is currently related to the following other Apache projects:
>
>  * Hadoop: Kudu provides MapReduce input/output formats for integration
>  * Spark: Kudu integrates with Spark via the above-mentioned input formats,
> and work is progressing on support for Spark Data Frames and Spark SQL.
>
>
>
> The Kudu team has reached out to several other Apache projects to start
> discussing integrations, including Flume, Kafka, Hive, and Drill.
>
>
>
> Kudu integrates with Impala, which is also being proposed for incubation.
>
>
>
> Kudu is already collaborating on ValueVector, a proposed TLP spinning out
> from the Apache Drill community.
>
>
>
> We look forward to continuing to integrate and collaborate with these
> communities.
>
> === An Excessive Fascination with the Apache Brand ===
>
> Many of the initial committers are already experienced Apache committers,
> and understand the true value provided by the Apache Way and the principles
> of the ASF. We believe that this development and contribution model is
> especially appropriate for storage products, where Apache’s
> community-over-code philosophy ensures long term viability and
> consensus-based participation.
>
> == Documentation ==
>
>  * Documentation is written in AsciiDoc and committed in the Kudu source
> repository:
>
>  * https://github.com/cloudera/kudu/tree/master/docs
>
>
>
>  * The Kudu web site is version-controlled on the ‘gh-pages’ branch of the
> above repository.
>
>  * A LaTeX whitepaper is also published, and the source is available within
> the same repository.
>  * APIs are documented within the source code as JavaDoc or C++-style
> documentation comments.
>  * Many design documents are stored within the source code repository as
> text files next to the code being documented.
>
> == Source and Intellectual Property Submission Plan ==
>
> The Kudu codebase and web site is currently hosted on GitHub and will be
> transitioned to the ASF repositories during incubation. Kudu is already
> licensed under the Apache 2.0 license.
>
>
>
> Some portions of the code are imported from other open source projects
> under the Apache 2.0, BSD, or MIT licenses, with copyrights held by authors
> other than the initial committers. These copyright notices are maintained
> in those files as well as a top-level NOTICE.txt file. We believe this to
> be permissible under the license terms and ASF policies, and confirmed via
> a recent thread on general@incubator.apache.org .
>
>
>
> The “Kudu” name is not a registered trademark, though before the initial
> release of the project, we performed a trademark search and Cloudera’s
> legal counsel deemed it acceptable in the context of a data storage engine.
> There exists an unrelated open source project by the same name related to
> deployments on Microsoft’s Azure cloud service. We have been in contact
> with legal counsel from Microsoft and have obtained their approval for the
> use of the Kudu name.
>
>
>
> Cloudera currently owns several domain names related to Kudu (getkudu.io,
> kududb.io, et al) which will be transferred to the ASF and redirected to
> the official page during incubation.
>
>
>
> Portions of Kudu are protected by pending or published patents owned by
> Cloudera. Given the protections already granted by the Apache License, we
> do not anticipate any explicit licensing or transfer of this intellectual
> property.
>
> == External Dependencies ==
>
> The full set of dependencies and licenses are listed in
> https://github.com/cloudera/kudu/blob/master/LICENSE.txt
>
> and summarized here:
>
>  * '''Twitter Bootstrap''': Apache 2.0
>  * '''d3''': BSD 3-clause
>  * '''epoch JS library''': MIT
>  * '''lz4''': BSD 2-clause
>  * '''gflags''': BSD 3-clause
>  * '''glog''': BSD 3-clause
>  * '''gperftools''': BSD 3-clause
>  * '''libev''': BSD 2-clause
>  * '''squeasel''':MIT license
>  * '''protobuf''': BSD 3-clause
>  * '''rapidjson''': MIT
>  * '''snappy''': BSD 3-clause
>  * '''trace-viewer''': BSD 3-clause
>  * '''zlib''': zlib license
>  * '''llvm''': University of Illinois/NCSA Open Source (BSD-alike)
>  * '''bitshuffle''': MIT
>  * '''boost''': Boost license
>  * '''curl''': MIT
>  * '''libunwind''': MIT
>  * '''nvml''': BSD 3-clause
>  * '''cyrus-sasl''': Cyrus SASL license (BSD-alike)
>  * '''openssl''': OpenSSL License (BSD-alike)
>
>  * '''Guava''': Apache 2.0
>  * '''StumbleUpon Async''': BSD
>  * '''Apache Hadoop''': Apache 2.0
>  * '''Apache log4j''': Apache 2.0
>  * '''Netty''': Apache 2.0
>  * '''slf4j''': MIT
>  * '''Apache Commons''': Apache 2.0
>  * '''murmur''': Apache 2.0
>
>
> '''Build/test-only dependencies''':
>
>  * '''CMake''': BSD 3-clause
>  * '''gcovr''': BSD 3-clause
>  * '''gmock''': BSD 3-clause
>  * '''Apache Maven''': Apache 2.0
>  * '''JUnit''': EPL
>  * '''Mockito''': MIT
>
> == Cryptography ==
>
> Kudu does not currently include any cryptography-related code.
>
> == Required Resources ==
>
> === Mailing lists ===
>
>  * private@kudu.incubator.apache.org (PMC)
>  * commits@kudu.incubator.apache.org (git push emails)
>  * issues@kudu.incubator.apache.org (JIRA issue feed)
>  * dev@kudu.incubator.apache.org (Gerrit code reviews plus dev discussion)
>  * user@kudu.incubator.apache.org (User questions)
>
>
> === Repository ===
>
>  * git://git.apache.org/kudu
>
> === Gerrit ===
>
> We hope to continue using Gerrit for our code review and commit workflow.
> The Kudu team has already been in contact with Jake Farrell to start
> discussions on how Gerrit can fit into the ASF. We know that several other
> ASF projects and podlings are also interested in Gerrit.
>
>
>
> If the Infrastructure team does not have the bandwidth to support Gerrit,
> we will continue to support our own instance of Gerrit for Kudu, and make
> the necessary integrations such that commits are properly authenticated and
> maintain sufficient provenance to uphold the ASF standards (e.g. via the
> solution adopted by the AsterixDB podling).
>
> == Issue Tracking ==
>
> We would like to import our current JIRA project into the ASF JIRA, such
> that our historical commit messages and code comments continue to reference
> the appropriate bug numbers.
>
> == Initial Committers ==
>
>  * Adar Dembo adar@cloudera.com
>  * Alex Feinberg alex@strlen.net
>  * Andrew Wang wang@apache.org
>  * Dan Burkert dan@cloudera.com
>  * David Alves dralves@apache.org
>  * Jean-Daniel Cryans jdcryans@apache.org
>  * Mike Percy mpercy@apache.org
>  * Misty Stanley-Jones misty@apache.org
>  * Todd Lipcon todd@apache.org
>
> The initial list of committers was seeded by listing those contributors who
> have contributed 20 or more patches in the last 12 months, indicating that
> they are active and have achieved merit through participation on the
> project. We chose not to include other contributors who either have not yet
> contributed a significant number of patches, or whose contributions are far
> in the past and we don’t expect to be active within the ASF.
>
> == Affiliations ==
>
>  * Adar Dembo - Cloudera
>  * Alex Feinberg - Forward Networks
>  * Andrew Wang - Cloudera
>  * Dan Burkert - Cloudera
>  * David Alves - Cloudera
>  * Jean-Daniel Cryans - Cloudera
>  * Mike Percy - Cloudera
>  * Misty Stanley-Jones - Cloudera
>  * Todd Lipcon - Cloudera
>
> == Sponsors ==
>
> === Champion ===
>
>  * Todd Lipcon
>
> === Nominated Mentors ===
>
>  * Jake Farrell - ASF Member and Infra team member, Acquia
>  * Brock Noland - ASF Member, StreamSets
>  * Michael Stack - ASF Member, Cloudera
>  * Jarek Jarcec Cecho - ASF Member, Cloudera
>  * Chris Mattmann - ASF Member, NASA JPL and USC
>  * Julien Le Dem - Incubator PMC, Dremio
>  * Carl Steinbach - ASF Member, LinkedIn
>
> === Sponsoring Entity ===
>
> The Apache Incubator
>

Re: [VOTE] Accept Kudu into the Apache Incubator

Posted by Julian Hyde <jh...@apache.org>.

+1

> On Nov 24, 2015, at 11:33 AM, Todd Lipcon <to...@apache.org> wrote:
> 
> On Tue, Nov 24, 2015 at 11:32 AM, Todd Lipcon <to...@apache.org> wrote:
> 
>> Hi all,
>> 
>> Discussion on the [DISCUSS] thread seems to have wound down, so I'd like
>> to call a VOTE on acceptance of Kudu into the ASF Incubator. The proposal
>> is pasted below and also available on the wiki at:
>> https://wiki.apache.org/incubator/KuduProposal
>> 
>> The proposal is unchanged since the original version, except for the
>> addition of Carl Steinbach as a Mentor.
>> 
>> Please cast your votes:
>> 
>> [] +1, accept Kudu into the Incubator
>> [] +/-0, positive/negative non-counted expression of feelings
>> [] -1, do not accept Kudu into the incubator (please state reasoning)
>> 
>> 
> I'll start the voting with my +1 (binding, assuming it's permitted to vote
> on your own proposal!)


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: [VOTE] Accept Kudu into the Apache Incubator

Posted by Jacques Nadeau <ja...@apache.org>.

+1

Great to see this coming to the foundation!

On Tue, Nov 24, 2015 at 11:33 AM, Todd Lipcon <to...@apache.org> wrote:

> On Tue, Nov 24, 2015 at 11:32 AM, Todd Lipcon <to...@apache.org> wrote:
>
> > Hi all,
> >
> > Discussion on the [DISCUSS] thread seems to have wound down, so I'd like
> > to call a VOTE on acceptance of Kudu into the ASF Incubator. The proposal
> > is pasted below and also available on the wiki at:
> > https://wiki.apache.org/incubator/KuduProposal
> >
> > The proposal is unchanged since the original version, except for the
> > addition of Carl Steinbach as a Mentor.
> >
> > Please cast your votes:
> >
> > [] +1, accept Kudu into the Incubator
> > [] +/-0, positive/negative non-counted expression of feelings
> > [] -1, do not accept Kudu into the incubator (please state reasoning)
> >
> >
> I'll start the voting with my +1 (binding, assuming it's permitted to vote
> on your own proposal!)
>

Re: [VOTE] Accept Kudu into the Apache Incubator

Posted by Todd Lipcon <to...@apache.org>.

On Tue, Nov 24, 2015 at 11:32 AM, Todd Lipcon <to...@apache.org> wrote:

> Hi all,
>
> Discussion on the [DISCUSS] thread seems to have wound down, so I'd like
> to call a VOTE on acceptance of Kudu into the ASF Incubator. The proposal
> is pasted below and also available on the wiki at:
> https://wiki.apache.org/incubator/KuduProposal
>
> The proposal is unchanged since the original version, except for the
> addition of Carl Steinbach as a Mentor.
>
> Please cast your votes:
>
> [] +1, accept Kudu into the Incubator
> [] +/-0, positive/negative non-counted expression of feelings
> [] -1, do not accept Kudu into the incubator (please state reasoning)
>
>
I'll start the voting with my +1 (binding, assuming it's permitted to vote
on your own proposal!)

Re: [VOTE] Accept Kudu into the Apache Incubator

Posted by Henry Robinson <he...@cloudera.com>.

+1 (non-binding).

Thanks,
Henry

On 27 November 2015 at 07:14, Andrew Bayer <an...@gmail.com> wrote:

> +1 binding
>
> On Thursday, November 26, 2015, Ted Dunning <te...@gmail.com> wrote:
>
> > +1 (binding)
> >
> > I think that forcing experienced community developers into one model or
> the
> > other is unnecessary. Let them in as they would like.
> >
> >
> >
> > On Wed, Nov 25, 2015 at 4:51 PM, Greg Stein <gstein@gmail.com
> > <javascript:;>> wrote:
> >
> > > -1 (binding)
> > >
> > > Starting with RTC is a poor way to attract new community members. I'd
> > like
> > > to see this community use CTR instead of mandating gerrit reviews.
> > >
> > > (ref: other-threads about lack of trust, and control issues; poor basis
> > for
> > > a community)
> > >
> > > On Tue, Nov 24, 2015 at 1:32 PM, Todd Lipcon <todd@apache.org
> > <javascript:;>> wrote:
> > >
> > > > Hi all,
> > > >
> > > > Discussion on the [DISCUSS] thread seems to have wound down, so I'd
> > like
> > > to
> > > > call a VOTE on acceptance of Kudu into the ASF Incubator. The
> proposal
> > is
> > > > pasted below and also available on the wiki at:
> > > > https://wiki.apache.org/incubator/KuduProposal
> > > >
> > > > The proposal is unchanged since the original version, except for the
> > > > addition of Carl Steinbach as a Mentor.
> > > >
> > > > Please cast your votes:
> > > >
> > > > [] +1, accept Kudu into the Incubator
> > > > [] +/-0, positive/negative non-counted expression of feelings
> > > > [] -1, do not accept Kudu into the incubator (please state reasoning)
> > > >
> > > > Given the US holiday this week, I imagine many folks are traveling or
> > > > otherwise offline. So, let's run the vote for a full week rather than
> > the
> > > > traditional 72 hours. Unless the IPMC objects to the extended voting
> > > > period, the vote will close on Tues, Dec 1st at noon PST.
> > > >
> > > > Thanks
> > > > -Todd
> > > > -----
> > > >
> > > > = Kudu Proposal =
> > > >
> > > > == Abstract ==
> > > >
> > > > Kudu is a distributed columnar storage engine built for the Apache
> > Hadoop
> > > > ecosystem.
> > > >
> > > > == Proposal ==
> > > >
> > > > Kudu is an open source storage engine for structured data which
> > supports
> > > > low-latency random access together with efficient analytical access
> > > > patterns. Kudu distributes data using horizontal partitioning and
> > > > replicates each partition using Raft consensus, providing low
> > > > mean-time-to-recovery and low tail latencies. Kudu is designed within
> > the
> > > > context of the Apache Hadoop ecosystem and supports many integrations
> > > with
> > > > other data analytics projects both inside and outside of the Apache
> > > > Software Foundation.
> > > >
> > > >
> > > >
> > > > We propose to incubate Kudu as a project of the Apache Software
> > > Foundation.
> > > >
> > > > == Background ==
> > > >
> > > > In recent years, explosive growth in the amount of data being
> generated
> > > and
> > > > captured by enterprises has resulted in the rapid adoption of open
> > source
> > > > technology which is able to store massive data sets at scale and at
> low
> > > > cost. In particular, the Apache Hadoop ecosystem has become a focal
> > point
> > > > for such “big data” workloads, because many traditional open source
> > > > database systems have lagged in offering a scalable alternative.
> > > >
> > > >
> > > >
> > > > Structured storage in the Hadoop ecosystem has typically been
> achieved
> > in
> > > > two ways: for static data sets, data is typically stored on Apache
> HDFS
> > > > using binary data formats such as Apache Avro or Apache Parquet.
> > However,
> > > > neither HDFS nor these formats has any provision for updating
> > individual
> > > > records, or for efficient random access. Mutable data sets are
> > typically
> > > > stored in semi-structured stores such as Apache HBase or Apache
> > > Cassandra.
> > > > These systems allow for low-latency record-level reads and writes,
> but
> > > lag
> > > > far behind the static file formats in terms of sequential read
> > throughput
> > > > for applications such as SQL-based analytics or machine learning.
> > > >
> > > >
> > > >
> > > > Kudu is a new storage system designed and implemented from the ground
> > up
> > > to
> > > > fill this gap between high-throughput sequential-access storage
> systems
> > > > such as HDFS and low-latency random-access systems such as HBase or
> > > > Cassandra. While these existing systems continue to hold advantages
> in
> > > some
> > > > situations, Kudu offers a “happy medium” alternative that can
> > > dramatically
> > > > simplify the architecture of many common workloads. In particular,
> Kudu
> > > > offers a simple API for row-level inserts, updates, and deletes,
> while
> > > > providing table scans at throughputs similar to Parquet, a
> > commonly-used
> > > > columnar format for static data.
> > > >
> > > >
> > > >
> > > > More information on Kudu can be found at the existing open source
> > project
> > > > website: http://getkudu.io and in particular in the Kudu white-paper
> > > PDF:
> > > > http://getkudu.io/kudu.pdf from which the above was excerpted.
> > > >
> > > > == Rationale ==
> > > >
> > > > As described above, Kudu fills an important gap in the open source
> > > storage
> > > > ecosystem. After our initial open source project release in September
> > > 2015,
> > > > we have seen a great amount of interest across a diverse set of users
> > and
> > > > companies. We believe that, as a storage system, it is critical to
> > build
> > > an
> > > > equally diverse set of contributors in the development community. Our
> > > > experiences as committers and PMC members on other Apache projects
> have
> > > > taught us the value of diverse communities in ensuring both longevity
> > and
> > > > high quality for such foundational systems.
> > > >
> > > > == Initial Goals ==
> > > >
> > > >  * Move the existing codebase, website, documentation, and mailing
> > lists
> > > to
> > > > Apache-hosted infrastructure
> > > >  * Work with the infrastructure team to implement and approve our
> code
> > > > review, build, and testing workflows in the context of the ASF
> > > >  * Incremental development and releases per Apache guidelines
> > > >
> > > > == Current Status ==
> > > >
> > > > ==== Releases ====
> > > >
> > > > Kudu has undergone one public release, tagged here
> > > > https://github.com/cloudera/kudu/tree/kudu0.5.0-release
> > > >
> > > > This initial release was not performed in the typical ASF fashion --
> no
> > > > source tarball was released, but rather only convenience binaries
> made
> > > > available in Cloudera’s repositories. We will adopt the ASF source
> > > release
> > > > process upon joining the incubator.
> > > >
> > > >
> > > > ==== Source ====
> > > >
> > > > Kudu’s source is currently hosted on GitHub at
> > > > https://github.com/cloudera/kudu
> > > >
> > > > This repository will be transitioned to Apache’s git hosting during
> > > > incubation.
> > > >
> > > >
> > > >
> > > > ==== Code review ====
> > > >
> > > > Kudu’s code reviews are currently public and hosted on Gerrit at
> > > > http://gerrit.cloudera.org:8080/#/q/status:open+project:kudu
> > > >
> > > > The Kudu developer community is very happy with gerrit and hopes to
> > work
> > > > with the Apache Infrastructure team to figure out how we can continue
> > to
> > > > use Gerrit within ASF policies.
> > > >
> > > >
> > > >
> > > > ==== Issue tracking ====
> > > >
> > > > Kudu’s bug and feature tracking is hosted on JIRA at:
> > > > https://issues.cloudera.org/projects/KUDU/summary
> > > >
> > > > This JIRA instance contains bugs and development discussion dating
> > back 2
> > > > years prior to Kudu’s open source release and will provide an initial
> > > seed
> > > > for the ASF JIRA.
> > > >
> > > >
> > > >
> > > > ==== Community discussion ====
> > > >
> > > > Kudu has several public discussion forums, linked here:
> > > > http://getkudu.io/community.html
> > > >
> > > >
> > > >
> > > > ==== Build Infrastructure ====
> > > >
> > > > The Kudu Gerrit instance is configured to only allow patches to be
> > > > committed after running them through an extensive set of pre-commit
> > tests
> > > > and code lints. The project currently makes use of elastic public
> cloud
> > > > resources to perform these tests. Until this point, these resources
> > have
> > > > been internal to Cloudera, though we are currently investing in
> moving
> > > to a
> > > > publicly accessible infrastructure.
> > > >
> > > >
> > > >
> > > > ==== Development practices ====
> > > >
> > > > Given that Kudu is a persistent storage engine, the community has a
> > high
> > > > quality bar for contributions to its core. We have a firm belief that
> > > high
> > > > quality is achieved through automation, not manual inspection, and
> > hence
> > > > put a focus on thorough testing and build infrastructure to ensure
> that
> > > > bar. The development community also practices review-then-commit for
> > all
> > > > changes to ensure that changes are accompanied by appropriate tests,
> > are
> > > > well commented, etc.
> > > >
> > > > Rather than seeing these practices as barriers to contribution, we
> > > believe
> > > > that a fully automated and standardized review and testing practice
> > makes
> > > > it easier for new contributors to have patches accepted. Any new
> > > developer
> > > > may post a patch to Gerrit using the same workflow as a seasoned
> > > > contributor, and the same suite of tests will be automatically run.
> If
> > > the
> > > > tests pass, a committer can quickly review and commit the
> contribution
> > > from
> > > > their web browser.
> > > >
> > > > === Meritocracy ===
> > > >
> > > > We believe strongly in meritocracy in electing committers and PMC
> > > members.
> > > > We believe that contributions can come in forms other than just code:
> > for
> > > > example, one of our initial proposed committers has contributed
> solely
> > in
> > > > the area of project documentation. We will encourage contributions
> and
> > > > participation of all types, and ensure that contributors are
> > > appropriately
> > > > recognized.
> > > >
> > > > === Community ===
> > > >
> > > > Though Kudu is relatively new as an open source project, it has
> already
> > > > seen promising growth in its community across several organizations:
> > > >
> > > >  * '''Cloudera''' is the original development sponsor for Kudu.
> > > >  * '''Xiaomi''' has been helping to develop and optimize Kudu for a
> new
> > > > production use case, contributing code, benchmarks, feedback, and
> > > > conference talks.
> > > >  * '''Intel''' has contributed optimizations related to their
> hardware
> > > > technologies.
> > > >  * '''Dropbox''' has been experimenting with Kudu for a machine
> > > monitoring
> > > > use case, and has been contributing bug reports and product feedback.
> > > >  * '''Dremio''' is working on integration with Apache Drill and
> > exploring
> > > > using Kudu in a production use case.
> > > >  * Several community-built Docker images, tutorials, and blog posts
> > have
> > > > sprouted up since Kudu’s release.
> > > >
> > > >
> > > >
> > > > By bringing Kudu to Apache, we hope to encourage further contribution
> > > from
> > > > the above organizations as well as to engage new users and
> contributors
> > > in
> > > > the community.
> > > >
> > > > === Core Developers ===
> > > >
> > > > Kudu was initially developed as a project at Cloudera. Most of the
> > > > contributions to date have been by developers employed by Cloudera.
> > > >
> > > >
> > > >
> > > > Many of the developers are committers or PMC members on other Apache
> > > > projects.
> > > >
> > > > === Alignment ===
> > > >
> > > > As a project in the big data ecosystem, Kudu is aligned with several
> > > other
> > > > ASF projects. Kudu includes input/output format integration with
> Apache
> > > > Hadoop, and this integration can also provide a bridge to Apache
> Spark.
> > > We
> > > > are planning to integrate with Apache Hive in the near future. We
> also
> > > > integrate closely with Cloudera Impala, which is also currently being
> > > > proposed for incubation. We have also scheduled a hackathon with the
> > > Apache
> > > > Drill team to work on integration with that query engine.
> > > >
> > > > == Known Risks ==
> > > >
> > > > === Orphaned Products ===
> > > >
> > > > The risk of Kudu being abandoned is low. Cloudera has invested a
> great
> > > deal
> > > > in the initial development of the project, and intends to grow its
> > > > investment over time as Kudu becomes a product adopted by its
> customer
> > > > base. Several other organizations are also experimenting with Kudu
> for
> > > > production use cases which would live for many years.
> > > >
> > > > === Inexperience with Open Source ===
> > > >
> > > > Kudu has been released in the open for less than two months. However,
> > > from
> > > > our very first public announcement we have been committed to
> > open-source
> > > > style development:
> > > >
> > > >  * our code reviews are fully public and documented on a mailing list
> > > >  * our daily development chatter is in a public chat room
> > > >  * we send out weekly “community status” reports highlighting news
> and
> > > > contributions
> > > >  * we published our entire JIRA history and discuss bugs in the open
> > > >  * we published our entire Git commit history, going back three years
> > (no
> > > > squashing)
> > > >
> > > >
> > > >
> > > > Several of the initial committers are experienced open source
> > developers,
> > > > several being committers and/or PMC members on other ASF projects
> > > (Hadoop,
> > > > HBase, Thrift, Flume, et al). Those who are not ASF committers have
> > > > experience on non-ASF open source projects (Kiji, open-vm-tools, et
> > al).
> > > >
> > > > === Homogenous Developers ===
> > > >
> > > > The initial committers are employees or former employees of Cloudera.
> > > > However, the committers are spread across multiple offices (Palo
> Alto,
> > > San
> > > > Francisco, Melbourne), so the team is familiar with working in a
> > > > distributed environment across varied time zones.
> > > >
> > > >
> > > >
> > > > The project has received some contributions from developers outside
> of
> > > > Cloudera, and is starting to attract a ''user'' community as well. We
> > > hope
> > > > to continue to encourage contributions from these developers and
> > > community
> > > > members and grow them into committers after they have had time to
> > > continue
> > > > their contributions.
> > > >
> > > > === Reliance on Salaried Developers ===
> > > >
> > > > As mentioned above, the majority of development up to this point has
> > been
> > > > sponsored by Cloudera. We have seen several community users
> participate
> > > in
> > > > discussions who are hobbyists interested in distributed systems and
> > > > databases, and hope that they will continue their participation in
> the
> > > > project going forward.
> > > >
> > > > === Relationships with Other Apache Products ===
> > > >
> > > > Kudu is currently related to the following other Apache projects:
> > > >
> > > >  * Hadoop: Kudu provides MapReduce input/output formats for
> integration
> > > >  * Spark: Kudu integrates with Spark via the above-mentioned input
> > > formats,
> > > > and work is progressing on support for Spark Data Frames and Spark
> SQL.
> > > >
> > > >
> > > >
> > > > The Kudu team has reached out to several other Apache projects to
> start
> > > > discussing integrations, including Flume, Kafka, Hive, and Drill.
> > > >
> > > >
> > > >
> > > > Kudu integrates with Impala, which is also being proposed for
> > incubation.
> > > >
> > > >
> > > >
> > > > Kudu is already collaborating on ValueVector, a proposed TLP spinning
> > out
> > > > from the Apache Drill community.
> > > >
> > > >
> > > >
> > > > We look forward to continuing to integrate and collaborate with these
> > > > communities.
> > > >
> > > > === An Excessive Fascination with the Apache Brand ===
> > > >
> > > > Many of the initial committers are already experienced Apache
> > committers,
> > > > and understand the true value provided by the Apache Way and the
> > > principles
> > > > of the ASF. We believe that this development and contribution model
> is
> > > > especially appropriate for storage products, where Apache’s
> > > > community-over-code philosophy ensures long term viability and
> > > > consensus-based participation.
> > > >
> > > > == Documentation ==
> > > >
> > > >  * Documentation is written in AsciiDoc and committed in the Kudu
> > source
> > > > repository:
> > > >
> > > >  * https://github.com/cloudera/kudu/tree/master/docs
> > > >
> > > >
> > > >
> > > >  * The Kudu web site is version-controlled on the ‘gh-pages’ branch
> of
> > > the
> > > > above repository.
> > > >
> > > >  * A LaTeX whitepaper is also published, and the source is available
> > > within
> > > > the same repository.
> > > >  * APIs are documented within the source code as JavaDoc or C++-style
> > > > documentation comments.
> > > >  * Many design documents are stored within the source code repository
> > as
> > > > text files next to the code being documented.
> > > >
> > > > == Source and Intellectual Property Submission Plan ==
> > > >
> > > > The Kudu codebase and web site is currently hosted on GitHub and will
> > be
> > > > transitioned to the ASF repositories during incubation. Kudu is
> already
> > > > licensed under the Apache 2.0 license.
> > > >
> > > >
> > > >
> > > > Some portions of the code are imported from other open source
> projects
> > > > under the Apache 2.0, BSD, or MIT licenses, with copyrights held by
> > > authors
> > > > other than the initial committers. These copyright notices are
> > maintained
> > > > in those files as well as a top-level NOTICE.txt file. We believe
> this
> > to
> > > > be permissible under the license terms and ASF policies, and
> confirmed
> > > via
> > > > a recent thread on general@incubator.apache.org <javascript:;> .
> > > >
> > > >
> > > >
> > > > The “Kudu” name is not a registered trademark, though before the
> > initial
> > > > release of the project, we performed a trademark search and
> Cloudera’s
> > > > legal counsel deemed it acceptable in the context of a data storage
> > > engine.
> > > > There exists an unrelated open source project by the same name
> related
> > to
> > > > deployments on Microsoft’s Azure cloud service. We have been in
> contact
> > > > with legal counsel from Microsoft and have obtained their approval
> for
> > > the
> > > > use of the Kudu name.
> > > >
> > > >
> > > >
> > > > Cloudera currently owns several domain names related to Kudu (
> > getkudu.io
> > > ,
> > > > kududb.io, et al) which will be transferred to the ASF and
> redirected
> > to
> > > > the official page during incubation.
> > > >
> > > >
> > > >
> > > > Portions of Kudu are protected by pending or published patents owned
> by
> > > > Cloudera. Given the protections already granted by the Apache
> License,
> > we
> > > > do not anticipate any explicit licensing or transfer of this
> > intellectual
> > > > property.
> > > >
> > > > == External Dependencies ==
> > > >
> > > > The full set of dependencies and licenses are listed in
> > > > https://github.com/cloudera/kudu/blob/master/LICENSE.txt
> > > >
> > > > and summarized here:
> > > >
> > > >  * '''Twitter Bootstrap''': Apache 2.0
> > > >  * '''d3''': BSD 3-clause
> > > >  * '''epoch JS library''': MIT
> > > >  * '''lz4''': BSD 2-clause
> > > >  * '''gflags''': BSD 3-clause
> > > >  * '''glog''': BSD 3-clause
> > > >  * '''gperftools''': BSD 3-clause
> > > >  * '''libev''': BSD 2-clause
> > > >  * '''squeasel''':MIT license
> > > >  * '''protobuf''': BSD 3-clause
> > > >  * '''rapidjson''': MIT
> > > >  * '''snappy''': BSD 3-clause
> > > >  * '''trace-viewer''': BSD 3-clause
> > > >  * '''zlib''': zlib license
> > > >  * '''llvm''': University of Illinois/NCSA Open Source (BSD-alike)
> > > >  * '''bitshuffle''': MIT
> > > >  * '''boost''': Boost license
> > > >  * '''curl''': MIT
> > > >  * '''libunwind''': MIT
> > > >  * '''nvml''': BSD 3-clause
> > > >  * '''cyrus-sasl''': Cyrus SASL license (BSD-alike)
> > > >  * '''openssl''': OpenSSL License (BSD-alike)
> > > >
> > > >  * '''Guava''': Apache 2.0
> > > >  * '''StumbleUpon Async''': BSD
> > > >  * '''Apache Hadoop''': Apache 2.0
> > > >  * '''Apache log4j''': Apache 2.0
> > > >  * '''Netty''': Apache 2.0
> > > >  * '''slf4j''': MIT
> > > >  * '''Apache Commons''': Apache 2.0
> > > >  * '''murmur''': Apache 2.0
> > > >
> > > >
> > > > '''Build/test-only dependencies''':
> > > >
> > > >  * '''CMake''': BSD 3-clause
> > > >  * '''gcovr''': BSD 3-clause
> > > >  * '''gmock''': BSD 3-clause
> > > >  * '''Apache Maven''': Apache 2.0
> > > >  * '''JUnit''': EPL
> > > >  * '''Mockito''': MIT
> > > >
> > > > == Cryptography ==
> > > >
> > > > Kudu does not currently include any cryptography-related code.
> > > >
> > > > == Required Resources ==
> > > >
> > > > === Mailing lists ===
> > > >
> > > >  * private@kudu.incubator.apache.org <javascript:;> (PMC)
> > > >  * commits@kudu.incubator.apache.org <javascript:;> (git push
> emails)
> > > >  * issues@kudu.incubator.apache.org <javascript:;> (JIRA issue feed)
> > > >  * dev@kudu.incubator.apache.org <javascript:;> (Gerrit code reviews
> > plus dev
> > > discussion)
> > > >  * user@kudu.incubator.apache.org <javascript:;> (User questions)
> > > >
> > > >
> > > > === Repository ===
> > > >
> > > >  * git://git.apache.org/kudu
> > > >
> > > > === Gerrit ===
> > > >
> > > > We hope to continue using Gerrit for our code review and commit
> > workflow.
> > > > The Kudu team has already been in contact with Jake Farrell to start
> > > > discussions on how Gerrit can fit into the ASF. We know that several
> > > other
> > > > ASF projects and podlings are also interested in Gerrit.
> > > >
> > > >
> > > >
> > > > If the Infrastructure team does not have the bandwidth to support
> > Gerrit,
> > > > we will continue to support our own instance of Gerrit for Kudu, and
> > make
> > > > the necessary integrations such that commits are properly
> authenticated
> > > and
> > > > maintain sufficient provenance to uphold the ASF standards (e.g. via
> > the
> > > > solution adopted by the AsterixDB podling).
> > > >
> > > > == Issue Tracking ==
> > > >
> > > > We would like to import our current JIRA project into the ASF JIRA,
> > such
> > > > that our historical commit messages and code comments continue to
> > > reference
> > > > the appropriate bug numbers.
> > > >
> > > > == Initial Committers ==
> > > >
> > > >  * Adar Dembo adar@cloudera.com <javascript:;>
> > > >  * Alex Feinberg alex@strlen.net <javascript:;>
> > > >  * Andrew Wang wang@apache.org <javascript:;>
> > > >  * Dan Burkert dan@cloudera.com <javascript:;>
> > > >  * David Alves dralves@apache.org <javascript:;>
> > > >  * Jean-Daniel Cryans jdcryans@apache.org <javascript:;>
> > > >  * Mike Percy mpercy@apache.org <javascript:;>
> > > >  * Misty Stanley-Jones misty@apache.org <javascript:;>
> > > >  * Todd Lipcon todd@apache.org <javascript:;>
> > > >
> > > > The initial list of committers was seeded by listing those
> contributors
> > > who
> > > > have contributed 20 or more patches in the last 12 months, indicating
> > > that
> > > > they are active and have achieved merit through participation on the
> > > > project. We chose not to include other contributors who either have
> not
> > > yet
> > > > contributed a significant number of patches, or whose contributions
> are
> > > far
> > > > in the past and we don’t expect to be active within the ASF.
> > > >
> > > > == Affiliations ==
> > > >
> > > >  * Adar Dembo - Cloudera
> > > >  * Alex Feinberg - Forward Networks
> > > >  * Andrew Wang - Cloudera
> > > >  * Dan Burkert - Cloudera
> > > >  * David Alves - Cloudera
> > > >  * Jean-Daniel Cryans - Cloudera
> > > >  * Mike Percy - Cloudera
> > > >  * Misty Stanley-Jones - Cloudera
> > > >  * Todd Lipcon - Cloudera
> > > >
> > > > == Sponsors ==
> > > >
> > > > === Champion ===
> > > >
> > > >  * Todd Lipcon
> > > >
> > > > === Nominated Mentors ===
> > > >
> > > >  * Jake Farrell - ASF Member and Infra team member, Acquia
> > > >  * Brock Noland - ASF Member, StreamSets
> > > >  * Michael Stack - ASF Member, Cloudera
> > > >  * Jarek Jarcec Cecho - ASF Member, Cloudera
> > > >  * Chris Mattmann - ASF Member, NASA JPL and USC
> > > >  * Julien Le Dem - Incubator PMC, Dremio
> > > >  * Carl Steinbach - ASF Member, LinkedIn
> > > >
> > > > === Sponsoring Entity ===
> > > >
> > > > The Apache Incubator
> > > >
> > >
> >
>



-- 
Henry Robinson
Software Engineer
Cloudera
415-994-6679

Re: [VOTE] Accept Kudu into the Apache Incubator

Posted by Andrew Bayer <an...@gmail.com>.

+1 binding

On Thursday, November 26, 2015, Ted Dunning <te...@gmail.com> wrote:

> +1 (binding)
>
> I think that forcing experienced community developers into one model or the
> other is unnecessary. Let them in as they would like.
>
>
>
> On Wed, Nov 25, 2015 at 4:51 PM, Greg Stein <gstein@gmail.com
> <javascript:;>> wrote:
>
> > -1 (binding)
> >
> > Starting with RTC is a poor way to attract new community members. I'd
> like
> > to see this community use CTR instead of mandating gerrit reviews.
> >
> > (ref: other-threads about lack of trust, and control issues; poor basis
> for
> > a community)
> >
> > On Tue, Nov 24, 2015 at 1:32 PM, Todd Lipcon <todd@apache.org
> <javascript:;>> wrote:
> >
> > > Hi all,
> > >
> > > Discussion on the [DISCUSS] thread seems to have wound down, so I'd
> like
> > to
> > > call a VOTE on acceptance of Kudu into the ASF Incubator. The proposal
> is
> > > pasted below and also available on the wiki at:
> > > https://wiki.apache.org/incubator/KuduProposal
> > >
> > > The proposal is unchanged since the original version, except for the
> > > addition of Carl Steinbach as a Mentor.
> > >
> > > Please cast your votes:
> > >
> > > [] +1, accept Kudu into the Incubator
> > > [] +/-0, positive/negative non-counted expression of feelings
> > > [] -1, do not accept Kudu into the incubator (please state reasoning)
> > >
> > > Given the US holiday this week, I imagine many folks are traveling or
> > > otherwise offline. So, let's run the vote for a full week rather than
> the
> > > traditional 72 hours. Unless the IPMC objects to the extended voting
> > > period, the vote will close on Tues, Dec 1st at noon PST.
> > >
> > > Thanks
> > > -Todd
> > > -----
> > >
> > > = Kudu Proposal =
> > >
> > > == Abstract ==
> > >
> > > Kudu is a distributed columnar storage engine built for the Apache
> Hadoop
> > > ecosystem.
> > >
> > > == Proposal ==
> > >
> > > Kudu is an open source storage engine for structured data which
> supports
> > > low-latency random access together with efficient analytical access
> > > patterns. Kudu distributes data using horizontal partitioning and
> > > replicates each partition using Raft consensus, providing low
> > > mean-time-to-recovery and low tail latencies. Kudu is designed within
> the
> > > context of the Apache Hadoop ecosystem and supports many integrations
> > with
> > > other data analytics projects both inside and outside of the Apache
> > > Software Foundation.
> > >
> > >
> > >
> > > We propose to incubate Kudu as a project of the Apache Software
> > Foundation.
> > >
> > > == Background ==
> > >
> > > In recent years, explosive growth in the amount of data being generated
> > and
> > > captured by enterprises has resulted in the rapid adoption of open
> source
> > > technology which is able to store massive data sets at scale and at low
> > > cost. In particular, the Apache Hadoop ecosystem has become a focal
> point
> > > for such “big data” workloads, because many traditional open source
> > > database systems have lagged in offering a scalable alternative.
> > >
> > >
> > >
> > > Structured storage in the Hadoop ecosystem has typically been achieved
> in
> > > two ways: for static data sets, data is typically stored on Apache HDFS
> > > using binary data formats such as Apache Avro or Apache Parquet.
> However,
> > > neither HDFS nor these formats has any provision for updating
> individual
> > > records, or for efficient random access. Mutable data sets are
> typically
> > > stored in semi-structured stores such as Apache HBase or Apache
> > Cassandra.
> > > These systems allow for low-latency record-level reads and writes, but
> > lag
> > > far behind the static file formats in terms of sequential read
> throughput
> > > for applications such as SQL-based analytics or machine learning.
> > >
> > >
> > >
> > > Kudu is a new storage system designed and implemented from the ground
> up
> > to
> > > fill this gap between high-throughput sequential-access storage systems
> > > such as HDFS and low-latency random-access systems such as HBase or
> > > Cassandra. While these existing systems continue to hold advantages in
> > some
> > > situations, Kudu offers a “happy medium” alternative that can
> > dramatically
> > > simplify the architecture of many common workloads. In particular, Kudu
> > > offers a simple API for row-level inserts, updates, and deletes, while
> > > providing table scans at throughputs similar to Parquet, a
> commonly-used
> > > columnar format for static data.
> > >
> > >
> > >
> > > More information on Kudu can be found at the existing open source
> project
> > > website: http://getkudu.io and in particular in the Kudu white-paper
> > PDF:
> > > http://getkudu.io/kudu.pdf from which the above was excerpted.
> > >
> > > == Rationale ==
> > >
> > > As described above, Kudu fills an important gap in the open source
> > storage
> > > ecosystem. After our initial open source project release in September
> > 2015,
> > > we have seen a great amount of interest across a diverse set of users
> and
> > > companies. We believe that, as a storage system, it is critical to
> build
> > an
> > > equally diverse set of contributors in the development community. Our
> > > experiences as committers and PMC members on other Apache projects have
> > > taught us the value of diverse communities in ensuring both longevity
> and
> > > high quality for such foundational systems.
> > >
> > > == Initial Goals ==
> > >
> > >  * Move the existing codebase, website, documentation, and mailing
> lists
> > to
> > > Apache-hosted infrastructure
> > >  * Work with the infrastructure team to implement and approve our code
> > > review, build, and testing workflows in the context of the ASF
> > >  * Incremental development and releases per Apache guidelines
> > >
> > > == Current Status ==
> > >
> > > ==== Releases ====
> > >
> > > Kudu has undergone one public release, tagged here
> > > https://github.com/cloudera/kudu/tree/kudu0.5.0-release
> > >
> > > This initial release was not performed in the typical ASF fashion -- no
> > > source tarball was released, but rather only convenience binaries made
> > > available in Cloudera’s repositories. We will adopt the ASF source
> > release
> > > process upon joining the incubator.
> > >
> > >
> > > ==== Source ====
> > >
> > > Kudu’s source is currently hosted on GitHub at
> > > https://github.com/cloudera/kudu
> > >
> > > This repository will be transitioned to Apache’s git hosting during
> > > incubation.
> > >
> > >
> > >
> > > ==== Code review ====
> > >
> > > Kudu’s code reviews are currently public and hosted on Gerrit at
> > > http://gerrit.cloudera.org:8080/#/q/status:open+project:kudu
> > >
> > > The Kudu developer community is very happy with gerrit and hopes to
> work
> > > with the Apache Infrastructure team to figure out how we can continue
> to
> > > use Gerrit within ASF policies.
> > >
> > >
> > >
> > > ==== Issue tracking ====
> > >
> > > Kudu’s bug and feature tracking is hosted on JIRA at:
> > > https://issues.cloudera.org/projects/KUDU/summary
> > >
> > > This JIRA instance contains bugs and development discussion dating
> back 2
> > > years prior to Kudu’s open source release and will provide an initial
> > seed
> > > for the ASF JIRA.
> > >
> > >
> > >
> > > ==== Community discussion ====
> > >
> > > Kudu has several public discussion forums, linked here:
> > > http://getkudu.io/community.html
> > >
> > >
> > >
> > > ==== Build Infrastructure ====
> > >
> > > The Kudu Gerrit instance is configured to only allow patches to be
> > > committed after running them through an extensive set of pre-commit
> tests
> > > and code lints. The project currently makes use of elastic public cloud
> > > resources to perform these tests. Until this point, these resources
> have
> > > been internal to Cloudera, though we are currently investing in moving
> > to a
> > > publicly accessible infrastructure.
> > >
> > >
> > >
> > > ==== Development practices ====
> > >
> > > Given that Kudu is a persistent storage engine, the community has a
> high
> > > quality bar for contributions to its core. We have a firm belief that
> > high
> > > quality is achieved through automation, not manual inspection, and
> hence
> > > put a focus on thorough testing and build infrastructure to ensure that
> > > bar. The development community also practices review-then-commit for
> all
> > > changes to ensure that changes are accompanied by appropriate tests,
> are
> > > well commented, etc.
> > >
> > > Rather than seeing these practices as barriers to contribution, we
> > believe
> > > that a fully automated and standardized review and testing practice
> makes
> > > it easier for new contributors to have patches accepted. Any new
> > developer
> > > may post a patch to Gerrit using the same workflow as a seasoned
> > > contributor, and the same suite of tests will be automatically run. If
> > the
> > > tests pass, a committer can quickly review and commit the contribution
> > from
> > > their web browser.
> > >
> > > === Meritocracy ===
> > >
> > > We believe strongly in meritocracy in electing committers and PMC
> > members.
> > > We believe that contributions can come in forms other than just code:
> for
> > > example, one of our initial proposed committers has contributed solely
> in
> > > the area of project documentation. We will encourage contributions and
> > > participation of all types, and ensure that contributors are
> > appropriately
> > > recognized.
> > >
> > > === Community ===
> > >
> > > Though Kudu is relatively new as an open source project, it has already
> > > seen promising growth in its community across several organizations:
> > >
> > >  * '''Cloudera''' is the original development sponsor for Kudu.
> > >  * '''Xiaomi''' has been helping to develop and optimize Kudu for a new
> > > production use case, contributing code, benchmarks, feedback, and
> > > conference talks.
> > >  * '''Intel''' has contributed optimizations related to their hardware
> > > technologies.
> > >  * '''Dropbox''' has been experimenting with Kudu for a machine
> > monitoring
> > > use case, and has been contributing bug reports and product feedback.
> > >  * '''Dremio''' is working on integration with Apache Drill and
> exploring
> > > using Kudu in a production use case.
> > >  * Several community-built Docker images, tutorials, and blog posts
> have
> > > sprouted up since Kudu’s release.
> > >
> > >
> > >
> > > By bringing Kudu to Apache, we hope to encourage further contribution
> > from
> > > the above organizations as well as to engage new users and contributors
> > in
> > > the community.
> > >
> > > === Core Developers ===
> > >
> > > Kudu was initially developed as a project at Cloudera. Most of the
> > > contributions to date have been by developers employed by Cloudera.
> > >
> > >
> > >
> > > Many of the developers are committers or PMC members on other Apache
> > > projects.
> > >
> > > === Alignment ===
> > >
> > > As a project in the big data ecosystem, Kudu is aligned with several
> > other
> > > ASF projects. Kudu includes input/output format integration with Apache
> > > Hadoop, and this integration can also provide a bridge to Apache Spark.
> > We
> > > are planning to integrate with Apache Hive in the near future. We also
> > > integrate closely with Cloudera Impala, which is also currently being
> > > proposed for incubation. We have also scheduled a hackathon with the
> > Apache
> > > Drill team to work on integration with that query engine.
> > >
> > > == Known Risks ==
> > >
> > > === Orphaned Products ===
> > >
> > > The risk of Kudu being abandoned is low. Cloudera has invested a great
> > deal
> > > in the initial development of the project, and intends to grow its
> > > investment over time as Kudu becomes a product adopted by its customer
> > > base. Several other organizations are also experimenting with Kudu for
> > > production use cases which would live for many years.
> > >
> > > === Inexperience with Open Source ===
> > >
> > > Kudu has been released in the open for less than two months. However,
> > from
> > > our very first public announcement we have been committed to
> open-source
> > > style development:
> > >
> > >  * our code reviews are fully public and documented on a mailing list
> > >  * our daily development chatter is in a public chat room
> > >  * we send out weekly “community status” reports highlighting news and
> > > contributions
> > >  * we published our entire JIRA history and discuss bugs in the open
> > >  * we published our entire Git commit history, going back three years
> (no
> > > squashing)
> > >
> > >
> > >
> > > Several of the initial committers are experienced open source
> developers,
> > > several being committers and/or PMC members on other ASF projects
> > (Hadoop,
> > > HBase, Thrift, Flume, et al). Those who are not ASF committers have
> > > experience on non-ASF open source projects (Kiji, open-vm-tools, et
> al).
> > >
> > > === Homogenous Developers ===
> > >
> > > The initial committers are employees or former employees of Cloudera.
> > > However, the committers are spread across multiple offices (Palo Alto,
> > San
> > > Francisco, Melbourne), so the team is familiar with working in a
> > > distributed environment across varied time zones.
> > >
> > >
> > >
> > > The project has received some contributions from developers outside of
> > > Cloudera, and is starting to attract a ''user'' community as well. We
> > hope
> > > to continue to encourage contributions from these developers and
> > community
> > > members and grow them into committers after they have had time to
> > continue
> > > their contributions.
> > >
> > > === Reliance on Salaried Developers ===
> > >
> > > As mentioned above, the majority of development up to this point has
> been
> > > sponsored by Cloudera. We have seen several community users participate
> > in
> > > discussions who are hobbyists interested in distributed systems and
> > > databases, and hope that they will continue their participation in the
> > > project going forward.
> > >
> > > === Relationships with Other Apache Products ===
> > >
> > > Kudu is currently related to the following other Apache projects:
> > >
> > >  * Hadoop: Kudu provides MapReduce input/output formats for integration
> > >  * Spark: Kudu integrates with Spark via the above-mentioned input
> > formats,
> > > and work is progressing on support for Spark Data Frames and Spark SQL.
> > >
> > >
> > >
> > > The Kudu team has reached out to several other Apache projects to start
> > > discussing integrations, including Flume, Kafka, Hive, and Drill.
> > >
> > >
> > >
> > > Kudu integrates with Impala, which is also being proposed for
> incubation.
> > >
> > >
> > >
> > > Kudu is already collaborating on ValueVector, a proposed TLP spinning
> out
> > > from the Apache Drill community.
> > >
> > >
> > >
> > > We look forward to continuing to integrate and collaborate with these
> > > communities.
> > >
> > > === An Excessive Fascination with the Apache Brand ===
> > >
> > > Many of the initial committers are already experienced Apache
> committers,
> > > and understand the true value provided by the Apache Way and the
> > principles
> > > of the ASF. We believe that this development and contribution model is
> > > especially appropriate for storage products, where Apache’s
> > > community-over-code philosophy ensures long term viability and
> > > consensus-based participation.
> > >
> > > == Documentation ==
> > >
> > >  * Documentation is written in AsciiDoc and committed in the Kudu
> source
> > > repository:
> > >
> > >  * https://github.com/cloudera/kudu/tree/master/docs
> > >
> > >
> > >
> > >  * The Kudu web site is version-controlled on the ‘gh-pages’ branch of
> > the
> > > above repository.
> > >
> > >  * A LaTeX whitepaper is also published, and the source is available
> > within
> > > the same repository.
> > >  * APIs are documented within the source code as JavaDoc or C++-style
> > > documentation comments.
> > >  * Many design documents are stored within the source code repository
> as
> > > text files next to the code being documented.
> > >
> > > == Source and Intellectual Property Submission Plan ==
> > >
> > > The Kudu codebase and web site is currently hosted on GitHub and will
> be
> > > transitioned to the ASF repositories during incubation. Kudu is already
> > > licensed under the Apache 2.0 license.
> > >
> > >
> > >
> > > Some portions of the code are imported from other open source projects
> > > under the Apache 2.0, BSD, or MIT licenses, with copyrights held by
> > authors
> > > other than the initial committers. These copyright notices are
> maintained
> > > in those files as well as a top-level NOTICE.txt file. We believe this
> to
> > > be permissible under the license terms and ASF policies, and confirmed
> > via
> > > a recent thread on general@incubator.apache.org <javascript:;> .
> > >
> > >
> > >
> > > The “Kudu” name is not a registered trademark, though before the
> initial
> > > release of the project, we performed a trademark search and Cloudera’s
> > > legal counsel deemed it acceptable in the context of a data storage
> > engine.
> > > There exists an unrelated open source project by the same name related
> to
> > > deployments on Microsoft’s Azure cloud service. We have been in contact
> > > with legal counsel from Microsoft and have obtained their approval for
> > the
> > > use of the Kudu name.
> > >
> > >
> > >
> > > Cloudera currently owns several domain names related to Kudu (
> getkudu.io
> > ,
> > > kududb.io, et al) which will be transferred to the ASF and redirected
> to
> > > the official page during incubation.
> > >
> > >
> > >
> > > Portions of Kudu are protected by pending or published patents owned by
> > > Cloudera. Given the protections already granted by the Apache License,
> we
> > > do not anticipate any explicit licensing or transfer of this
> intellectual
> > > property.
> > >
> > > == External Dependencies ==
> > >
> > > The full set of dependencies and licenses are listed in
> > > https://github.com/cloudera/kudu/blob/master/LICENSE.txt
> > >
> > > and summarized here:
> > >
> > >  * '''Twitter Bootstrap''': Apache 2.0
> > >  * '''d3''': BSD 3-clause
> > >  * '''epoch JS library''': MIT
> > >  * '''lz4''': BSD 2-clause
> > >  * '''gflags''': BSD 3-clause
> > >  * '''glog''': BSD 3-clause
> > >  * '''gperftools''': BSD 3-clause
> > >  * '''libev''': BSD 2-clause
> > >  * '''squeasel''':MIT license
> > >  * '''protobuf''': BSD 3-clause
> > >  * '''rapidjson''': MIT
> > >  * '''snappy''': BSD 3-clause
> > >  * '''trace-viewer''': BSD 3-clause
> > >  * '''zlib''': zlib license
> > >  * '''llvm''': University of Illinois/NCSA Open Source (BSD-alike)
> > >  * '''bitshuffle''': MIT
> > >  * '''boost''': Boost license
> > >  * '''curl''': MIT
> > >  * '''libunwind''': MIT
> > >  * '''nvml''': BSD 3-clause
> > >  * '''cyrus-sasl''': Cyrus SASL license (BSD-alike)
> > >  * '''openssl''': OpenSSL License (BSD-alike)
> > >
> > >  * '''Guava''': Apache 2.0
> > >  * '''StumbleUpon Async''': BSD
> > >  * '''Apache Hadoop''': Apache 2.0
> > >  * '''Apache log4j''': Apache 2.0
> > >  * '''Netty''': Apache 2.0
> > >  * '''slf4j''': MIT
> > >  * '''Apache Commons''': Apache 2.0
> > >  * '''murmur''': Apache 2.0
> > >
> > >
> > > '''Build/test-only dependencies''':
> > >
> > >  * '''CMake''': BSD 3-clause
> > >  * '''gcovr''': BSD 3-clause
> > >  * '''gmock''': BSD 3-clause
> > >  * '''Apache Maven''': Apache 2.0
> > >  * '''JUnit''': EPL
> > >  * '''Mockito''': MIT
> > >
> > > == Cryptography ==
> > >
> > > Kudu does not currently include any cryptography-related code.
> > >
> > > == Required Resources ==
> > >
> > > === Mailing lists ===
> > >
> > >  * private@kudu.incubator.apache.org <javascript:;> (PMC)
> > >  * commits@kudu.incubator.apache.org <javascript:;> (git push emails)
> > >  * issues@kudu.incubator.apache.org <javascript:;> (JIRA issue feed)
> > >  * dev@kudu.incubator.apache.org <javascript:;> (Gerrit code reviews
> plus dev
> > discussion)
> > >  * user@kudu.incubator.apache.org <javascript:;> (User questions)
> > >
> > >
> > > === Repository ===
> > >
> > >  * git://git.apache.org/kudu
> > >
> > > === Gerrit ===
> > >
> > > We hope to continue using Gerrit for our code review and commit
> workflow.
> > > The Kudu team has already been in contact with Jake Farrell to start
> > > discussions on how Gerrit can fit into the ASF. We know that several
> > other
> > > ASF projects and podlings are also interested in Gerrit.
> > >
> > >
> > >
> > > If the Infrastructure team does not have the bandwidth to support
> Gerrit,
> > > we will continue to support our own instance of Gerrit for Kudu, and
> make
> > > the necessary integrations such that commits are properly authenticated
> > and
> > > maintain sufficient provenance to uphold the ASF standards (e.g. via
> the
> > > solution adopted by the AsterixDB podling).
> > >
> > > == Issue Tracking ==
> > >
> > > We would like to import our current JIRA project into the ASF JIRA,
> such
> > > that our historical commit messages and code comments continue to
> > reference
> > > the appropriate bug numbers.
> > >
> > > == Initial Committers ==
> > >
> > >  * Adar Dembo adar@cloudera.com <javascript:;>
> > >  * Alex Feinberg alex@strlen.net <javascript:;>
> > >  * Andrew Wang wang@apache.org <javascript:;>
> > >  * Dan Burkert dan@cloudera.com <javascript:;>
> > >  * David Alves dralves@apache.org <javascript:;>
> > >  * Jean-Daniel Cryans jdcryans@apache.org <javascript:;>
> > >  * Mike Percy mpercy@apache.org <javascript:;>
> > >  * Misty Stanley-Jones misty@apache.org <javascript:;>
> > >  * Todd Lipcon todd@apache.org <javascript:;>
> > >
> > > The initial list of committers was seeded by listing those contributors
> > who
> > > have contributed 20 or more patches in the last 12 months, indicating
> > that
> > > they are active and have achieved merit through participation on the
> > > project. We chose not to include other contributors who either have not
> > yet
> > > contributed a significant number of patches, or whose contributions are
> > far
> > > in the past and we don’t expect to be active within the ASF.
> > >
> > > == Affiliations ==
> > >
> > >  * Adar Dembo - Cloudera
> > >  * Alex Feinberg - Forward Networks
> > >  * Andrew Wang - Cloudera
> > >  * Dan Burkert - Cloudera
> > >  * David Alves - Cloudera
> > >  * Jean-Daniel Cryans - Cloudera
> > >  * Mike Percy - Cloudera
> > >  * Misty Stanley-Jones - Cloudera
> > >  * Todd Lipcon - Cloudera
> > >
> > > == Sponsors ==
> > >
> > > === Champion ===
> > >
> > >  * Todd Lipcon
> > >
> > > === Nominated Mentors ===
> > >
> > >  * Jake Farrell - ASF Member and Infra team member, Acquia
> > >  * Brock Noland - ASF Member, StreamSets
> > >  * Michael Stack - ASF Member, Cloudera
> > >  * Jarek Jarcec Cecho - ASF Member, Cloudera
> > >  * Chris Mattmann - ASF Member, NASA JPL and USC
> > >  * Julien Le Dem - Incubator PMC, Dremio
> > >  * Carl Steinbach - ASF Member, LinkedIn
> > >
> > > === Sponsoring Entity ===
> > >
> > > The Apache Incubator
> > >
> >
>

Re: [VOTE] Accept Kudu into the Apache Incubator

Posted by Ted Dunning <te...@gmail.com>.

+1 (binding)

I think that forcing experienced community developers into one model or the
other is unnecessary. Let them in as they would like.



On Wed, Nov 25, 2015 at 4:51 PM, Greg Stein <gs...@gmail.com> wrote:

> -1 (binding)
>
> Starting with RTC is a poor way to attract new community members. I'd like
> to see this community use CTR instead of mandating gerrit reviews.
>
> (ref: other-threads about lack of trust, and control issues; poor basis for
> a community)
>
> On Tue, Nov 24, 2015 at 1:32 PM, Todd Lipcon <to...@apache.org> wrote:
>
> > Hi all,
> >
> > Discussion on the [DISCUSS] thread seems to have wound down, so I'd like
> to
> > call a VOTE on acceptance of Kudu into the ASF Incubator. The proposal is
> > pasted below and also available on the wiki at:
> > https://wiki.apache.org/incubator/KuduProposal
> >
> > The proposal is unchanged since the original version, except for the
> > addition of Carl Steinbach as a Mentor.
> >
> > Please cast your votes:
> >
> > [] +1, accept Kudu into the Incubator
> > [] +/-0, positive/negative non-counted expression of feelings
> > [] -1, do not accept Kudu into the incubator (please state reasoning)
> >
> > Given the US holiday this week, I imagine many folks are traveling or
> > otherwise offline. So, let's run the vote for a full week rather than the
> > traditional 72 hours. Unless the IPMC objects to the extended voting
> > period, the vote will close on Tues, Dec 1st at noon PST.
> >
> > Thanks
> > -Todd
> > -----
> >
> > = Kudu Proposal =
> >
> > == Abstract ==
> >
> > Kudu is a distributed columnar storage engine built for the Apache Hadoop
> > ecosystem.
> >
> > == Proposal ==
> >
> > Kudu is an open source storage engine for structured data which supports
> > low-latency random access together with efficient analytical access
> > patterns. Kudu distributes data using horizontal partitioning and
> > replicates each partition using Raft consensus, providing low
> > mean-time-to-recovery and low tail latencies. Kudu is designed within the
> > context of the Apache Hadoop ecosystem and supports many integrations
> with
> > other data analytics projects both inside and outside of the Apache
> > Software Foundation.
> >
> >
> >
> > We propose to incubate Kudu as a project of the Apache Software
> Foundation.
> >
> > == Background ==
> >
> > In recent years, explosive growth in the amount of data being generated
> and
> > captured by enterprises has resulted in the rapid adoption of open source
> > technology which is able to store massive data sets at scale and at low
> > cost. In particular, the Apache Hadoop ecosystem has become a focal point
> > for such “big data” workloads, because many traditional open source
> > database systems have lagged in offering a scalable alternative.
> >
> >
> >
> > Structured storage in the Hadoop ecosystem has typically been achieved in
> > two ways: for static data sets, data is typically stored on Apache HDFS
> > using binary data formats such as Apache Avro or Apache Parquet. However,
> > neither HDFS nor these formats has any provision for updating individual
> > records, or for efficient random access. Mutable data sets are typically
> > stored in semi-structured stores such as Apache HBase or Apache
> Cassandra.
> > These systems allow for low-latency record-level reads and writes, but
> lag
> > far behind the static file formats in terms of sequential read throughput
> > for applications such as SQL-based analytics or machine learning.
> >
> >
> >
> > Kudu is a new storage system designed and implemented from the ground up
> to
> > fill this gap between high-throughput sequential-access storage systems
> > such as HDFS and low-latency random-access systems such as HBase or
> > Cassandra. While these existing systems continue to hold advantages in
> some
> > situations, Kudu offers a “happy medium” alternative that can
> dramatically
> > simplify the architecture of many common workloads. In particular, Kudu
> > offers a simple API for row-level inserts, updates, and deletes, while
> > providing table scans at throughputs similar to Parquet, a commonly-used
> > columnar format for static data.
> >
> >
> >
> > More information on Kudu can be found at the existing open source project
> > website: http://getkudu.io and in particular in the Kudu white-paper
> PDF:
> > http://getkudu.io/kudu.pdf from which the above was excerpted.
> >
> > == Rationale ==
> >
> > As described above, Kudu fills an important gap in the open source
> storage
> > ecosystem. After our initial open source project release in September
> 2015,
> > we have seen a great amount of interest across a diverse set of users and
> > companies. We believe that, as a storage system, it is critical to build
> an
> > equally diverse set of contributors in the development community. Our
> > experiences as committers and PMC members on other Apache projects have
> > taught us the value of diverse communities in ensuring both longevity and
> > high quality for such foundational systems.
> >
> > == Initial Goals ==
> >
> >  * Move the existing codebase, website, documentation, and mailing lists
> to
> > Apache-hosted infrastructure
> >  * Work with the infrastructure team to implement and approve our code
> > review, build, and testing workflows in the context of the ASF
> >  * Incremental development and releases per Apache guidelines
> >
> > == Current Status ==
> >
> > ==== Releases ====
> >
> > Kudu has undergone one public release, tagged here
> > https://github.com/cloudera/kudu/tree/kudu0.5.0-release
> >
> > This initial release was not performed in the typical ASF fashion -- no
> > source tarball was released, but rather only convenience binaries made
> > available in Cloudera’s repositories. We will adopt the ASF source
> release
> > process upon joining the incubator.
> >
> >
> > ==== Source ====
> >
> > Kudu’s source is currently hosted on GitHub at
> > https://github.com/cloudera/kudu
> >
> > This repository will be transitioned to Apache’s git hosting during
> > incubation.
> >
> >
> >
> > ==== Code review ====
> >
> > Kudu’s code reviews are currently public and hosted on Gerrit at
> > http://gerrit.cloudera.org:8080/#/q/status:open+project:kudu
> >
> > The Kudu developer community is very happy with gerrit and hopes to work
> > with the Apache Infrastructure team to figure out how we can continue to
> > use Gerrit within ASF policies.
> >
> >
> >
> > ==== Issue tracking ====
> >
> > Kudu’s bug and feature tracking is hosted on JIRA at:
> > https://issues.cloudera.org/projects/KUDU/summary
> >
> > This JIRA instance contains bugs and development discussion dating back 2
> > years prior to Kudu’s open source release and will provide an initial
> seed
> > for the ASF JIRA.
> >
> >
> >
> > ==== Community discussion ====
> >
> > Kudu has several public discussion forums, linked here:
> > http://getkudu.io/community.html
> >
> >
> >
> > ==== Build Infrastructure ====
> >
> > The Kudu Gerrit instance is configured to only allow patches to be
> > committed after running them through an extensive set of pre-commit tests
> > and code lints. The project currently makes use of elastic public cloud
> > resources to perform these tests. Until this point, these resources have
> > been internal to Cloudera, though we are currently investing in moving
> to a
> > publicly accessible infrastructure.
> >
> >
> >
> > ==== Development practices ====
> >
> > Given that Kudu is a persistent storage engine, the community has a high
> > quality bar for contributions to its core. We have a firm belief that
> high
> > quality is achieved through automation, not manual inspection, and hence
> > put a focus on thorough testing and build infrastructure to ensure that
> > bar. The development community also practices review-then-commit for all
> > changes to ensure that changes are accompanied by appropriate tests, are
> > well commented, etc.
> >
> > Rather than seeing these practices as barriers to contribution, we
> believe
> > that a fully automated and standardized review and testing practice makes
> > it easier for new contributors to have patches accepted. Any new
> developer
> > may post a patch to Gerrit using the same workflow as a seasoned
> > contributor, and the same suite of tests will be automatically run. If
> the
> > tests pass, a committer can quickly review and commit the contribution
> from
> > their web browser.
> >
> > === Meritocracy ===
> >
> > We believe strongly in meritocracy in electing committers and PMC
> members.
> > We believe that contributions can come in forms other than just code: for
> > example, one of our initial proposed committers has contributed solely in
> > the area of project documentation. We will encourage contributions and
> > participation of all types, and ensure that contributors are
> appropriately
> > recognized.
> >
> > === Community ===
> >
> > Though Kudu is relatively new as an open source project, it has already
> > seen promising growth in its community across several organizations:
> >
> >  * '''Cloudera''' is the original development sponsor for Kudu.
> >  * '''Xiaomi''' has been helping to develop and optimize Kudu for a new
> > production use case, contributing code, benchmarks, feedback, and
> > conference talks.
> >  * '''Intel''' has contributed optimizations related to their hardware
> > technologies.
> >  * '''Dropbox''' has been experimenting with Kudu for a machine
> monitoring
> > use case, and has been contributing bug reports and product feedback.
> >  * '''Dremio''' is working on integration with Apache Drill and exploring
> > using Kudu in a production use case.
> >  * Several community-built Docker images, tutorials, and blog posts have
> > sprouted up since Kudu’s release.
> >
> >
> >
> > By bringing Kudu to Apache, we hope to encourage further contribution
> from
> > the above organizations as well as to engage new users and contributors
> in
> > the community.
> >
> > === Core Developers ===
> >
> > Kudu was initially developed as a project at Cloudera. Most of the
> > contributions to date have been by developers employed by Cloudera.
> >
> >
> >
> > Many of the developers are committers or PMC members on other Apache
> > projects.
> >
> > === Alignment ===
> >
> > As a project in the big data ecosystem, Kudu is aligned with several
> other
> > ASF projects. Kudu includes input/output format integration with Apache
> > Hadoop, and this integration can also provide a bridge to Apache Spark.
> We
> > are planning to integrate with Apache Hive in the near future. We also
> > integrate closely with Cloudera Impala, which is also currently being
> > proposed for incubation. We have also scheduled a hackathon with the
> Apache
> > Drill team to work on integration with that query engine.
> >
> > == Known Risks ==
> >
> > === Orphaned Products ===
> >
> > The risk of Kudu being abandoned is low. Cloudera has invested a great
> deal
> > in the initial development of the project, and intends to grow its
> > investment over time as Kudu becomes a product adopted by its customer
> > base. Several other organizations are also experimenting with Kudu for
> > production use cases which would live for many years.
> >
> > === Inexperience with Open Source ===
> >
> > Kudu has been released in the open for less than two months. However,
> from
> > our very first public announcement we have been committed to open-source
> > style development:
> >
> >  * our code reviews are fully public and documented on a mailing list
> >  * our daily development chatter is in a public chat room
> >  * we send out weekly “community status” reports highlighting news and
> > contributions
> >  * we published our entire JIRA history and discuss bugs in the open
> >  * we published our entire Git commit history, going back three years (no
> > squashing)
> >
> >
> >
> > Several of the initial committers are experienced open source developers,
> > several being committers and/or PMC members on other ASF projects
> (Hadoop,
> > HBase, Thrift, Flume, et al). Those who are not ASF committers have
> > experience on non-ASF open source projects (Kiji, open-vm-tools, et al).
> >
> > === Homogenous Developers ===
> >
> > The initial committers are employees or former employees of Cloudera.
> > However, the committers are spread across multiple offices (Palo Alto,
> San
> > Francisco, Melbourne), so the team is familiar with working in a
> > distributed environment across varied time zones.
> >
> >
> >
> > The project has received some contributions from developers outside of
> > Cloudera, and is starting to attract a ''user'' community as well. We
> hope
> > to continue to encourage contributions from these developers and
> community
> > members and grow them into committers after they have had time to
> continue
> > their contributions.
> >
> > === Reliance on Salaried Developers ===
> >
> > As mentioned above, the majority of development up to this point has been
> > sponsored by Cloudera. We have seen several community users participate
> in
> > discussions who are hobbyists interested in distributed systems and
> > databases, and hope that they will continue their participation in the
> > project going forward.
> >
> > === Relationships with Other Apache Products ===
> >
> > Kudu is currently related to the following other Apache projects:
> >
> >  * Hadoop: Kudu provides MapReduce input/output formats for integration
> >  * Spark: Kudu integrates with Spark via the above-mentioned input
> formats,
> > and work is progressing on support for Spark Data Frames and Spark SQL.
> >
> >
> >
> > The Kudu team has reached out to several other Apache projects to start
> > discussing integrations, including Flume, Kafka, Hive, and Drill.
> >
> >
> >
> > Kudu integrates with Impala, which is also being proposed for incubation.
> >
> >
> >
> > Kudu is already collaborating on ValueVector, a proposed TLP spinning out
> > from the Apache Drill community.
> >
> >
> >
> > We look forward to continuing to integrate and collaborate with these
> > communities.
> >
> > === An Excessive Fascination with the Apache Brand ===
> >
> > Many of the initial committers are already experienced Apache committers,
> > and understand the true value provided by the Apache Way and the
> principles
> > of the ASF. We believe that this development and contribution model is
> > especially appropriate for storage products, where Apache’s
> > community-over-code philosophy ensures long term viability and
> > consensus-based participation.
> >
> > == Documentation ==
> >
> >  * Documentation is written in AsciiDoc and committed in the Kudu source
> > repository:
> >
> >  * https://github.com/cloudera/kudu/tree/master/docs
> >
> >
> >
> >  * The Kudu web site is version-controlled on the ‘gh-pages’ branch of
> the
> > above repository.
> >
> >  * A LaTeX whitepaper is also published, and the source is available
> within
> > the same repository.
> >  * APIs are documented within the source code as JavaDoc or C++-style
> > documentation comments.
> >  * Many design documents are stored within the source code repository as
> > text files next to the code being documented.
> >
> > == Source and Intellectual Property Submission Plan ==
> >
> > The Kudu codebase and web site is currently hosted on GitHub and will be
> > transitioned to the ASF repositories during incubation. Kudu is already
> > licensed under the Apache 2.0 license.
> >
> >
> >
> > Some portions of the code are imported from other open source projects
> > under the Apache 2.0, BSD, or MIT licenses, with copyrights held by
> authors
> > other than the initial committers. These copyright notices are maintained
> > in those files as well as a top-level NOTICE.txt file. We believe this to
> > be permissible under the license terms and ASF policies, and confirmed
> via
> > a recent thread on general@incubator.apache.org .
> >
> >
> >
> > The “Kudu” name is not a registered trademark, though before the initial
> > release of the project, we performed a trademark search and Cloudera’s
> > legal counsel deemed it acceptable in the context of a data storage
> engine.
> > There exists an unrelated open source project by the same name related to
> > deployments on Microsoft’s Azure cloud service. We have been in contact
> > with legal counsel from Microsoft and have obtained their approval for
> the
> > use of the Kudu name.
> >
> >
> >
> > Cloudera currently owns several domain names related to Kudu (getkudu.io
> ,
> > kududb.io, et al) which will be transferred to the ASF and redirected to
> > the official page during incubation.
> >
> >
> >
> > Portions of Kudu are protected by pending or published patents owned by
> > Cloudera. Given the protections already granted by the Apache License, we
> > do not anticipate any explicit licensing or transfer of this intellectual
> > property.
> >
> > == External Dependencies ==
> >
> > The full set of dependencies and licenses are listed in
> > https://github.com/cloudera/kudu/blob/master/LICENSE.txt
> >
> > and summarized here:
> >
> >  * '''Twitter Bootstrap''': Apache 2.0
> >  * '''d3''': BSD 3-clause
> >  * '''epoch JS library''': MIT
> >  * '''lz4''': BSD 2-clause
> >  * '''gflags''': BSD 3-clause
> >  * '''glog''': BSD 3-clause
> >  * '''gperftools''': BSD 3-clause
> >  * '''libev''': BSD 2-clause
> >  * '''squeasel''':MIT license
> >  * '''protobuf''': BSD 3-clause
> >  * '''rapidjson''': MIT
> >  * '''snappy''': BSD 3-clause
> >  * '''trace-viewer''': BSD 3-clause
> >  * '''zlib''': zlib license
> >  * '''llvm''': University of Illinois/NCSA Open Source (BSD-alike)
> >  * '''bitshuffle''': MIT
> >  * '''boost''': Boost license
> >  * '''curl''': MIT
> >  * '''libunwind''': MIT
> >  * '''nvml''': BSD 3-clause
> >  * '''cyrus-sasl''': Cyrus SASL license (BSD-alike)
> >  * '''openssl''': OpenSSL License (BSD-alike)
> >
> >  * '''Guava''': Apache 2.0
> >  * '''StumbleUpon Async''': BSD
> >  * '''Apache Hadoop''': Apache 2.0
> >  * '''Apache log4j''': Apache 2.0
> >  * '''Netty''': Apache 2.0
> >  * '''slf4j''': MIT
> >  * '''Apache Commons''': Apache 2.0
> >  * '''murmur''': Apache 2.0
> >
> >
> > '''Build/test-only dependencies''':
> >
> >  * '''CMake''': BSD 3-clause
> >  * '''gcovr''': BSD 3-clause
> >  * '''gmock''': BSD 3-clause
> >  * '''Apache Maven''': Apache 2.0
> >  * '''JUnit''': EPL
> >  * '''Mockito''': MIT
> >
> > == Cryptography ==
> >
> > Kudu does not currently include any cryptography-related code.
> >
> > == Required Resources ==
> >
> > === Mailing lists ===
> >
> >  * private@kudu.incubator.apache.org (PMC)
> >  * commits@kudu.incubator.apache.org (git push emails)
> >  * issues@kudu.incubator.apache.org (JIRA issue feed)
> >  * dev@kudu.incubator.apache.org (Gerrit code reviews plus dev
> discussion)
> >  * user@kudu.incubator.apache.org (User questions)
> >
> >
> > === Repository ===
> >
> >  * git://git.apache.org/kudu
> >
> > === Gerrit ===
> >
> > We hope to continue using Gerrit for our code review and commit workflow.
> > The Kudu team has already been in contact with Jake Farrell to start
> > discussions on how Gerrit can fit into the ASF. We know that several
> other
> > ASF projects and podlings are also interested in Gerrit.
> >
> >
> >
> > If the Infrastructure team does not have the bandwidth to support Gerrit,
> > we will continue to support our own instance of Gerrit for Kudu, and make
> > the necessary integrations such that commits are properly authenticated
> and
> > maintain sufficient provenance to uphold the ASF standards (e.g. via the
> > solution adopted by the AsterixDB podling).
> >
> > == Issue Tracking ==
> >
> > We would like to import our current JIRA project into the ASF JIRA, such
> > that our historical commit messages and code comments continue to
> reference
> > the appropriate bug numbers.
> >
> > == Initial Committers ==
> >
> >  * Adar Dembo adar@cloudera.com
> >  * Alex Feinberg alex@strlen.net
> >  * Andrew Wang wang@apache.org
> >  * Dan Burkert dan@cloudera.com
> >  * David Alves dralves@apache.org
> >  * Jean-Daniel Cryans jdcryans@apache.org
> >  * Mike Percy mpercy@apache.org
> >  * Misty Stanley-Jones misty@apache.org
> >  * Todd Lipcon todd@apache.org
> >
> > The initial list of committers was seeded by listing those contributors
> who
> > have contributed 20 or more patches in the last 12 months, indicating
> that
> > they are active and have achieved merit through participation on the
> > project. We chose not to include other contributors who either have not
> yet
> > contributed a significant number of patches, or whose contributions are
> far
> > in the past and we don’t expect to be active within the ASF.
> >
> > == Affiliations ==
> >
> >  * Adar Dembo - Cloudera
> >  * Alex Feinberg - Forward Networks
> >  * Andrew Wang - Cloudera
> >  * Dan Burkert - Cloudera
> >  * David Alves - Cloudera
> >  * Jean-Daniel Cryans - Cloudera
> >  * Mike Percy - Cloudera
> >  * Misty Stanley-Jones - Cloudera
> >  * Todd Lipcon - Cloudera
> >
> > == Sponsors ==
> >
> > === Champion ===
> >
> >  * Todd Lipcon
> >
> > === Nominated Mentors ===
> >
> >  * Jake Farrell - ASF Member and Infra team member, Acquia
> >  * Brock Noland - ASF Member, StreamSets
> >  * Michael Stack - ASF Member, Cloudera
> >  * Jarek Jarcec Cecho - ASF Member, Cloudera
> >  * Chris Mattmann - ASF Member, NASA JPL and USC
> >  * Julien Le Dem - Incubator PMC, Dremio
> >  * Carl Steinbach - ASF Member, LinkedIn
> >
> > === Sponsoring Entity ===
> >
> > The Apache Incubator
> >
>

Re: [VOTE] Accept Kudu into the Apache Incubator

Posted by Greg Stein <gs...@gmail.com>.

-1 (binding)

Starting with RTC is a poor way to attract new community members. I'd like
to see this community use CTR instead of mandating gerrit reviews.

(ref: other-threads about lack of trust, and control issues; poor basis for
a community)

On Tue, Nov 24, 2015 at 1:32 PM, Todd Lipcon <to...@apache.org> wrote:

> Hi all,
>
> Discussion on the [DISCUSS] thread seems to have wound down, so I'd like to
> call a VOTE on acceptance of Kudu into the ASF Incubator. The proposal is
> pasted below and also available on the wiki at:
> https://wiki.apache.org/incubator/KuduProposal
>
> The proposal is unchanged since the original version, except for the
> addition of Carl Steinbach as a Mentor.
>
> Please cast your votes:
>
> [] +1, accept Kudu into the Incubator
> [] +/-0, positive/negative non-counted expression of feelings
> [] -1, do not accept Kudu into the incubator (please state reasoning)
>
> Given the US holiday this week, I imagine many folks are traveling or
> otherwise offline. So, let's run the vote for a full week rather than the
> traditional 72 hours. Unless the IPMC objects to the extended voting
> period, the vote will close on Tues, Dec 1st at noon PST.
>
> Thanks
> -Todd
> -----
>
> = Kudu Proposal =
>
> == Abstract ==
>
> Kudu is a distributed columnar storage engine built for the Apache Hadoop
> ecosystem.
>
> == Proposal ==
>
> Kudu is an open source storage engine for structured data which supports
> low-latency random access together with efficient analytical access
> patterns. Kudu distributes data using horizontal partitioning and
> replicates each partition using Raft consensus, providing low
> mean-time-to-recovery and low tail latencies. Kudu is designed within the
> context of the Apache Hadoop ecosystem and supports many integrations with
> other data analytics projects both inside and outside of the Apache
> Software Foundation.
>
>
>
> We propose to incubate Kudu as a project of the Apache Software Foundation.
>
> == Background ==
>
> In recent years, explosive growth in the amount of data being generated and
> captured by enterprises has resulted in the rapid adoption of open source
> technology which is able to store massive data sets at scale and at low
> cost. In particular, the Apache Hadoop ecosystem has become a focal point
> for such “big data” workloads, because many traditional open source
> database systems have lagged in offering a scalable alternative.
>
>
>
> Structured storage in the Hadoop ecosystem has typically been achieved in
> two ways: for static data sets, data is typically stored on Apache HDFS
> using binary data formats such as Apache Avro or Apache Parquet. However,
> neither HDFS nor these formats has any provision for updating individual
> records, or for efficient random access. Mutable data sets are typically
> stored in semi-structured stores such as Apache HBase or Apache Cassandra.
> These systems allow for low-latency record-level reads and writes, but lag
> far behind the static file formats in terms of sequential read throughput
> for applications such as SQL-based analytics or machine learning.
>
>
>
> Kudu is a new storage system designed and implemented from the ground up to
> fill this gap between high-throughput sequential-access storage systems
> such as HDFS and low-latency random-access systems such as HBase or
> Cassandra. While these existing systems continue to hold advantages in some
> situations, Kudu offers a “happy medium” alternative that can dramatically
> simplify the architecture of many common workloads. In particular, Kudu
> offers a simple API for row-level inserts, updates, and deletes, while
> providing table scans at throughputs similar to Parquet, a commonly-used
> columnar format for static data.
>
>
>
> More information on Kudu can be found at the existing open source project
> website: http://getkudu.io and in particular in the Kudu white-paper PDF:
> http://getkudu.io/kudu.pdf from which the above was excerpted.
>
> == Rationale ==
>
> As described above, Kudu fills an important gap in the open source storage
> ecosystem. After our initial open source project release in September 2015,
> we have seen a great amount of interest across a diverse set of users and
> companies. We believe that, as a storage system, it is critical to build an
> equally diverse set of contributors in the development community. Our
> experiences as committers and PMC members on other Apache projects have
> taught us the value of diverse communities in ensuring both longevity and
> high quality for such foundational systems.
>
> == Initial Goals ==
>
>  * Move the existing codebase, website, documentation, and mailing lists to
> Apache-hosted infrastructure
>  * Work with the infrastructure team to implement and approve our code
> review, build, and testing workflows in the context of the ASF
>  * Incremental development and releases per Apache guidelines
>
> == Current Status ==
>
> ==== Releases ====
>
> Kudu has undergone one public release, tagged here
> https://github.com/cloudera/kudu/tree/kudu0.5.0-release
>
> This initial release was not performed in the typical ASF fashion -- no
> source tarball was released, but rather only convenience binaries made
> available in Cloudera’s repositories. We will adopt the ASF source release
> process upon joining the incubator.
>
>
> ==== Source ====
>
> Kudu’s source is currently hosted on GitHub at
> https://github.com/cloudera/kudu
>
> This repository will be transitioned to Apache’s git hosting during
> incubation.
>
>
>
> ==== Code review ====
>
> Kudu’s code reviews are currently public and hosted on Gerrit at
> http://gerrit.cloudera.org:8080/#/q/status:open+project:kudu
>
> The Kudu developer community is very happy with gerrit and hopes to work
> with the Apache Infrastructure team to figure out how we can continue to
> use Gerrit within ASF policies.
>
>
>
> ==== Issue tracking ====
>
> Kudu’s bug and feature tracking is hosted on JIRA at:
> https://issues.cloudera.org/projects/KUDU/summary
>
> This JIRA instance contains bugs and development discussion dating back 2
> years prior to Kudu’s open source release and will provide an initial seed
> for the ASF JIRA.
>
>
>
> ==== Community discussion ====
>
> Kudu has several public discussion forums, linked here:
> http://getkudu.io/community.html
>
>
>
> ==== Build Infrastructure ====
>
> The Kudu Gerrit instance is configured to only allow patches to be
> committed after running them through an extensive set of pre-commit tests
> and code lints. The project currently makes use of elastic public cloud
> resources to perform these tests. Until this point, these resources have
> been internal to Cloudera, though we are currently investing in moving to a
> publicly accessible infrastructure.
>
>
>
> ==== Development practices ====
>
> Given that Kudu is a persistent storage engine, the community has a high
> quality bar for contributions to its core. We have a firm belief that high
> quality is achieved through automation, not manual inspection, and hence
> put a focus on thorough testing and build infrastructure to ensure that
> bar. The development community also practices review-then-commit for all
> changes to ensure that changes are accompanied by appropriate tests, are
> well commented, etc.
>
> Rather than seeing these practices as barriers to contribution, we believe
> that a fully automated and standardized review and testing practice makes
> it easier for new contributors to have patches accepted. Any new developer
> may post a patch to Gerrit using the same workflow as a seasoned
> contributor, and the same suite of tests will be automatically run. If the
> tests pass, a committer can quickly review and commit the contribution from
> their web browser.
>
> === Meritocracy ===
>
> We believe strongly in meritocracy in electing committers and PMC members.
> We believe that contributions can come in forms other than just code: for
> example, one of our initial proposed committers has contributed solely in
> the area of project documentation. We will encourage contributions and
> participation of all types, and ensure that contributors are appropriately
> recognized.
>
> === Community ===
>
> Though Kudu is relatively new as an open source project, it has already
> seen promising growth in its community across several organizations:
>
>  * '''Cloudera''' is the original development sponsor for Kudu.
>  * '''Xiaomi''' has been helping to develop and optimize Kudu for a new
> production use case, contributing code, benchmarks, feedback, and
> conference talks.
>  * '''Intel''' has contributed optimizations related to their hardware
> technologies.
>  * '''Dropbox''' has been experimenting with Kudu for a machine monitoring
> use case, and has been contributing bug reports and product feedback.
>  * '''Dremio''' is working on integration with Apache Drill and exploring
> using Kudu in a production use case.
>  * Several community-built Docker images, tutorials, and blog posts have
> sprouted up since Kudu’s release.
>
>
>
> By bringing Kudu to Apache, we hope to encourage further contribution from
> the above organizations as well as to engage new users and contributors in
> the community.
>
> === Core Developers ===
>
> Kudu was initially developed as a project at Cloudera. Most of the
> contributions to date have been by developers employed by Cloudera.
>
>
>
> Many of the developers are committers or PMC members on other Apache
> projects.
>
> === Alignment ===
>
> As a project in the big data ecosystem, Kudu is aligned with several other
> ASF projects. Kudu includes input/output format integration with Apache
> Hadoop, and this integration can also provide a bridge to Apache Spark. We
> are planning to integrate with Apache Hive in the near future. We also
> integrate closely with Cloudera Impala, which is also currently being
> proposed for incubation. We have also scheduled a hackathon with the Apache
> Drill team to work on integration with that query engine.
>
> == Known Risks ==
>
> === Orphaned Products ===
>
> The risk of Kudu being abandoned is low. Cloudera has invested a great deal
> in the initial development of the project, and intends to grow its
> investment over time as Kudu becomes a product adopted by its customer
> base. Several other organizations are also experimenting with Kudu for
> production use cases which would live for many years.
>
> === Inexperience with Open Source ===
>
> Kudu has been released in the open for less than two months. However, from
> our very first public announcement we have been committed to open-source
> style development:
>
>  * our code reviews are fully public and documented on a mailing list
>  * our daily development chatter is in a public chat room
>  * we send out weekly “community status” reports highlighting news and
> contributions
>  * we published our entire JIRA history and discuss bugs in the open
>  * we published our entire Git commit history, going back three years (no
> squashing)
>
>
>
> Several of the initial committers are experienced open source developers,
> several being committers and/or PMC members on other ASF projects (Hadoop,
> HBase, Thrift, Flume, et al). Those who are not ASF committers have
> experience on non-ASF open source projects (Kiji, open-vm-tools, et al).
>
> === Homogenous Developers ===
>
> The initial committers are employees or former employees of Cloudera.
> However, the committers are spread across multiple offices (Palo Alto, San
> Francisco, Melbourne), so the team is familiar with working in a
> distributed environment across varied time zones.
>
>
>
> The project has received some contributions from developers outside of
> Cloudera, and is starting to attract a ''user'' community as well. We hope
> to continue to encourage contributions from these developers and community
> members and grow them into committers after they have had time to continue
> their contributions.
>
> === Reliance on Salaried Developers ===
>
> As mentioned above, the majority of development up to this point has been
> sponsored by Cloudera. We have seen several community users participate in
> discussions who are hobbyists interested in distributed systems and
> databases, and hope that they will continue their participation in the
> project going forward.
>
> === Relationships with Other Apache Products ===
>
> Kudu is currently related to the following other Apache projects:
>
>  * Hadoop: Kudu provides MapReduce input/output formats for integration
>  * Spark: Kudu integrates with Spark via the above-mentioned input formats,
> and work is progressing on support for Spark Data Frames and Spark SQL.
>
>
>
> The Kudu team has reached out to several other Apache projects to start
> discussing integrations, including Flume, Kafka, Hive, and Drill.
>
>
>
> Kudu integrates with Impala, which is also being proposed for incubation.
>
>
>
> Kudu is already collaborating on ValueVector, a proposed TLP spinning out
> from the Apache Drill community.
>
>
>
> We look forward to continuing to integrate and collaborate with these
> communities.
>
> === An Excessive Fascination with the Apache Brand ===
>
> Many of the initial committers are already experienced Apache committers,
> and understand the true value provided by the Apache Way and the principles
> of the ASF. We believe that this development and contribution model is
> especially appropriate for storage products, where Apache’s
> community-over-code philosophy ensures long term viability and
> consensus-based participation.
>
> == Documentation ==
>
>  * Documentation is written in AsciiDoc and committed in the Kudu source
> repository:
>
>  * https://github.com/cloudera/kudu/tree/master/docs
>
>
>
>  * The Kudu web site is version-controlled on the ‘gh-pages’ branch of the
> above repository.
>
>  * A LaTeX whitepaper is also published, and the source is available within
> the same repository.
>  * APIs are documented within the source code as JavaDoc or C++-style
> documentation comments.
>  * Many design documents are stored within the source code repository as
> text files next to the code being documented.
>
> == Source and Intellectual Property Submission Plan ==
>
> The Kudu codebase and web site is currently hosted on GitHub and will be
> transitioned to the ASF repositories during incubation. Kudu is already
> licensed under the Apache 2.0 license.
>
>
>
> Some portions of the code are imported from other open source projects
> under the Apache 2.0, BSD, or MIT licenses, with copyrights held by authors
> other than the initial committers. These copyright notices are maintained
> in those files as well as a top-level NOTICE.txt file. We believe this to
> be permissible under the license terms and ASF policies, and confirmed via
> a recent thread on general@incubator.apache.org .
>
>
>
> The “Kudu” name is not a registered trademark, though before the initial
> release of the project, we performed a trademark search and Cloudera’s
> legal counsel deemed it acceptable in the context of a data storage engine.
> There exists an unrelated open source project by the same name related to
> deployments on Microsoft’s Azure cloud service. We have been in contact
> with legal counsel from Microsoft and have obtained their approval for the
> use of the Kudu name.
>
>
>
> Cloudera currently owns several domain names related to Kudu (getkudu.io,
> kududb.io, et al) which will be transferred to the ASF and redirected to
> the official page during incubation.
>
>
>
> Portions of Kudu are protected by pending or published patents owned by
> Cloudera. Given the protections already granted by the Apache License, we
> do not anticipate any explicit licensing or transfer of this intellectual
> property.
>
> == External Dependencies ==
>
> The full set of dependencies and licenses are listed in
> https://github.com/cloudera/kudu/blob/master/LICENSE.txt
>
> and summarized here:
>
>  * '''Twitter Bootstrap''': Apache 2.0
>  * '''d3''': BSD 3-clause
>  * '''epoch JS library''': MIT
>  * '''lz4''': BSD 2-clause
>  * '''gflags''': BSD 3-clause
>  * '''glog''': BSD 3-clause
>  * '''gperftools''': BSD 3-clause
>  * '''libev''': BSD 2-clause
>  * '''squeasel''':MIT license
>  * '''protobuf''': BSD 3-clause
>  * '''rapidjson''': MIT
>  * '''snappy''': BSD 3-clause
>  * '''trace-viewer''': BSD 3-clause
>  * '''zlib''': zlib license
>  * '''llvm''': University of Illinois/NCSA Open Source (BSD-alike)
>  * '''bitshuffle''': MIT
>  * '''boost''': Boost license
>  * '''curl''': MIT
>  * '''libunwind''': MIT
>  * '''nvml''': BSD 3-clause
>  * '''cyrus-sasl''': Cyrus SASL license (BSD-alike)
>  * '''openssl''': OpenSSL License (BSD-alike)
>
>  * '''Guava''': Apache 2.0
>  * '''StumbleUpon Async''': BSD
>  * '''Apache Hadoop''': Apache 2.0
>  * '''Apache log4j''': Apache 2.0
>  * '''Netty''': Apache 2.0
>  * '''slf4j''': MIT
>  * '''Apache Commons''': Apache 2.0
>  * '''murmur''': Apache 2.0
>
>
> '''Build/test-only dependencies''':
>
>  * '''CMake''': BSD 3-clause
>  * '''gcovr''': BSD 3-clause
>  * '''gmock''': BSD 3-clause
>  * '''Apache Maven''': Apache 2.0
>  * '''JUnit''': EPL
>  * '''Mockito''': MIT
>
> == Cryptography ==
>
> Kudu does not currently include any cryptography-related code.
>
> == Required Resources ==
>
> === Mailing lists ===
>
>  * private@kudu.incubator.apache.org (PMC)
>  * commits@kudu.incubator.apache.org (git push emails)
>  * issues@kudu.incubator.apache.org (JIRA issue feed)
>  * dev@kudu.incubator.apache.org (Gerrit code reviews plus dev discussion)
>  * user@kudu.incubator.apache.org (User questions)
>
>
> === Repository ===
>
>  * git://git.apache.org/kudu
>
> === Gerrit ===
>
> We hope to continue using Gerrit for our code review and commit workflow.
> The Kudu team has already been in contact with Jake Farrell to start
> discussions on how Gerrit can fit into the ASF. We know that several other
> ASF projects and podlings are also interested in Gerrit.
>
>
>
> If the Infrastructure team does not have the bandwidth to support Gerrit,
> we will continue to support our own instance of Gerrit for Kudu, and make
> the necessary integrations such that commits are properly authenticated and
> maintain sufficient provenance to uphold the ASF standards (e.g. via the
> solution adopted by the AsterixDB podling).
>
> == Issue Tracking ==
>
> We would like to import our current JIRA project into the ASF JIRA, such
> that our historical commit messages and code comments continue to reference
> the appropriate bug numbers.
>
> == Initial Committers ==
>
>  * Adar Dembo adar@cloudera.com
>  * Alex Feinberg alex@strlen.net
>  * Andrew Wang wang@apache.org
>  * Dan Burkert dan@cloudera.com
>  * David Alves dralves@apache.org
>  * Jean-Daniel Cryans jdcryans@apache.org
>  * Mike Percy mpercy@apache.org
>  * Misty Stanley-Jones misty@apache.org
>  * Todd Lipcon todd@apache.org
>
> The initial list of committers was seeded by listing those contributors who
> have contributed 20 or more patches in the last 12 months, indicating that
> they are active and have achieved merit through participation on the
> project. We chose not to include other contributors who either have not yet
> contributed a significant number of patches, or whose contributions are far
> in the past and we don’t expect to be active within the ASF.
>
> == Affiliations ==
>
>  * Adar Dembo - Cloudera
>  * Alex Feinberg - Forward Networks
>  * Andrew Wang - Cloudera
>  * Dan Burkert - Cloudera
>  * David Alves - Cloudera
>  * Jean-Daniel Cryans - Cloudera
>  * Mike Percy - Cloudera
>  * Misty Stanley-Jones - Cloudera
>  * Todd Lipcon - Cloudera
>
> == Sponsors ==
>
> === Champion ===
>
>  * Todd Lipcon
>
> === Nominated Mentors ===
>
>  * Jake Farrell - ASF Member and Infra team member, Acquia
>  * Brock Noland - ASF Member, StreamSets
>  * Michael Stack - ASF Member, Cloudera
>  * Jarek Jarcec Cecho - ASF Member, Cloudera
>  * Chris Mattmann - ASF Member, NASA JPL and USC
>  * Julien Le Dem - Incubator PMC, Dremio
>  * Carl Steinbach - ASF Member, LinkedIn
>
> === Sponsoring Entity ===
>
> The Apache Incubator
>

Re: [VOTE] Accept Kudu into the Apache Incubator

Posted by Andrew Purtell <ap...@apache.org>.

+1 (binding)

Good luck!


On Tue, Nov 24, 2015 at 11:32 AM, Todd Lipcon <to...@apache.org> wrote:

> Hi all,
>
> Discussion on the [DISCUSS] thread seems to have wound down, so I'd like to
> call a VOTE on acceptance of Kudu into the ASF Incubator. The proposal is
> pasted below and also available on the wiki at:
> https://wiki.apache.org/incubator/KuduProposal
>
> The proposal is unchanged since the original version, except for the
> addition of Carl Steinbach as a Mentor.
>
> Please cast your votes:
>
> [] +1, accept Kudu into the Incubator
> [] +/-0, positive/negative non-counted expression of feelings
> [] -1, do not accept Kudu into the incubator (please state reasoning)
>
> Given the US holiday this week, I imagine many folks are traveling or
> otherwise offline. So, let's run the vote for a full week rather than the
> traditional 72 hours. Unless the IPMC objects to the extended voting
> period, the vote will close on Tues, Dec 1st at noon PST.
>
> Thanks
> -Todd
> -----
>
> = Kudu Proposal =
>
> == Abstract ==
>
> Kudu is a distributed columnar storage engine built for the Apache Hadoop
> ecosystem.
>
> == Proposal ==
>
> Kudu is an open source storage engine for structured data which supports
> low-latency random access together with efficient analytical access
> patterns. Kudu distributes data using horizontal partitioning and
> replicates each partition using Raft consensus, providing low
> mean-time-to-recovery and low tail latencies. Kudu is designed within the
> context of the Apache Hadoop ecosystem and supports many integrations with
> other data analytics projects both inside and outside of the Apache
> Software Foundation.
>
>
>
> We propose to incubate Kudu as a project of the Apache Software Foundation.
>
> == Background ==
>
> In recent years, explosive growth in the amount of data being generated and
> captured by enterprises has resulted in the rapid adoption of open source
> technology which is able to store massive data sets at scale and at low
> cost. In particular, the Apache Hadoop ecosystem has become a focal point
> for such “big data” workloads, because many traditional open source
> database systems have lagged in offering a scalable alternative.
>
>
>
> Structured storage in the Hadoop ecosystem has typically been achieved in
> two ways: for static data sets, data is typically stored on Apache HDFS
> using binary data formats such as Apache Avro or Apache Parquet. However,
> neither HDFS nor these formats has any provision for updating individual
> records, or for efficient random access. Mutable data sets are typically
> stored in semi-structured stores such as Apache HBase or Apache Cassandra.
> These systems allow for low-latency record-level reads and writes, but lag
> far behind the static file formats in terms of sequential read throughput
> for applications such as SQL-based analytics or machine learning.
>
>
>
> Kudu is a new storage system designed and implemented from the ground up to
> fill this gap between high-throughput sequential-access storage systems
> such as HDFS and low-latency random-access systems such as HBase or
> Cassandra. While these existing systems continue to hold advantages in some
> situations, Kudu offers a “happy medium” alternative that can dramatically
> simplify the architecture of many common workloads. In particular, Kudu
> offers a simple API for row-level inserts, updates, and deletes, while
> providing table scans at throughputs similar to Parquet, a commonly-used
> columnar format for static data.
>
>
>
> More information on Kudu can be found at the existing open source project
> website: http://getkudu.io and in particular in the Kudu white-paper PDF:
> http://getkudu.io/kudu.pdf from which the above was excerpted.
>
> == Rationale ==
>
> As described above, Kudu fills an important gap in the open source storage
> ecosystem. After our initial open source project release in September 2015,
> we have seen a great amount of interest across a diverse set of users and
> companies. We believe that, as a storage system, it is critical to build an
> equally diverse set of contributors in the development community. Our
> experiences as committers and PMC members on other Apache projects have
> taught us the value of diverse communities in ensuring both longevity and
> high quality for such foundational systems.
>
> == Initial Goals ==
>
>  * Move the existing codebase, website, documentation, and mailing lists to
> Apache-hosted infrastructure
>  * Work with the infrastructure team to implement and approve our code
> review, build, and testing workflows in the context of the ASF
>  * Incremental development and releases per Apache guidelines
>
> == Current Status ==
>
> ==== Releases ====
>
> Kudu has undergone one public release, tagged here
> https://github.com/cloudera/kudu/tree/kudu0.5.0-release
>
> This initial release was not performed in the typical ASF fashion -- no
> source tarball was released, but rather only convenience binaries made
> available in Cloudera’s repositories. We will adopt the ASF source release
> process upon joining the incubator.
>
>
> ==== Source ====
>
> Kudu’s source is currently hosted on GitHub at
> https://github.com/cloudera/kudu
>
> This repository will be transitioned to Apache’s git hosting during
> incubation.
>
>
>
> ==== Code review ====
>
> Kudu’s code reviews are currently public and hosted on Gerrit at
> http://gerrit.cloudera.org:8080/#/q/status:open+project:kudu
>
> The Kudu developer community is very happy with gerrit and hopes to work
> with the Apache Infrastructure team to figure out how we can continue to
> use Gerrit within ASF policies.
>
>
>
> ==== Issue tracking ====
>
> Kudu’s bug and feature tracking is hosted on JIRA at:
> https://issues.cloudera.org/projects/KUDU/summary
>
> This JIRA instance contains bugs and development discussion dating back 2
> years prior to Kudu’s open source release and will provide an initial seed
> for the ASF JIRA.
>
>
>
> ==== Community discussion ====
>
> Kudu has several public discussion forums, linked here:
> http://getkudu.io/community.html
>
>
>
> ==== Build Infrastructure ====
>
> The Kudu Gerrit instance is configured to only allow patches to be
> committed after running them through an extensive set of pre-commit tests
> and code lints. The project currently makes use of elastic public cloud
> resources to perform these tests. Until this point, these resources have
> been internal to Cloudera, though we are currently investing in moving to a
> publicly accessible infrastructure.
>
>
>
> ==== Development practices ====
>
> Given that Kudu is a persistent storage engine, the community has a high
> quality bar for contributions to its core. We have a firm belief that high
> quality is achieved through automation, not manual inspection, and hence
> put a focus on thorough testing and build infrastructure to ensure that
> bar. The development community also practices review-then-commit for all
> changes to ensure that changes are accompanied by appropriate tests, are
> well commented, etc.
>
> Rather than seeing these practices as barriers to contribution, we believe
> that a fully automated and standardized review and testing practice makes
> it easier for new contributors to have patches accepted. Any new developer
> may post a patch to Gerrit using the same workflow as a seasoned
> contributor, and the same suite of tests will be automatically run. If the
> tests pass, a committer can quickly review and commit the contribution from
> their web browser.
>
> === Meritocracy ===
>
> We believe strongly in meritocracy in electing committers and PMC members.
> We believe that contributions can come in forms other than just code: for
> example, one of our initial proposed committers has contributed solely in
> the area of project documentation. We will encourage contributions and
> participation of all types, and ensure that contributors are appropriately
> recognized.
>
> === Community ===
>
> Though Kudu is relatively new as an open source project, it has already
> seen promising growth in its community across several organizations:
>
>  * '''Cloudera''' is the original development sponsor for Kudu.
>  * '''Xiaomi''' has been helping to develop and optimize Kudu for a new
> production use case, contributing code, benchmarks, feedback, and
> conference talks.
>  * '''Intel''' has contributed optimizations related to their hardware
> technologies.
>  * '''Dropbox''' has been experimenting with Kudu for a machine monitoring
> use case, and has been contributing bug reports and product feedback.
>  * '''Dremio''' is working on integration with Apache Drill and exploring
> using Kudu in a production use case.
>  * Several community-built Docker images, tutorials, and blog posts have
> sprouted up since Kudu’s release.
>
>
>
> By bringing Kudu to Apache, we hope to encourage further contribution from
> the above organizations as well as to engage new users and contributors in
> the community.
>
> === Core Developers ===
>
> Kudu was initially developed as a project at Cloudera. Most of the
> contributions to date have been by developers employed by Cloudera.
>
>
>
> Many of the developers are committers or PMC members on other Apache
> projects.
>
> === Alignment ===
>
> As a project in the big data ecosystem, Kudu is aligned with several other
> ASF projects. Kudu includes input/output format integration with Apache
> Hadoop, and this integration can also provide a bridge to Apache Spark. We
> are planning to integrate with Apache Hive in the near future. We also
> integrate closely with Cloudera Impala, which is also currently being
> proposed for incubation. We have also scheduled a hackathon with the Apache
> Drill team to work on integration with that query engine.
>
> == Known Risks ==
>
> === Orphaned Products ===
>
> The risk of Kudu being abandoned is low. Cloudera has invested a great deal
> in the initial development of the project, and intends to grow its
> investment over time as Kudu becomes a product adopted by its customer
> base. Several other organizations are also experimenting with Kudu for
> production use cases which would live for many years.
>
> === Inexperience with Open Source ===
>
> Kudu has been released in the open for less than two months. However, from
> our very first public announcement we have been committed to open-source
> style development:
>
>  * our code reviews are fully public and documented on a mailing list
>  * our daily development chatter is in a public chat room
>  * we send out weekly “community status” reports highlighting news and
> contributions
>  * we published our entire JIRA history and discuss bugs in the open
>  * we published our entire Git commit history, going back three years (no
> squashing)
>
>
>
> Several of the initial committers are experienced open source developers,
> several being committers and/or PMC members on other ASF projects (Hadoop,
> HBase, Thrift, Flume, et al). Those who are not ASF committers have
> experience on non-ASF open source projects (Kiji, open-vm-tools, et al).
>
> === Homogenous Developers ===
>
> The initial committers are employees or former employees of Cloudera.
> However, the committers are spread across multiple offices (Palo Alto, San
> Francisco, Melbourne), so the team is familiar with working in a
> distributed environment across varied time zones.
>
>
>
> The project has received some contributions from developers outside of
> Cloudera, and is starting to attract a ''user'' community as well. We hope
> to continue to encourage contributions from these developers and community
> members and grow them into committers after they have had time to continue
> their contributions.
>
> === Reliance on Salaried Developers ===
>
> As mentioned above, the majority of development up to this point has been
> sponsored by Cloudera. We have seen several community users participate in
> discussions who are hobbyists interested in distributed systems and
> databases, and hope that they will continue their participation in the
> project going forward.
>
> === Relationships with Other Apache Products ===
>
> Kudu is currently related to the following other Apache projects:
>
>  * Hadoop: Kudu provides MapReduce input/output formats for integration
>  * Spark: Kudu integrates with Spark via the above-mentioned input formats,
> and work is progressing on support for Spark Data Frames and Spark SQL.
>
>
>
> The Kudu team has reached out to several other Apache projects to start
> discussing integrations, including Flume, Kafka, Hive, and Drill.
>
>
>
> Kudu integrates with Impala, which is also being proposed for incubation.
>
>
>
> Kudu is already collaborating on ValueVector, a proposed TLP spinning out
> from the Apache Drill community.
>
>
>
> We look forward to continuing to integrate and collaborate with these
> communities.
>
> === An Excessive Fascination with the Apache Brand ===
>
> Many of the initial committers are already experienced Apache committers,
> and understand the true value provided by the Apache Way and the principles
> of the ASF. We believe that this development and contribution model is
> especially appropriate for storage products, where Apache’s
> community-over-code philosophy ensures long term viability and
> consensus-based participation.
>
> == Documentation ==
>
>  * Documentation is written in AsciiDoc and committed in the Kudu source
> repository:
>
>  * https://github.com/cloudera/kudu/tree/master/docs
>
>
>
>  * The Kudu web site is version-controlled on the ‘gh-pages’ branch of the
> above repository.
>
>  * A LaTeX whitepaper is also published, and the source is available within
> the same repository.
>  * APIs are documented within the source code as JavaDoc or C++-style
> documentation comments.
>  * Many design documents are stored within the source code repository as
> text files next to the code being documented.
>
> == Source and Intellectual Property Submission Plan ==
>
> The Kudu codebase and web site is currently hosted on GitHub and will be
> transitioned to the ASF repositories during incubation. Kudu is already
> licensed under the Apache 2.0 license.
>
>
>
> Some portions of the code are imported from other open source projects
> under the Apache 2.0, BSD, or MIT licenses, with copyrights held by authors
> other than the initial committers. These copyright notices are maintained
> in those files as well as a top-level NOTICE.txt file. We believe this to
> be permissible under the license terms and ASF policies, and confirmed via
> a recent thread on general@incubator.apache.org .
>
>
>
> The “Kudu” name is not a registered trademark, though before the initial
> release of the project, we performed a trademark search and Cloudera’s
> legal counsel deemed it acceptable in the context of a data storage engine.
> There exists an unrelated open source project by the same name related to
> deployments on Microsoft’s Azure cloud service. We have been in contact
> with legal counsel from Microsoft and have obtained their approval for the
> use of the Kudu name.
>
>
>
> Cloudera currently owns several domain names related to Kudu (getkudu.io,
> kududb.io, et al) which will be transferred to the ASF and redirected to
> the official page during incubation.
>
>
>
> Portions of Kudu are protected by pending or published patents owned by
> Cloudera. Given the protections already granted by the Apache License, we
> do not anticipate any explicit licensing or transfer of this intellectual
> property.
>
> == External Dependencies ==
>
> The full set of dependencies and licenses are listed in
> https://github.com/cloudera/kudu/blob/master/LICENSE.txt
>
> and summarized here:
>
>  * '''Twitter Bootstrap''': Apache 2.0
>  * '''d3''': BSD 3-clause
>  * '''epoch JS library''': MIT
>  * '''lz4''': BSD 2-clause
>  * '''gflags''': BSD 3-clause
>  * '''glog''': BSD 3-clause
>  * '''gperftools''': BSD 3-clause
>  * '''libev''': BSD 2-clause
>  * '''squeasel''':MIT license
>  * '''protobuf''': BSD 3-clause
>  * '''rapidjson''': MIT
>  * '''snappy''': BSD 3-clause
>  * '''trace-viewer''': BSD 3-clause
>  * '''zlib''': zlib license
>  * '''llvm''': University of Illinois/NCSA Open Source (BSD-alike)
>  * '''bitshuffle''': MIT
>  * '''boost''': Boost license
>  * '''curl''': MIT
>  * '''libunwind''': MIT
>  * '''nvml''': BSD 3-clause
>  * '''cyrus-sasl''': Cyrus SASL license (BSD-alike)
>  * '''openssl''': OpenSSL License (BSD-alike)
>
>  * '''Guava''': Apache 2.0
>  * '''StumbleUpon Async''': BSD
>  * '''Apache Hadoop''': Apache 2.0
>  * '''Apache log4j''': Apache 2.0
>  * '''Netty''': Apache 2.0
>  * '''slf4j''': MIT
>  * '''Apache Commons''': Apache 2.0
>  * '''murmur''': Apache 2.0
>
>
> '''Build/test-only dependencies''':
>
>  * '''CMake''': BSD 3-clause
>  * '''gcovr''': BSD 3-clause
>  * '''gmock''': BSD 3-clause
>  * '''Apache Maven''': Apache 2.0
>  * '''JUnit''': EPL
>  * '''Mockito''': MIT
>
> == Cryptography ==
>
> Kudu does not currently include any cryptography-related code.
>
> == Required Resources ==
>
> === Mailing lists ===
>
>  * private@kudu.incubator.apache.org (PMC)
>  * commits@kudu.incubator.apache.org (git push emails)
>  * issues@kudu.incubator.apache.org (JIRA issue feed)
>  * dev@kudu.incubator.apache.org (Gerrit code reviews plus dev discussion)
>  * user@kudu.incubator.apache.org (User questions)
>
>
> === Repository ===
>
>  * git://git.apache.org/kudu
>
> === Gerrit ===
>
> We hope to continue using Gerrit for our code review and commit workflow.
> The Kudu team has already been in contact with Jake Farrell to start
> discussions on how Gerrit can fit into the ASF. We know that several other
> ASF projects and podlings are also interested in Gerrit.
>
>
>
> If the Infrastructure team does not have the bandwidth to support Gerrit,
> we will continue to support our own instance of Gerrit for Kudu, and make
> the necessary integrations such that commits are properly authenticated and
> maintain sufficient provenance to uphold the ASF standards (e.g. via the
> solution adopted by the AsterixDB podling).
>
> == Issue Tracking ==
>
> We would like to import our current JIRA project into the ASF JIRA, such
> that our historical commit messages and code comments continue to reference
> the appropriate bug numbers.
>
> == Initial Committers ==
>
>  * Adar Dembo adar@cloudera.com
>  * Alex Feinberg alex@strlen.net
>  * Andrew Wang wang@apache.org
>  * Dan Burkert dan@cloudera.com
>  * David Alves dralves@apache.org
>  * Jean-Daniel Cryans jdcryans@apache.org
>  * Mike Percy mpercy@apache.org
>  * Misty Stanley-Jones misty@apache.org
>  * Todd Lipcon todd@apache.org
>
> The initial list of committers was seeded by listing those contributors who
> have contributed 20 or more patches in the last 12 months, indicating that
> they are active and have achieved merit through participation on the
> project. We chose not to include other contributors who either have not yet
> contributed a significant number of patches, or whose contributions are far
> in the past and we don’t expect to be active within the ASF.
>
> == Affiliations ==
>
>  * Adar Dembo - Cloudera
>  * Alex Feinberg - Forward Networks
>  * Andrew Wang - Cloudera
>  * Dan Burkert - Cloudera
>  * David Alves - Cloudera
>  * Jean-Daniel Cryans - Cloudera
>  * Mike Percy - Cloudera
>  * Misty Stanley-Jones - Cloudera
>  * Todd Lipcon - Cloudera
>
> == Sponsors ==
>
> === Champion ===
>
>  * Todd Lipcon
>
> === Nominated Mentors ===
>
>  * Jake Farrell - ASF Member and Infra team member, Acquia
>  * Brock Noland - ASF Member, StreamSets
>  * Michael Stack - ASF Member, Cloudera
>  * Jarek Jarcec Cecho - ASF Member, Cloudera
>  * Chris Mattmann - ASF Member, NASA JPL and USC
>  * Julien Le Dem - Incubator PMC, Dremio
>  * Carl Steinbach - ASF Member, LinkedIn
>
> === Sponsoring Entity ===
>
> The Apache Incubator
>



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: [VOTE] Accept Kudu into the Apache Incubator

Posted by Brock Noland <br...@apache.org>.

+1 (binding)

On Tuesday, November 24, 2015, Carl Steinbach <cw...@apache.org> wrote:

> +1 (binding)
>
>
> On Tue, Nov 24, 2015 at 5:39 PM, John D. Ament <johndament@apache.org
> <javascript:;>>
> wrote:
>
> > +1
> > On Nov 24, 2015 14:33, "Todd Lipcon" <todd@apache.org <javascript:;>>
> wrote:
> >
> > > Hi all,
> > >
> > > Discussion on the [DISCUSS] thread seems to have wound down, so I'd
> like
> > to
> > > call a VOTE on acceptance of Kudu into the ASF Incubator. The proposal
> is
> > > pasted below and also available on the wiki at:
> > > https://wiki.apache.org/incubator/KuduProposal
> > >
> > > The proposal is unchanged since the original version, except for the
> > > addition of Carl Steinbach as a Mentor.
> > >
> > > Please cast your votes:
> > >
> > > [] +1, accept Kudu into the Incubator
> > > [] +/-0, positive/negative non-counted expression of feelings
> > > [] -1, do not accept Kudu into the incubator (please state reasoning)
> > >
> > > Given the US holiday this week, I imagine many folks are traveling or
> > > otherwise offline. So, let's run the vote for a full week rather than
> the
> > > traditional 72 hours. Unless the IPMC objects to the extended voting
> > > period, the vote will close on Tues, Dec 1st at noon PST.
> > >
> > > Thanks
> > > -Todd
> > > -----
> > >
> > > = Kudu Proposal =
> > >
> > > == Abstract ==
> > >
> > > Kudu is a distributed columnar storage engine built for the Apache
> Hadoop
> > > ecosystem.
> > >
> > > == Proposal ==
> > >
> > > Kudu is an open source storage engine for structured data which
> supports
> > > low-latency random access together with efficient analytical access
> > > patterns. Kudu distributes data using horizontal partitioning and
> > > replicates each partition using Raft consensus, providing low
> > > mean-time-to-recovery and low tail latencies. Kudu is designed within
> the
> > > context of the Apache Hadoop ecosystem and supports many integrations
> > with
> > > other data analytics projects both inside and outside of the Apache
> > > Software Foundation.
> > >
> > >
> > >
> > > We propose to incubate Kudu as a project of the Apache Software
> > Foundation.
> > >
> > > == Background ==
> > >
> > > In recent years, explosive growth in the amount of data being generated
> > and
> > > captured by enterprises has resulted in the rapid adoption of open
> source
> > > technology which is able to store massive data sets at scale and at low
> > > cost. In particular, the Apache Hadoop ecosystem has become a focal
> point
> > > for such “big data” workloads, because many traditional open source
> > > database systems have lagged in offering a scalable alternative.
> > >
> > >
> > >
> > > Structured storage in the Hadoop ecosystem has typically been achieved
> in
> > > two ways: for static data sets, data is typically stored on Apache HDFS
> > > using binary data formats such as Apache Avro or Apache Parquet.
> However,
> > > neither HDFS nor these formats has any provision for updating
> individual
> > > records, or for efficient random access. Mutable data sets are
> typically
> > > stored in semi-structured stores such as Apache HBase or Apache
> > Cassandra.
> > > These systems allow for low-latency record-level reads and writes, but
> > lag
> > > far behind the static file formats in terms of sequential read
> throughput
> > > for applications such as SQL-based analytics or machine learning.
> > >
> > >
> > >
> > > Kudu is a new storage system designed and implemented from the ground
> up
> > to
> > > fill this gap between high-throughput sequential-access storage systems
> > > such as HDFS and low-latency random-access systems such as HBase or
> > > Cassandra. While these existing systems continue to hold advantages in
> > some
> > > situations, Kudu offers a “happy medium” alternative that can
> > dramatically
> > > simplify the architecture of many common workloads. In particular, Kudu
> > > offers a simple API for row-level inserts, updates, and deletes, while
> > > providing table scans at throughputs similar to Parquet, a
> commonly-used
> > > columnar format for static data.
> > >
> > >
> > >
> > > More information on Kudu can be found at the existing open source
> project
> > > website: http://getkudu.io and in particular in the Kudu white-paper
> > PDF:
> > > http://getkudu.io/kudu.pdf from which the above was excerpted.
> > >
> > > == Rationale ==
> > >
> > > As described above, Kudu fills an important gap in the open source
> > storage
> > > ecosystem. After our initial open source project release in September
> > 2015,
> > > we have seen a great amount of interest across a diverse set of users
> and
> > > companies. We believe that, as a storage system, it is critical to
> build
> > an
> > > equally diverse set of contributors in the development community. Our
> > > experiences as committers and PMC members on other Apache projects have
> > > taught us the value of diverse communities in ensuring both longevity
> and
> > > high quality for such foundational systems.
> > >
> > > == Initial Goals ==
> > >
> > >  * Move the existing codebase, website, documentation, and mailing
> lists
> > to
> > > Apache-hosted infrastructure
> > >  * Work with the infrastructure team to implement and approve our code
> > > review, build, and testing workflows in the context of the ASF
> > >  * Incremental development and releases per Apache guidelines
> > >
> > > == Current Status ==
> > >
> > > ==== Releases ====
> > >
> > > Kudu has undergone one public release, tagged here
> > > https://github.com/cloudera/kudu/tree/kudu0.5.0-release
> > >
> > > This initial release was not performed in the typical ASF fashion -- no
> > > source tarball was released, but rather only convenience binaries made
> > > available in Cloudera’s repositories. We will adopt the ASF source
> > release
> > > process upon joining the incubator.
> > >
> > >
> > > ==== Source ====
> > >
> > > Kudu’s source is currently hosted on GitHub at
> > > https://github.com/cloudera/kudu
> > >
> > > This repository will be transitioned to Apache’s git hosting during
> > > incubation.
> > >
> > >
> > >
> > > ==== Code review ====
> > >
> > > Kudu’s code reviews are currently public and hosted on Gerrit at
> > > http://gerrit.cloudera.org:8080/#/q/status:open+project:kudu
> > >
> > > The Kudu developer community is very happy with gerrit and hopes to
> work
> > > with the Apache Infrastructure team to figure out how we can continue
> to
> > > use Gerrit within ASF policies.
> > >
> > >
> > >
> > > ==== Issue tracking ====
> > >
> > > Kudu’s bug and feature tracking is hosted on JIRA at:
> > > https://issues.cloudera.org/projects/KUDU/summary
> > >
> > > This JIRA instance contains bugs and development discussion dating
> back 2
> > > years prior to Kudu’s open source release and will provide an initial
> > seed
> > > for the ASF JIRA.
> > >
> > >
> > >
> > > ==== Community discussion ====
> > >
> > > Kudu has several public discussion forums, linked here:
> > > http://getkudu.io/community.html
> > >
> > >
> > >
> > > ==== Build Infrastructure ====
> > >
> > > The Kudu Gerrit instance is configured to only allow patches to be
> > > committed after running them through an extensive set of pre-commit
> tests
> > > and code lints. The project currently makes use of elastic public cloud
> > > resources to perform these tests. Until this point, these resources
> have
> > > been internal to Cloudera, though we are currently investing in moving
> > to a
> > > publicly accessible infrastructure.
> > >
> > >
> > >
> > > ==== Development practices ====
> > >
> > > Given that Kudu is a persistent storage engine, the community has a
> high
> > > quality bar for contributions to its core. We have a firm belief that
> > high
> > > quality is achieved through automation, not manual inspection, and
> hence
> > > put a focus on thorough testing and build infrastructure to ensure that
> > > bar. The development community also practices review-then-commit for
> all
> > > changes to ensure that changes are accompanied by appropriate tests,
> are
> > > well commented, etc.
> > >
> > > Rather than seeing these practices as barriers to contribution, we
> > believe
> > > that a fully automated and standardized review and testing practice
> makes
> > > it easier for new contributors to have patches accepted. Any new
> > developer
> > > may post a patch to Gerrit using the same workflow as a seasoned
> > > contributor, and the same suite of tests will be automatically run. If
> > the
> > > tests pass, a committer can quickly review and commit the contribution
> > from
> > > their web browser.
> > >
> > > === Meritocracy ===
> > >
> > > We believe strongly in meritocracy in electing committers and PMC
> > members.
> > > We believe that contributions can come in forms other than just code:
> for
> > > example, one of our initial proposed committers has contributed solely
> in
> > > the area of project documentation. We will encourage contributions and
> > > participation of all types, and ensure that contributors are
> > appropriately
> > > recognized.
> > >
> > > === Community ===
> > >
> > > Though Kudu is relatively new as an open source project, it has already
> > > seen promising growth in its community across several organizations:
> > >
> > >  * '''Cloudera''' is the original development sponsor for Kudu.
> > >  * '''Xiaomi''' has been helping to develop and optimize Kudu for a new
> > > production use case, contributing code, benchmarks, feedback, and
> > > conference talks.
> > >  * '''Intel''' has contributed optimizations related to their hardware
> > > technologies.
> > >  * '''Dropbox''' has been experimenting with Kudu for a machine
> > monitoring
> > > use case, and has been contributing bug reports and product feedback.
> > >  * '''Dremio''' is working on integration with Apache Drill and
> exploring
> > > using Kudu in a production use case.
> > >  * Several community-built Docker images, tutorials, and blog posts
> have
> > > sprouted up since Kudu’s release.
> > >
> > >
> > >
> > > By bringing Kudu to Apache, we hope to encourage further contribution
> > from
> > > the above organizations as well as to engage new users and contributors
> > in
> > > the community.
> > >
> > > === Core Developers ===
> > >
> > > Kudu was initially developed as a project at Cloudera. Most of the
> > > contributions to date have been by developers employed by Cloudera.
> > >
> > >
> > >
> > > Many of the developers are committers or PMC members on other Apache
> > > projects.
> > >
> > > === Alignment ===
> > >
> > > As a project in the big data ecosystem, Kudu is aligned with several
> > other
> > > ASF projects. Kudu includes input/output format integration with Apache
> > > Hadoop, and this integration can also provide a bridge to Apache Spark.
> > We
> > > are planning to integrate with Apache Hive in the near future. We also
> > > integrate closely with Cloudera Impala, which is also currently being
> > > proposed for incubation. We have also scheduled a hackathon with the
> > Apache
> > > Drill team to work on integration with that query engine.
> > >
> > > == Known Risks ==
> > >
> > > === Orphaned Products ===
> > >
> > > The risk of Kudu being abandoned is low. Cloudera has invested a great
> > deal
> > > in the initial development of the project, and intends to grow its
> > > investment over time as Kudu becomes a product adopted by its customer
> > > base. Several other organizations are also experimenting with Kudu for
> > > production use cases which would live for many years.
> > >
> > > === Inexperience with Open Source ===
> > >
> > > Kudu has been released in the open for less than two months. However,
> > from
> > > our very first public announcement we have been committed to
> open-source
> > > style development:
> > >
> > >  * our code reviews are fully public and documented on a mailing list
> > >  * our daily development chatter is in a public chat room
> > >  * we send out weekly “community status” reports highlighting news and
> > > contributions
> > >  * we published our entire JIRA history and discuss bugs in the open
> > >  * we published our entire Git commit history, going back three years
> (no
> > > squashing)
> > >
> > >
> > >
> > > Several of the initial committers are experienced open source
> developers,
> > > several being committers and/or PMC members on other ASF projects
> > (Hadoop,
> > > HBase, Thrift, Flume, et al). Those who are not ASF committers have
> > > experience on non-ASF open source projects (Kiji, open-vm-tools, et
> al).
> > >
> > > === Homogenous Developers ===
> > >
> > > The initial committers are employees or former employees of Cloudera.
> > > However, the committers are spread across multiple offices (Palo Alto,
> > San
> > > Francisco, Melbourne), so the team is familiar with working in a
> > > distributed environment across varied time zones.
> > >
> > >
> > >
> > > The project has received some contributions from developers outside of
> > > Cloudera, and is starting to attract a ''user'' community as well. We
> > hope
> > > to continue to encourage contributions from these developers and
> > community
> > > members and grow them into committers after they have had time to
> > continue
> > > their contributions.
> > >
> > > === Reliance on Salaried Developers ===
> > >
> > > As mentioned above, the majority of development up to this point has
> been
> > > sponsored by Cloudera. We have seen several community users participate
> > in
> > > discussions who are hobbyists interested in distributed systems and
> > > databases, and hope that they will continue their participation in the
> > > project going forward.
> > >
> > > === Relationships with Other Apache Products ===
> > >
> > > Kudu is currently related to the following other Apache projects:
> > >
> > >  * Hadoop: Kudu provides MapReduce input/output formats for integration
> > >  * Spark: Kudu integrates with Spark via the above-mentioned input
> > formats,
> > > and work is progressing on support for Spark Data Frames and Spark SQL.
> > >
> > >
> > >
> > > The Kudu team has reached out to several other Apache projects to start
> > > discussing integrations, including Flume, Kafka, Hive, and Drill.
> > >
> > >
> > >
> > > Kudu integrates with Impala, which is also being proposed for
> incubation.
> > >
> > >
> > >
> > > Kudu is already collaborating on ValueVector, a proposed TLP spinning
> out
> > > from the Apache Drill community.
> > >
> > >
> > >
> > > We look forward to continuing to integrate and collaborate with these
> > > communities.
> > >
> > > === An Excessive Fascination with the Apache Brand ===
> > >
> > > Many of the initial committers are already experienced Apache
> committers,
> > > and understand the true value provided by the Apache Way and the
> > principles
> > > of the ASF. We believe that this development and contribution model is
> > > especially appropriate for storage products, where Apache’s
> > > community-over-code philosophy ensures long term viability and
> > > consensus-based participation.
> > >
> > > == Documentation ==
> > >
> > >  * Documentation is written in AsciiDoc and committed in the Kudu
> source
> > > repository:
> > >
> > >  * https://github.com/cloudera/kudu/tree/master/docs
> > >
> > >
> > >
> > >  * The Kudu web site is version-controlled on the ‘gh-pages’ branch of
> > the
> > > above repository.
> > >
> > >  * A LaTeX whitepaper is also published, and the source is available
> > within
> > > the same repository.
> > >  * APIs are documented within the source code as JavaDoc or C++-style
> > > documentation comments.
> > >  * Many design documents are stored within the source code repository
> as
> > > text files next to the code being documented.
> > >
> > > == Source and Intellectual Property Submission Plan ==
> > >
> > > The Kudu codebase and web site is currently hosted on GitHub and will
> be
> > > transitioned to the ASF repositories during incubation. Kudu is already
> > > licensed under the Apache 2.0 license.
> > >
> > >
> > >
> > > Some portions of the code are imported from other open source projects
> > > under the Apache 2.0, BSD, or MIT licenses, with copyrights held by
> > authors
> > > other than the initial committers. These copyright notices are
> maintained
> > > in those files as well as a top-level NOTICE.txt file. We believe this
> to
> > > be permissible under the license terms and ASF policies, and confirmed
> > via
> > > a recent thread on general@incubator.apache.org <javascript:;> .
> > >
> > >
> > >
> > > The “Kudu” name is not a registered trademark, though before the
> initial
> > > release of the project, we performed a trademark search and Cloudera’s
> > > legal counsel deemed it acceptable in the context of a data storage
> > engine.
> > > There exists an unrelated open source project by the same name related
> to
> > > deployments on Microsoft’s Azure cloud service. We have been in contact
> > > with legal counsel from Microsoft and have obtained their approval for
> > the
> > > use of the Kudu name.
> > >
> > >
> > >
> > > Cloudera currently owns several domain names related to Kudu (
> getkudu.io
> > ,
> > > kududb.io, et al) which will be transferred to the ASF and redirected
> to
> > > the official page during incubation.
> > >
> > >
> > >
> > > Portions of Kudu are protected by pending or published patents owned by
> > > Cloudera. Given the protections already granted by the Apache License,
> we
> > > do not anticipate any explicit licensing or transfer of this
> intellectual
> > > property.
> > >
> > > == External Dependencies ==
> > >
> > > The full set of dependencies and licenses are listed in
> > > https://github.com/cloudera/kudu/blob/master/LICENSE.txt
> > >
> > > and summarized here:
> > >
> > >  * '''Twitter Bootstrap''': Apache 2.0
> > >  * '''d3''': BSD 3-clause
> > >  * '''epoch JS library''': MIT
> > >  * '''lz4''': BSD 2-clause
> > >  * '''gflags''': BSD 3-clause
> > >  * '''glog''': BSD 3-clause
> > >  * '''gperftools''': BSD 3-clause
> > >  * '''libev''': BSD 2-clause
> > >  * '''squeasel''':MIT license
> > >  * '''protobuf''': BSD 3-clause
> > >  * '''rapidjson''': MIT
> > >  * '''snappy''': BSD 3-clause
> > >  * '''trace-viewer''': BSD 3-clause
> > >  * '''zlib''': zlib license
> > >  * '''llvm''': University of Illinois/NCSA Open Source (BSD-alike)
> > >  * '''bitshuffle''': MIT
> > >  * '''boost''': Boost license
> > >  * '''curl''': MIT
> > >  * '''libunwind''': MIT
> > >  * '''nvml''': BSD 3-clause
> > >  * '''cyrus-sasl''': Cyrus SASL license (BSD-alike)
> > >  * '''openssl''': OpenSSL License (BSD-alike)
> > >
> > >  * '''Guava''': Apache 2.0
> > >  * '''StumbleUpon Async''': BSD
> > >  * '''Apache Hadoop''': Apache 2.0
> > >  * '''Apache log4j''': Apache 2.0
> > >  * '''Netty''': Apache 2.0
> > >  * '''slf4j''': MIT
> > >  * '''Apache Commons''': Apache 2.0
> > >  * '''murmur''': Apache 2.0
> > >
> > >
> > > '''Build/test-only dependencies''':
> > >
> > >  * '''CMake''': BSD 3-clause
> > >  * '''gcovr''': BSD 3-clause
> > >  * '''gmock''': BSD 3-clause
> > >  * '''Apache Maven''': Apache 2.0
> > >  * '''JUnit''': EPL
> > >  * '''Mockito''': MIT
> > >
> > > == Cryptography ==
> > >
> > > Kudu does not currently include any cryptography-related code.
> > >
> > > == Required Resources ==
> > >
> > > === Mailing lists ===
> > >
> > >  * private@kudu.incubator.apache.org <javascript:;> (PMC)
> > >  * commits@kudu.incubator.apache.org <javascript:;> (git push emails)
> > >  * issues@kudu.incubator.apache.org <javascript:;> (JIRA issue feed)
> > >  * dev@kudu.incubator.apache.org <javascript:;> (Gerrit code reviews
> plus dev
> > discussion)
> > >  * user@kudu.incubator.apache.org <javascript:;> (User questions)
> > >
> > >
> > > === Repository ===
> > >
> > >  * git://git.apache.org/kudu
> > >
> > > === Gerrit ===
> > >
> > > We hope to continue using Gerrit for our code review and commit
> workflow.
> > > The Kudu team has already been in contact with Jake Farrell to start
> > > discussions on how Gerrit can fit into the ASF. We know that several
> > other
> > > ASF projects and podlings are also interested in Gerrit.
> > >
> > >
> > >
> > > If the Infrastructure team does not have the bandwidth to support
> Gerrit,
> > > we will continue to support our own instance of Gerrit for Kudu, and
> make
> > > the necessary integrations such that commits are properly authenticated
> > and
> > > maintain sufficient provenance to uphold the ASF standards (e.g. via
> the
> > > solution adopted by the AsterixDB podling).
> > >
> > > == Issue Tracking ==
> > >
> > > We would like to import our current JIRA project into the ASF JIRA,
> such
> > > that our historical commit messages and code comments continue to
> > reference
> > > the appropriate bug numbers.
> > >
> > > == Initial Committers ==
> > >
> > >  * Adar Dembo adar@cloudera.com <javascript:;>
> > >  * Alex Feinberg alex@strlen.net <javascript:;>
> > >  * Andrew Wang wang@apache.org <javascript:;>
> > >  * Dan Burkert dan@cloudera.com <javascript:;>
> > >  * David Alves dralves@apache.org <javascript:;>
> > >  * Jean-Daniel Cryans jdcryans@apache.org <javascript:;>
> > >  * Mike Percy mpercy@apache.org <javascript:;>
> > >  * Misty Stanley-Jones misty@apache.org <javascript:;>
> > >  * Todd Lipcon todd@apache.org <javascript:;>
> > >
> > > The initial list of committers was seeded by listing those contributors
> > who
> > > have contributed 20 or more patches in the last 12 months, indicating
> > that
> > > they are active and have achieved merit through participation on the
> > > project. We chose not to include other contributors who either have not
> > yet
> > > contributed a significant number of patches, or whose contributions are
> > far
> > > in the past and we don’t expect to be active within the ASF.
> > >
> > > == Affiliations ==
> > >
> > >  * Adar Dembo - Cloudera
> > >  * Alex Feinberg - Forward Networks
> > >  * Andrew Wang - Cloudera
> > >  * Dan Burkert - Cloudera
> > >  * David Alves - Cloudera
> > >  * Jean-Daniel Cryans - Cloudera
> > >  * Mike Percy - Cloudera
> > >  * Misty Stanley-Jones - Cloudera
> > >  * Todd Lipcon - Cloudera
> > >
> > > == Sponsors ==
> > >
> > > === Champion ===
> > >
> > >  * Todd Lipcon
> > >
> > > === Nominated Mentors ===
> > >
> > >  * Jake Farrell - ASF Member and Infra team member, Acquia
> > >  * Brock Noland - ASF Member, StreamSets
> > >  * Michael Stack - ASF Member, Cloudera
> > >  * Jarek Jarcec Cecho - ASF Member, Cloudera
> > >  * Chris Mattmann - ASF Member, NASA JPL and USC
> > >  * Julien Le Dem - Incubator PMC, Dremio
> > >  * Carl Steinbach - ASF Member, LinkedIn
> > >
> > > === Sponsoring Entity ===
> > >
> > > The Apache Incubator
> > >
> >
>

Re: [VOTE] Accept Kudu into the Apache Incubator

Posted by Carl Steinbach <cw...@apache.org>.

+1 (binding)


On Tue, Nov 24, 2015 at 5:39 PM, John D. Ament <jo...@apache.org>
wrote:

> +1
> On Nov 24, 2015 14:33, "Todd Lipcon" <to...@apache.org> wrote:
>
> > Hi all,
> >
> > Discussion on the [DISCUSS] thread seems to have wound down, so I'd like
> to
> > call a VOTE on acceptance of Kudu into the ASF Incubator. The proposal is
> > pasted below and also available on the wiki at:
> > https://wiki.apache.org/incubator/KuduProposal
> >
> > The proposal is unchanged since the original version, except for the
> > addition of Carl Steinbach as a Mentor.
> >
> > Please cast your votes:
> >
> > [] +1, accept Kudu into the Incubator
> > [] +/-0, positive/negative non-counted expression of feelings
> > [] -1, do not accept Kudu into the incubator (please state reasoning)
> >
> > Given the US holiday this week, I imagine many folks are traveling or
> > otherwise offline. So, let's run the vote for a full week rather than the
> > traditional 72 hours. Unless the IPMC objects to the extended voting
> > period, the vote will close on Tues, Dec 1st at noon PST.
> >
> > Thanks
> > -Todd
> > -----
> >
> > = Kudu Proposal =
> >
> > == Abstract ==
> >
> > Kudu is a distributed columnar storage engine built for the Apache Hadoop
> > ecosystem.
> >
> > == Proposal ==
> >
> > Kudu is an open source storage engine for structured data which supports
> > low-latency random access together with efficient analytical access
> > patterns. Kudu distributes data using horizontal partitioning and
> > replicates each partition using Raft consensus, providing low
> > mean-time-to-recovery and low tail latencies. Kudu is designed within the
> > context of the Apache Hadoop ecosystem and supports many integrations
> with
> > other data analytics projects both inside and outside of the Apache
> > Software Foundation.
> >
> >
> >
> > We propose to incubate Kudu as a project of the Apache Software
> Foundation.
> >
> > == Background ==
> >
> > In recent years, explosive growth in the amount of data being generated
> and
> > captured by enterprises has resulted in the rapid adoption of open source
> > technology which is able to store massive data sets at scale and at low
> > cost. In particular, the Apache Hadoop ecosystem has become a focal point
> > for such “big data” workloads, because many traditional open source
> > database systems have lagged in offering a scalable alternative.
> >
> >
> >
> > Structured storage in the Hadoop ecosystem has typically been achieved in
> > two ways: for static data sets, data is typically stored on Apache HDFS
> > using binary data formats such as Apache Avro or Apache Parquet. However,
> > neither HDFS nor these formats has any provision for updating individual
> > records, or for efficient random access. Mutable data sets are typically
> > stored in semi-structured stores such as Apache HBase or Apache
> Cassandra.
> > These systems allow for low-latency record-level reads and writes, but
> lag
> > far behind the static file formats in terms of sequential read throughput
> > for applications such as SQL-based analytics or machine learning.
> >
> >
> >
> > Kudu is a new storage system designed and implemented from the ground up
> to
> > fill this gap between high-throughput sequential-access storage systems
> > such as HDFS and low-latency random-access systems such as HBase or
> > Cassandra. While these existing systems continue to hold advantages in
> some
> > situations, Kudu offers a “happy medium” alternative that can
> dramatically
> > simplify the architecture of many common workloads. In particular, Kudu
> > offers a simple API for row-level inserts, updates, and deletes, while
> > providing table scans at throughputs similar to Parquet, a commonly-used
> > columnar format for static data.
> >
> >
> >
> > More information on Kudu can be found at the existing open source project
> > website: http://getkudu.io and in particular in the Kudu white-paper
> PDF:
> > http://getkudu.io/kudu.pdf from which the above was excerpted.
> >
> > == Rationale ==
> >
> > As described above, Kudu fills an important gap in the open source
> storage
> > ecosystem. After our initial open source project release in September
> 2015,
> > we have seen a great amount of interest across a diverse set of users and
> > companies. We believe that, as a storage system, it is critical to build
> an
> > equally diverse set of contributors in the development community. Our
> > experiences as committers and PMC members on other Apache projects have
> > taught us the value of diverse communities in ensuring both longevity and
> > high quality for such foundational systems.
> >
> > == Initial Goals ==
> >
> >  * Move the existing codebase, website, documentation, and mailing lists
> to
> > Apache-hosted infrastructure
> >  * Work with the infrastructure team to implement and approve our code
> > review, build, and testing workflows in the context of the ASF
> >  * Incremental development and releases per Apache guidelines
> >
> > == Current Status ==
> >
> > ==== Releases ====
> >
> > Kudu has undergone one public release, tagged here
> > https://github.com/cloudera/kudu/tree/kudu0.5.0-release
> >
> > This initial release was not performed in the typical ASF fashion -- no
> > source tarball was released, but rather only convenience binaries made
> > available in Cloudera’s repositories. We will adopt the ASF source
> release
> > process upon joining the incubator.
> >
> >
> > ==== Source ====
> >
> > Kudu’s source is currently hosted on GitHub at
> > https://github.com/cloudera/kudu
> >
> > This repository will be transitioned to Apache’s git hosting during
> > incubation.
> >
> >
> >
> > ==== Code review ====
> >
> > Kudu’s code reviews are currently public and hosted on Gerrit at
> > http://gerrit.cloudera.org:8080/#/q/status:open+project:kudu
> >
> > The Kudu developer community is very happy with gerrit and hopes to work
> > with the Apache Infrastructure team to figure out how we can continue to
> > use Gerrit within ASF policies.
> >
> >
> >
> > ==== Issue tracking ====
> >
> > Kudu’s bug and feature tracking is hosted on JIRA at:
> > https://issues.cloudera.org/projects/KUDU/summary
> >
> > This JIRA instance contains bugs and development discussion dating back 2
> > years prior to Kudu’s open source release and will provide an initial
> seed
> > for the ASF JIRA.
> >
> >
> >
> > ==== Community discussion ====
> >
> > Kudu has several public discussion forums, linked here:
> > http://getkudu.io/community.html
> >
> >
> >
> > ==== Build Infrastructure ====
> >
> > The Kudu Gerrit instance is configured to only allow patches to be
> > committed after running them through an extensive set of pre-commit tests
> > and code lints. The project currently makes use of elastic public cloud
> > resources to perform these tests. Until this point, these resources have
> > been internal to Cloudera, though we are currently investing in moving
> to a
> > publicly accessible infrastructure.
> >
> >
> >
> > ==== Development practices ====
> >
> > Given that Kudu is a persistent storage engine, the community has a high
> > quality bar for contributions to its core. We have a firm belief that
> high
> > quality is achieved through automation, not manual inspection, and hence
> > put a focus on thorough testing and build infrastructure to ensure that
> > bar. The development community also practices review-then-commit for all
> > changes to ensure that changes are accompanied by appropriate tests, are
> > well commented, etc.
> >
> > Rather than seeing these practices as barriers to contribution, we
> believe
> > that a fully automated and standardized review and testing practice makes
> > it easier for new contributors to have patches accepted. Any new
> developer
> > may post a patch to Gerrit using the same workflow as a seasoned
> > contributor, and the same suite of tests will be automatically run. If
> the
> > tests pass, a committer can quickly review and commit the contribution
> from
> > their web browser.
> >
> > === Meritocracy ===
> >
> > We believe strongly in meritocracy in electing committers and PMC
> members.
> > We believe that contributions can come in forms other than just code: for
> > example, one of our initial proposed committers has contributed solely in
> > the area of project documentation. We will encourage contributions and
> > participation of all types, and ensure that contributors are
> appropriately
> > recognized.
> >
> > === Community ===
> >
> > Though Kudu is relatively new as an open source project, it has already
> > seen promising growth in its community across several organizations:
> >
> >  * '''Cloudera''' is the original development sponsor for Kudu.
> >  * '''Xiaomi''' has been helping to develop and optimize Kudu for a new
> > production use case, contributing code, benchmarks, feedback, and
> > conference talks.
> >  * '''Intel''' has contributed optimizations related to their hardware
> > technologies.
> >  * '''Dropbox''' has been experimenting with Kudu for a machine
> monitoring
> > use case, and has been contributing bug reports and product feedback.
> >  * '''Dremio''' is working on integration with Apache Drill and exploring
> > using Kudu in a production use case.
> >  * Several community-built Docker images, tutorials, and blog posts have
> > sprouted up since Kudu’s release.
> >
> >
> >
> > By bringing Kudu to Apache, we hope to encourage further contribution
> from
> > the above organizations as well as to engage new users and contributors
> in
> > the community.
> >
> > === Core Developers ===
> >
> > Kudu was initially developed as a project at Cloudera. Most of the
> > contributions to date have been by developers employed by Cloudera.
> >
> >
> >
> > Many of the developers are committers or PMC members on other Apache
> > projects.
> >
> > === Alignment ===
> >
> > As a project in the big data ecosystem, Kudu is aligned with several
> other
> > ASF projects. Kudu includes input/output format integration with Apache
> > Hadoop, and this integration can also provide a bridge to Apache Spark.
> We
> > are planning to integrate with Apache Hive in the near future. We also
> > integrate closely with Cloudera Impala, which is also currently being
> > proposed for incubation. We have also scheduled a hackathon with the
> Apache
> > Drill team to work on integration with that query engine.
> >
> > == Known Risks ==
> >
> > === Orphaned Products ===
> >
> > The risk of Kudu being abandoned is low. Cloudera has invested a great
> deal
> > in the initial development of the project, and intends to grow its
> > investment over time as Kudu becomes a product adopted by its customer
> > base. Several other organizations are also experimenting with Kudu for
> > production use cases which would live for many years.
> >
> > === Inexperience with Open Source ===
> >
> > Kudu has been released in the open for less than two months. However,
> from
> > our very first public announcement we have been committed to open-source
> > style development:
> >
> >  * our code reviews are fully public and documented on a mailing list
> >  * our daily development chatter is in a public chat room
> >  * we send out weekly “community status” reports highlighting news and
> > contributions
> >  * we published our entire JIRA history and discuss bugs in the open
> >  * we published our entire Git commit history, going back three years (no
> > squashing)
> >
> >
> >
> > Several of the initial committers are experienced open source developers,
> > several being committers and/or PMC members on other ASF projects
> (Hadoop,
> > HBase, Thrift, Flume, et al). Those who are not ASF committers have
> > experience on non-ASF open source projects (Kiji, open-vm-tools, et al).
> >
> > === Homogenous Developers ===
> >
> > The initial committers are employees or former employees of Cloudera.
> > However, the committers are spread across multiple offices (Palo Alto,
> San
> > Francisco, Melbourne), so the team is familiar with working in a
> > distributed environment across varied time zones.
> >
> >
> >
> > The project has received some contributions from developers outside of
> > Cloudera, and is starting to attract a ''user'' community as well. We
> hope
> > to continue to encourage contributions from these developers and
> community
> > members and grow them into committers after they have had time to
> continue
> > their contributions.
> >
> > === Reliance on Salaried Developers ===
> >
> > As mentioned above, the majority of development up to this point has been
> > sponsored by Cloudera. We have seen several community users participate
> in
> > discussions who are hobbyists interested in distributed systems and
> > databases, and hope that they will continue their participation in the
> > project going forward.
> >
> > === Relationships with Other Apache Products ===
> >
> > Kudu is currently related to the following other Apache projects:
> >
> >  * Hadoop: Kudu provides MapReduce input/output formats for integration
> >  * Spark: Kudu integrates with Spark via the above-mentioned input
> formats,
> > and work is progressing on support for Spark Data Frames and Spark SQL.
> >
> >
> >
> > The Kudu team has reached out to several other Apache projects to start
> > discussing integrations, including Flume, Kafka, Hive, and Drill.
> >
> >
> >
> > Kudu integrates with Impala, which is also being proposed for incubation.
> >
> >
> >
> > Kudu is already collaborating on ValueVector, a proposed TLP spinning out
> > from the Apache Drill community.
> >
> >
> >
> > We look forward to continuing to integrate and collaborate with these
> > communities.
> >
> > === An Excessive Fascination with the Apache Brand ===
> >
> > Many of the initial committers are already experienced Apache committers,
> > and understand the true value provided by the Apache Way and the
> principles
> > of the ASF. We believe that this development and contribution model is
> > especially appropriate for storage products, where Apache’s
> > community-over-code philosophy ensures long term viability and
> > consensus-based participation.
> >
> > == Documentation ==
> >
> >  * Documentation is written in AsciiDoc and committed in the Kudu source
> > repository:
> >
> >  * https://github.com/cloudera/kudu/tree/master/docs
> >
> >
> >
> >  * The Kudu web site is version-controlled on the ‘gh-pages’ branch of
> the
> > above repository.
> >
> >  * A LaTeX whitepaper is also published, and the source is available
> within
> > the same repository.
> >  * APIs are documented within the source code as JavaDoc or C++-style
> > documentation comments.
> >  * Many design documents are stored within the source code repository as
> > text files next to the code being documented.
> >
> > == Source and Intellectual Property Submission Plan ==
> >
> > The Kudu codebase and web site is currently hosted on GitHub and will be
> > transitioned to the ASF repositories during incubation. Kudu is already
> > licensed under the Apache 2.0 license.
> >
> >
> >
> > Some portions of the code are imported from other open source projects
> > under the Apache 2.0, BSD, or MIT licenses, with copyrights held by
> authors
> > other than the initial committers. These copyright notices are maintained
> > in those files as well as a top-level NOTICE.txt file. We believe this to
> > be permissible under the license terms and ASF policies, and confirmed
> via
> > a recent thread on general@incubator.apache.org .
> >
> >
> >
> > The “Kudu” name is not a registered trademark, though before the initial
> > release of the project, we performed a trademark search and Cloudera’s
> > legal counsel deemed it acceptable in the context of a data storage
> engine.
> > There exists an unrelated open source project by the same name related to
> > deployments on Microsoft’s Azure cloud service. We have been in contact
> > with legal counsel from Microsoft and have obtained their approval for
> the
> > use of the Kudu name.
> >
> >
> >
> > Cloudera currently owns several domain names related to Kudu (getkudu.io
> ,
> > kududb.io, et al) which will be transferred to the ASF and redirected to
> > the official page during incubation.
> >
> >
> >
> > Portions of Kudu are protected by pending or published patents owned by
> > Cloudera. Given the protections already granted by the Apache License, we
> > do not anticipate any explicit licensing or transfer of this intellectual
> > property.
> >
> > == External Dependencies ==
> >
> > The full set of dependencies and licenses are listed in
> > https://github.com/cloudera/kudu/blob/master/LICENSE.txt
> >
> > and summarized here:
> >
> >  * '''Twitter Bootstrap''': Apache 2.0
> >  * '''d3''': BSD 3-clause
> >  * '''epoch JS library''': MIT
> >  * '''lz4''': BSD 2-clause
> >  * '''gflags''': BSD 3-clause
> >  * '''glog''': BSD 3-clause
> >  * '''gperftools''': BSD 3-clause
> >  * '''libev''': BSD 2-clause
> >  * '''squeasel''':MIT license
> >  * '''protobuf''': BSD 3-clause
> >  * '''rapidjson''': MIT
> >  * '''snappy''': BSD 3-clause
> >  * '''trace-viewer''': BSD 3-clause
> >  * '''zlib''': zlib license
> >  * '''llvm''': University of Illinois/NCSA Open Source (BSD-alike)
> >  * '''bitshuffle''': MIT
> >  * '''boost''': Boost license
> >  * '''curl''': MIT
> >  * '''libunwind''': MIT
> >  * '''nvml''': BSD 3-clause
> >  * '''cyrus-sasl''': Cyrus SASL license (BSD-alike)
> >  * '''openssl''': OpenSSL License (BSD-alike)
> >
> >  * '''Guava''': Apache 2.0
> >  * '''StumbleUpon Async''': BSD
> >  * '''Apache Hadoop''': Apache 2.0
> >  * '''Apache log4j''': Apache 2.0
> >  * '''Netty''': Apache 2.0
> >  * '''slf4j''': MIT
> >  * '''Apache Commons''': Apache 2.0
> >  * '''murmur''': Apache 2.0
> >
> >
> > '''Build/test-only dependencies''':
> >
> >  * '''CMake''': BSD 3-clause
> >  * '''gcovr''': BSD 3-clause
> >  * '''gmock''': BSD 3-clause
> >  * '''Apache Maven''': Apache 2.0
> >  * '''JUnit''': EPL
> >  * '''Mockito''': MIT
> >
> > == Cryptography ==
> >
> > Kudu does not currently include any cryptography-related code.
> >
> > == Required Resources ==
> >
> > === Mailing lists ===
> >
> >  * private@kudu.incubator.apache.org (PMC)
> >  * commits@kudu.incubator.apache.org (git push emails)
> >  * issues@kudu.incubator.apache.org (JIRA issue feed)
> >  * dev@kudu.incubator.apache.org (Gerrit code reviews plus dev
> discussion)
> >  * user@kudu.incubator.apache.org (User questions)
> >
> >
> > === Repository ===
> >
> >  * git://git.apache.org/kudu
> >
> > === Gerrit ===
> >
> > We hope to continue using Gerrit for our code review and commit workflow.
> > The Kudu team has already been in contact with Jake Farrell to start
> > discussions on how Gerrit can fit into the ASF. We know that several
> other
> > ASF projects and podlings are also interested in Gerrit.
> >
> >
> >
> > If the Infrastructure team does not have the bandwidth to support Gerrit,
> > we will continue to support our own instance of Gerrit for Kudu, and make
> > the necessary integrations such that commits are properly authenticated
> and
> > maintain sufficient provenance to uphold the ASF standards (e.g. via the
> > solution adopted by the AsterixDB podling).
> >
> > == Issue Tracking ==
> >
> > We would like to import our current JIRA project into the ASF JIRA, such
> > that our historical commit messages and code comments continue to
> reference
> > the appropriate bug numbers.
> >
> > == Initial Committers ==
> >
> >  * Adar Dembo adar@cloudera.com
> >  * Alex Feinberg alex@strlen.net
> >  * Andrew Wang wang@apache.org
> >  * Dan Burkert dan@cloudera.com
> >  * David Alves dralves@apache.org
> >  * Jean-Daniel Cryans jdcryans@apache.org
> >  * Mike Percy mpercy@apache.org
> >  * Misty Stanley-Jones misty@apache.org
> >  * Todd Lipcon todd@apache.org
> >
> > The initial list of committers was seeded by listing those contributors
> who
> > have contributed 20 or more patches in the last 12 months, indicating
> that
> > they are active and have achieved merit through participation on the
> > project. We chose not to include other contributors who either have not
> yet
> > contributed a significant number of patches, or whose contributions are
> far
> > in the past and we don’t expect to be active within the ASF.
> >
> > == Affiliations ==
> >
> >  * Adar Dembo - Cloudera
> >  * Alex Feinberg - Forward Networks
> >  * Andrew Wang - Cloudera
> >  * Dan Burkert - Cloudera
> >  * David Alves - Cloudera
> >  * Jean-Daniel Cryans - Cloudera
> >  * Mike Percy - Cloudera
> >  * Misty Stanley-Jones - Cloudera
> >  * Todd Lipcon - Cloudera
> >
> > == Sponsors ==
> >
> > === Champion ===
> >
> >  * Todd Lipcon
> >
> > === Nominated Mentors ===
> >
> >  * Jake Farrell - ASF Member and Infra team member, Acquia
> >  * Brock Noland - ASF Member, StreamSets
> >  * Michael Stack - ASF Member, Cloudera
> >  * Jarek Jarcec Cecho - ASF Member, Cloudera
> >  * Chris Mattmann - ASF Member, NASA JPL and USC
> >  * Julien Le Dem - Incubator PMC, Dremio
> >  * Carl Steinbach - ASF Member, LinkedIn
> >
> > === Sponsoring Entity ===
> >
> > The Apache Incubator
> >
>

Re: [VOTE] Accept Kudu into the Apache Incubator

Posted by "John D. Ament" <jo...@apache.org>.

+1
On Nov 24, 2015 14:33, "Todd Lipcon" <to...@apache.org> wrote:

> Hi all,
>
> Discussion on the [DISCUSS] thread seems to have wound down, so I'd like to
> call a VOTE on acceptance of Kudu into the ASF Incubator. The proposal is
> pasted below and also available on the wiki at:
> https://wiki.apache.org/incubator/KuduProposal
>
> The proposal is unchanged since the original version, except for the
> addition of Carl Steinbach as a Mentor.
>
> Please cast your votes:
>
> [] +1, accept Kudu into the Incubator
> [] +/-0, positive/negative non-counted expression of feelings
> [] -1, do not accept Kudu into the incubator (please state reasoning)
>
> Given the US holiday this week, I imagine many folks are traveling or
> otherwise offline. So, let's run the vote for a full week rather than the
> traditional 72 hours. Unless the IPMC objects to the extended voting
> period, the vote will close on Tues, Dec 1st at noon PST.
>
> Thanks
> -Todd
> -----
>
> = Kudu Proposal =
>
> == Abstract ==
>
> Kudu is a distributed columnar storage engine built for the Apache Hadoop
> ecosystem.
>
> == Proposal ==
>
> Kudu is an open source storage engine for structured data which supports
> low-latency random access together with efficient analytical access
> patterns. Kudu distributes data using horizontal partitioning and
> replicates each partition using Raft consensus, providing low
> mean-time-to-recovery and low tail latencies. Kudu is designed within the
> context of the Apache Hadoop ecosystem and supports many integrations with
> other data analytics projects both inside and outside of the Apache
> Software Foundation.
>
>
>
> We propose to incubate Kudu as a project of the Apache Software Foundation.
>
> == Background ==
>
> In recent years, explosive growth in the amount of data being generated and
> captured by enterprises has resulted in the rapid adoption of open source
> technology which is able to store massive data sets at scale and at low
> cost. In particular, the Apache Hadoop ecosystem has become a focal point
> for such “big data” workloads, because many traditional open source
> database systems have lagged in offering a scalable alternative.
>
>
>
> Structured storage in the Hadoop ecosystem has typically been achieved in
> two ways: for static data sets, data is typically stored on Apache HDFS
> using binary data formats such as Apache Avro or Apache Parquet. However,
> neither HDFS nor these formats has any provision for updating individual
> records, or for efficient random access. Mutable data sets are typically
> stored in semi-structured stores such as Apache HBase or Apache Cassandra.
> These systems allow for low-latency record-level reads and writes, but lag
> far behind the static file formats in terms of sequential read throughput
> for applications such as SQL-based analytics or machine learning.
>
>
>
> Kudu is a new storage system designed and implemented from the ground up to
> fill this gap between high-throughput sequential-access storage systems
> such as HDFS and low-latency random-access systems such as HBase or
> Cassandra. While these existing systems continue to hold advantages in some
> situations, Kudu offers a “happy medium” alternative that can dramatically
> simplify the architecture of many common workloads. In particular, Kudu
> offers a simple API for row-level inserts, updates, and deletes, while
> providing table scans at throughputs similar to Parquet, a commonly-used
> columnar format for static data.
>
>
>
> More information on Kudu can be found at the existing open source project
> website: http://getkudu.io and in particular in the Kudu white-paper PDF:
> http://getkudu.io/kudu.pdf from which the above was excerpted.
>
> == Rationale ==
>
> As described above, Kudu fills an important gap in the open source storage
> ecosystem. After our initial open source project release in September 2015,
> we have seen a great amount of interest across a diverse set of users and
> companies. We believe that, as a storage system, it is critical to build an
> equally diverse set of contributors in the development community. Our
> experiences as committers and PMC members on other Apache projects have
> taught us the value of diverse communities in ensuring both longevity and
> high quality for such foundational systems.
>
> == Initial Goals ==
>
>  * Move the existing codebase, website, documentation, and mailing lists to
> Apache-hosted infrastructure
>  * Work with the infrastructure team to implement and approve our code
> review, build, and testing workflows in the context of the ASF
>  * Incremental development and releases per Apache guidelines
>
> == Current Status ==
>
> ==== Releases ====
>
> Kudu has undergone one public release, tagged here
> https://github.com/cloudera/kudu/tree/kudu0.5.0-release
>
> This initial release was not performed in the typical ASF fashion -- no
> source tarball was released, but rather only convenience binaries made
> available in Cloudera’s repositories. We will adopt the ASF source release
> process upon joining the incubator.
>
>
> ==== Source ====
>
> Kudu’s source is currently hosted on GitHub at
> https://github.com/cloudera/kudu
>
> This repository will be transitioned to Apache’s git hosting during
> incubation.
>
>
>
> ==== Code review ====
>
> Kudu’s code reviews are currently public and hosted on Gerrit at
> http://gerrit.cloudera.org:8080/#/q/status:open+project:kudu
>
> The Kudu developer community is very happy with gerrit and hopes to work
> with the Apache Infrastructure team to figure out how we can continue to
> use Gerrit within ASF policies.
>
>
>
> ==== Issue tracking ====
>
> Kudu’s bug and feature tracking is hosted on JIRA at:
> https://issues.cloudera.org/projects/KUDU/summary
>
> This JIRA instance contains bugs and development discussion dating back 2
> years prior to Kudu’s open source release and will provide an initial seed
> for the ASF JIRA.
>
>
>
> ==== Community discussion ====
>
> Kudu has several public discussion forums, linked here:
> http://getkudu.io/community.html
>
>
>
> ==== Build Infrastructure ====
>
> The Kudu Gerrit instance is configured to only allow patches to be
> committed after running them through an extensive set of pre-commit tests
> and code lints. The project currently makes use of elastic public cloud
> resources to perform these tests. Until this point, these resources have
> been internal to Cloudera, though we are currently investing in moving to a
> publicly accessible infrastructure.
>
>
>
> ==== Development practices ====
>
> Given that Kudu is a persistent storage engine, the community has a high
> quality bar for contributions to its core. We have a firm belief that high
> quality is achieved through automation, not manual inspection, and hence
> put a focus on thorough testing and build infrastructure to ensure that
> bar. The development community also practices review-then-commit for all
> changes to ensure that changes are accompanied by appropriate tests, are
> well commented, etc.
>
> Rather than seeing these practices as barriers to contribution, we believe
> that a fully automated and standardized review and testing practice makes
> it easier for new contributors to have patches accepted. Any new developer
> may post a patch to Gerrit using the same workflow as a seasoned
> contributor, and the same suite of tests will be automatically run. If the
> tests pass, a committer can quickly review and commit the contribution from
> their web browser.
>
> === Meritocracy ===
>
> We believe strongly in meritocracy in electing committers and PMC members.
> We believe that contributions can come in forms other than just code: for
> example, one of our initial proposed committers has contributed solely in
> the area of project documentation. We will encourage contributions and
> participation of all types, and ensure that contributors are appropriately
> recognized.
>
> === Community ===
>
> Though Kudu is relatively new as an open source project, it has already
> seen promising growth in its community across several organizations:
>
>  * '''Cloudera''' is the original development sponsor for Kudu.
>  * '''Xiaomi''' has been helping to develop and optimize Kudu for a new
> production use case, contributing code, benchmarks, feedback, and
> conference talks.
>  * '''Intel''' has contributed optimizations related to their hardware
> technologies.
>  * '''Dropbox''' has been experimenting with Kudu for a machine monitoring
> use case, and has been contributing bug reports and product feedback.
>  * '''Dremio''' is working on integration with Apache Drill and exploring
> using Kudu in a production use case.
>  * Several community-built Docker images, tutorials, and blog posts have
> sprouted up since Kudu’s release.
>
>
>
> By bringing Kudu to Apache, we hope to encourage further contribution from
> the above organizations as well as to engage new users and contributors in
> the community.
>
> === Core Developers ===
>
> Kudu was initially developed as a project at Cloudera. Most of the
> contributions to date have been by developers employed by Cloudera.
>
>
>
> Many of the developers are committers or PMC members on other Apache
> projects.
>
> === Alignment ===
>
> As a project in the big data ecosystem, Kudu is aligned with several other
> ASF projects. Kudu includes input/output format integration with Apache
> Hadoop, and this integration can also provide a bridge to Apache Spark. We
> are planning to integrate with Apache Hive in the near future. We also
> integrate closely with Cloudera Impala, which is also currently being
> proposed for incubation. We have also scheduled a hackathon with the Apache
> Drill team to work on integration with that query engine.
>
> == Known Risks ==
>
> === Orphaned Products ===
>
> The risk of Kudu being abandoned is low. Cloudera has invested a great deal
> in the initial development of the project, and intends to grow its
> investment over time as Kudu becomes a product adopted by its customer
> base. Several other organizations are also experimenting with Kudu for
> production use cases which would live for many years.
>
> === Inexperience with Open Source ===
>
> Kudu has been released in the open for less than two months. However, from
> our very first public announcement we have been committed to open-source
> style development:
>
>  * our code reviews are fully public and documented on a mailing list
>  * our daily development chatter is in a public chat room
>  * we send out weekly “community status” reports highlighting news and
> contributions
>  * we published our entire JIRA history and discuss bugs in the open
>  * we published our entire Git commit history, going back three years (no
> squashing)
>
>
>
> Several of the initial committers are experienced open source developers,
> several being committers and/or PMC members on other ASF projects (Hadoop,
> HBase, Thrift, Flume, et al). Those who are not ASF committers have
> experience on non-ASF open source projects (Kiji, open-vm-tools, et al).
>
> === Homogenous Developers ===
>
> The initial committers are employees or former employees of Cloudera.
> However, the committers are spread across multiple offices (Palo Alto, San
> Francisco, Melbourne), so the team is familiar with working in a
> distributed environment across varied time zones.
>
>
>
> The project has received some contributions from developers outside of
> Cloudera, and is starting to attract a ''user'' community as well. We hope
> to continue to encourage contributions from these developers and community
> members and grow them into committers after they have had time to continue
> their contributions.
>
> === Reliance on Salaried Developers ===
>
> As mentioned above, the majority of development up to this point has been
> sponsored by Cloudera. We have seen several community users participate in
> discussions who are hobbyists interested in distributed systems and
> databases, and hope that they will continue their participation in the
> project going forward.
>
> === Relationships with Other Apache Products ===
>
> Kudu is currently related to the following other Apache projects:
>
>  * Hadoop: Kudu provides MapReduce input/output formats for integration
>  * Spark: Kudu integrates with Spark via the above-mentioned input formats,
> and work is progressing on support for Spark Data Frames and Spark SQL.
>
>
>
> The Kudu team has reached out to several other Apache projects to start
> discussing integrations, including Flume, Kafka, Hive, and Drill.
>
>
>
> Kudu integrates with Impala, which is also being proposed for incubation.
>
>
>
> Kudu is already collaborating on ValueVector, a proposed TLP spinning out
> from the Apache Drill community.
>
>
>
> We look forward to continuing to integrate and collaborate with these
> communities.
>
> === An Excessive Fascination with the Apache Brand ===
>
> Many of the initial committers are already experienced Apache committers,
> and understand the true value provided by the Apache Way and the principles
> of the ASF. We believe that this development and contribution model is
> especially appropriate for storage products, where Apache’s
> community-over-code philosophy ensures long term viability and
> consensus-based participation.
>
> == Documentation ==
>
>  * Documentation is written in AsciiDoc and committed in the Kudu source
> repository:
>
>  * https://github.com/cloudera/kudu/tree/master/docs
>
>
>
>  * The Kudu web site is version-controlled on the ‘gh-pages’ branch of the
> above repository.
>
>  * A LaTeX whitepaper is also published, and the source is available within
> the same repository.
>  * APIs are documented within the source code as JavaDoc or C++-style
> documentation comments.
>  * Many design documents are stored within the source code repository as
> text files next to the code being documented.
>
> == Source and Intellectual Property Submission Plan ==
>
> The Kudu codebase and web site is currently hosted on GitHub and will be
> transitioned to the ASF repositories during incubation. Kudu is already
> licensed under the Apache 2.0 license.
>
>
>
> Some portions of the code are imported from other open source projects
> under the Apache 2.0, BSD, or MIT licenses, with copyrights held by authors
> other than the initial committers. These copyright notices are maintained
> in those files as well as a top-level NOTICE.txt file. We believe this to
> be permissible under the license terms and ASF policies, and confirmed via
> a recent thread on general@incubator.apache.org .
>
>
>
> The “Kudu” name is not a registered trademark, though before the initial
> release of the project, we performed a trademark search and Cloudera’s
> legal counsel deemed it acceptable in the context of a data storage engine.
> There exists an unrelated open source project by the same name related to
> deployments on Microsoft’s Azure cloud service. We have been in contact
> with legal counsel from Microsoft and have obtained their approval for the
> use of the Kudu name.
>
>
>
> Cloudera currently owns several domain names related to Kudu (getkudu.io,
> kududb.io, et al) which will be transferred to the ASF and redirected to
> the official page during incubation.
>
>
>
> Portions of Kudu are protected by pending or published patents owned by
> Cloudera. Given the protections already granted by the Apache License, we
> do not anticipate any explicit licensing or transfer of this intellectual
> property.
>
> == External Dependencies ==
>
> The full set of dependencies and licenses are listed in
> https://github.com/cloudera/kudu/blob/master/LICENSE.txt
>
> and summarized here:
>
>  * '''Twitter Bootstrap''': Apache 2.0
>  * '''d3''': BSD 3-clause
>  * '''epoch JS library''': MIT
>  * '''lz4''': BSD 2-clause
>  * '''gflags''': BSD 3-clause
>  * '''glog''': BSD 3-clause
>  * '''gperftools''': BSD 3-clause
>  * '''libev''': BSD 2-clause
>  * '''squeasel''':MIT license
>  * '''protobuf''': BSD 3-clause
>  * '''rapidjson''': MIT
>  * '''snappy''': BSD 3-clause
>  * '''trace-viewer''': BSD 3-clause
>  * '''zlib''': zlib license
>  * '''llvm''': University of Illinois/NCSA Open Source (BSD-alike)
>  * '''bitshuffle''': MIT
>  * '''boost''': Boost license
>  * '''curl''': MIT
>  * '''libunwind''': MIT
>  * '''nvml''': BSD 3-clause
>  * '''cyrus-sasl''': Cyrus SASL license (BSD-alike)
>  * '''openssl''': OpenSSL License (BSD-alike)
>
>  * '''Guava''': Apache 2.0
>  * '''StumbleUpon Async''': BSD
>  * '''Apache Hadoop''': Apache 2.0
>  * '''Apache log4j''': Apache 2.0
>  * '''Netty''': Apache 2.0
>  * '''slf4j''': MIT
>  * '''Apache Commons''': Apache 2.0
>  * '''murmur''': Apache 2.0
>
>
> '''Build/test-only dependencies''':
>
>  * '''CMake''': BSD 3-clause
>  * '''gcovr''': BSD 3-clause
>  * '''gmock''': BSD 3-clause
>  * '''Apache Maven''': Apache 2.0
>  * '''JUnit''': EPL
>  * '''Mockito''': MIT
>
> == Cryptography ==
>
> Kudu does not currently include any cryptography-related code.
>
> == Required Resources ==
>
> === Mailing lists ===
>
>  * private@kudu.incubator.apache.org (PMC)
>  * commits@kudu.incubator.apache.org (git push emails)
>  * issues@kudu.incubator.apache.org (JIRA issue feed)
>  * dev@kudu.incubator.apache.org (Gerrit code reviews plus dev discussion)
>  * user@kudu.incubator.apache.org (User questions)
>
>
> === Repository ===
>
>  * git://git.apache.org/kudu
>
> === Gerrit ===
>
> We hope to continue using Gerrit for our code review and commit workflow.
> The Kudu team has already been in contact with Jake Farrell to start
> discussions on how Gerrit can fit into the ASF. We know that several other
> ASF projects and podlings are also interested in Gerrit.
>
>
>
> If the Infrastructure team does not have the bandwidth to support Gerrit,
> we will continue to support our own instance of Gerrit for Kudu, and make
> the necessary integrations such that commits are properly authenticated and
> maintain sufficient provenance to uphold the ASF standards (e.g. via the
> solution adopted by the AsterixDB podling).
>
> == Issue Tracking ==
>
> We would like to import our current JIRA project into the ASF JIRA, such
> that our historical commit messages and code comments continue to reference
> the appropriate bug numbers.
>
> == Initial Committers ==
>
>  * Adar Dembo adar@cloudera.com
>  * Alex Feinberg alex@strlen.net
>  * Andrew Wang wang@apache.org
>  * Dan Burkert dan@cloudera.com
>  * David Alves dralves@apache.org
>  * Jean-Daniel Cryans jdcryans@apache.org
>  * Mike Percy mpercy@apache.org
>  * Misty Stanley-Jones misty@apache.org
>  * Todd Lipcon todd@apache.org
>
> The initial list of committers was seeded by listing those contributors who
> have contributed 20 or more patches in the last 12 months, indicating that
> they are active and have achieved merit through participation on the
> project. We chose not to include other contributors who either have not yet
> contributed a significant number of patches, or whose contributions are far
> in the past and we don’t expect to be active within the ASF.
>
> == Affiliations ==
>
>  * Adar Dembo - Cloudera
>  * Alex Feinberg - Forward Networks
>  * Andrew Wang - Cloudera
>  * Dan Burkert - Cloudera
>  * David Alves - Cloudera
>  * Jean-Daniel Cryans - Cloudera
>  * Mike Percy - Cloudera
>  * Misty Stanley-Jones - Cloudera
>  * Todd Lipcon - Cloudera
>
> == Sponsors ==
>
> === Champion ===
>
>  * Todd Lipcon
>
> === Nominated Mentors ===
>
>  * Jake Farrell - ASF Member and Infra team member, Acquia
>  * Brock Noland - ASF Member, StreamSets
>  * Michael Stack - ASF Member, Cloudera
>  * Jarek Jarcec Cecho - ASF Member, Cloudera
>  * Chris Mattmann - ASF Member, NASA JPL and USC
>  * Julien Le Dem - Incubator PMC, Dremio
>  * Carl Steinbach - ASF Member, LinkedIn
>
> === Sponsoring Entity ===
>
> The Apache Incubator
>

Re: [VOTE] Accept Kudu into the Apache Incubator

Posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov>.

+1 from me.

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++





-----Original Message-----
From: <to...@cloudera.com> on behalf of Todd Lipcon <to...@apache.org>
Reply-To: "general@incubator.apache.org" <ge...@incubator.apache.org>
Date: Tuesday, November 24, 2015 at 11:32 AM
To: "general@incubator.apache.org" <ge...@incubator.apache.org>
Subject: [VOTE] Accept Kudu into the Apache Incubator

>Hi all,
>
>Discussion on the [DISCUSS] thread seems to have wound down, so I'd like
>to
>call a VOTE on acceptance of Kudu into the ASF Incubator. The proposal is
>pasted below and also available on the wiki at:
>https://wiki.apache.org/incubator/KuduProposal
>
>The proposal is unchanged since the original version, except for the
>addition of Carl Steinbach as a Mentor.
>
>Please cast your votes:
>
>[] +1, accept Kudu into the Incubator
>[] +/-0, positive/negative non-counted expression of feelings
>[] -1, do not accept Kudu into the incubator (please state reasoning)
>
>Given the US holiday this week, I imagine many folks are traveling or
>otherwise offline. So, let's run the vote for a full week rather than the
>traditional 72 hours. Unless the IPMC objects to the extended voting
>period, the vote will close on Tues, Dec 1st at noon PST.
>
>Thanks
>-Todd
>-----
>
>= Kudu Proposal =
>
>== Abstract ==
>
>Kudu is a distributed columnar storage engine built for the Apache Hadoop
>ecosystem.
>
>== Proposal ==
>
>Kudu is an open source storage engine for structured data which supports
>low-latency random access together with efficient analytical access
>patterns. Kudu distributes data using horizontal partitioning and
>replicates each partition using Raft consensus, providing low
>mean-time-to-recovery and low tail latencies. Kudu is designed within the
>context of the Apache Hadoop ecosystem and supports many integrations with
>other data analytics projects both inside and outside of the Apache
>Software Foundation.
>
>
>
>We propose to incubate Kudu as a project of the Apache Software
>Foundation.
>
>== Background ==
>
>In recent years, explosive growth in the amount of data being generated
>and
>captured by enterprises has resulted in the rapid adoption of open source
>technology which is able to store massive data sets at scale and at low
>cost. In particular, the Apache Hadoop ecosystem has become a focal point
>for such “big data” workloads, because many traditional open source
>database systems have lagged in offering a scalable alternative.
>
>
>
>Structured storage in the Hadoop ecosystem has typically been achieved in
>two ways: for static data sets, data is typically stored on Apache HDFS
>using binary data formats such as Apache Avro or Apache Parquet. However,
>neither HDFS nor these formats has any provision for updating individual
>records, or for efficient random access. Mutable data sets are typically
>stored in semi-structured stores such as Apache HBase or Apache Cassandra.
>These systems allow for low-latency record-level reads and writes, but lag
>far behind the static file formats in terms of sequential read throughput
>for applications such as SQL-based analytics or machine learning.
>
>
>
>Kudu is a new storage system designed and implemented from the ground up
>to
>fill this gap between high-throughput sequential-access storage systems
>such as HDFS and low-latency random-access systems such as HBase or
>Cassandra. While these existing systems continue to hold advantages in
>some
>situations, Kudu offers a “happy medium” alternative that can dramatically
>simplify the architecture of many common workloads. In particular, Kudu
>offers a simple API for row-level inserts, updates, and deletes, while
>providing table scans at throughputs similar to Parquet, a commonly-used
>columnar format for static data.
>
>
>
>More information on Kudu can be found at the existing open source project
>website: http://getkudu.io and in particular in the Kudu white-paper PDF:
>http://getkudu.io/kudu.pdf from which the above was excerpted.
>
>== Rationale ==
>
>As described above, Kudu fills an important gap in the open source storage
>ecosystem. After our initial open source project release in September
>2015,
>we have seen a great amount of interest across a diverse set of users and
>companies. We believe that, as a storage system, it is critical to build
>an
>equally diverse set of contributors in the development community. Our
>experiences as committers and PMC members on other Apache projects have
>taught us the value of diverse communities in ensuring both longevity and
>high quality for such foundational systems.
>
>== Initial Goals ==
>
> * Move the existing codebase, website, documentation, and mailing lists
>to
>Apache-hosted infrastructure
> * Work with the infrastructure team to implement and approve our code
>review, build, and testing workflows in the context of the ASF
> * Incremental development and releases per Apache guidelines
>
>== Current Status ==
>
>==== Releases ====
>
>Kudu has undergone one public release, tagged here
>https://github.com/cloudera/kudu/tree/kudu0.5.0-release
>
>This initial release was not performed in the typical ASF fashion -- no
>source tarball was released, but rather only convenience binaries made
>available in Cloudera’s repositories. We will adopt the ASF source release
>process upon joining the incubator.
>
>
>==== Source ====
>
>Kudu’s source is currently hosted on GitHub at
>https://github.com/cloudera/kudu
>
>This repository will be transitioned to Apache’s git hosting during
>incubation.
>
>
>
>==== Code review ====
>
>Kudu’s code reviews are currently public and hosted on Gerrit at
>http://gerrit.cloudera.org:8080/#/q/status:open+project:kudu
>
>The Kudu developer community is very happy with gerrit and hopes to work
>with the Apache Infrastructure team to figure out how we can continue to
>use Gerrit within ASF policies.
>
>
>
>==== Issue tracking ====
>
>Kudu’s bug and feature tracking is hosted on JIRA at:
>https://issues.cloudera.org/projects/KUDU/summary
>
>This JIRA instance contains bugs and development discussion dating back 2
>years prior to Kudu’s open source release and will provide an initial seed
>for the ASF JIRA.
>
>
>
>==== Community discussion ====
>
>Kudu has several public discussion forums, linked here:
>http://getkudu.io/community.html
>
>
>
>==== Build Infrastructure ====
>
>The Kudu Gerrit instance is configured to only allow patches to be
>committed after running them through an extensive set of pre-commit tests
>and code lints. The project currently makes use of elastic public cloud
>resources to perform these tests. Until this point, these resources have
>been internal to Cloudera, though we are currently investing in moving to
>a
>publicly accessible infrastructure.
>
>
>
>==== Development practices ====
>
>Given that Kudu is a persistent storage engine, the community has a high
>quality bar for contributions to its core. We have a firm belief that high
>quality is achieved through automation, not manual inspection, and hence
>put a focus on thorough testing and build infrastructure to ensure that
>bar. The development community also practices review-then-commit for all
>changes to ensure that changes are accompanied by appropriate tests, are
>well commented, etc.
>
>Rather than seeing these practices as barriers to contribution, we believe
>that a fully automated and standardized review and testing practice makes
>it easier for new contributors to have patches accepted. Any new developer
>may post a patch to Gerrit using the same workflow as a seasoned
>contributor, and the same suite of tests will be automatically run. If the
>tests pass, a committer can quickly review and commit the contribution
>from
>their web browser.
>
>=== Meritocracy ===
>
>We believe strongly in meritocracy in electing committers and PMC members.
>We believe that contributions can come in forms other than just code: for
>example, one of our initial proposed committers has contributed solely in
>the area of project documentation. We will encourage contributions and
>participation of all types, and ensure that contributors are appropriately
>recognized.
>
>=== Community ===
>
>Though Kudu is relatively new as an open source project, it has already
>seen promising growth in its community across several organizations:
>
> * '''Cloudera''' is the original development sponsor for Kudu.
> * '''Xiaomi''' has been helping to develop and optimize Kudu for a new
>production use case, contributing code, benchmarks, feedback, and
>conference talks.
> * '''Intel''' has contributed optimizations related to their hardware
>technologies.
> * '''Dropbox''' has been experimenting with Kudu for a machine monitoring
>use case, and has been contributing bug reports and product feedback.
> * '''Dremio''' is working on integration with Apache Drill and exploring
>using Kudu in a production use case.
> * Several community-built Docker images, tutorials, and blog posts have
>sprouted up since Kudu’s release.
>
>
>
>By bringing Kudu to Apache, we hope to encourage further contribution from
>the above organizations as well as to engage new users and contributors in
>the community.
>
>=== Core Developers ===
>
>Kudu was initially developed as a project at Cloudera. Most of the
>contributions to date have been by developers employed by Cloudera.
>
>
>
>Many of the developers are committers or PMC members on other Apache
>projects.
>
>=== Alignment ===
>
>As a project in the big data ecosystem, Kudu is aligned with several other
>ASF projects. Kudu includes input/output format integration with Apache
>Hadoop, and this integration can also provide a bridge to Apache Spark. We
>are planning to integrate with Apache Hive in the near future. We also
>integrate closely with Cloudera Impala, which is also currently being
>proposed for incubation. We have also scheduled a hackathon with the
>Apache
>Drill team to work on integration with that query engine.
>
>== Known Risks ==
>
>=== Orphaned Products ===
>
>The risk of Kudu being abandoned is low. Cloudera has invested a great
>deal
>in the initial development of the project, and intends to grow its
>investment over time as Kudu becomes a product adopted by its customer
>base. Several other organizations are also experimenting with Kudu for
>production use cases which would live for many years.
>
>=== Inexperience with Open Source ===
>
>Kudu has been released in the open for less than two months. However, from
>our very first public announcement we have been committed to open-source
>style development:
>
> * our code reviews are fully public and documented on a mailing list
> * our daily development chatter is in a public chat room
> * we send out weekly “community status” reports highlighting news and
>contributions
> * we published our entire JIRA history and discuss bugs in the open
> * we published our entire Git commit history, going back three years (no
>squashing)
>
>
>
>Several of the initial committers are experienced open source developers,
>several being committers and/or PMC members on other ASF projects (Hadoop,
>HBase, Thrift, Flume, et al). Those who are not ASF committers have
>experience on non-ASF open source projects (Kiji, open-vm-tools, et al).
>
>=== Homogenous Developers ===
>
>The initial committers are employees or former employees of Cloudera.
>However, the committers are spread across multiple offices (Palo Alto, San
>Francisco, Melbourne), so the team is familiar with working in a
>distributed environment across varied time zones.
>
>
>
>The project has received some contributions from developers outside of
>Cloudera, and is starting to attract a ''user'' community as well. We hope
>to continue to encourage contributions from these developers and community
>members and grow them into committers after they have had time to continue
>their contributions.
>
>=== Reliance on Salaried Developers ===
>
>As mentioned above, the majority of development up to this point has been
>sponsored by Cloudera. We have seen several community users participate in
>discussions who are hobbyists interested in distributed systems and
>databases, and hope that they will continue their participation in the
>project going forward.
>
>=== Relationships with Other Apache Products ===
>
>Kudu is currently related to the following other Apache projects:
>
> * Hadoop: Kudu provides MapReduce input/output formats for integration
> * Spark: Kudu integrates with Spark via the above-mentioned input
>formats,
>and work is progressing on support for Spark Data Frames and Spark SQL.
>
>
>
>The Kudu team has reached out to several other Apache projects to start
>discussing integrations, including Flume, Kafka, Hive, and Drill.
>
>
>
>Kudu integrates with Impala, which is also being proposed for incubation.
>
>
>
>Kudu is already collaborating on ValueVector, a proposed TLP spinning out
>from the Apache Drill community.
>
>
>
>We look forward to continuing to integrate and collaborate with these
>communities.
>
>=== An Excessive Fascination with the Apache Brand ===
>
>Many of the initial committers are already experienced Apache committers,
>and understand the true value provided by the Apache Way and the
>principles
>of the ASF. We believe that this development and contribution model is
>especially appropriate for storage products, where Apache’s
>community-over-code philosophy ensures long term viability and
>consensus-based participation.
>
>== Documentation ==
>
> * Documentation is written in AsciiDoc and committed in the Kudu source
>repository:
>
> * https://github.com/cloudera/kudu/tree/master/docs
>
>
>
> * The Kudu web site is version-controlled on the ‘gh-pages’ branch of the
>above repository.
>
> * A LaTeX whitepaper is also published, and the source is available
>within
>the same repository.
> * APIs are documented within the source code as JavaDoc or C++-style
>documentation comments.
> * Many design documents are stored within the source code repository as
>text files next to the code being documented.
>
>== Source and Intellectual Property Submission Plan ==
>
>The Kudu codebase and web site is currently hosted on GitHub and will be
>transitioned to the ASF repositories during incubation. Kudu is already
>licensed under the Apache 2.0 license.
>
>
>
>Some portions of the code are imported from other open source projects
>under the Apache 2.0, BSD, or MIT licenses, with copyrights held by
>authors
>other than the initial committers. These copyright notices are maintained
>in those files as well as a top-level NOTICE.txt file. We believe this to
>be permissible under the license terms and ASF policies, and confirmed via
>a recent thread on general@incubator.apache.org .
>
>
>
>The “Kudu” name is not a registered trademark, though before the initial
>release of the project, we performed a trademark search and Cloudera’s
>legal counsel deemed it acceptable in the context of a data storage
>engine.
>There exists an unrelated open source project by the same name related to
>deployments on Microsoft’s Azure cloud service. We have been in contact
>with legal counsel from Microsoft and have obtained their approval for the
>use of the Kudu name.
>
>
>
>Cloudera currently owns several domain names related to Kudu (getkudu.io,
>kududb.io, et al) which will be transferred to the ASF and redirected to
>the official page during incubation.
>
>
>
>Portions of Kudu are protected by pending or published patents owned by
>Cloudera. Given the protections already granted by the Apache License, we
>do not anticipate any explicit licensing or transfer of this intellectual
>property.
>
>== External Dependencies ==
>
>The full set of dependencies and licenses are listed in
>https://github.com/cloudera/kudu/blob/master/LICENSE.txt
>
>and summarized here:
>
> * '''Twitter Bootstrap''': Apache 2.0
> * '''d3''': BSD 3-clause
> * '''epoch JS library''': MIT
> * '''lz4''': BSD 2-clause
> * '''gflags''': BSD 3-clause
> * '''glog''': BSD 3-clause
> * '''gperftools''': BSD 3-clause
> * '''libev''': BSD 2-clause
> * '''squeasel''':MIT license
> * '''protobuf''': BSD 3-clause
> * '''rapidjson''': MIT
> * '''snappy''': BSD 3-clause
> * '''trace-viewer''': BSD 3-clause
> * '''zlib''': zlib license
> * '''llvm''': University of Illinois/NCSA Open Source (BSD-alike)
> * '''bitshuffle''': MIT
> * '''boost''': Boost license
> * '''curl''': MIT
> * '''libunwind''': MIT
> * '''nvml''': BSD 3-clause
> * '''cyrus-sasl''': Cyrus SASL license (BSD-alike)
> * '''openssl''': OpenSSL License (BSD-alike)
>
> * '''Guava''': Apache 2.0
> * '''StumbleUpon Async''': BSD
> * '''Apache Hadoop''': Apache 2.0
> * '''Apache log4j''': Apache 2.0
> * '''Netty''': Apache 2.0
> * '''slf4j''': MIT
> * '''Apache Commons''': Apache 2.0
> * '''murmur''': Apache 2.0
>
>
>'''Build/test-only dependencies''':
>
> * '''CMake''': BSD 3-clause
> * '''gcovr''': BSD 3-clause
> * '''gmock''': BSD 3-clause
> * '''Apache Maven''': Apache 2.0
> * '''JUnit''': EPL
> * '''Mockito''': MIT
>
>== Cryptography ==
>
>Kudu does not currently include any cryptography-related code.
>
>== Required Resources ==
>
>=== Mailing lists ===
>
> * private@kudu.incubator.apache.org (PMC)
> * commits@kudu.incubator.apache.org (git push emails)
> * issues@kudu.incubator.apache.org (JIRA issue feed)
> * dev@kudu.incubator.apache.org (Gerrit code reviews plus dev discussion)
> * user@kudu.incubator.apache.org (User questions)
>
>
>=== Repository ===
>
> * git://git.apache.org/kudu
>
>=== Gerrit ===
>
>We hope to continue using Gerrit for our code review and commit workflow.
>The Kudu team has already been in contact with Jake Farrell to start
>discussions on how Gerrit can fit into the ASF. We know that several other
>ASF projects and podlings are also interested in Gerrit.
>
>
>
>If the Infrastructure team does not have the bandwidth to support Gerrit,
>we will continue to support our own instance of Gerrit for Kudu, and make
>the necessary integrations such that commits are properly authenticated
>and
>maintain sufficient provenance to uphold the ASF standards (e.g. via the
>solution adopted by the AsterixDB podling).
>
>== Issue Tracking ==
>
>We would like to import our current JIRA project into the ASF JIRA, such
>that our historical commit messages and code comments continue to
>reference
>the appropriate bug numbers.
>
>== Initial Committers ==
>
> * Adar Dembo adar@cloudera.com
> * Alex Feinberg alex@strlen.net
> * Andrew Wang wang@apache.org
> * Dan Burkert dan@cloudera.com
> * David Alves dralves@apache.org
> * Jean-Daniel Cryans jdcryans@apache.org
> * Mike Percy mpercy@apache.org
> * Misty Stanley-Jones misty@apache.org
> * Todd Lipcon todd@apache.org
>
>The initial list of committers was seeded by listing those contributors
>who
>have contributed 20 or more patches in the last 12 months, indicating that
>they are active and have achieved merit through participation on the
>project. We chose not to include other contributors who either have not
>yet
>contributed a significant number of patches, or whose contributions are
>far
>in the past and we don’t expect to be active within the ASF.
>
>== Affiliations ==
>
> * Adar Dembo - Cloudera
> * Alex Feinberg - Forward Networks
> * Andrew Wang - Cloudera
> * Dan Burkert - Cloudera
> * David Alves - Cloudera
> * Jean-Daniel Cryans - Cloudera
> * Mike Percy - Cloudera
> * Misty Stanley-Jones - Cloudera
> * Todd Lipcon - Cloudera
>
>== Sponsors ==
>
>=== Champion ===
>
> * Todd Lipcon
>
>=== Nominated Mentors ===
>
> * Jake Farrell - ASF Member and Infra team member, Acquia
> * Brock Noland - ASF Member, StreamSets
> * Michael Stack - ASF Member, Cloudera
> * Jarek Jarcec Cecho - ASF Member, Cloudera
> * Chris Mattmann - ASF Member, NASA JPL and USC
> * Julien Le Dem - Incubator PMC, Dremio
> * Carl Steinbach - ASF Member, LinkedIn
>
>=== Sponsoring Entity ===
>
>The Apache Incubator


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org