You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@incubator.apache.org by Todd Lipcon <to...@apache.org> on 2015/11/17 19:32:27 UTC

[DISCUSS] Kudu incubator proposal

Hi all,

We'd like to start a discussion proposing the submission of Kudu to the
Apache Incubator.

The proposal is available on the Wiki here:
https://wiki.apache.org/incubator/KuduProposal
and pasted in this email for easy quoting during discussion.

Looking forward to hearing feedback!

-Todd
---------------------

= Kudu Proposal =

== Abstract ==

Kudu is a distributed columnar storage engine built for the Apache Hadoop
ecosystem.

== Proposal ==

Kudu is an open source storage engine for structured data which supports
low-latency random access together with efficient analytical access
patterns. Kudu distributes data using horizontal partitioning and
replicates each partition using Raft consensus, providing low
mean-time-to-recovery and low tail latencies. Kudu is designed within the
context of the Apache Hadoop ecosystem and supports many integrations with
other data analytics projects both inside and outside of the Apache
Software Foundation.



We propose to incubate Kudu as a project of the Apache Software Foundation.

== Background ==

In recent years, explosive growth in the amount of data being generated and
captured by enterprises has resulted in the rapid adoption of open source
technology which is able to store massive data sets at scale and at low
cost. In particular, the Apache Hadoop ecosystem has become a focal point
for such “big data” workloads, because many traditional open source
database systems have lagged in offering a scalable alternative.



Structured storage in the Hadoop ecosystem has typically been achieved in
two ways: for static data sets, data is typically stored on Apache HDFS
using binary data formats such as Apache Avro or Apache Parquet. However,
neither HDFS nor these formats has any provision for updating individual
records, or for efficient random access. Mutable data sets are typically
stored in semi-structured stores such as Apache HBase or Apache Cassandra.
These systems allow for low-latency record-level reads and writes, but lag
far behind the static file formats in terms of sequential read throughput
for applications such as SQL-based analytics or machine learning.



Kudu is a new storage system designed and implemented from the ground up to
fill this gap between high-throughput sequential-access storage systems
such as HDFS and low-latency random-access systems such as HBase or
Cassandra. While these existing systems continue to hold advantages in some
situations, Kudu offers a “happy medium” alternative that can dramatically
simplify the architecture of many common workloads. In particular, Kudu
offers a simple API for row-level inserts, updates, and deletes, while
providing table scans at throughputs similar to Parquet, a commonly-used
columnar format for static data.



More information on Kudu can be found at the existing open source project
website: http://getkudu.io and in particular in the Kudu white-paper PDF:
http://getkudu.io/kudu.pdf from which the above was excerpted.

== Rationale ==

As described above, Kudu fills an important gap in the open source storage
ecosystem. After our initial open source project release in September 2015,
we have seen a great amount of interest across a diverse set of users and
companies. We believe that, as a storage system, it is critical to build an
equally diverse set of contributors in the development community. Our
experiences as committers and PMC members on other Apache projects have
taught us the value of diverse communities in ensuring both longevity and
high quality for such foundational systems.

== Initial Goals ==

 * Move the existing codebase, website, documentation, and mailing lists to
Apache-hosted infrastructure
 * Work with the infrastructure team to implement and approve our code
review, build, and testing workflows in the context of the ASF
 * Incremental development and releases per Apache guidelines

== Current Status ==

==== Releases ====

Kudu has undergone one public release, tagged here
https://github.com/cloudera/kudu/tree/kudu0.5.0-release

This initial release was not performed in the typical ASF fashion -- no
source tarball was released, but rather only convenience binaries made
available in Cloudera’s repositories. We will adopt the ASF source release
process upon joining the incubator.


==== Source ====

Kudu’s source is currently hosted on GitHub at
https://github.com/cloudera/kudu

This repository will be transitioned to Apache’s git hosting during
incubation.



==== Code review ====

Kudu’s code reviews are currently public and hosted on Gerrit at
http://gerrit.cloudera.org:8080/#/q/status:open+project:kudu

The Kudu developer community is very happy with gerrit and hopes to work
with the Apache Infrastructure team to figure out how we can continue to
use Gerrit within ASF policies.



==== Issue tracking ====

Kudu’s bug and feature tracking is hosted on JIRA at:
https://issues.cloudera.org/projects/KUDU/summary

This JIRA instance contains bugs and development discussion dating back 2
years prior to Kudu’s open source release and will provide an initial seed
for the ASF JIRA.



==== Community discussion ====

Kudu has several public discussion forums, linked here:
http://getkudu.io/community.html



==== Build Infrastructure ====

The Kudu Gerrit instance is configured to only allow patches to be
committed after running them through an extensive set of pre-commit tests
and code lints. The project currently makes use of elastic public cloud
resources to perform these tests. Until this point, these resources have
been internal to Cloudera, though we are currently investing in moving to a
publicly accessible infrastructure.



==== Development practices ====

Given that Kudu is a persistent storage engine, the community has a high
quality bar for contributions to its core. We have a firm belief that high
quality is achieved through automation, not manual inspection, and hence
put a focus on thorough testing and build infrastructure to ensure that
bar. The development community also practices review-then-commit for all
changes to ensure that changes are accompanied by appropriate tests, are
well commented, etc.

Rather than seeing these practices as barriers to contribution, we believe
that a fully automated and standardized review and testing practice makes
it easier for new contributors to have patches accepted. Any new developer
may post a patch to Gerrit using the same workflow as a seasoned
contributor, and the same suite of tests will be automatically run. If the
tests pass, a committer can quickly review and commit the contribution from
their web browser.

=== Meritocracy ===

We believe strongly in meritocracy in electing committers and PMC members.
We believe that contributions can come in forms other than just code: for
example, one of our initial proposed committers has contributed solely in
the area of project documentation. We will encourage contributions and
participation of all types, and ensure that contributors are appropriately
recognized.

=== Community ===

Though Kudu is relatively new as an open source project, it has already
seen promising growth in its community across several organizations:

 * '''Cloudera''' is the original development sponsor for Kudu.
 * '''Xiaomi''' has been helping to develop and optimize Kudu for a new
production use case, contributing code, benchmarks, feedback, and
conference talks.
 * '''Intel''' has contributed optimizations related to their hardware
technologies.
 * '''Dropbox''' has been experimenting with Kudu for a machine monitoring
use case, and has been contributing bug reports and product feedback.
 * '''Dremio''' is working on integration with Apache Drill and exploring
using Kudu in a production use case.
 * Several community-built Docker images, tutorials, and blog posts have
sprouted up since Kudu’s release.



By bringing Kudu to Apache, we hope to encourage further contribution from
the above organizations as well as to engage new users and contributors in
the community.

=== Core Developers ===

Kudu was initially developed as a project at Cloudera. Most of the
contributions to date have been by developers employed by Cloudera.



Many of the developers are committers or PMC members on other Apache
projects.

=== Alignment ===

As a project in the big data ecosystem, Kudu is aligned with several other
ASF projects. Kudu includes input/output format integration with Apache
Hadoop, and this integration can also provide a bridge to Apache Spark. We
are planning to integrate with Apache Hive in the near future. We also
integrate closely with Cloudera Impala, which is also currently being
proposed for incubation. We have also scheduled a hackathon with the Apache
Drill team to work on integration with that query engine.

== Known Risks ==

=== Orphaned Products ===

The risk of Kudu being abandoned is low. Cloudera has invested a great deal
in the initial development of the project, and intends to grow its
investment over time as Kudu becomes a product adopted by its customer
base. Several other organizations are also experimenting with Kudu for
production use cases which would live for many years.

=== Inexperience with Open Source ===

Kudu has been released in the open for less than two months. However, from
our very first public announcement we have been committed to open-source
style development:

 * our code reviews are fully public and documented on a mailing list
 * our daily development chatter is in a public chat room
 * we send out weekly “community status” reports highlighting news and
contributions
 * we published our entire JIRA history and discuss bugs in the open
 * we published our entire Git commit history, going back three years (no
squashing)



Several of the initial committers are experienced open source developers,
several being committers and/or PMC members on other ASF projects (Hadoop,
HBase, Thrift, Flume, et al). Those who are not ASF committers have
experience on non-ASF open source projects (Kiji, open-vm-tools, et al).

=== Homogenous Developers ===

The initial committers are employees or former employees of Cloudera.
However, the committers are spread across multiple offices (Palo Alto, San
Francisco, Melbourne), so the team is familiar with working in a
distributed environment across varied time zones.



The project has received some contributions from developers outside of
Cloudera, and is starting to attract a ''user'' community as well. We hope
to continue to encourage contributions from these developers and community
members and grow them into committers after they have had time to continue
their contributions.

=== Reliance on Salaried Developers ===

As mentioned above, the majority of development up to this point has been
sponsored by Cloudera. We have seen several community users participate in
discussions who are hobbyists interested in distributed systems and
databases, and hope that they will continue their participation in the
project going forward.

=== Relationships with Other Apache Products ===

Kudu is currently related to the following other Apache projects:

 * Hadoop: Kudu provides MapReduce input/output formats for integration
 * Spark: Kudu integrates with Spark via the above-mentioned input formats,
and work is progressing on support for Spark Data Frames and Spark SQL.



The Kudu team has reached out to several other Apache projects to start
discussing integrations, including Flume, Kafka, Hive, and Drill.



Kudu integrates with Impala, which is also being proposed for incubation.



Kudu is already collaborating on ValueVector, a proposed TLP spinning out
from the Apache Drill community.



We look forward to continuing to integrate and collaborate with these
communities.

=== An Excessive Fascination with the Apache Brand ===

Many of the initial committers are already experienced Apache committers,
and understand the true value provided by the Apache Way and the principles
of the ASF. We believe that this development and contribution model is
especially appropriate for storage products, where Apache’s
community-over-code philosophy ensures long term viability and
consensus-based participation.

== Documentation ==

 * Documentation is written in AsciiDoc and committed in the Kudu source
repository:

 * https://github.com/cloudera/kudu/tree/master/docs



 * The Kudu web site is version-controlled on the ‘gh-pages’ branch of the
above repository.

 * A LaTeX whitepaper is also published, and the source is available within
the same repository.
 * APIs are documented within the source code as JavaDoc or C++-style
documentation comments.
 * Many design documents are stored within the source code repository as
text files next to the code being documented.

== Source and Intellectual Property Submission Plan ==

The Kudu codebase and web site is currently hosted on GitHub and will be
transitioned to the ASF repositories during incubation. Kudu is already
licensed under the Apache 2.0 license.



Some portions of the code are imported from other open source projects
under the Apache 2.0, BSD, or MIT licenses, with copyrights held by authors
other than the initial committers. These copyright notices are maintained
in those files as well as a top-level NOTICE.txt file. We believe this to
be permissible under the license terms and ASF policies, and confirmed via
a recent thread on general@incubator.apache.org .



The “Kudu” name is not a registered trademark, though before the initial
release of the project, we performed a trademark search and Cloudera’s
legal counsel deemed it acceptable in the context of a data storage engine.
There exists an unrelated open source project by the same name related to
deployments on Microsoft’s Azure cloud service. We have been in contact
with legal counsel from Microsoft and have obtained their approval for the
use of the Kudu name.



Cloudera currently owns several domain names related to Kudu (getkudu.io,
kududb.io, et al) which will be transferred to the ASF and redirected to
the official page during incubation.



Portions of Kudu are protected by pending or published patents owned by
Cloudera. Given the protections already granted by the Apache License, we
do not anticipate any explicit licensing or transfer of this intellectual
property.

== External Dependencies ==

The full set of dependencies and licenses are listed in
https://github.com/cloudera/kudu/blob/master/LICENSE.txt

and summarized here:

 * '''Twitter Bootstrap''': Apache 2.0
 * '''d3''': BSD 3-clause
 * '''epoch JS library''': MIT
 * '''lz4''': BSD 2-clause
 * '''gflags''': BSD 3-clause
 * '''glog''': BSD 3-clause
 * '''gperftools''': BSD 3-clause
 * '''libev''': BSD 2-clause
 * '''squeasel''':MIT license
 * '''protobuf''': BSD 3-clause
 * '''rapidjson''': MIT
 * '''snappy''': BSD 3-clause
 * '''trace-viewer''': BSD 3-clause
 * '''zlib''': zlib license
 * '''llvm''': University of Illinois/NCSA Open Source (BSD-alike)
 * '''bitshuffle''': MIT
 * '''boost''': Boost license
 * '''curl''': MIT
 * '''libunwind''': MIT
 * '''nvml''': BSD 3-clause
 * '''cyrus-sasl''': Cyrus SASL license (BSD-alike)
 * '''openssl''': OpenSSL License (BSD-alike)

 * '''Guava''': Apache 2.0
 * '''StumbleUpon Async''': BSD
 * '''Apache Hadoop''': Apache 2.0
 * '''Apache log4j''': Apache 2.0
 * '''Netty''': Apache 2.0
 * '''slf4j''': MIT
 * '''Apache Commons''': Apache 2.0
 * '''murmur''': Apache 2.0


'''Build/test-only dependencies''':

 * '''CMake''': BSD 3-clause
 * '''gcovr''': BSD 3-clause
 * '''gmock''': BSD 3-clause
 * '''Apache Maven''': Apache 2.0
 * '''JUnit''': EPL
 * '''Mockito''': MIT

== Cryptography ==

Kudu does not currently include any cryptography-related code.

== Required Resources ==

=== Mailing lists ===

 * private@kudu.incubator.apache.org (PMC)
 * commits@kudu.incubator.apache.org (git push emails)
 * issues@kudu.incubator.apache.org (JIRA issue feed)
 * dev@kudu.incubator.apache.org (Gerrit code reviews plus dev discussion)
 * user@kudu.incubator.apache.org (User questions)


=== Repository ===

 * git://git.apache.org/kudu

=== Gerrit ===

We hope to continue using Gerrit for our code review and commit workflow.
The Kudu team has already been in contact with Jake Farrell to start
discussions on how Gerrit can fit into the ASF. We know that several other
ASF projects and podlings are also interested in Gerrit.



If the Infrastructure team does not have the bandwidth to support Gerrit,
we will continue to support our own instance of Gerrit for Kudu, and make
the necessary integrations such that commits are properly authenticated and
maintain sufficient provenance to uphold the ASF standards (e.g. via the
solution adopted by the AsterixDB podling).

== Issue Tracking ==

We would like to import our current JIRA project into the ASF JIRA, such
that our historical commit messages and code comments continue to reference
the appropriate bug numbers.

== Initial Committers ==

 * Adar Dembo adar@cloudera.com
 * Alex Feinberg alex@strlen.net
 * Andrew Wang wang@apache.org
 * Dan Burkert dan@cloudera.com
 * David Alves dralves@apache.org
 * Jean-Daniel Cryans jdcryans@apache.org
 * Mike Percy mpercy@apache.org
 * Misty Stanley-Jones misty@apache.org
 * Todd Lipcon todd@apache.org

The initial list of committers was seeded by listing those contributors who
have contributed 20 or more patches in the last 12 months, indicating that
they are active and have achieved merit through participation on the
project. We chose not to include other contributors who either have not yet
contributed a significant number of patches, or whose contributions are far
in the past and we don’t expect to be active within the ASF.

== Affiliations ==

 * Adar Dembo - Cloudera
 * Alex Feinberg - Forward Networks
 * Andrew Wang - Cloudera
 * Dan Burkert - Cloudera
 * David Alves - Cloudera
 * Jean-Daniel Cryans - Cloudera
 * Mike Percy - Cloudera
 * Misty Stanley-Jones - Cloudera
 * Todd Lipcon - Cloudera

== Sponsors ==

=== Champion ===

 * Todd Lipcon

=== Nominated Mentors ===

 * Jake Farrell - ASF Member and Infra team member, Acquia
 * Brock Noland - ASF Member, StreamSets
 * Michael Stack - ASF Member, Cloudera
 * Jarek Jarcec Cecho - ASF Member, Cloudera
 * Chris Mattmann - ASF Member, NASA JPL and USC
 * Julien Le Dem - Incubator PMC, Dremio

=== Sponsoring Entity ===

The Apache Incubator

Re: [DISCUSS] Kudu incubator proposal

Posted by Todd Lipcon <to...@apache.org>.
On Tue, Nov 17, 2015 at 4:42 PM, Konstantin Boudnik <co...@apache.org> wrote:

>
> > For now, I think "meritocracy" should be followed -- when contributors
> > demonstrate sufficient merit, we can add them as committers. Note that
> > there are plenty of my coworkers who have made small contributions in the
> > past, and they aren't listed as contributors either.
>
> So, you're saying that people were chosen to be listed or not as the
> contributors merely by the amount of the code they have contributed to the
> project. Am I reading this right?
>

As stated in the proposal:

"The initial list of committers was seeded by listing those contributors
who have contributed 20 or more patches in the last 12 months, indicating
that they are active and have achieved merit through participation on the
project. We chose not to include other contributors who either have not yet
contributed a significant number of patches, or whose contributions are far
in the past and we don’t expect to be active within the ASF."
Note that, since our documentation and web site are also versioned in git,
this includes one initial committer who has not written any code but has
been a major contributor to our docs and site.

This is well aligned with the process I've seen on the many ASF projects
I've been involved with. The ASF "get involved" page[1] says:

"Apache is a meritocracy. That is, once someone has shown sufficient
sustained committment to a project by helping out and contributing work to
the project (and the ASF) may be voted in by the project as a committer.

...[snipped]... If your work shows merit, the PMC for the project may hold
a vote to invite you to become a committer.
Note that becoming a committer is not just about submitting some patches;
it's also about helping out on the development and user discussion lists,
helping with documentation and the issue tracker, and showing long-term
interest. "

What we're proposing for Kudu is to follow the above model: show long-term
interest and sufficient sustained commitment, and you can become a
committer. While I appreciate that some incubator projects are happy to add
people without the above criteria, we'd prefer to follow the meritocratic
model as described above.

-Todd

[1] http://www.apache.org/foundation/getinvolved.html

Re: [DISCUSS] Kudu incubator proposal

Posted by Henry Robinson <he...@cloudera.com>.
And we'd be pleased to hear your advice over on our [DISCUSS] thread :)

On 17 November 2015 at 16:59, Henry Saputra <he...@gmail.com> wrote:

> You were trying to comment on Impala proposal =P
>
> On Tue, Nov 17, 2015 at 4:56 PM, Marvin Humphrey <ma...@rectangular.com>
> wrote:
> > On Tue, Nov 17, 2015 at 4:53 PM, Marvin Humphrey <ma...@rectangular.com>
> wrote:
> >> On Tue, Nov 17, 2015 at 4:42 PM, Konstantin Boudnik <co...@apache.org>
> wrote:
> >
> >> I agree that this prospective podling is going to have a lot of work
> >> to do, and I think that a more diverse Mentor corps is badly needed.
> >> But those are separate issues.
> >
> > Bah, I had the Mentor list for Kudu confused with the other proposal
> > from today. That critique doesn't apply here.
> >
> > Marvin
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> > For additional commands, e-mail: general-help@incubator.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>
>


-- 
Henry Robinson
Software Engineer
Cloudera
415-994-6679

Re: [DISCUSS] Kudu incubator proposal

Posted by Henry Saputra <he...@gmail.com>.
You were trying to comment on Impala proposal =P

On Tue, Nov 17, 2015 at 4:56 PM, Marvin Humphrey <ma...@rectangular.com> wrote:
> On Tue, Nov 17, 2015 at 4:53 PM, Marvin Humphrey <ma...@rectangular.com> wrote:
>> On Tue, Nov 17, 2015 at 4:42 PM, Konstantin Boudnik <co...@apache.org> wrote:
>
>> I agree that this prospective podling is going to have a lot of work
>> to do, and I think that a more diverse Mentor corps is badly needed.
>> But those are separate issues.
>
> Bah, I had the Mentor list for Kudu confused with the other proposal
> from today. That critique doesn't apply here.
>
> Marvin
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [DISCUSS] Kudu incubator proposal

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Tue, Nov 17, 2015 at 4:53 PM, Marvin Humphrey <ma...@rectangular.com> wrote:
> On Tue, Nov 17, 2015 at 4:42 PM, Konstantin Boudnik <co...@apache.org> wrote:

> I agree that this prospective podling is going to have a lot of work
> to do, and I think that a more diverse Mentor corps is badly needed.
> But those are separate issues.

Bah, I had the Mentor list for Kudu confused with the other proposal
from today. That critique doesn't apply here.

Marvin

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [DISCUSS] Kudu incubator proposal

Posted by Konstantin Boudnik <co...@apache.org>.
On Mon, Nov 23, 2015 at 11:17AM, Chris Mattmann wrote:
> Hi Alex,
> 
> We don’t have to hear from every author. If we don’t have an ICLA
> or SGA on file for them, we’ll remove their code from the initial

Yup, that's the concern I was trying to address by asking to reach out to the
contributors. If we simply can do this at the IP-check time - fine.

Thanks for taking time explaining this, Chris.
  Cos

> import. BTW, that’s a few steps away. We’re also rehashing stuff
> that has been discussed many times over, frankly.
> 
> Cheers,
> Chris
> 
> 
> 
> -----Original Message-----
> From: Alex Harui <ah...@adobe.com>
> Reply-To: "general@incubator.apache.org" <ge...@incubator.apache.org>
> Date: Monday, November 23, 2015 at 10:46 AM
> To: "general@incubator.apache.org" <ge...@incubator.apache.org>
> Subject: Re: [DISCUSS] Kudu incubator proposal
> 
> >
> >
> >On 11/23/15, 8:23 AM, "Mattmann, Chris A (3980)"
> ><ch...@jpl.nasa.gov> wrote:
> >
> >>Alex, 
> >>
> >>Please re-read my email. As I stated we don’t take code that
> >>authors don’t want us to have. So far, we haven’t heard from any of
> >>the authors on the incoming Kudu project that that’s the case. If
> >>it’s not the case, we go by the license of the project which stipulates
> >>how code can be copied, modified, reused, etc.
> >
> >Yes, but my interpretation of your words is that folks have to opt out,
> >not the other way around.  I thought the "take rule" meant that folks have
> >to opt in via SGA/CLA or for minor stuff, I'd think an email to dev@ would
> >suffice.
> >
> >So it isn't whether you haven't heard from any of the authors, it is
> >whether you have heard from every author.
> >
> >Thanks,
> >-Alex
> >
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
> 

Re: [DISCUSS] Kudu incubator proposal

Posted by Todd Lipcon <to...@apache.org>.
On Sun, Nov 22, 2015 at 12:28 PM, Konstantin Boudnik <co...@apache.org> wrote:

>
> Basing the "diversity" on affiliaiton (or any other arbitrary property) is
> quite bogus - I am with you and Roy on this. All I want to make sure that
> people who contributed code and, perhaps, became inactive for a period of
> time, are aware about the project trying to enter the incubation. Which
> might
> trigger their wish to participate again.
>
> Hopefully, the third time to clarify my initial point would be a charm ;)
>
>
Third time's the charm :)

I sent an email to our existing kudu-dev and kudu-user mailing lists:
https://groups.google.com/forum/#!topic/kudu-user/_0RfyNHWPZE

i also forwarded the email along with a specific note to encourage
continued contribution to those who have contributed patches in the past
but were not listed as initial committers.

Any other suggestions for the proposal? Otherwise it seems like
discussion's winding down and I'll call a VOTE on Monday or Tuesday.

-Todd

Re: [DISCUSS] Kudu incubator proposal

Posted by Ted Dunning <te...@gmail.com>.
Since the contributors were employed at Cloudera, they probably signed an
invention assignment.  That means Cloudera can sign an SGA.

On Wed, Nov 25, 2015 at 11:39 AM, Greg Stein <gs...@gmail.com> wrote:

> On Mon, Nov 23, 2015 at 12:46 PM, Alex Harui <ah...@adobe.com> wrote:
>
> > On 11/23/15, 8:23 AM, "Mattmann, Chris A (3980)"
> > <ch...@jpl.nasa.gov> wrote:
> >
> > >Alex,
> > >
> > >Please re-read my email. As I stated we don’t take code that
> > >authors don’t want us to have. So far, we haven’t heard from any of
> > >the authors on the incoming Kudu project that that’s the case. If
> > >it’s not the case, we go by the license of the project which stipulates
> > >how code can be copied, modified, reused, etc.
> >
> > Yes, but my interpretation of your words is that folks have to opt out,
> >
>
> Correct: opt-out.
>
> Since this code is under ALv2, we can import it to the ASF under that
> license. We have always done stuff like this, including other permissive
> licenses.
>
> But this isn't simply importing a library, this is saying "the ASF is now
> the primary locus of development for >this< code." And that's where people
> can say, "woah. I hate you guys. don't develop my code there", and so we
> nuke it.
>
> SGA/iCLA is to give us rights that we otherwise wouldn't have (ie. the code
> was under a different license).
>
> Cheers,
> -g
>

Re: [DISCUSS] Kudu incubator proposal

Posted by Owen O'Malley <om...@apache.org>.
On Tue, Nov 24, 2015 at 7:39 PM, Greg Stein <gs...@gmail.com> wrote:

> On Mon, Nov 23, 2015 at 12:46 PM, Alex Harui <ah...@adobe.com> wrote:
>
> > On 11/23/15, 8:23 AM, "Mattmann, Chris A (3980)"
> > <ch...@jpl.nasa.gov> wrote:
> >
> > >Alex,
> > >
> > >Please re-read my email. As I stated we don’t take code that
> > >authors don’t want us to have. So far, we haven’t heard from any of
> > >the authors on the incoming Kudu project that that’s the case. If
> > >it’s not the case, we go by the license of the project which stipulates
> > >how code can be copied, modified, reused, etc.
> >
> > Yes, but my interpretation of your words is that folks have to opt out,
> >
>
> Correct: opt-out.
>
> Since this code is under ALv2, we can import it to the ASF under that
> license. We have always done stuff like this, including other permissive
> licenses.
>
> But this isn't simply importing a library, this is saying "the ASF is now
> the primary locus of development for >this< code." And that's where people
> can say, "woah. I hate you guys. don't develop my code there", and so we
> nuke it.
>
> SGA/iCLA is to give us rights that we otherwise wouldn't have (ie. the code
> was under a different license).
>

It is worth looking back at the thread on Bloodhound
<http://www.google.com/url?q=http%3A%2F%2Fmail-archives.apache.org%2Fmod_mbox%2Fincubator-general%2F201201.mbox%2F%253C0F2EA54E-4419-428F-A604-46EF59C40469%2540gbiv.com%253E&sa=D&sntz=1&usg=AFQjCNG4tmh9dY86HFVyRZlTE66tCjvhKg>
.

The important thing is that Apache doesn't fork communities. In this case,
the community wants to move to Apache. That is great and should be allowed.
They shouldn't need to get an explicit permission from each contributor
over the years.

.. Owen

Re: [DISCUSS] Kudu incubator proposal

Posted by Greg Stein <gs...@gmail.com>.
On Mon, Nov 23, 2015 at 12:46 PM, Alex Harui <ah...@adobe.com> wrote:

> On 11/23/15, 8:23 AM, "Mattmann, Chris A (3980)"
> <ch...@jpl.nasa.gov> wrote:
>
> >Alex,
> >
> >Please re-read my email. As I stated we don’t take code that
> >authors don’t want us to have. So far, we haven’t heard from any of
> >the authors on the incoming Kudu project that that’s the case. If
> >it’s not the case, we go by the license of the project which stipulates
> >how code can be copied, modified, reused, etc.
>
> Yes, but my interpretation of your words is that folks have to opt out,
>

Correct: opt-out.

Since this code is under ALv2, we can import it to the ASF under that
license. We have always done stuff like this, including other permissive
licenses.

But this isn't simply importing a library, this is saying "the ASF is now
the primary locus of development for >this< code." And that's where people
can say, "woah. I hate you guys. don't develop my code there", and so we
nuke it.

SGA/iCLA is to give us rights that we otherwise wouldn't have (ie. the code
was under a different license).

Cheers,
-g

Re: [DISCUSS] Kudu incubator proposal

Posted by Chris Mattmann <ma...@apache.org>.
Hi Alex,

We don’t have to hear from every author. If we don’t have an ICLA
or SGA on file for them, we’ll remove their code from the initial
import. BTW, that’s a few steps away. We’re also rehashing stuff
that has been discussed many times over, frankly.

Cheers,
Chris



-----Original Message-----
From: Alex Harui <ah...@adobe.com>
Reply-To: "general@incubator.apache.org" <ge...@incubator.apache.org>
Date: Monday, November 23, 2015 at 10:46 AM
To: "general@incubator.apache.org" <ge...@incubator.apache.org>
Subject: Re: [DISCUSS] Kudu incubator proposal

>
>
>On 11/23/15, 8:23 AM, "Mattmann, Chris A (3980)"
><ch...@jpl.nasa.gov> wrote:
>
>>Alex, 
>>
>>Please re-read my email. As I stated we don’t take code that
>>authors don’t want us to have. So far, we haven’t heard from any of
>>the authors on the incoming Kudu project that that’s the case. If
>>it’s not the case, we go by the license of the project which stipulates
>>how code can be copied, modified, reused, etc.
>
>Yes, but my interpretation of your words is that folks have to opt out,
>not the other way around.  I thought the "take rule" meant that folks have
>to opt in via SGA/CLA or for minor stuff, I'd think an email to dev@ would
>suffice.
>
>So it isn't whether you haven't heard from any of the authors, it is
>whether you have heard from every author.
>
>Thanks,
>-Alex
>



---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [DISCUSS] Kudu incubator proposal

Posted by Alex Harui <ah...@adobe.com>.

On 11/23/15, 8:23 AM, "Mattmann, Chris A (3980)"
<ch...@jpl.nasa.gov> wrote:

>Alex, 
>
>Please re-read my email. As I stated we don’t take code that
>authors don’t want us to have. So far, we haven’t heard from any of
>the authors on the incoming Kudu project that that’s the case. If
>it’s not the case, we go by the license of the project which stipulates
>how code can be copied, modified, reused, etc.

Yes, but my interpretation of your words is that folks have to opt out,
not the other way around.  I thought the "take rule" meant that folks have
to opt in via SGA/CLA or for minor stuff, I'd think an email to dev@ would
suffice.

So it isn't whether you haven't heard from any of the authors, it is
whether you have heard from every author.

Thanks,
-Alex


Re: [DISCUSS] Kudu incubator proposal

Posted by Niall Pemberton <ni...@gmail.com>.
On Mon, Nov 23, 2015 at 4:23 PM, Mattmann, Chris A (3980) <
chris.a.mattmann@jpl.nasa.gov> wrote:

> Alex,
>
> Please re-read my email. As I stated we don’t take code that
> authors don’t want us to have.


Surely it depends on how that work was licensed before the ASF?

    http://www.apache.org/legal/src-headers.html#3party

Niall


> So far, we haven’t heard from any of
> the authors on the incoming Kudu project that that’s the case. If
> it’s not the case, we go by the license of the project which stipulates
> how code can be copied, modified, reused, etc.
>
> Cheers,
> Chris
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattmann@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
> -----Original Message-----
> From: Alex Harui <ah...@adobe.com>
> Reply-To: "general@incubator.apache.org" <ge...@incubator.apache.org>
> Date: Monday, November 23, 2015 at 8:14 AM
> To: "general@incubator.apache.org" <ge...@incubator.apache.org>
> Subject: Re: [DISCUSS] Kudu incubator proposal
>
> >
> >
> >On 11/22/15, 12:51 PM, "Mattmann, Chris A (3980)"
> ><ch...@jpl.nasa.gov> wrote:
> >
> >>If they have code contributions part of
> >>this
> >>code base, that they don’t want included, they can state that. It was my
> >>understanding this code base was Apache License, version 2, beforehand,
> >>thus
> >>we have the ability to include and modify their code as part of an ALv2
> >>licensed code base, that is being brought to the ASF.
> >
> >I thought there was a "rule" that Apache projects don't "take" code,
> >regardless of license.  IOW, that every line code must be "donated" to the
> >ASF via SGA or CLA.  That would imply that unless the contributors for
> >every line of code in the incoming the code base have documented their
> >donation, that the code base will need documentation (via LICENSE and
> >maybe comments in the files and headers) as to what AL code remains
> >third-party and which is officially part of the ASF project.  Sure, the
> >fact that the code is under AL lets you include and modify, but I think
> >that without such documentation that code is "bundled" and third-party.  I
> >thought the default is that it isn't "included" and they need to state
> >that it is ok for it to be "included".
> >
> >Otherwise, there's a pile of Google Code and GitHub projects I'm going to
> >grab for my project.
> >
> >Thanks,
> >-Alex
> >
> >
> >
> >---------------------------------------------------------------------
> >To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> >For additional commands, e-mail: general-help@incubator.apache.org
>
>

Re: [DISCUSS] Kudu incubator proposal

Posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov>.
Alex, 

Please re-read my email. As I stated we don’t take code that
authors don’t want us to have. So far, we haven’t heard from any of
the authors on the incoming Kudu project that that’s the case. If
it’s not the case, we go by the license of the project which stipulates
how code can be copied, modified, reused, etc.

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++





-----Original Message-----
From: Alex Harui <ah...@adobe.com>
Reply-To: "general@incubator.apache.org" <ge...@incubator.apache.org>
Date: Monday, November 23, 2015 at 8:14 AM
To: "general@incubator.apache.org" <ge...@incubator.apache.org>
Subject: Re: [DISCUSS] Kudu incubator proposal

>
>
>On 11/22/15, 12:51 PM, "Mattmann, Chris A (3980)"
><ch...@jpl.nasa.gov> wrote:
>
>>If they have code contributions part of
>>this 
>>code base, that they don’t want included, they can state that. It was my
>>understanding this code base was Apache License, version 2, beforehand,
>>thus 
>>we have the ability to include and modify their code as part of an ALv2
>>licensed code base, that is being brought to the ASF.
>
>I thought there was a "rule" that Apache projects don't "take" code,
>regardless of license.  IOW, that every line code must be "donated" to the
>ASF via SGA or CLA.  That would imply that unless the contributors for
>every line of code in the incoming the code base have documented their
>donation, that the code base will need documentation (via LICENSE and
>maybe comments in the files and headers) as to what AL code remains
>third-party and which is officially part of the ASF project.  Sure, the
>fact that the code is under AL lets you include and modify, but I think
>that without such documentation that code is "bundled" and third-party.  I
>thought the default is that it isn't "included" and they need to state
>that it is ok for it to be "included".
>
>Otherwise, there's a pile of Google Code and GitHub projects I'm going to
>grab for my project.
>
>Thanks,
>-Alex
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>For additional commands, e-mail: general-help@incubator.apache.org


Re: [DISCUSS] Kudu incubator proposal

Posted by Alex Harui <ah...@adobe.com>.

On 11/22/15, 12:51 PM, "Mattmann, Chris A (3980)"
<ch...@jpl.nasa.gov> wrote:

>If they have code contributions part of
>this 
>code base, that they don’t want included, they can state that. It was my
>understanding this code base was Apache License, version 2, beforehand,
>thus 
>we have the ability to include and modify their code as part of an ALv2
>licensed code base, that is being brought to the ASF.

I thought there was a "rule" that Apache projects don't "take" code,
regardless of license.  IOW, that every line code must be "donated" to the
ASF via SGA or CLA.  That would imply that unless the contributors for
every line of code in the incoming the code base have documented their
donation, that the code base will need documentation (via LICENSE and
maybe comments in the files and headers) as to what AL code remains
third-party and which is officially part of the ASF project.  Sure, the
fact that the code is under AL lets you include and modify, but I think
that without such documentation that code is "bundled" and third-party.  I
thought the default is that it isn't "included" and they need to state
that it is ok for it to be "included".

Otherwise, there's a pile of Google Code and GitHub projects I'm going to
grab for my project.

Thanks,
-Alex



---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: [DISCUSS] Kudu incubator proposal

Posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov>.
Hi Cos,

-----Original Message-----

From: Konstantin Boudnik <co...@apache.org>
Reply-To: <ge...@incubator.apache.org>
Date: Sunday, November 22, 2015 at 12:28 PM
To: <ge...@incubator.apache.org>
Subject: Re: [DISCUSS] Kudu incubator proposal

>On Sun, Nov 22, 2015 at 08:19PM, Mattmann, Chris A (3980) wrote:
>> Hi,
>> 
>> Todd has answered this question already - that folks will be invited
>> based on their contributions made during Incubation. The mentors on
>> this project have a history of being inclusive, so I think we’re fine
>> on that.
>> 
>> As for inviting people based on their company affiliation, this has
>> been dealt with plenty of times in the Incubator, most recently
>> Roy’s comments I believe in 2012 which summarize this - diversity,
>> while to be encouraged, is not a strict requirement of Incubation.
>
>Chris, I couldn't care less about where the contributors are coming from.
>Basing the "diversity" on affiliaiton (or any other arbitrary property) is
>quite bogus - I am with you and Roy on this. All I want to make sure that
>people who contributed code and, perhaps, became inactive for a period of
>time, are aware about the project trying to enter the incubation. Which
>might
>trigger their wish to participate again.

Todd and the initial contributors list of this proposal are who came to me
and the other mentors and who approached to bring this project to Apache.
If there are other folks who have contributed to the project prior to
Apache, 
good for them and if they want to be part of the community that’s between
them and
quite honestly Todd and the initial contributors. There is no requirement
to 
include them on this proposal. If they have code contributions part of
this 
code base, that they don’t want included, they can state that. It was my
understanding this code base was Apache License, version 2, beforehand,
thus 
we have the ability to include and modify their code as part of an ALv2
licensed code base, that is being brought to the ASF.

Community wise and regardless of the license, if someone who contributed
prior wasn’t included in this proposal and wanted their code removed,
I would assume Todd and the members of the initial PPMC would honor
their wishes since we only want code that wants to be included in this
project. Recall also - we are forming a community here - which is a set of
people that can and desire to work together.

>
>Hopefully, the third time to clarify my initial point would be a charm ;)

Clarified. That said, see above and my own point.

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++




>
>> Cheers,
>> Chris
>> 
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Chief Architect
>> Instrument Software and Science Data Systems Section (398)
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 168-519, Mailstop: 168-527
>> Email: chris.a.mattmann@nasa.gov
>> WWW:  http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Associate Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> 
>> 
>> 
>> 
>> 
>> -----Original Message-----
>> From: Konstantin Boudnik <co...@apache.org>
>> Reply-To: <ge...@incubator.apache.org>
>> Date: Sunday, November 22, 2015 at 12:14 PM
>> To: <ge...@incubator.apache.org>
>> Subject: Re: [DISCUSS] Kudu incubator proposal
>> 
>> >My point exactly, thanks Henry!
>> >
>> >On Tue, Nov 17, 2015 at 10:51PM, Henry Saputra wrote:
>> >> Hi Todd,
>> >> 
>> >> One concern, other IPMCs could help correct me if I am wrong, for
>> >> project that already open source and accepting contributions from
>> >> individuals which not part of initial committers is that it needs to
>> >> get the consent or grant from those contributors when moving to ASF.
>> >> Unless, the individuals have already signed the one for Cloudera when
>> >> contributing to Kudu.
>> >> 
>> >> Want to make sure the software grant concern be addressed early.
>> >> 
>> >> Thanks,
>> >> 
>> >> - Henry
>> >> 
>> >> On Tue, Nov 17, 2015 at 6:47 PM, Todd Lipcon <to...@apache.org> wrote:
>> >> > On Tue, Nov 17, 2015 at 6:36 PM, Luke Han <lu...@gmail.com>
>>wrote:
>> >> >
>> >> >> In "community" section of this proposal, there are many companies
>> >> >> have been mentioned including Xiaomi, Dropbox, Intel and Dremio,
>> >> >> and said there are contributions from them.
>> >> >>
>> >> >> I think their engineers are more interesting and be involved
>> >> >> in Kudu actively, why not think about to invite them to be
>>committer
>> >>first?
>> >> >>
>> >> >
>> >> > Per earlier messages on the thread and per the proposal: because
>>they
>> >> > haven't yet shown significant contributions over a sustained
>>period.
>> >> > Interest is one thing, but interest is not sufficient to become a
>> >>committer
>> >> > in a meritocracy. The folks called out in the community section are
>> >>those
>> >> > who have interest and are somewhere on the path towards
>> >>committership, but
>> >> > aren't there yet. I have every hope (and expectation) that they
>>will
>> >>get
>> >> > there if they keep up their good work.
>> >> >
>> >> > I would rather not discuss specific people's progress towards
>> >>committership
>> >> > in a public forum. Rather, I hoped that we could start with the
>> >>proposed
>> >> > committer list, and during incubation we can discuss potential new
>> >> > committers and PMC members following the typical ASF processes on
>>our
>> >> > PPMC's private list. I'm counting on our experienced mentors to
>>keep
>> >>us all
>> >> > honest -- they should absolutely call us out if the initial
>> >>committers act
>> >> > exclusionary or otherwise violate the ASF principles of being an
>> >>inclusive
>> >> > meritocracy.
>> >> >
>> >> > -Todd
>> >> 
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> >> For additional commands, e-mail: general-help@incubator.apache.org
>> >> 
>> 


Re: [DISCUSS] Kudu incubator proposal

Posted by Konstantin Boudnik <co...@apache.org>.
On Sun, Nov 22, 2015 at 08:19PM, Mattmann, Chris A (3980) wrote:
> Hi,
> 
> Todd has answered this question already - that folks will be invited
> based on their contributions made during Incubation. The mentors on
> this project have a history of being inclusive, so I think we’re fine
> on that.
> 
> As for inviting people based on their company affiliation, this has
> been dealt with plenty of times in the Incubator, most recently
> Roy’s comments I believe in 2012 which summarize this - diversity,
> while to be encouraged, is not a strict requirement of Incubation.

Chris, I couldn't care less about where the contributors are coming from.
Basing the "diversity" on affiliaiton (or any other arbitrary property) is
quite bogus - I am with you and Roy on this. All I want to make sure that
people who contributed code and, perhaps, became inactive for a period of
time, are aware about the project trying to enter the incubation. Which might
trigger their wish to participate again.

Hopefully, the third time to clarify my initial point would be a charm ;)

Regards,
  Cos

> Cheers,
> Chris
> 
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattmann@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 
> 
> 
> 
> 
> -----Original Message-----
> From: Konstantin Boudnik <co...@apache.org>
> Reply-To: <ge...@incubator.apache.org>
> Date: Sunday, November 22, 2015 at 12:14 PM
> To: <ge...@incubator.apache.org>
> Subject: Re: [DISCUSS] Kudu incubator proposal
> 
> >My point exactly, thanks Henry!
> >
> >On Tue, Nov 17, 2015 at 10:51PM, Henry Saputra wrote:
> >> Hi Todd,
> >> 
> >> One concern, other IPMCs could help correct me if I am wrong, for
> >> project that already open source and accepting contributions from
> >> individuals which not part of initial committers is that it needs to
> >> get the consent or grant from those contributors when moving to ASF.
> >> Unless, the individuals have already signed the one for Cloudera when
> >> contributing to Kudu.
> >> 
> >> Want to make sure the software grant concern be addressed early.
> >> 
> >> Thanks,
> >> 
> >> - Henry
> >> 
> >> On Tue, Nov 17, 2015 at 6:47 PM, Todd Lipcon <to...@apache.org> wrote:
> >> > On Tue, Nov 17, 2015 at 6:36 PM, Luke Han <lu...@gmail.com> wrote:
> >> >
> >> >> In "community" section of this proposal, there are many companies
> >> >> have been mentioned including Xiaomi, Dropbox, Intel and Dremio,
> >> >> and said there are contributions from them.
> >> >>
> >> >> I think their engineers are more interesting and be involved
> >> >> in Kudu actively, why not think about to invite them to be committer
> >>first?
> >> >>
> >> >
> >> > Per earlier messages on the thread and per the proposal: because they
> >> > haven't yet shown significant contributions over a sustained period.
> >> > Interest is one thing, but interest is not sufficient to become a
> >>committer
> >> > in a meritocracy. The folks called out in the community section are
> >>those
> >> > who have interest and are somewhere on the path towards
> >>committership, but
> >> > aren't there yet. I have every hope (and expectation) that they will
> >>get
> >> > there if they keep up their good work.
> >> >
> >> > I would rather not discuss specific people's progress towards
> >>committership
> >> > in a public forum. Rather, I hoped that we could start with the
> >>proposed
> >> > committer list, and during incubation we can discuss potential new
> >> > committers and PMC members following the typical ASF processes on our
> >> > PPMC's private list. I'm counting on our experienced mentors to keep
> >>us all
> >> > honest -- they should absolutely call us out if the initial
> >>committers act
> >> > exclusionary or otherwise violate the ASF principles of being an
> >>inclusive
> >> > meritocracy.
> >> >
> >> > -Todd
> >> 
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> >> For additional commands, e-mail: general-help@incubator.apache.org
> >> 
> 

Re: [DISCUSS] Kudu incubator proposal

Posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov>.
Hi,

Todd has answered this question already - that folks will be invited
based on their contributions made during Incubation. The mentors on
this project have a history of being inclusive, so I think we’re fine
on that.

As for inviting people based on their company affiliation, this has
been dealt with plenty of times in the Incubator, most recently
Roy’s comments I believe in 2012 which summarize this - diversity,
while to be encouraged, is not a strict requirement of Incubation.

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++





-----Original Message-----
From: Konstantin Boudnik <co...@apache.org>
Reply-To: <ge...@incubator.apache.org>
Date: Sunday, November 22, 2015 at 12:14 PM
To: <ge...@incubator.apache.org>
Subject: Re: [DISCUSS] Kudu incubator proposal

>My point exactly, thanks Henry!
>
>On Tue, Nov 17, 2015 at 10:51PM, Henry Saputra wrote:
>> Hi Todd,
>> 
>> One concern, other IPMCs could help correct me if I am wrong, for
>> project that already open source and accepting contributions from
>> individuals which not part of initial committers is that it needs to
>> get the consent or grant from those contributors when moving to ASF.
>> Unless, the individuals have already signed the one for Cloudera when
>> contributing to Kudu.
>> 
>> Want to make sure the software grant concern be addressed early.
>> 
>> Thanks,
>> 
>> - Henry
>> 
>> On Tue, Nov 17, 2015 at 6:47 PM, Todd Lipcon <to...@apache.org> wrote:
>> > On Tue, Nov 17, 2015 at 6:36 PM, Luke Han <lu...@gmail.com> wrote:
>> >
>> >> In "community" section of this proposal, there are many companies
>> >> have been mentioned including Xiaomi, Dropbox, Intel and Dremio,
>> >> and said there are contributions from them.
>> >>
>> >> I think their engineers are more interesting and be involved
>> >> in Kudu actively, why not think about to invite them to be committer
>>first?
>> >>
>> >
>> > Per earlier messages on the thread and per the proposal: because they
>> > haven't yet shown significant contributions over a sustained period.
>> > Interest is one thing, but interest is not sufficient to become a
>>committer
>> > in a meritocracy. The folks called out in the community section are
>>those
>> > who have interest and are somewhere on the path towards
>>committership, but
>> > aren't there yet. I have every hope (and expectation) that they will
>>get
>> > there if they keep up their good work.
>> >
>> > I would rather not discuss specific people's progress towards
>>committership
>> > in a public forum. Rather, I hoped that we could start with the
>>proposed
>> > committer list, and during incubation we can discuss potential new
>> > committers and PMC members following the typical ASF processes on our
>> > PPMC's private list. I'm counting on our experienced mentors to keep
>>us all
>> > honest -- they should absolutely call us out if the initial
>>committers act
>> > exclusionary or otherwise violate the ASF principles of being an
>>inclusive
>> > meritocracy.
>> >
>> > -Todd
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> For additional commands, e-mail: general-help@incubator.apache.org
>> 


Re: [DISCUSS] Kudu incubator proposal

Posted by Konstantin Boudnik <co...@apache.org>.
My point exactly, thanks Henry!

On Tue, Nov 17, 2015 at 10:51PM, Henry Saputra wrote:
> Hi Todd,
> 
> One concern, other IPMCs could help correct me if I am wrong, for
> project that already open source and accepting contributions from
> individuals which not part of initial committers is that it needs to
> get the consent or grant from those contributors when moving to ASF.
> Unless, the individuals have already signed the one for Cloudera when
> contributing to Kudu.
> 
> Want to make sure the software grant concern be addressed early.
> 
> Thanks,
> 
> - Henry
> 
> On Tue, Nov 17, 2015 at 6:47 PM, Todd Lipcon <to...@apache.org> wrote:
> > On Tue, Nov 17, 2015 at 6:36 PM, Luke Han <lu...@gmail.com> wrote:
> >
> >> In "community" section of this proposal, there are many companies
> >> have been mentioned including Xiaomi, Dropbox, Intel and Dremio,
> >> and said there are contributions from them.
> >>
> >> I think their engineers are more interesting and be involved
> >> in Kudu actively, why not think about to invite them to be committer first?
> >>
> >
> > Per earlier messages on the thread and per the proposal: because they
> > haven't yet shown significant contributions over a sustained period.
> > Interest is one thing, but interest is not sufficient to become a committer
> > in a meritocracy. The folks called out in the community section are those
> > who have interest and are somewhere on the path towards committership, but
> > aren't there yet. I have every hope (and expectation) that they will get
> > there if they keep up their good work.
> >
> > I would rather not discuss specific people's progress towards committership
> > in a public forum. Rather, I hoped that we could start with the proposed
> > committer list, and during incubation we can discuss potential new
> > committers and PMC members following the typical ASF processes on our
> > PPMC's private list. I'm counting on our experienced mentors to keep us all
> > honest -- they should absolutely call us out if the initial committers act
> > exclusionary or otherwise violate the ASF principles of being an inclusive
> > meritocracy.
> >
> > -Todd
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
> 

Re: [DISCUSS] Kudu incubator proposal

Posted by Todd Lipcon <to...@cloudera.com>.
On Tue, Nov 17, 2015 at 10:51 PM, Henry Saputra <he...@gmail.com>
wrote:

> Hi Todd,
>
> One concern, other IPMCs could help correct me if I am wrong, for
> project that already open source and accepting contributions from
> individuals which not part of initial committers is that it needs to
> get the consent or grant from those contributors when moving to ASF.
> Unless, the individuals have already signed the one for Cloudera when
> contributing to Kudu.
>
>
I believe that all contributors outside of Cloudera employees have signed
CLAs with Cloudera (either individual ones, or via their employers). That
said, if we need to track people down to get more explicit consent to
contribute code to Apache, I'm sure folks should be fine with it.

I'll await further instruction from the IPMC on this one.

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera

Re: [DISCUSS] Kudu incubator proposal

Posted by Henry Saputra <he...@gmail.com>.
Hi Todd,

One concern, other IPMCs could help correct me if I am wrong, for
project that already open source and accepting contributions from
individuals which not part of initial committers is that it needs to
get the consent or grant from those contributors when moving to ASF.
Unless, the individuals have already signed the one for Cloudera when
contributing to Kudu.

Want to make sure the software grant concern be addressed early.

Thanks,

- Henry

On Tue, Nov 17, 2015 at 6:47 PM, Todd Lipcon <to...@apache.org> wrote:
> On Tue, Nov 17, 2015 at 6:36 PM, Luke Han <lu...@gmail.com> wrote:
>
>> In "community" section of this proposal, there are many companies
>> have been mentioned including Xiaomi, Dropbox, Intel and Dremio,
>> and said there are contributions from them.
>>
>> I think their engineers are more interesting and be involved
>> in Kudu actively, why not think about to invite them to be committer first?
>>
>
> Per earlier messages on the thread and per the proposal: because they
> haven't yet shown significant contributions over a sustained period.
> Interest is one thing, but interest is not sufficient to become a committer
> in a meritocracy. The folks called out in the community section are those
> who have interest and are somewhere on the path towards committership, but
> aren't there yet. I have every hope (and expectation) that they will get
> there if they keep up their good work.
>
> I would rather not discuss specific people's progress towards committership
> in a public forum. Rather, I hoped that we could start with the proposed
> committer list, and during incubation we can discuss potential new
> committers and PMC members following the typical ASF processes on our
> PPMC's private list. I'm counting on our experienced mentors to keep us all
> honest -- they should absolutely call us out if the initial committers act
> exclusionary or otherwise violate the ASF principles of being an inclusive
> meritocracy.
>
> -Todd

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [DISCUSS] Kudu incubator proposal

Posted by Todd Lipcon <to...@apache.org>.
On Tue, Nov 17, 2015 at 6:36 PM, Luke Han <lu...@gmail.com> wrote:

> In "community" section of this proposal, there are many companies
> have been mentioned including Xiaomi, Dropbox, Intel and Dremio,
> and said there are contributions from them.
>
> I think their engineers are more interesting and be involved
> in Kudu actively, why not think about to invite them to be committer first?
>

Per earlier messages on the thread and per the proposal: because they
haven't yet shown significant contributions over a sustained period.
Interest is one thing, but interest is not sufficient to become a committer
in a meritocracy. The folks called out in the community section are those
who have interest and are somewhere on the path towards committership, but
aren't there yet. I have every hope (and expectation) that they will get
there if they keep up their good work.

I would rather not discuss specific people's progress towards committership
in a public forum. Rather, I hoped that we could start with the proposed
committer list, and during incubation we can discuss potential new
committers and PMC members following the typical ASF processes on our
PPMC's private list. I'm counting on our experienced mentors to keep us all
honest -- they should absolutely call us out if the initial committers act
exclusionary or otherwise violate the ASF principles of being an inclusive
meritocracy.

-Todd

Re: [DISCUSS] Kudu incubator proposal

Posted by Luke Han <lu...@gmail.com>.
In "community" section of this proposal, there are many companies
have been mentioned including Xiaomi, Dropbox, Intel and Dremio,
and said there are contributions from them.

I think their engineers are more interesting and be involved
in Kudu actively, why not think about to invite them to be committer first?

Just my 2 cents:)

Thanks.




*=== Community ===*
>
> *Though Kudu is relatively new as an open source project, it has already**seen
> promising growth in its community across several organizations:*
>
> * * '''Cloudera''' is the original development sponsor for Kudu.*
> * * '''Xiaomi''' has been helping to develop and optimize Kudu for a new*
> *production use case, contributing code, benchmarks, feedback, and*
> *conference talks.*
> * * '''Intel''' has contributed optimizations related to their hardware*
> *technologies.*
> * * '''Dropbox''' has been experimenting with Kudu for a machine
> monitoring*
> *use case, and has been contributing bug reports and product feedback.*
> * * '''Dremio''' is working on integration with Apache Drill and exploring*
> *using Kudu in a production use case.*
> * * Several community-built Docker images, tutorials, and blog posts have**sprouted
> up since Kudu’s release.*
>
>
>
> *By bringing Kudu to Apache, we hope to encourage further contribution
> from*
> *the above organizations as well as to engage new users and contributors
> in**the community.*
>



Best Regards!
---------------------

Luke Han

On Wed, Nov 18, 2015 at 10:19 AM, Todd Lipcon <to...@apache.org> wrote:

> On Tue, Nov 17, 2015 at 6:04 PM, Konstantin Boudnik <co...@apache.org>
> wrote:
>
> > On Tue, Nov 17, 2015 at 04:53PM, Marvin Humphrey wrote:
> > > On Tue, Nov 17, 2015 at 4:42 PM, Konstantin Boudnik <co...@apache.org>
> > wrote:
> > >
> > > > So, you're saying that people were chosen to be listed or not as the
> > > > contributors merely by the amount of the code they have contributed
> to
> > the
> > > > project. Am I reading this right?
> > >
> > > We've had this debate about committer cattle call additions many
> > > times. The position that Todd is taking is completely reasonable. The
> > > expectation that just about anybody can join a project during the
> > > proposal phase is messed up and I wish that tradition had never caught
> >
> > That's not my point, Marvin. The people who contributed less than 20
> > commits
> > (hmm, why not 4 or a 107?) are still contributors. And in my opinion,
> they
> > at
> > least have to be invited to participate in the podling, if it is accepted
> > by
> > IPMC. So, I will re-phrase: "was an invitation to participate in the
> > project
> > extended to all contributors?".
> >
> > Shall it be done formally or by providing "Interested Party" is an
> > implementation detail.
> >
> >
> We haven't formally extended any invitation to these people to continue
> participating in the project at the ASF. Those who are active in the
> project I fully anticipate will continue to be active and work their way
> towards committership. Others who contributed in the past but whom we
> haven't seen in 12+ months are of course welcome to come back to the
> project. In that case, I think it would be an easy vote to committership.
>
> If anyone is interested in the project, feel free to edit the wiki and add
> an "Interested Parties" section. I haven't seen that one before on other
> proposals, and not sure what it accomplishes. The whole nature of the ASF
> is that no explicit "invitations to participate" are necessary. Everyone is
> by default invited to participate and contribute. To make that explicit,
> though, I'll make sure to send out a note to all of our previous
> contributors once we're accepted for incubation.
>
> The reason that we elected to include the "active in the last 12 months"
> was to avoid creating a project with a super-long list of employees of a
> single company. Seeing such a list can be discouraging for new folks --
> both because of the "wall of single employer" effect and because newcomers
> to the community are likely to be confused why these people have been made
> committers when they have never once participated inside the ASF.
>
> If the IPMC at large feels that the above reasoning is inappropriate, we
> can change the proposal to include a few more committers -- there's a small
> handful of folks who made significant contributions to the project early on
> that are no longer active. I don't imagine these people will end up
> contributing or voting on releases, though, so it seems like an artificial
> "stuffing" of the committer list.
>
> Thanks
> -Todd
>

Re: [DISCUSS] Kudu incubator proposal

Posted by Todd Lipcon <to...@apache.org>.
On Tue, Nov 17, 2015 at 6:04 PM, Konstantin Boudnik <co...@apache.org> wrote:

> On Tue, Nov 17, 2015 at 04:53PM, Marvin Humphrey wrote:
> > On Tue, Nov 17, 2015 at 4:42 PM, Konstantin Boudnik <co...@apache.org>
> wrote:
> >
> > > So, you're saying that people were chosen to be listed or not as the
> > > contributors merely by the amount of the code they have contributed to
> the
> > > project. Am I reading this right?
> >
> > We've had this debate about committer cattle call additions many
> > times. The position that Todd is taking is completely reasonable. The
> > expectation that just about anybody can join a project during the
> > proposal phase is messed up and I wish that tradition had never caught
>
> That's not my point, Marvin. The people who contributed less than 20
> commits
> (hmm, why not 4 or a 107?) are still contributors. And in my opinion, they
> at
> least have to be invited to participate in the podling, if it is accepted
> by
> IPMC. So, I will re-phrase: "was an invitation to participate in the
> project
> extended to all contributors?".
>
> Shall it be done formally or by providing "Interested Party" is an
> implementation detail.
>
>
We haven't formally extended any invitation to these people to continue
participating in the project at the ASF. Those who are active in the
project I fully anticipate will continue to be active and work their way
towards committership. Others who contributed in the past but whom we
haven't seen in 12+ months are of course welcome to come back to the
project. In that case, I think it would be an easy vote to committership.

If anyone is interested in the project, feel free to edit the wiki and add
an "Interested Parties" section. I haven't seen that one before on other
proposals, and not sure what it accomplishes. The whole nature of the ASF
is that no explicit "invitations to participate" are necessary. Everyone is
by default invited to participate and contribute. To make that explicit,
though, I'll make sure to send out a note to all of our previous
contributors once we're accepted for incubation.

The reason that we elected to include the "active in the last 12 months"
was to avoid creating a project with a super-long list of employees of a
single company. Seeing such a list can be discouraging for new folks --
both because of the "wall of single employer" effect and because newcomers
to the community are likely to be confused why these people have been made
committers when they have never once participated inside the ASF.

If the IPMC at large feels that the above reasoning is inappropriate, we
can change the proposal to include a few more committers -- there's a small
handful of folks who made significant contributions to the project early on
that are no longer active. I don't imagine these people will end up
contributing or voting on releases, though, so it seems like an artificial
"stuffing" of the committer list.

Thanks
-Todd

Re: [DISCUSS] Kudu incubator proposal

Posted by Konstantin Boudnik <co...@apache.org>.
On Tue, Nov 17, 2015 at 04:53PM, Marvin Humphrey wrote:
> On Tue, Nov 17, 2015 at 4:42 PM, Konstantin Boudnik <co...@apache.org> wrote:
> 
> > So, you're saying that people were chosen to be listed or not as the
> > contributors merely by the amount of the code they have contributed to the
> > project. Am I reading this right?
> 
> We've had this debate about committer cattle call additions many
> times. The position that Todd is taking is completely reasonable. The
> expectation that just about anybody can join a project during the
> proposal phase is messed up and I wish that tradition had never caught

That's not my point, Marvin. The people who contributed less than 20 commits
(hmm, why not 4 or a 107?) are still contributors. And in my opinion, they at
least have to be invited to participate in the podling, if it is accepted by
IPMC. So, I will re-phrase: "was an invitation to participate in the project
extended to all contributors?".

Shall it be done formally or by providing "Interested Party" is an
implementation detail. 

Cos

> on. Instead, there ought to be an "Interested Party" section in the
> proposal template where people can express interest and "subscribe to
> your newsletter".
> 

Re: [DISCUSS] Kudu incubator proposal

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Tue, Nov 17, 2015 at 4:42 PM, Konstantin Boudnik <co...@apache.org> wrote:

> So, you're saying that people were chosen to be listed or not as the
> contributors merely by the amount of the code they have contributed to the
> project. Am I reading this right?

We've had this debate about committer cattle call additions many
times. The position that Todd is taking is completely reasonable. The
expectation that just about anybody can join a project during the
proposal phase is messed up and I wish that tradition had never caught
on. Instead, there ought to be an "Interested Party" section in the
proposal template where people can express interest and "subscribe to
your newsletter".

I agree that this prospective podling is going to have a lot of work
to do, and I think that a more diverse Mentor corps is badly needed.
But those are separate issues.

Marvin Humphrey

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [DISCUSS] Kudu incubator proposal

Posted by Konstantin Boudnik <co...@apache.org>.
On Tue, Nov 17, 2015 at 04:33PM, Todd Lipcon wrote:
> >
> >
> > > Hi Atri,
> > >
> > > Thanks for the interest! Following the example of other recent incubator
> > > projects, we would like to keep the initial committer list to those who
> > are
> > > already have a track record of contributions to the project. We'd love to
> > > have you involved as a contributor during incubation, and of course would
> > > be glad to add you as a committer after you've become a regular
> > contributor.
> >
> > Considering that the project has been closed-sourced until very recently
> > such
> > "following" would surely create pretty high entry barrier for new
> > committers.
> > Which of course would be a concern, from the community growth perspective.
> >
> 
> Sure, I expect the IPMC and our mentors to hold us to reasonable
> expectations during incubation.
> 
> For now, I think "meritocracy" should be followed -- when contributors
> demonstrate sufficient merit, we can add them as committers. Note that
> there are plenty of my coworkers who have made small contributions in the
> past, and they aren't listed as contributors either.

So, you're saying that people were chosen to be listed or not as the
contributors merely by the amount of the code they have contributed to the
project. Am I reading this right?

Cos

Re: [DISCUSS] Kudu incubator proposal

Posted by Todd Lipcon <to...@cloudera.com>.
>
>
> > Hi Atri,
> >
> > Thanks for the interest! Following the example of other recent incubator
> > projects, we would like to keep the initial committer list to those who
> are
> > already have a track record of contributions to the project. We'd love to
> > have you involved as a contributor during incubation, and of course would
> > be glad to add you as a committer after you've become a regular
> contributor.
>
> Considering that the project has been closed-sourced until very recently
> such
> "following" would surely create pretty high entry barrier for new
> committers.
> Which of course would be a concern, from the community growth perspective.
>

Sure, I expect the IPMC and our mentors to hold us to reasonable
expectations during incubation.

For now, I think "meritocracy" should be followed -- when contributors
demonstrate sufficient merit, we can add them as committers. Note that
there are plenty of my coworkers who have made small contributions in the
past, and they aren't listed as contributors either.

Todd
-- 
Todd Lipcon
Software Engineer, Cloudera

Re: [DISCUSS] Kudu incubator proposal

Posted by Jake Farrell <jf...@apache.org>.
merit is merit, why would the barrier for new committers be different here
than in any other project? If the ramp up and time to learn the projects
source is the barrier then it is on us to help make it easier through
documentation, clear project roadmap and entry level consumable tickets to
help those new contributors get a footing

-Jake

On Tue, Nov 17, 2015 at 7:27 PM, Konstantin Boudnik <co...@apache.org> wrote:

> On Tue, Nov 17, 2015 at 10:43AM, Todd Lipcon wrote:
> > On Tue, Nov 17, 2015 at 10:38 AM, Atri Sharma <at...@gmail.com>
> wrote:
> >
> > > Sounds great.
> > >
> > > I would love to be an help as a committer, if possible. This seems to
> be
> > > fantastic in line with my focus areas and can help existing big data
> > > projects to accelerate so Kudu's growth is something I would care
> about.
> > >
> >
> > Hi Atri,
> >
> > Thanks for the interest! Following the example of other recent incubator
> > projects, we would like to keep the initial committer list to those who
> are
> > already have a track record of contributions to the project. We'd love to
> > have you involved as a contributor during incubation, and of course would
> > be glad to add you as a committer after you've become a regular
> contributor.
>
> Considering that the project has been closed-sourced until very recently
> such
> "following" would surely create pretty high entry barrier for new
> committers.
> Which of course would be a concern, from the community growth perspective.
>
> Cos
>

Re: [DISCUSS] Kudu incubator proposal

Posted by Konstantin Boudnik <co...@apache.org>.
On Tue, Nov 17, 2015 at 10:43AM, Todd Lipcon wrote:
> On Tue, Nov 17, 2015 at 10:38 AM, Atri Sharma <at...@gmail.com> wrote:
> 
> > Sounds great.
> >
> > I would love to be an help as a committer, if possible. This seems to be
> > fantastic in line with my focus areas and can help existing big data
> > projects to accelerate so Kudu's growth is something I would care about.
> >
> 
> Hi Atri,
> 
> Thanks for the interest! Following the example of other recent incubator
> projects, we would like to keep the initial committer list to those who are
> already have a track record of contributions to the project. We'd love to
> have you involved as a contributor during incubation, and of course would
> be glad to add you as a committer after you've become a regular contributor.

Considering that the project has been closed-sourced until very recently such
"following" would surely create pretty high entry barrier for new committers.
Which of course would be a concern, from the community growth perspective.

Cos

Re: [DISCUSS] Kudu incubator proposal

Posted by Todd Lipcon <to...@apache.org>.
On Tue, Nov 17, 2015 at 10:38 AM, Atri Sharma <at...@gmail.com> wrote:

> Sounds great.
>
> I would love to be an help as a committer, if possible. This seems to be
> fantastic in line with my focus areas and can help existing big data
> projects to accelerate so Kudu's growth is something I would care about.
>

Hi Atri,

Thanks for the interest! Following the example of other recent incubator
projects, we would like to keep the initial committer list to those who are
already have a track record of contributions to the project. We'd love to
have you involved as a contributor during incubation, and of course would
be glad to add you as a committer after you've become a regular contributor.

Thanks
Todd

>

Re: [DISCUSS] Kudu incubator proposal

Posted by Atri Sharma <at...@gmail.com>.
Sounds great.

I would love to be an help as a committer, if possible. This seems to be
fantastic in line with my focus areas and can help existing big data
projects to accelerate so Kudu's growth is something I would care about.

On Wed, Nov 18, 2015 at 12:02 AM, Todd Lipcon <to...@apache.org> wrote:

> Hi all,
>
> We'd like to start a discussion proposing the submission of Kudu to the
> Apache Incubator.
>
> The proposal is available on the Wiki here:
> https://wiki.apache.org/incubator/KuduProposal
> and pasted in this email for easy quoting during discussion.
>
> Looking forward to hearing feedback!
>
> -Todd
> ---------------------
>
> = Kudu Proposal =
>
> == Abstract ==
>
> Kudu is a distributed columnar storage engine built for the Apache Hadoop
> ecosystem.
>
> == Proposal ==
>
> Kudu is an open source storage engine for structured data which supports
> low-latency random access together with efficient analytical access
> patterns. Kudu distributes data using horizontal partitioning and
> replicates each partition using Raft consensus, providing low
> mean-time-to-recovery and low tail latencies. Kudu is designed within the
> context of the Apache Hadoop ecosystem and supports many integrations with
> other data analytics projects both inside and outside of the Apache
> Software Foundation.
>
>
>
> We propose to incubate Kudu as a project of the Apache Software Foundation.
>
> == Background ==
>
> In recent years, explosive growth in the amount of data being generated and
> captured by enterprises has resulted in the rapid adoption of open source
> technology which is able to store massive data sets at scale and at low
> cost. In particular, the Apache Hadoop ecosystem has become a focal point
> for such “big data” workloads, because many traditional open source
> database systems have lagged in offering a scalable alternative.
>
>
>
> Structured storage in the Hadoop ecosystem has typically been achieved in
> two ways: for static data sets, data is typically stored on Apache HDFS
> using binary data formats such as Apache Avro or Apache Parquet. However,
> neither HDFS nor these formats has any provision for updating individual
> records, or for efficient random access. Mutable data sets are typically
> stored in semi-structured stores such as Apache HBase or Apache Cassandra.
> These systems allow for low-latency record-level reads and writes, but lag
> far behind the static file formats in terms of sequential read throughput
> for applications such as SQL-based analytics or machine learning.
>
>
>
> Kudu is a new storage system designed and implemented from the ground up to
> fill this gap between high-throughput sequential-access storage systems
> such as HDFS and low-latency random-access systems such as HBase or
> Cassandra. While these existing systems continue to hold advantages in some
> situations, Kudu offers a “happy medium” alternative that can dramatically
> simplify the architecture of many common workloads. In particular, Kudu
> offers a simple API for row-level inserts, updates, and deletes, while
> providing table scans at throughputs similar to Parquet, a commonly-used
> columnar format for static data.
>
>
>
> More information on Kudu can be found at the existing open source project
> website: http://getkudu.io and in particular in the Kudu white-paper PDF:
> http://getkudu.io/kudu.pdf from which the above was excerpted.
>
> == Rationale ==
>
> As described above, Kudu fills an important gap in the open source storage
> ecosystem. After our initial open source project release in September 2015,
> we have seen a great amount of interest across a diverse set of users and
> companies. We believe that, as a storage system, it is critical to build an
> equally diverse set of contributors in the development community. Our
> experiences as committers and PMC members on other Apache projects have
> taught us the value of diverse communities in ensuring both longevity and
> high quality for such foundational systems.
>
> == Initial Goals ==
>
>  * Move the existing codebase, website, documentation, and mailing lists to
> Apache-hosted infrastructure
>  * Work with the infrastructure team to implement and approve our code
> review, build, and testing workflows in the context of the ASF
>  * Incremental development and releases per Apache guidelines
>
> == Current Status ==
>
> ==== Releases ====
>
> Kudu has undergone one public release, tagged here
> https://github.com/cloudera/kudu/tree/kudu0.5.0-release
>
> This initial release was not performed in the typical ASF fashion -- no
> source tarball was released, but rather only convenience binaries made
> available in Cloudera’s repositories. We will adopt the ASF source release
> process upon joining the incubator.
>
>
> ==== Source ====
>
> Kudu’s source is currently hosted on GitHub at
> https://github.com/cloudera/kudu
>
> This repository will be transitioned to Apache’s git hosting during
> incubation.
>
>
>
> ==== Code review ====
>
> Kudu’s code reviews are currently public and hosted on Gerrit at
> http://gerrit.cloudera.org:8080/#/q/status:open+project:kudu
>
> The Kudu developer community is very happy with gerrit and hopes to work
> with the Apache Infrastructure team to figure out how we can continue to
> use Gerrit within ASF policies.
>
>
>
> ==== Issue tracking ====
>
> Kudu’s bug and feature tracking is hosted on JIRA at:
> https://issues.cloudera.org/projects/KUDU/summary
>
> This JIRA instance contains bugs and development discussion dating back 2
> years prior to Kudu’s open source release and will provide an initial seed
> for the ASF JIRA.
>
>
>
> ==== Community discussion ====
>
> Kudu has several public discussion forums, linked here:
> http://getkudu.io/community.html
>
>
>
> ==== Build Infrastructure ====
>
> The Kudu Gerrit instance is configured to only allow patches to be
> committed after running them through an extensive set of pre-commit tests
> and code lints. The project currently makes use of elastic public cloud
> resources to perform these tests. Until this point, these resources have
> been internal to Cloudera, though we are currently investing in moving to a
> publicly accessible infrastructure.
>
>
>
> ==== Development practices ====
>
> Given that Kudu is a persistent storage engine, the community has a high
> quality bar for contributions to its core. We have a firm belief that high
> quality is achieved through automation, not manual inspection, and hence
> put a focus on thorough testing and build infrastructure to ensure that
> bar. The development community also practices review-then-commit for all
> changes to ensure that changes are accompanied by appropriate tests, are
> well commented, etc.
>
> Rather than seeing these practices as barriers to contribution, we believe
> that a fully automated and standardized review and testing practice makes
> it easier for new contributors to have patches accepted. Any new developer
> may post a patch to Gerrit using the same workflow as a seasoned
> contributor, and the same suite of tests will be automatically run. If the
> tests pass, a committer can quickly review and commit the contribution from
> their web browser.
>
> === Meritocracy ===
>
> We believe strongly in meritocracy in electing committers and PMC members.
> We believe that contributions can come in forms other than just code: for
> example, one of our initial proposed committers has contributed solely in
> the area of project documentation. We will encourage contributions and
> participation of all types, and ensure that contributors are appropriately
> recognized.
>
> === Community ===
>
> Though Kudu is relatively new as an open source project, it has already
> seen promising growth in its community across several organizations:
>
>  * '''Cloudera''' is the original development sponsor for Kudu.
>  * '''Xiaomi''' has been helping to develop and optimize Kudu for a new
> production use case, contributing code, benchmarks, feedback, and
> conference talks.
>  * '''Intel''' has contributed optimizations related to their hardware
> technologies.
>  * '''Dropbox''' has been experimenting with Kudu for a machine monitoring
> use case, and has been contributing bug reports and product feedback.
>  * '''Dremio''' is working on integration with Apache Drill and exploring
> using Kudu in a production use case.
>  * Several community-built Docker images, tutorials, and blog posts have
> sprouted up since Kudu’s release.
>
>
>
> By bringing Kudu to Apache, we hope to encourage further contribution from
> the above organizations as well as to engage new users and contributors in
> the community.
>
> === Core Developers ===
>
> Kudu was initially developed as a project at Cloudera. Most of the
> contributions to date have been by developers employed by Cloudera.
>
>
>
> Many of the developers are committers or PMC members on other Apache
> projects.
>
> === Alignment ===
>
> As a project in the big data ecosystem, Kudu is aligned with several other
> ASF projects. Kudu includes input/output format integration with Apache
> Hadoop, and this integration can also provide a bridge to Apache Spark. We
> are planning to integrate with Apache Hive in the near future. We also
> integrate closely with Cloudera Impala, which is also currently being
> proposed for incubation. We have also scheduled a hackathon with the Apache
> Drill team to work on integration with that query engine.
>
> == Known Risks ==
>
> === Orphaned Products ===
>
> The risk of Kudu being abandoned is low. Cloudera has invested a great deal
> in the initial development of the project, and intends to grow its
> investment over time as Kudu becomes a product adopted by its customer
> base. Several other organizations are also experimenting with Kudu for
> production use cases which would live for many years.
>
> === Inexperience with Open Source ===
>
> Kudu has been released in the open for less than two months. However, from
> our very first public announcement we have been committed to open-source
> style development:
>
>  * our code reviews are fully public and documented on a mailing list
>  * our daily development chatter is in a public chat room
>  * we send out weekly “community status” reports highlighting news and
> contributions
>  * we published our entire JIRA history and discuss bugs in the open
>  * we published our entire Git commit history, going back three years (no
> squashing)
>
>
>
> Several of the initial committers are experienced open source developers,
> several being committers and/or PMC members on other ASF projects (Hadoop,
> HBase, Thrift, Flume, et al). Those who are not ASF committers have
> experience on non-ASF open source projects (Kiji, open-vm-tools, et al).
>
> === Homogenous Developers ===
>
> The initial committers are employees or former employees of Cloudera.
> However, the committers are spread across multiple offices (Palo Alto, San
> Francisco, Melbourne), so the team is familiar with working in a
> distributed environment across varied time zones.
>
>
>
> The project has received some contributions from developers outside of
> Cloudera, and is starting to attract a ''user'' community as well. We hope
> to continue to encourage contributions from these developers and community
> members and grow them into committers after they have had time to continue
> their contributions.
>
> === Reliance on Salaried Developers ===
>
> As mentioned above, the majority of development up to this point has been
> sponsored by Cloudera. We have seen several community users participate in
> discussions who are hobbyists interested in distributed systems and
> databases, and hope that they will continue their participation in the
> project going forward.
>
> === Relationships with Other Apache Products ===
>
> Kudu is currently related to the following other Apache projects:
>
>  * Hadoop: Kudu provides MapReduce input/output formats for integration
>  * Spark: Kudu integrates with Spark via the above-mentioned input formats,
> and work is progressing on support for Spark Data Frames and Spark SQL.
>
>
>
> The Kudu team has reached out to several other Apache projects to start
> discussing integrations, including Flume, Kafka, Hive, and Drill.
>
>
>
> Kudu integrates with Impala, which is also being proposed for incubation.
>
>
>
> Kudu is already collaborating on ValueVector, a proposed TLP spinning out
> from the Apache Drill community.
>
>
>
> We look forward to continuing to integrate and collaborate with these
> communities.
>
> === An Excessive Fascination with the Apache Brand ===
>
> Many of the initial committers are already experienced Apache committers,
> and understand the true value provided by the Apache Way and the principles
> of the ASF. We believe that this development and contribution model is
> especially appropriate for storage products, where Apache’s
> community-over-code philosophy ensures long term viability and
> consensus-based participation.
>
> == Documentation ==
>
>  * Documentation is written in AsciiDoc and committed in the Kudu source
> repository:
>
>  * https://github.com/cloudera/kudu/tree/master/docs
>
>
>
>  * The Kudu web site is version-controlled on the ‘gh-pages’ branch of the
> above repository.
>
>  * A LaTeX whitepaper is also published, and the source is available within
> the same repository.
>  * APIs are documented within the source code as JavaDoc or C++-style
> documentation comments.
>  * Many design documents are stored within the source code repository as
> text files next to the code being documented.
>
> == Source and Intellectual Property Submission Plan ==
>
> The Kudu codebase and web site is currently hosted on GitHub and will be
> transitioned to the ASF repositories during incubation. Kudu is already
> licensed under the Apache 2.0 license.
>
>
>
> Some portions of the code are imported from other open source projects
> under the Apache 2.0, BSD, or MIT licenses, with copyrights held by authors
> other than the initial committers. These copyright notices are maintained
> in those files as well as a top-level NOTICE.txt file. We believe this to
> be permissible under the license terms and ASF policies, and confirmed via
> a recent thread on general@incubator.apache.org .
>
>
>
> The “Kudu” name is not a registered trademark, though before the initial
> release of the project, we performed a trademark search and Cloudera’s
> legal counsel deemed it acceptable in the context of a data storage engine.
> There exists an unrelated open source project by the same name related to
> deployments on Microsoft’s Azure cloud service. We have been in contact
> with legal counsel from Microsoft and have obtained their approval for the
> use of the Kudu name.
>
>
>
> Cloudera currently owns several domain names related to Kudu (getkudu.io,
> kududb.io, et al) which will be transferred to the ASF and redirected to
> the official page during incubation.
>
>
>
> Portions of Kudu are protected by pending or published patents owned by
> Cloudera. Given the protections already granted by the Apache License, we
> do not anticipate any explicit licensing or transfer of this intellectual
> property.
>
> == External Dependencies ==
>
> The full set of dependencies and licenses are listed in
> https://github.com/cloudera/kudu/blob/master/LICENSE.txt
>
> and summarized here:
>
>  * '''Twitter Bootstrap''': Apache 2.0
>  * '''d3''': BSD 3-clause
>  * '''epoch JS library''': MIT
>  * '''lz4''': BSD 2-clause
>  * '''gflags''': BSD 3-clause
>  * '''glog''': BSD 3-clause
>  * '''gperftools''': BSD 3-clause
>  * '''libev''': BSD 2-clause
>  * '''squeasel''':MIT license
>  * '''protobuf''': BSD 3-clause
>  * '''rapidjson''': MIT
>  * '''snappy''': BSD 3-clause
>  * '''trace-viewer''': BSD 3-clause
>  * '''zlib''': zlib license
>  * '''llvm''': University of Illinois/NCSA Open Source (BSD-alike)
>  * '''bitshuffle''': MIT
>  * '''boost''': Boost license
>  * '''curl''': MIT
>  * '''libunwind''': MIT
>  * '''nvml''': BSD 3-clause
>  * '''cyrus-sasl''': Cyrus SASL license (BSD-alike)
>  * '''openssl''': OpenSSL License (BSD-alike)
>
>  * '''Guava''': Apache 2.0
>  * '''StumbleUpon Async''': BSD
>  * '''Apache Hadoop''': Apache 2.0
>  * '''Apache log4j''': Apache 2.0
>  * '''Netty''': Apache 2.0
>  * '''slf4j''': MIT
>  * '''Apache Commons''': Apache 2.0
>  * '''murmur''': Apache 2.0
>
>
> '''Build/test-only dependencies''':
>
>  * '''CMake''': BSD 3-clause
>  * '''gcovr''': BSD 3-clause
>  * '''gmock''': BSD 3-clause
>  * '''Apache Maven''': Apache 2.0
>  * '''JUnit''': EPL
>  * '''Mockito''': MIT
>
> == Cryptography ==
>
> Kudu does not currently include any cryptography-related code.
>
> == Required Resources ==
>
> === Mailing lists ===
>
>  * private@kudu.incubator.apache.org (PMC)
>  * commits@kudu.incubator.apache.org (git push emails)
>  * issues@kudu.incubator.apache.org (JIRA issue feed)
>  * dev@kudu.incubator.apache.org (Gerrit code reviews plus dev discussion)
>  * user@kudu.incubator.apache.org (User questions)
>
>
> === Repository ===
>
>  * git://git.apache.org/kudu
>
> === Gerrit ===
>
> We hope to continue using Gerrit for our code review and commit workflow.
> The Kudu team has already been in contact with Jake Farrell to start
> discussions on how Gerrit can fit into the ASF. We know that several other
> ASF projects and podlings are also interested in Gerrit.
>
>
>
> If the Infrastructure team does not have the bandwidth to support Gerrit,
> we will continue to support our own instance of Gerrit for Kudu, and make
> the necessary integrations such that commits are properly authenticated and
> maintain sufficient provenance to uphold the ASF standards (e.g. via the
> solution adopted by the AsterixDB podling).
>
> == Issue Tracking ==
>
> We would like to import our current JIRA project into the ASF JIRA, such
> that our historical commit messages and code comments continue to reference
> the appropriate bug numbers.
>
> == Initial Committers ==
>
>  * Adar Dembo adar@cloudera.com
>  * Alex Feinberg alex@strlen.net
>  * Andrew Wang wang@apache.org
>  * Dan Burkert dan@cloudera.com
>  * David Alves dralves@apache.org
>  * Jean-Daniel Cryans jdcryans@apache.org
>  * Mike Percy mpercy@apache.org
>  * Misty Stanley-Jones misty@apache.org
>  * Todd Lipcon todd@apache.org
>
> The initial list of committers was seeded by listing those contributors who
> have contributed 20 or more patches in the last 12 months, indicating that
> they are active and have achieved merit through participation on the
> project. We chose not to include other contributors who either have not yet
> contributed a significant number of patches, or whose contributions are far
> in the past and we don’t expect to be active within the ASF.
>
> == Affiliations ==
>
>  * Adar Dembo - Cloudera
>  * Alex Feinberg - Forward Networks
>  * Andrew Wang - Cloudera
>  * Dan Burkert - Cloudera
>  * David Alves - Cloudera
>  * Jean-Daniel Cryans - Cloudera
>  * Mike Percy - Cloudera
>  * Misty Stanley-Jones - Cloudera
>  * Todd Lipcon - Cloudera
>
> == Sponsors ==
>
> === Champion ===
>
>  * Todd Lipcon
>
> === Nominated Mentors ===
>
>  * Jake Farrell - ASF Member and Infra team member, Acquia
>  * Brock Noland - ASF Member, StreamSets
>  * Michael Stack - ASF Member, Cloudera
>  * Jarek Jarcec Cecho - ASF Member, Cloudera
>  * Chris Mattmann - ASF Member, NASA JPL and USC
>  * Julien Le Dem - Incubator PMC, Dremio
>
> === Sponsoring Entity ===
>
> The Apache Incubator
>



-- 
Regards,

Atri
*l'apprenant*