You are viewing a plain text version of this content. The canonical link for it is here.

Posted to general@incubator.apache.org by Jun Rao <ju...@gmail.com> on 2011/06/28 19:00:32 UTC

[VOTE] Kafka to join the Incubator

Hi all,


Since the discussion on the thread of the Kafka incubator proposal is
winding down, I'd like to call a vote.

At the end of this mail, I've put a copy of the current proposal.  Here is
a link to the document in the wiki:
http://wiki.apache.org/incubator/KafkaProposal

And here is a link to the discussion thread:
http://www.mail-archive.com/general@incubator.apache.org/msg29594.html

Please cast your votes:

[  ] +1 Accept Kafka for incubation
[  ] +0 Indifferent to Kafka incubation
[  ]  -1 Reject Kafka for incubation

This vote will close 72 hours from now.

Thanks,

Jun

== Abstract ==
Kafka is a distributed publish-subscribe system for processing large amounts
of streaming data.

== Proposal ==
Kafka provides an extremely high throughput distributed publish/subscribe
messaging system.  Additionally, it supports relatively long term
persistence of messages to support a wide variety of consumers, partitioning
of the message stream across servers and consumers, and functionality for
loading data into Apache Hadoop for offline, batch processing.

== Background ==
Kafka was developed at LinkedIn to process the large amounts of events
generated by that company's website and provide a common repository for many
types of consumers to access and process those events. Kafka has been used
in production at LinkedIn scale to handle dozens of types of events
including page views, searches and social network activity. Kafka clusters
at LinkedIn currently process more than two billion events per day.

Kafka fills the gap between messaging systems such as Apache ActiveMQ, which
provide low latency message delivery but don't focus on throughput, and log
processing systems such as Scribe and Flume, which do not provide adequate
latency for our diverse set of consumers.  Kafka can also be inserted into
traditional log-processing systems, acting as an intermediate step before
further processing. Kafka focuses relentlessly on performance and throughput
by not introspecting into message content, nor indexing them on the broker.
 We also achieve high performance by depending on Java's sendFile/transferTo
capabilities to minimize intermediate buffer copies and relying on the OS's
pagecache to efficiently serve up message contents to consumers. Kafka is
also designed to be scalable and it depends on Apache ZooKeeper for
coordination amongst its producers, brokers and consumers.

Kafka is written in Scala. It was developed internally at LinkedIn to meet
our particular use cases, but will be useful to many organizations facing a
similar need to reliably process large amounts of streaming data.
 Therefore, we would like to share it the ASF and begin developing a
community of developers and users within Apache.

== Rationale ==
Many organizations can benefit from a reliable stream processing system such
as Kafka.  While our use case of processing events from a very large website
like LinkedIn has driven the design of Kafka, its uses are varied and we
expect many new use cases to emerge.  Kafka provides a natural bridge
between near real-time event processing and offline batch processing and
will appeal to many users.

== Current Status ==
=== Meritocracy ===
Our intent with this incubator proposal is to start building a diverse
developer community around Kafka following the Apache meritocracy model.
Since Kafka was open sourced we have solicited contributions via the website
and presentations given to user groups and technical audiences.  We have had
positive responses to these and have received several contributions and
clients for other languages.  We plan to continue this support for new
contributors and work with those who contribute significantly to the project
to make them committers.

=== Community ===
Kafka is currently being used by developed by engineers within LinkedIn and
used in production in that company. Additionally, we have active users in or
have received contributions from a diverse set of companies including
MediaSift, SocialTwist, Clearspring and Urban Airship. Recent public
presentations of Kafka and its goals garnered much interest from potential
contributors. We hope to extend our contributor base significantly and
invite all those who are interested in building high-throughput distributed
systems to participate.  We have begun receiving contributions from outside
of LinkedIn, including clients for several languages including Ruby, PHP,
Clojure, .NET and Python.

To further this goal, we use GitHub issue tracking and branching facilities,
as well as maintaining a public mailing list via Google Groups.

=== Core Developers ===
Kafka is currently being developed by four engineers at LinkedIn: Neha
Narkhede, Jun Rao, Jakob Homan and Jay Kreps. Jun has experience within
Apache as a Cassandra committer and PMC member. Neha has been an active
contributor to several projects LinkedIn has open sourced, including Bobo,
Sensei and Zoie. Jay has experience with open source software as the
originator of the Project Voldemort project, as well as being active within
the Hadoop ecosystem community. Jakob is an Apache Hadoop committer and PMC
and previous Apache ZooKeeper contributor.

=== Alignment ===
The ASF is the natural choice to host the Kafka project as its goal of
encouraging community-driven open-source projects fits with our vision for
Kafka.  Additionally, many other projects with which we are familiar with
and expect Kafka to integrate with, such as Apache Hadoop, Pig, ZooKeeper
and log4j are hosted by the ASF and we will benefit and provide benefit by
close proximity to them.

== Known Risks ==
=== Orphaned Products ===
The core developers plan to work full time on the project. There is very
little risk of Kafka being abandoned as it is a critical part of LinkedIn's
internal infrastructure and is in production use.

=== Inexperience with Open Source ===
All of the core developers have experience with open source development.
 LinkedIn open sourced Kafka several months ago and has been receiving
contributions since.  Jun is an Apache Cassandra committer and PMC member.
 Jay and Neha have been involved with several open source projects released
by LinkedIn.  Jakob has been actively involved with the ASF as a full-time
Hadoop committer and PMC member.

=== Homogeneous Developers ===
The current core developers are all from LinkedIn. However, we hope to
establish a developer community that includes contributors from several
corporations and we actively encouraging new contributors via the mailing
lists and public presentations of Kafka.

=== Reliance on Salaried Developers ===
Currently, the developers are paid to do work on Kafka. However, once the
project has a community built around it, we expect to get committers,
developers and community from outside the current core developers. However,
because LinkedIn relies on Kafka internally, the reliance on salaried
developers is unlikely to change.

=== Relationships with Other Apache Products ===
Kafka is deeply integrated with Apache products. Kafka uses Apache ZooKeeper
to coordinate its state amongst the brokers, consumers, and soon, the
producers.  Kafka provides input formats to allow Hadoop MapReduce to load
data directly from Kafka.  Kafka provides an appender to allow consuming
data directly from Apache log4j.

=== An Excessive Fascination with the Apache Brand ===
While we respect the reputation of the Apache brand and have no doubts that
it will attract contributors and users, our interest is primarily to give
Kafka a solid home as an open source project following an established
development model. We have also given reasons in the Rationale and Alignment
sections.

== Documentation ==
Information about Kafka can be found at [http://sna-projects.com/kafka/] The
following links provide more information about the project:

 * Kafka roadmap and goals: [http://sna-projects.com/kafka/projects.php]
 * The GitHub site: [https://github.com/kafka-dev/kafka]
 * Kafka overview from Jay Kreps: [
http://www.slideshare.net/ydn/hug-january-2011-kafka-presentation]
 * Kafka overview from Jakob Homan: [http://bit.ly/fLmoZz]
 * Kafka paper at NetDB 2011: [
http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
]

== Initial Source ==
Kafka has been under development at LinkedIn since November 2009.  It was
open sourced by LinkedIn in January 2011.  It is currently hosted on github
under the Apache license at [https://github.com/kafka-dev/kafka]

Kafka is mainly written in Scala with some performance testing code in Java.
 Several clients have been contributed in other languages, including Ruby,
PHP, Clojure, .NET and Python.  Its source tree is entirely self contained
and relies of simple build tool (sbt) as its build system and dependency
resolution mechanism.

== External Dependencies ==
The dependencies all have Apache compatible licenses.

== Cryptography ==
Not applicable.

== Required Resources ==
=== Mailing Lists ===
 * kafka-private for private PMC discussions (with moderated subscriptions)
 * kafka-dev
 * kafka-commits
 * kafka-user

=== Subversion Directory ===
[https://svn.apache.org/repos/asf/incubator/kafka]

=== Issue Tracking ===
JIRA Kafka (KAFKA)

=== Other Resources ===
The existing code already has unit tests, so we would like a Hudson instance
to run them whenever a new patch is submitted. This can be added after
project creation.

== Initial Committers ==
 * Jay Kreps
 * Jun Rao
 * Neha Narkhede
 * Jakob Homan
 * Phillip Rhodes
 * Henry Saputra
 * Chris Burroughs

== Affiliations ==
 * Jay Kreps (LinkedIn)
 * Jun Rao (LinkedIn)
 * Neha Narkhede (LinkedIn)
 * Jakob Homan (LinkedIn)
 * Phillip Rhodes (Fogbeam Labs)
 * Henry Saputra (Cisco Systems)
 * Chris Burroughs (Clearspring Technologies)

== Sponsors ==
=== Champion ===
Chris Douglas (Apache Member)

=== Nominated Mentors ===
 * Alan Cabrera (Apache Member)
 * Geir Magnusson, Jr. (Apache Member and Director)
 * Owen O'Malley (Apache Member)

=== Sponsoring Entity ===
We are requesting the Incubator to sponsor this project.

Re: [VOTE] Kafka to join the Incubator

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.

+1 (non-binding)

On Tue, Jun 28, 2011 at 10:30 PM, Jun Rao <ju...@gmail.com> wrote:

> Hi all,
>
>
> Since the discussion on the thread of the Kafka incubator proposal is
> winding down, I'd like to call a vote.
>
> At the end of this mail, I've put a copy of the current proposal.  Here is
> a link to the document in the wiki:
> http://wiki.apache.org/incubator/KafkaProposal
>
> And here is a link to the discussion thread:
> http://www.mail-archive.com/general@incubator.apache.org/msg29594.html
>
> Please cast your votes:
>
> [  ] +1 Accept Kafka for incubation
> [  ] +0 Indifferent to Kafka incubation
> [  ]  -1 Reject Kafka for incubation
>
> This vote will close 72 hours from now.
>
> Thanks,
>
> Jun
>
> == Abstract ==
> Kafka is a distributed publish-subscribe system for processing large
> amounts
> of streaming data.
>
> == Proposal ==
> Kafka provides an extremely high throughput distributed publish/subscribe
> messaging system.  Additionally, it supports relatively long term
> persistence of messages to support a wide variety of consumers,
> partitioning
> of the message stream across servers and consumers, and functionality for
> loading data into Apache Hadoop for offline, batch processing.
>
> == Background ==
> Kafka was developed at LinkedIn to process the large amounts of events
> generated by that company's website and provide a common repository for
> many
> types of consumers to access and process those events. Kafka has been used
> in production at LinkedIn scale to handle dozens of types of events
> including page views, searches and social network activity. Kafka clusters
> at LinkedIn currently process more than two billion events per day.
>
> Kafka fills the gap between messaging systems such as Apache ActiveMQ,
> which
> provide low latency message delivery but don't focus on throughput, and log
> processing systems such as Scribe and Flume, which do not provide adequate
> latency for our diverse set of consumers.  Kafka can also be inserted into
> traditional log-processing systems, acting as an intermediate step before
> further processing. Kafka focuses relentlessly on performance and
> throughput
> by not introspecting into message content, nor indexing them on the broker.
>  We also achieve high performance by depending on Java's
> sendFile/transferTo
> capabilities to minimize intermediate buffer copies and relying on the OS's
> pagecache to efficiently serve up message contents to consumers. Kafka is
> also designed to be scalable and it depends on Apache ZooKeeper for
> coordination amongst its producers, brokers and consumers.
>
> Kafka is written in Scala. It was developed internally at LinkedIn to meet
> our particular use cases, but will be useful to many organizations facing a
> similar need to reliably process large amounts of streaming data.
>  Therefore, we would like to share it the ASF and begin developing a
> community of developers and users within Apache.
>
> == Rationale ==
> Many organizations can benefit from a reliable stream processing system
> such
> as Kafka.  While our use case of processing events from a very large
> website
> like LinkedIn has driven the design of Kafka, its uses are varied and we
> expect many new use cases to emerge.  Kafka provides a natural bridge
> between near real-time event processing and offline batch processing and
> will appeal to many users.
>
> == Current Status ==
> === Meritocracy ===
> Our intent with this incubator proposal is to start building a diverse
> developer community around Kafka following the Apache meritocracy model.
> Since Kafka was open sourced we have solicited contributions via the
> website
> and presentations given to user groups and technical audiences.  We have
> had
> positive responses to these and have received several contributions and
> clients for other languages.  We plan to continue this support for new
> contributors and work with those who contribute significantly to the
> project
> to make them committers.
>
> === Community ===
> Kafka is currently being used by developed by engineers within LinkedIn and
> used in production in that company. Additionally, we have active users in
> or
> have received contributions from a diverse set of companies including
> MediaSift, SocialTwist, Clearspring and Urban Airship. Recent public
> presentations of Kafka and its goals garnered much interest from potential
> contributors. We hope to extend our contributor base significantly and
> invite all those who are interested in building high-throughput distributed
> systems to participate.  We have begun receiving contributions from outside
> of LinkedIn, including clients for several languages including Ruby, PHP,
> Clojure, .NET and Python.
>
> To further this goal, we use GitHub issue tracking and branching
> facilities,
> as well as maintaining a public mailing list via Google Groups.
>
> === Core Developers ===
> Kafka is currently being developed by four engineers at LinkedIn: Neha
> Narkhede, Jun Rao, Jakob Homan and Jay Kreps. Jun has experience within
> Apache as a Cassandra committer and PMC member. Neha has been an active
> contributor to several projects LinkedIn has open sourced, including Bobo,
> Sensei and Zoie. Jay has experience with open source software as the
> originator of the Project Voldemort project, as well as being active within
> the Hadoop ecosystem community. Jakob is an Apache Hadoop committer and PMC
> and previous Apache ZooKeeper contributor.
>
> === Alignment ===
> The ASF is the natural choice to host the Kafka project as its goal of
> encouraging community-driven open-source projects fits with our vision for
> Kafka.  Additionally, many other projects with which we are familiar with
> and expect Kafka to integrate with, such as Apache Hadoop, Pig, ZooKeeper
> and log4j are hosted by the ASF and we will benefit and provide benefit by
> close proximity to them.
>
> == Known Risks ==
> === Orphaned Products ===
> The core developers plan to work full time on the project. There is very
> little risk of Kafka being abandoned as it is a critical part of LinkedIn's
> internal infrastructure and is in production use.
>
> === Inexperience with Open Source ===
> All of the core developers have experience with open source development.
>  LinkedIn open sourced Kafka several months ago and has been receiving
> contributions since.  Jun is an Apache Cassandra committer and PMC member.
>  Jay and Neha have been involved with several open source projects released
> by LinkedIn.  Jakob has been actively involved with the ASF as a full-time
> Hadoop committer and PMC member.
>
> === Homogeneous Developers ===
> The current core developers are all from LinkedIn. However, we hope to
> establish a developer community that includes contributors from several
> corporations and we actively encouraging new contributors via the mailing
> lists and public presentations of Kafka.
>
> === Reliance on Salaried Developers ===
> Currently, the developers are paid to do work on Kafka. However, once the
> project has a community built around it, we expect to get committers,
> developers and community from outside the current core developers. However,
> because LinkedIn relies on Kafka internally, the reliance on salaried
> developers is unlikely to change.
>
> === Relationships with Other Apache Products ===
> Kafka is deeply integrated with Apache products. Kafka uses Apache
> ZooKeeper
> to coordinate its state amongst the brokers, consumers, and soon, the
> producers.  Kafka provides input formats to allow Hadoop MapReduce to load
> data directly from Kafka.  Kafka provides an appender to allow consuming
> data directly from Apache log4j.
>
> === An Excessive Fascination with the Apache Brand ===
> While we respect the reputation of the Apache brand and have no doubts that
> it will attract contributors and users, our interest is primarily to give
> Kafka a solid home as an open source project following an established
> development model. We have also given reasons in the Rationale and
> Alignment
> sections.
>
> == Documentation ==
> Information about Kafka can be found at [http://sna-projects.com/kafka/]
> The
> following links provide more information about the project:
>
>  * Kafka roadmap and goals: [http://sna-projects.com/kafka/projects.php]
>  * The GitHub site: [https://github.com/kafka-dev/kafka]
>  * Kafka overview from Jay Kreps: [
> http://www.slideshare.net/ydn/hug-january-2011-kafka-presentation]
>  * Kafka overview from Jakob Homan: [http://bit.ly/fLmoZz]
>  * Kafka paper at NetDB 2011: [
>
> http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
> ]
>
> == Initial Source ==
> Kafka has been under development at LinkedIn since November 2009.  It was
> open sourced by LinkedIn in January 2011.  It is currently hosted on github
> under the Apache license at [https://github.com/kafka-dev/kafka]
>
> Kafka is mainly written in Scala with some performance testing code in
> Java.
>  Several clients have been contributed in other languages, including Ruby,
> PHP, Clojure, .NET and Python.  Its source tree is entirely self contained
> and relies of simple build tool (sbt) as its build system and dependency
> resolution mechanism.
>
> == External Dependencies ==
> The dependencies all have Apache compatible licenses.
>
> == Cryptography ==
> Not applicable.
>
> == Required Resources ==
> === Mailing Lists ===
>  * kafka-private for private PMC discussions (with moderated subscriptions)
>  * kafka-dev
>  * kafka-commits
>  * kafka-user
>
> === Subversion Directory ===
> [https://svn.apache.org/repos/asf/incubator/kafka]
>
> === Issue Tracking ===
> JIRA Kafka (KAFKA)
>
> === Other Resources ===
> The existing code already has unit tests, so we would like a Hudson
> instance
> to run them whenever a new patch is submitted. This can be added after
> project creation.
>
> == Initial Committers ==
>  * Jay Kreps
>  * Jun Rao
>  * Neha Narkhede
>  * Jakob Homan
>  * Phillip Rhodes
>  * Henry Saputra
>  * Chris Burroughs
>
> == Affiliations ==
>  * Jay Kreps (LinkedIn)
>  * Jun Rao (LinkedIn)
>  * Neha Narkhede (LinkedIn)
>  * Jakob Homan (LinkedIn)
>  * Phillip Rhodes (Fogbeam Labs)
>  * Henry Saputra (Cisco Systems)
>  * Chris Burroughs (Clearspring Technologies)
>
> == Sponsors ==
> === Champion ===
> Chris Douglas (Apache Member)
>
> === Nominated Mentors ===
>  * Alan Cabrera (Apache Member)
>  * Geir Magnusson, Jr. (Apache Member and Director)
>  * Owen O'Malley (Apache Member)
>
> === Sponsoring Entity ===
> We are requesting the Incubator to sponsor this project.
>



-- 
Regards,
Shalin Shekhar Mangar.

Re: [VOTE] Kafka to join the Incubator

Posted by Nigel Daley <nd...@mac.com>.

+1 (binding)

On Jun 28, 2011, at 10:00 AM, Jun Rao wrote:

> Hi all,
> 
> 
> Since the discussion on the thread of the Kafka incubator proposal is
> winding down, I'd like to call a vote.
> 
> At the end of this mail, I've put a copy of the current proposal.  Here is
> a link to the document in the wiki:
> http://wiki.apache.org/incubator/KafkaProposal
> 
> And here is a link to the discussion thread:
> http://www.mail-archive.com/general@incubator.apache.org/msg29594.html
> 
> Please cast your votes:
> 
> [  ] +1 Accept Kafka for incubation
> [  ] +0 Indifferent to Kafka incubation
> [  ]  -1 Reject Kafka for incubation
> 
> This vote will close 72 hours from now.
> 
> Thanks,
> 
> Jun
> 
> == Abstract ==
> Kafka is a distributed publish-subscribe system for processing large amounts
> of streaming data.
> 
> == Proposal ==
> Kafka provides an extremely high throughput distributed publish/subscribe
> messaging system.  Additionally, it supports relatively long term
> persistence of messages to support a wide variety of consumers, partitioning
> of the message stream across servers and consumers, and functionality for
> loading data into Apache Hadoop for offline, batch processing.
> 
> == Background ==
> Kafka was developed at LinkedIn to process the large amounts of events
> generated by that company's website and provide a common repository for many
> types of consumers to access and process those events. Kafka has been used
> in production at LinkedIn scale to handle dozens of types of events
> including page views, searches and social network activity. Kafka clusters
> at LinkedIn currently process more than two billion events per day.
> 
> Kafka fills the gap between messaging systems such as Apache ActiveMQ, which
> provide low latency message delivery but don't focus on throughput, and log
> processing systems such as Scribe and Flume, which do not provide adequate
> latency for our diverse set of consumers.  Kafka can also be inserted into
> traditional log-processing systems, acting as an intermediate step before
> further processing. Kafka focuses relentlessly on performance and throughput
> by not introspecting into message content, nor indexing them on the broker.
> We also achieve high performance by depending on Java's sendFile/transferTo
> capabilities to minimize intermediate buffer copies and relying on the OS's
> pagecache to efficiently serve up message contents to consumers. Kafka is
> also designed to be scalable and it depends on Apache ZooKeeper for
> coordination amongst its producers, brokers and consumers.
> 
> Kafka is written in Scala. It was developed internally at LinkedIn to meet
> our particular use cases, but will be useful to many organizations facing a
> similar need to reliably process large amounts of streaming data.
> Therefore, we would like to share it the ASF and begin developing a
> community of developers and users within Apache.
> 
> == Rationale ==
> Many organizations can benefit from a reliable stream processing system such
> as Kafka.  While our use case of processing events from a very large website
> like LinkedIn has driven the design of Kafka, its uses are varied and we
> expect many new use cases to emerge.  Kafka provides a natural bridge
> between near real-time event processing and offline batch processing and
> will appeal to many users.
> 
> == Current Status ==
> === Meritocracy ===
> Our intent with this incubator proposal is to start building a diverse
> developer community around Kafka following the Apache meritocracy model.
> Since Kafka was open sourced we have solicited contributions via the website
> and presentations given to user groups and technical audiences.  We have had
> positive responses to these and have received several contributions and
> clients for other languages.  We plan to continue this support for new
> contributors and work with those who contribute significantly to the project
> to make them committers.
> 
> === Community ===
> Kafka is currently being used by developed by engineers within LinkedIn and
> used in production in that company. Additionally, we have active users in or
> have received contributions from a diverse set of companies including
> MediaSift, SocialTwist, Clearspring and Urban Airship. Recent public
> presentations of Kafka and its goals garnered much interest from potential
> contributors. We hope to extend our contributor base significantly and
> invite all those who are interested in building high-throughput distributed
> systems to participate.  We have begun receiving contributions from outside
> of LinkedIn, including clients for several languages including Ruby, PHP,
> Clojure, .NET and Python.
> 
> To further this goal, we use GitHub issue tracking and branching facilities,
> as well as maintaining a public mailing list via Google Groups.
> 
> === Core Developers ===
> Kafka is currently being developed by four engineers at LinkedIn: Neha
> Narkhede, Jun Rao, Jakob Homan and Jay Kreps. Jun has experience within
> Apache as a Cassandra committer and PMC member. Neha has been an active
> contributor to several projects LinkedIn has open sourced, including Bobo,
> Sensei and Zoie. Jay has experience with open source software as the
> originator of the Project Voldemort project, as well as being active within
> the Hadoop ecosystem community. Jakob is an Apache Hadoop committer and PMC
> and previous Apache ZooKeeper contributor.
> 
> === Alignment ===
> The ASF is the natural choice to host the Kafka project as its goal of
> encouraging community-driven open-source projects fits with our vision for
> Kafka.  Additionally, many other projects with which we are familiar with
> and expect Kafka to integrate with, such as Apache Hadoop, Pig, ZooKeeper
> and log4j are hosted by the ASF and we will benefit and provide benefit by
> close proximity to them.
> 
> == Known Risks ==
> === Orphaned Products ===
> The core developers plan to work full time on the project. There is very
> little risk of Kafka being abandoned as it is a critical part of LinkedIn's
> internal infrastructure and is in production use.
> 
> === Inexperience with Open Source ===
> All of the core developers have experience with open source development.
> LinkedIn open sourced Kafka several months ago and has been receiving
> contributions since.  Jun is an Apache Cassandra committer and PMC member.
> Jay and Neha have been involved with several open source projects released
> by LinkedIn.  Jakob has been actively involved with the ASF as a full-time
> Hadoop committer and PMC member.
> 
> === Homogeneous Developers ===
> The current core developers are all from LinkedIn. However, we hope to
> establish a developer community that includes contributors from several
> corporations and we actively encouraging new contributors via the mailing
> lists and public presentations of Kafka.
> 
> === Reliance on Salaried Developers ===
> Currently, the developers are paid to do work on Kafka. However, once the
> project has a community built around it, we expect to get committers,
> developers and community from outside the current core developers. However,
> because LinkedIn relies on Kafka internally, the reliance on salaried
> developers is unlikely to change.
> 
> === Relationships with Other Apache Products ===
> Kafka is deeply integrated with Apache products. Kafka uses Apache ZooKeeper
> to coordinate its state amongst the brokers, consumers, and soon, the
> producers.  Kafka provides input formats to allow Hadoop MapReduce to load
> data directly from Kafka.  Kafka provides an appender to allow consuming
> data directly from Apache log4j.
> 
> === An Excessive Fascination with the Apache Brand ===
> While we respect the reputation of the Apache brand and have no doubts that
> it will attract contributors and users, our interest is primarily to give
> Kafka a solid home as an open source project following an established
> development model. We have also given reasons in the Rationale and Alignment
> sections.
> 
> == Documentation ==
> Information about Kafka can be found at [http://sna-projects.com/kafka/] The
> following links provide more information about the project:
> 
> * Kafka roadmap and goals: [http://sna-projects.com/kafka/projects.php]
> * The GitHub site: [https://github.com/kafka-dev/kafka]
> * Kafka overview from Jay Kreps: [
> http://www.slideshare.net/ydn/hug-january-2011-kafka-presentation]
> * Kafka overview from Jakob Homan: [http://bit.ly/fLmoZz]
> * Kafka paper at NetDB 2011: [
> http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
> ]
> 
> == Initial Source ==
> Kafka has been under development at LinkedIn since November 2009.  It was
> open sourced by LinkedIn in January 2011.  It is currently hosted on github
> under the Apache license at [https://github.com/kafka-dev/kafka]
> 
> Kafka is mainly written in Scala with some performance testing code in Java.
> Several clients have been contributed in other languages, including Ruby,
> PHP, Clojure, .NET and Python.  Its source tree is entirely self contained
> and relies of simple build tool (sbt) as its build system and dependency
> resolution mechanism.
> 
> == External Dependencies ==
> The dependencies all have Apache compatible licenses.
> 
> == Cryptography ==
> Not applicable.
> 
> == Required Resources ==
> === Mailing Lists ===
> * kafka-private for private PMC discussions (with moderated subscriptions)
> * kafka-dev
> * kafka-commits
> * kafka-user
> 
> === Subversion Directory ===
> [https://svn.apache.org/repos/asf/incubator/kafka]
> 
> === Issue Tracking ===
> JIRA Kafka (KAFKA)
> 
> === Other Resources ===
> The existing code already has unit tests, so we would like a Hudson instance
> to run them whenever a new patch is submitted. This can be added after
> project creation.
> 
> == Initial Committers ==
> * Jay Kreps
> * Jun Rao
> * Neha Narkhede
> * Jakob Homan
> * Phillip Rhodes
> * Henry Saputra
> * Chris Burroughs
> 
> == Affiliations ==
> * Jay Kreps (LinkedIn)
> * Jun Rao (LinkedIn)
> * Neha Narkhede (LinkedIn)
> * Jakob Homan (LinkedIn)
> * Phillip Rhodes (Fogbeam Labs)
> * Henry Saputra (Cisco Systems)
> * Chris Burroughs (Clearspring Technologies)
> 
> == Sponsors ==
> === Champion ===
> Chris Douglas (Apache Member)
> 
> === Nominated Mentors ===
> * Alan Cabrera (Apache Member)
> * Geir Magnusson, Jr. (Apache Member and Director)
> * Owen O'Malley (Apache Member)
> 
> === Sponsoring Entity ===
> We are requesting the Incubator to sponsor this project.


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: [VOTE] Kafka to join the Incubator

Posted by Chris Douglas <cd...@apache.org>.

+1 (binding) -C

On Tue, Jun 28, 2011 at 7:00 AM, Jun Rao <ju...@gmail.com> wrote:
> Hi all,
>
>
> Since the discussion on the thread of the Kafka incubator proposal is
> winding down, I'd like to call a vote.
>
> At the end of this mail, I've put a copy of the current proposal.  Here is
> a link to the document in the wiki:
> http://wiki.apache.org/incubator/KafkaProposal
>
> And here is a link to the discussion thread:
> http://www.mail-archive.com/general@incubator.apache.org/msg29594.html
>
> Please cast your votes:
>
> [  ] +1 Accept Kafka for incubation
> [  ] +0 Indifferent to Kafka incubation
> [  ]  -1 Reject Kafka for incubation
>
> This vote will close 72 hours from now.
>
> Thanks,
>
> Jun
>
> == Abstract ==
> Kafka is a distributed publish-subscribe system for processing large amounts
> of streaming data.
>
> == Proposal ==
> Kafka provides an extremely high throughput distributed publish/subscribe
> messaging system.  Additionally, it supports relatively long term
> persistence of messages to support a wide variety of consumers, partitioning
> of the message stream across servers and consumers, and functionality for
> loading data into Apache Hadoop for offline, batch processing.
>
> == Background ==
> Kafka was developed at LinkedIn to process the large amounts of events
> generated by that company's website and provide a common repository for many
> types of consumers to access and process those events. Kafka has been used
> in production at LinkedIn scale to handle dozens of types of events
> including page views, searches and social network activity. Kafka clusters
> at LinkedIn currently process more than two billion events per day.
>
> Kafka fills the gap between messaging systems such as Apache ActiveMQ, which
> provide low latency message delivery but don't focus on throughput, and log
> processing systems such as Scribe and Flume, which do not provide adequate
> latency for our diverse set of consumers.  Kafka can also be inserted into
> traditional log-processing systems, acting as an intermediate step before
> further processing. Kafka focuses relentlessly on performance and throughput
> by not introspecting into message content, nor indexing them on the broker.
>  We also achieve high performance by depending on Java's sendFile/transferTo
> capabilities to minimize intermediate buffer copies and relying on the OS's
> pagecache to efficiently serve up message contents to consumers. Kafka is
> also designed to be scalable and it depends on Apache ZooKeeper for
> coordination amongst its producers, brokers and consumers.
>
> Kafka is written in Scala. It was developed internally at LinkedIn to meet
> our particular use cases, but will be useful to many organizations facing a
> similar need to reliably process large amounts of streaming data.
>  Therefore, we would like to share it the ASF and begin developing a
> community of developers and users within Apache.
>
> == Rationale ==
> Many organizations can benefit from a reliable stream processing system such
> as Kafka.  While our use case of processing events from a very large website
> like LinkedIn has driven the design of Kafka, its uses are varied and we
> expect many new use cases to emerge.  Kafka provides a natural bridge
> between near real-time event processing and offline batch processing and
> will appeal to many users.
>
> == Current Status ==
> === Meritocracy ===
> Our intent with this incubator proposal is to start building a diverse
> developer community around Kafka following the Apache meritocracy model.
> Since Kafka was open sourced we have solicited contributions via the website
> and presentations given to user groups and technical audiences.  We have had
> positive responses to these and have received several contributions and
> clients for other languages.  We plan to continue this support for new
> contributors and work with those who contribute significantly to the project
> to make them committers.
>
> === Community ===
> Kafka is currently being used by developed by engineers within LinkedIn and
> used in production in that company. Additionally, we have active users in or
> have received contributions from a diverse set of companies including
> MediaSift, SocialTwist, Clearspring and Urban Airship. Recent public
> presentations of Kafka and its goals garnered much interest from potential
> contributors. We hope to extend our contributor base significantly and
> invite all those who are interested in building high-throughput distributed
> systems to participate.  We have begun receiving contributions from outside
> of LinkedIn, including clients for several languages including Ruby, PHP,
> Clojure, .NET and Python.
>
> To further this goal, we use GitHub issue tracking and branching facilities,
> as well as maintaining a public mailing list via Google Groups.
>
> === Core Developers ===
> Kafka is currently being developed by four engineers at LinkedIn: Neha
> Narkhede, Jun Rao, Jakob Homan and Jay Kreps. Jun has experience within
> Apache as a Cassandra committer and PMC member. Neha has been an active
> contributor to several projects LinkedIn has open sourced, including Bobo,
> Sensei and Zoie. Jay has experience with open source software as the
> originator of the Project Voldemort project, as well as being active within
> the Hadoop ecosystem community. Jakob is an Apache Hadoop committer and PMC
> and previous Apache ZooKeeper contributor.
>
> === Alignment ===
> The ASF is the natural choice to host the Kafka project as its goal of
> encouraging community-driven open-source projects fits with our vision for
> Kafka.  Additionally, many other projects with which we are familiar with
> and expect Kafka to integrate with, such as Apache Hadoop, Pig, ZooKeeper
> and log4j are hosted by the ASF and we will benefit and provide benefit by
> close proximity to them.
>
> == Known Risks ==
> === Orphaned Products ===
> The core developers plan to work full time on the project. There is very
> little risk of Kafka being abandoned as it is a critical part of LinkedIn's
> internal infrastructure and is in production use.
>
> === Inexperience with Open Source ===
> All of the core developers have experience with open source development.
>  LinkedIn open sourced Kafka several months ago and has been receiving
> contributions since.  Jun is an Apache Cassandra committer and PMC member.
>  Jay and Neha have been involved with several open source projects released
> by LinkedIn.  Jakob has been actively involved with the ASF as a full-time
> Hadoop committer and PMC member.
>
> === Homogeneous Developers ===
> The current core developers are all from LinkedIn. However, we hope to
> establish a developer community that includes contributors from several
> corporations and we actively encouraging new contributors via the mailing
> lists and public presentations of Kafka.
>
> === Reliance on Salaried Developers ===
> Currently, the developers are paid to do work on Kafka. However, once the
> project has a community built around it, we expect to get committers,
> developers and community from outside the current core developers. However,
> because LinkedIn relies on Kafka internally, the reliance on salaried
> developers is unlikely to change.
>
> === Relationships with Other Apache Products ===
> Kafka is deeply integrated with Apache products. Kafka uses Apache ZooKeeper
> to coordinate its state amongst the brokers, consumers, and soon, the
> producers.  Kafka provides input formats to allow Hadoop MapReduce to load
> data directly from Kafka.  Kafka provides an appender to allow consuming
> data directly from Apache log4j.
>
> === An Excessive Fascination with the Apache Brand ===
> While we respect the reputation of the Apache brand and have no doubts that
> it will attract contributors and users, our interest is primarily to give
> Kafka a solid home as an open source project following an established
> development model. We have also given reasons in the Rationale and Alignment
> sections.
>
> == Documentation ==
> Information about Kafka can be found at [http://sna-projects.com/kafka/] The
> following links provide more information about the project:
>
>  * Kafka roadmap and goals: [http://sna-projects.com/kafka/projects.php]
>  * The GitHub site: [https://github.com/kafka-dev/kafka]
>  * Kafka overview from Jay Kreps: [
> http://www.slideshare.net/ydn/hug-january-2011-kafka-presentation]
>  * Kafka overview from Jakob Homan: [http://bit.ly/fLmoZz]
>  * Kafka paper at NetDB 2011: [
> http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
> ]
>
> == Initial Source ==
> Kafka has been under development at LinkedIn since November 2009.  It was
> open sourced by LinkedIn in January 2011.  It is currently hosted on github
> under the Apache license at [https://github.com/kafka-dev/kafka]
>
> Kafka is mainly written in Scala with some performance testing code in Java.
>  Several clients have been contributed in other languages, including Ruby,
> PHP, Clojure, .NET and Python.  Its source tree is entirely self contained
> and relies of simple build tool (sbt) as its build system and dependency
> resolution mechanism.
>
> == External Dependencies ==
> The dependencies all have Apache compatible licenses.
>
> == Cryptography ==
> Not applicable.
>
> == Required Resources ==
> === Mailing Lists ===
>  * kafka-private for private PMC discussions (with moderated subscriptions)
>  * kafka-dev
>  * kafka-commits
>  * kafka-user
>
> === Subversion Directory ===
> [https://svn.apache.org/repos/asf/incubator/kafka]
>
> === Issue Tracking ===
> JIRA Kafka (KAFKA)
>
> === Other Resources ===
> The existing code already has unit tests, so we would like a Hudson instance
> to run them whenever a new patch is submitted. This can be added after
> project creation.
>
> == Initial Committers ==
>  * Jay Kreps
>  * Jun Rao
>  * Neha Narkhede
>  * Jakob Homan
>  * Phillip Rhodes
>  * Henry Saputra
>  * Chris Burroughs
>
> == Affiliations ==
>  * Jay Kreps (LinkedIn)
>  * Jun Rao (LinkedIn)
>  * Neha Narkhede (LinkedIn)
>  * Jakob Homan (LinkedIn)
>  * Phillip Rhodes (Fogbeam Labs)
>  * Henry Saputra (Cisco Systems)
>  * Chris Burroughs (Clearspring Technologies)
>
> == Sponsors ==
> === Champion ===
> Chris Douglas (Apache Member)
>
> === Nominated Mentors ===
>  * Alan Cabrera (Apache Member)
>  * Geir Magnusson, Jr. (Apache Member and Director)
>  * Owen O'Malley (Apache Member)
>
> === Sponsoring Entity ===
> We are requesting the Incubator to sponsor this project.
>

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: [VOTE] Kafka to join the Incubator

Posted by Niklas Gustavsson <ni...@protocol7.com>.

On Tue, Jun 28, 2011 at 7:00 PM, Jun Rao <ju...@gmail.com> wrote:
> Since the discussion on the thread of the Kafka incubator proposal is
> winding down, I'd like to call a vote.

+1

/niklas

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: [VOTE] Kafka to join the Incubator

Posted by Hrishikesh Barua <ta...@gmail.com>.

+1

On 28 June 2011 22:30, Jun Rao <ju...@gmail.com> wrote:

> Hi all,
>
>
> Since the discussion on the thread of the Kafka incubator proposal is
> winding down, I'd like to call a vote.
>
> At the end of this mail, I've put a copy of the current proposal.  Here is
> a link to the document in the wiki:
> http://wiki.apache.org/incubator/KafkaProposal
>
> And here is a link to the discussion thread:
> http://www.mail-archive.com/general@incubator.apache.org/msg29594.html
>
> Please cast your votes:
>
> [  ] +1 Accept Kafka for incubation
> [  ] +0 Indifferent to Kafka incubation
> [  ]  -1 Reject Kafka for incubation
>
> This vote will close 72 hours from now.
>
> Thanks,
>
> Jun
>
> == Abstract ==
> Kafka is a distributed publish-subscribe system for processing large
> amounts
> of streaming data.
>
> == Proposal ==
> Kafka provides an extremely high throughput distributed publish/subscribe
> messaging system.  Additionally, it supports relatively long term
> persistence of messages to support a wide variety of consumers,
> partitioning
> of the message stream across servers and consumers, and functionality for
> loading data into Apache Hadoop for offline, batch processing.
>
> == Background ==
> Kafka was developed at LinkedIn to process the large amounts of events
> generated by that company's website and provide a common repository for
> many
> types of consumers to access and process those events. Kafka has been used
> in production at LinkedIn scale to handle dozens of types of events
> including page views, searches and social network activity. Kafka clusters
> at LinkedIn currently process more than two billion events per day.
>
> Kafka fills the gap between messaging systems such as Apache ActiveMQ,
> which
> provide low latency message delivery but don't focus on throughput, and log
> processing systems such as Scribe and Flume, which do not provide adequate
> latency for our diverse set of consumers.  Kafka can also be inserted into
> traditional log-processing systems, acting as an intermediate step before
> further processing. Kafka focuses relentlessly on performance and
> throughput
> by not introspecting into message content, nor indexing them on the broker.
>  We also achieve high performance by depending on Java's
> sendFile/transferTo
> capabilities to minimize intermediate buffer copies and relying on the OS's
> pagecache to efficiently serve up message contents to consumers. Kafka is
> also designed to be scalable and it depends on Apache ZooKeeper for
> coordination amongst its producers, brokers and consumers.
>
> Kafka is written in Scala. It was developed internally at LinkedIn to meet
> our particular use cases, but will be useful to many organizations facing a
> similar need to reliably process large amounts of streaming data.
>  Therefore, we would like to share it the ASF and begin developing a
> community of developers and users within Apache.
>
> == Rationale ==
> Many organizations can benefit from a reliable stream processing system
> such
> as Kafka.  While our use case of processing events from a very large
> website
> like LinkedIn has driven the design of Kafka, its uses are varied and we
> expect many new use cases to emerge.  Kafka provides a natural bridge
> between near real-time event processing and offline batch processing and
> will appeal to many users.
>
> == Current Status ==
> === Meritocracy ===
> Our intent with this incubator proposal is to start building a diverse
> developer community around Kafka following the Apache meritocracy model.
> Since Kafka was open sourced we have solicited contributions via the
> website
> and presentations given to user groups and technical audiences.  We have
> had
> positive responses to these and have received several contributions and
> clients for other languages.  We plan to continue this support for new
> contributors and work with those who contribute significantly to the
> project
> to make them committers.
>
> === Community ===
> Kafka is currently being used by developed by engineers within LinkedIn and
> used in production in that company. Additionally, we have active users in
> or
> have received contributions from a diverse set of companies including
> MediaSift, SocialTwist, Clearspring and Urban Airship. Recent public
> presentations of Kafka and its goals garnered much interest from potential
> contributors. We hope to extend our contributor base significantly and
> invite all those who are interested in building high-throughput distributed
> systems to participate.  We have begun receiving contributions from outside
> of LinkedIn, including clients for several languages including Ruby, PHP,
> Clojure, .NET and Python.
>
> To further this goal, we use GitHub issue tracking and branching
> facilities,
> as well as maintaining a public mailing list via Google Groups.
>
> === Core Developers ===
> Kafka is currently being developed by four engineers at LinkedIn: Neha
> Narkhede, Jun Rao, Jakob Homan and Jay Kreps. Jun has experience within
> Apache as a Cassandra committer and PMC member. Neha has been an active
> contributor to several projects LinkedIn has open sourced, including Bobo,
> Sensei and Zoie. Jay has experience with open source software as the
> originator of the Project Voldemort project, as well as being active within
> the Hadoop ecosystem community. Jakob is an Apache Hadoop committer and PMC
> and previous Apache ZooKeeper contributor.
>
> === Alignment ===
> The ASF is the natural choice to host the Kafka project as its goal of
> encouraging community-driven open-source projects fits with our vision for
> Kafka.  Additionally, many other projects with which we are familiar with
> and expect Kafka to integrate with, such as Apache Hadoop, Pig, ZooKeeper
> and log4j are hosted by the ASF and we will benefit and provide benefit by
> close proximity to them.
>
> == Known Risks ==
> === Orphaned Products ===
> The core developers plan to work full time on the project. There is very
> little risk of Kafka being abandoned as it is a critical part of LinkedIn's
> internal infrastructure and is in production use.
>
> === Inexperience with Open Source ===
> All of the core developers have experience with open source development.
>  LinkedIn open sourced Kafka several months ago and has been receiving
> contributions since.  Jun is an Apache Cassandra committer and PMC member.
>  Jay and Neha have been involved with several open source projects released
> by LinkedIn.  Jakob has been actively involved with the ASF as a full-time
> Hadoop committer and PMC member.
>
> === Homogeneous Developers ===
> The current core developers are all from LinkedIn. However, we hope to
> establish a developer community that includes contributors from several
> corporations and we actively encouraging new contributors via the mailing
> lists and public presentations of Kafka.
>
> === Reliance on Salaried Developers ===
> Currently, the developers are paid to do work on Kafka. However, once the
> project has a community built around it, we expect to get committers,
> developers and community from outside the current core developers. However,
> because LinkedIn relies on Kafka internally, the reliance on salaried
> developers is unlikely to change.
>
> === Relationships with Other Apache Products ===
> Kafka is deeply integrated with Apache products. Kafka uses Apache
> ZooKeeper
> to coordinate its state amongst the brokers, consumers, and soon, the
> producers.  Kafka provides input formats to allow Hadoop MapReduce to load
> data directly from Kafka.  Kafka provides an appender to allow consuming
> data directly from Apache log4j.
>
> === An Excessive Fascination with the Apache Brand ===
> While we respect the reputation of the Apache brand and have no doubts that
> it will attract contributors and users, our interest is primarily to give
> Kafka a solid home as an open source project following an established
> development model. We have also given reasons in the Rationale and
> Alignment
> sections.
>
> == Documentation ==
> Information about Kafka can be found at [http://sna-projects.com/kafka/]
> The
> following links provide more information about the project:
>
>  * Kafka roadmap and goals: [http://sna-projects.com/kafka/projects.php]
>  * The GitHub site: [https://github.com/kafka-dev/kafka]
>  * Kafka overview from Jay Kreps: [
> http://www.slideshare.net/ydn/hug-january-2011-kafka-presentation]
>  * Kafka overview from Jakob Homan: [http://bit.ly/fLmoZz]
>  * Kafka paper at NetDB 2011: [
>
> http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
> ]
>
> == Initial Source ==
> Kafka has been under development at LinkedIn since November 2009.  It was
> open sourced by LinkedIn in January 2011.  It is currently hosted on github
> under the Apache license at [https://github.com/kafka-dev/kafka]
>
> Kafka is mainly written in Scala with some performance testing code in
> Java.
>  Several clients have been contributed in other languages, including Ruby,
> PHP, Clojure, .NET and Python.  Its source tree is entirely self contained
> and relies of simple build tool (sbt) as its build system and dependency
> resolution mechanism.
>
> == External Dependencies ==
> The dependencies all have Apache compatible licenses.
>
> == Cryptography ==
> Not applicable.
>
> == Required Resources ==
> === Mailing Lists ===
>  * kafka-private for private PMC discussions (with moderated subscriptions)
>  * kafka-dev
>  * kafka-commits
>  * kafka-user
>
> === Subversion Directory ===
> [https://svn.apache.org/repos/asf/incubator/kafka]
>
> === Issue Tracking ===
> JIRA Kafka (KAFKA)
>
> === Other Resources ===
> The existing code already has unit tests, so we would like a Hudson
> instance
> to run them whenever a new patch is submitted. This can be added after
> project creation.
>
> == Initial Committers ==
>  * Jay Kreps
>  * Jun Rao
>  * Neha Narkhede
>  * Jakob Homan
>  * Phillip Rhodes
>  * Henry Saputra
>  * Chris Burroughs
>
> == Affiliations ==
>  * Jay Kreps (LinkedIn)
>  * Jun Rao (LinkedIn)
>  * Neha Narkhede (LinkedIn)
>  * Jakob Homan (LinkedIn)
>  * Phillip Rhodes (Fogbeam Labs)
>  * Henry Saputra (Cisco Systems)
>  * Chris Burroughs (Clearspring Technologies)
>
> == Sponsors ==
> === Champion ===
> Chris Douglas (Apache Member)
>
> === Nominated Mentors ===
>  * Alan Cabrera (Apache Member)
>  * Geir Magnusson, Jr. (Apache Member and Director)
>  * Owen O'Malley (Apache Member)
>
> === Sponsoring Entity ===
> We are requesting the Incubator to sponsor this project.
>



-- 
You can't be normal and expect abnormal returns - Jeffrey Pfeffer
------------------------------------------
http://www.deepinspace.net
------------------------------------------

Re: [VOTE] Kafka to join the Incubator

Posted by Ralph Goers <ra...@dslextreme.com>.

+1 (binding)

Ralph

On Jun 28, 2011, at 10:00 AM, Jun Rao wrote:

> Hi all,
> 
> 
> Since the discussion on the thread of the Kafka incubator proposal is
> winding down, I'd like to call a vote.
> 
> At the end of this mail, I've put a copy of the current proposal.  Here is
> a link to the document in the wiki:
> http://wiki.apache.org/incubator/KafkaProposal
> 
> And here is a link to the discussion thread:
> http://www.mail-archive.com/general@incubator.apache.org/msg29594.html
> 
> Please cast your votes:
> 
> [  ] +1 Accept Kafka for incubation
> [  ] +0 Indifferent to Kafka incubation
> [  ]  -1 Reject Kafka for incubation
> 
> This vote will close 72 hours from now.
> 
> Thanks,
> 
> Jun
> 
> == Abstract ==
> Kafka is a distributed publish-subscribe system for processing large amounts
> of streaming data.
> 
> == Proposal ==
> Kafka provides an extremely high throughput distributed publish/subscribe
> messaging system.  Additionally, it supports relatively long term
> persistence of messages to support a wide variety of consumers, partitioning
> of the message stream across servers and consumers, and functionality for
> loading data into Apache Hadoop for offline, batch processing.
> 
> == Background ==
> Kafka was developed at LinkedIn to process the large amounts of events
> generated by that company's website and provide a common repository for many
> types of consumers to access and process those events. Kafka has been used
> in production at LinkedIn scale to handle dozens of types of events
> including page views, searches and social network activity. Kafka clusters
> at LinkedIn currently process more than two billion events per day.
> 
> Kafka fills the gap between messaging systems such as Apache ActiveMQ, which
> provide low latency message delivery but don't focus on throughput, and log
> processing systems such as Scribe and Flume, which do not provide adequate
> latency for our diverse set of consumers.  Kafka can also be inserted into
> traditional log-processing systems, acting as an intermediate step before
> further processing. Kafka focuses relentlessly on performance and throughput
> by not introspecting into message content, nor indexing them on the broker.
> We also achieve high performance by depending on Java's sendFile/transferTo
> capabilities to minimize intermediate buffer copies and relying on the OS's
> pagecache to efficiently serve up message contents to consumers. Kafka is
> also designed to be scalable and it depends on Apache ZooKeeper for
> coordination amongst its producers, brokers and consumers.
> 
> Kafka is written in Scala. It was developed internally at LinkedIn to meet
> our particular use cases, but will be useful to many organizations facing a
> similar need to reliably process large amounts of streaming data.
> Therefore, we would like to share it the ASF and begin developing a
> community of developers and users within Apache.
> 
> == Rationale ==
> Many organizations can benefit from a reliable stream processing system such
> as Kafka.  While our use case of processing events from a very large website
> like LinkedIn has driven the design of Kafka, its uses are varied and we
> expect many new use cases to emerge.  Kafka provides a natural bridge
> between near real-time event processing and offline batch processing and
> will appeal to many users.
> 
> == Current Status ==
> === Meritocracy ===
> Our intent with this incubator proposal is to start building a diverse
> developer community around Kafka following the Apache meritocracy model.
> Since Kafka was open sourced we have solicited contributions via the website
> and presentations given to user groups and technical audiences.  We have had
> positive responses to these and have received several contributions and
> clients for other languages.  We plan to continue this support for new
> contributors and work with those who contribute significantly to the project
> to make them committers.
> 
> === Community ===
> Kafka is currently being used by developed by engineers within LinkedIn and
> used in production in that company. Additionally, we have active users in or
> have received contributions from a diverse set of companies including
> MediaSift, SocialTwist, Clearspring and Urban Airship. Recent public
> presentations of Kafka and its goals garnered much interest from potential
> contributors. We hope to extend our contributor base significantly and
> invite all those who are interested in building high-throughput distributed
> systems to participate.  We have begun receiving contributions from outside
> of LinkedIn, including clients for several languages including Ruby, PHP,
> Clojure, .NET and Python.
> 
> To further this goal, we use GitHub issue tracking and branching facilities,
> as well as maintaining a public mailing list via Google Groups.
> 
> === Core Developers ===
> Kafka is currently being developed by four engineers at LinkedIn: Neha
> Narkhede, Jun Rao, Jakob Homan and Jay Kreps. Jun has experience within
> Apache as a Cassandra committer and PMC member. Neha has been an active
> contributor to several projects LinkedIn has open sourced, including Bobo,
> Sensei and Zoie. Jay has experience with open source software as the
> originator of the Project Voldemort project, as well as being active within
> the Hadoop ecosystem community. Jakob is an Apache Hadoop committer and PMC
> and previous Apache ZooKeeper contributor.
> 
> === Alignment ===
> The ASF is the natural choice to host the Kafka project as its goal of
> encouraging community-driven open-source projects fits with our vision for
> Kafka.  Additionally, many other projects with which we are familiar with
> and expect Kafka to integrate with, such as Apache Hadoop, Pig, ZooKeeper
> and log4j are hosted by the ASF and we will benefit and provide benefit by
> close proximity to them.
> 
> == Known Risks ==
> === Orphaned Products ===
> The core developers plan to work full time on the project. There is very
> little risk of Kafka being abandoned as it is a critical part of LinkedIn's
> internal infrastructure and is in production use.
> 
> === Inexperience with Open Source ===
> All of the core developers have experience with open source development.
> LinkedIn open sourced Kafka several months ago and has been receiving
> contributions since.  Jun is an Apache Cassandra committer and PMC member.
> Jay and Neha have been involved with several open source projects released
> by LinkedIn.  Jakob has been actively involved with the ASF as a full-time
> Hadoop committer and PMC member.
> 
> === Homogeneous Developers ===
> The current core developers are all from LinkedIn. However, we hope to
> establish a developer community that includes contributors from several
> corporations and we actively encouraging new contributors via the mailing
> lists and public presentations of Kafka.
> 
> === Reliance on Salaried Developers ===
> Currently, the developers are paid to do work on Kafka. However, once the
> project has a community built around it, we expect to get committers,
> developers and community from outside the current core developers. However,
> because LinkedIn relies on Kafka internally, the reliance on salaried
> developers is unlikely to change.
> 
> === Relationships with Other Apache Products ===
> Kafka is deeply integrated with Apache products. Kafka uses Apache ZooKeeper
> to coordinate its state amongst the brokers, consumers, and soon, the
> producers.  Kafka provides input formats to allow Hadoop MapReduce to load
> data directly from Kafka.  Kafka provides an appender to allow consuming
> data directly from Apache log4j.
> 
> === An Excessive Fascination with the Apache Brand ===
> While we respect the reputation of the Apache brand and have no doubts that
> it will attract contributors and users, our interest is primarily to give
> Kafka a solid home as an open source project following an established
> development model. We have also given reasons in the Rationale and Alignment
> sections.
> 
> == Documentation ==
> Information about Kafka can be found at [http://sna-projects.com/kafka/] The
> following links provide more information about the project:
> 
> * Kafka roadmap and goals: [http://sna-projects.com/kafka/projects.php]
> * The GitHub site: [https://github.com/kafka-dev/kafka]
> * Kafka overview from Jay Kreps: [
> http://www.slideshare.net/ydn/hug-january-2011-kafka-presentation]
> * Kafka overview from Jakob Homan: [http://bit.ly/fLmoZz]
> * Kafka paper at NetDB 2011: [
> http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
> ]
> 
> == Initial Source ==
> Kafka has been under development at LinkedIn since November 2009.  It was
> open sourced by LinkedIn in January 2011.  It is currently hosted on github
> under the Apache license at [https://github.com/kafka-dev/kafka]
> 
> Kafka is mainly written in Scala with some performance testing code in Java.
> Several clients have been contributed in other languages, including Ruby,
> PHP, Clojure, .NET and Python.  Its source tree is entirely self contained
> and relies of simple build tool (sbt) as its build system and dependency
> resolution mechanism.
> 
> == External Dependencies ==
> The dependencies all have Apache compatible licenses.
> 
> == Cryptography ==
> Not applicable.
> 
> == Required Resources ==
> === Mailing Lists ===
> * kafka-private for private PMC discussions (with moderated subscriptions)
> * kafka-dev
> * kafka-commits
> * kafka-user
> 
> === Subversion Directory ===
> [https://svn.apache.org/repos/asf/incubator/kafka]
> 
> === Issue Tracking ===
> JIRA Kafka (KAFKA)
> 
> === Other Resources ===
> The existing code already has unit tests, so we would like a Hudson instance
> to run them whenever a new patch is submitted. This can be added after
> project creation.
> 
> == Initial Committers ==
> * Jay Kreps
> * Jun Rao
> * Neha Narkhede
> * Jakob Homan
> * Phillip Rhodes
> * Henry Saputra
> * Chris Burroughs
> 
> == Affiliations ==
> * Jay Kreps (LinkedIn)
> * Jun Rao (LinkedIn)
> * Neha Narkhede (LinkedIn)
> * Jakob Homan (LinkedIn)
> * Phillip Rhodes (Fogbeam Labs)
> * Henry Saputra (Cisco Systems)
> * Chris Burroughs (Clearspring Technologies)
> 
> == Sponsors ==
> === Champion ===
> Chris Douglas (Apache Member)
> 
> === Nominated Mentors ===
> * Alan Cabrera (Apache Member)
> * Geir Magnusson, Jr. (Apache Member and Director)
> * Owen O'Malley (Apache Member)
> 
> === Sponsoring Entity ===
> We are requesting the Incubator to sponsor this project.


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: [VOTE] Kafka to join the Incubator

Posted by Mohammad Nour El-Din <no...@gmail.com>.

+1 (Binding)

On Tue, Jun 28, 2011 at 8:16 PM, Alan D. Cabrera <li...@toolazydogs.com> wrote:
> +1 binding
>
>
> Regards,
> Alan
>
> On Jun 28, 2011, at 10:00 AM, Jun Rao wrote:
>
>> Hi all,
>>
>>
>> Since the discussion on the thread of the Kafka incubator proposal is
>> winding down, I'd like to call a vote.
>>
>> At the end of this mail, I've put a copy of the current proposal.  Here is
>> a link to the document in the wiki:
>> http://wiki.apache.org/incubator/KafkaProposal
>>
>> And here is a link to the discussion thread:
>> http://www.mail-archive.com/general@incubator.apache.org/msg29594.html
>>
>> Please cast your votes:
>>
>> [  ] +1 Accept Kafka for incubation
>> [  ] +0 Indifferent to Kafka incubation
>> [  ]  -1 Reject Kafka for incubation
>>
>> This vote will close 72 hours from now.
>>
>> Thanks,
>>
>> Jun
>>
>> == Abstract ==
>> Kafka is a distributed publish-subscribe system for processing large amounts
>> of streaming data.
>>
>> == Proposal ==
>> Kafka provides an extremely high throughput distributed publish/subscribe
>> messaging system.  Additionally, it supports relatively long term
>> persistence of messages to support a wide variety of consumers, partitioning
>> of the message stream across servers and consumers, and functionality for
>> loading data into Apache Hadoop for offline, batch processing.
>>
>> == Background ==
>> Kafka was developed at LinkedIn to process the large amounts of events
>> generated by that company's website and provide a common repository for many
>> types of consumers to access and process those events. Kafka has been used
>> in production at LinkedIn scale to handle dozens of types of events
>> including page views, searches and social network activity. Kafka clusters
>> at LinkedIn currently process more than two billion events per day.
>>
>> Kafka fills the gap between messaging systems such as Apache ActiveMQ, which
>> provide low latency message delivery but don't focus on throughput, and log
>> processing systems such as Scribe and Flume, which do not provide adequate
>> latency for our diverse set of consumers.  Kafka can also be inserted into
>> traditional log-processing systems, acting as an intermediate step before
>> further processing. Kafka focuses relentlessly on performance and throughput
>> by not introspecting into message content, nor indexing them on the broker.
>> We also achieve high performance by depending on Java's sendFile/transferTo
>> capabilities to minimize intermediate buffer copies and relying on the OS's
>> pagecache to efficiently serve up message contents to consumers. Kafka is
>> also designed to be scalable and it depends on Apache ZooKeeper for
>> coordination amongst its producers, brokers and consumers.
>>
>> Kafka is written in Scala. It was developed internally at LinkedIn to meet
>> our particular use cases, but will be useful to many organizations facing a
>> similar need to reliably process large amounts of streaming data.
>> Therefore, we would like to share it the ASF and begin developing a
>> community of developers and users within Apache.
>>
>> == Rationale ==
>> Many organizations can benefit from a reliable stream processing system such
>> as Kafka.  While our use case of processing events from a very large website
>> like LinkedIn has driven the design of Kafka, its uses are varied and we
>> expect many new use cases to emerge.  Kafka provides a natural bridge
>> between near real-time event processing and offline batch processing and
>> will appeal to many users.
>>
>> == Current Status ==
>> === Meritocracy ===
>> Our intent with this incubator proposal is to start building a diverse
>> developer community around Kafka following the Apache meritocracy model.
>> Since Kafka was open sourced we have solicited contributions via the website
>> and presentations given to user groups and technical audiences.  We have had
>> positive responses to these and have received several contributions and
>> clients for other languages.  We plan to continue this support for new
>> contributors and work with those who contribute significantly to the project
>> to make them committers.
>>
>> === Community ===
>> Kafka is currently being used by developed by engineers within LinkedIn and
>> used in production in that company. Additionally, we have active users in or
>> have received contributions from a diverse set of companies including
>> MediaSift, SocialTwist, Clearspring and Urban Airship. Recent public
>> presentations of Kafka and its goals garnered much interest from potential
>> contributors. We hope to extend our contributor base significantly and
>> invite all those who are interested in building high-throughput distributed
>> systems to participate.  We have begun receiving contributions from outside
>> of LinkedIn, including clients for several languages including Ruby, PHP,
>> Clojure, .NET and Python.
>>
>> To further this goal, we use GitHub issue tracking and branching facilities,
>> as well as maintaining a public mailing list via Google Groups.
>>
>> === Core Developers ===
>> Kafka is currently being developed by four engineers at LinkedIn: Neha
>> Narkhede, Jun Rao, Jakob Homan and Jay Kreps. Jun has experience within
>> Apache as a Cassandra committer and PMC member. Neha has been an active
>> contributor to several projects LinkedIn has open sourced, including Bobo,
>> Sensei and Zoie. Jay has experience with open source software as the
>> originator of the Project Voldemort project, as well as being active within
>> the Hadoop ecosystem community. Jakob is an Apache Hadoop committer and PMC
>> and previous Apache ZooKeeper contributor.
>>
>> === Alignment ===
>> The ASF is the natural choice to host the Kafka project as its goal of
>> encouraging community-driven open-source projects fits with our vision for
>> Kafka.  Additionally, many other projects with which we are familiar with
>> and expect Kafka to integrate with, such as Apache Hadoop, Pig, ZooKeeper
>> and log4j are hosted by the ASF and we will benefit and provide benefit by
>> close proximity to them.
>>
>> == Known Risks ==
>> === Orphaned Products ===
>> The core developers plan to work full time on the project. There is very
>> little risk of Kafka being abandoned as it is a critical part of LinkedIn's
>> internal infrastructure and is in production use.
>>
>> === Inexperience with Open Source ===
>> All of the core developers have experience with open source development.
>> LinkedIn open sourced Kafka several months ago and has been receiving
>> contributions since.  Jun is an Apache Cassandra committer and PMC member.
>> Jay and Neha have been involved with several open source projects released
>> by LinkedIn.  Jakob has been actively involved with the ASF as a full-time
>> Hadoop committer and PMC member.
>>
>> === Homogeneous Developers ===
>> The current core developers are all from LinkedIn. However, we hope to
>> establish a developer community that includes contributors from several
>> corporations and we actively encouraging new contributors via the mailing
>> lists and public presentations of Kafka.
>>
>> === Reliance on Salaried Developers ===
>> Currently, the developers are paid to do work on Kafka. However, once the
>> project has a community built around it, we expect to get committers,
>> developers and community from outside the current core developers. However,
>> because LinkedIn relies on Kafka internally, the reliance on salaried
>> developers is unlikely to change.
>>
>> === Relationships with Other Apache Products ===
>> Kafka is deeply integrated with Apache products. Kafka uses Apache ZooKeeper
>> to coordinate its state amongst the brokers, consumers, and soon, the
>> producers.  Kafka provides input formats to allow Hadoop MapReduce to load
>> data directly from Kafka.  Kafka provides an appender to allow consuming
>> data directly from Apache log4j.
>>
>> === An Excessive Fascination with the Apache Brand ===
>> While we respect the reputation of the Apache brand and have no doubts that
>> it will attract contributors and users, our interest is primarily to give
>> Kafka a solid home as an open source project following an established
>> development model. We have also given reasons in the Rationale and Alignment
>> sections.
>>
>> == Documentation ==
>> Information about Kafka can be found at [http://sna-projects.com/kafka/] The
>> following links provide more information about the project:
>>
>> * Kafka roadmap and goals: [http://sna-projects.com/kafka/projects.php]
>> * The GitHub site: [https://github.com/kafka-dev/kafka]
>> * Kafka overview from Jay Kreps: [
>> http://www.slideshare.net/ydn/hug-january-2011-kafka-presentation]
>> * Kafka overview from Jakob Homan: [http://bit.ly/fLmoZz]
>> * Kafka paper at NetDB 2011: [
>> http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
>> ]
>>
>> == Initial Source ==
>> Kafka has been under development at LinkedIn since November 2009.  It was
>> open sourced by LinkedIn in January 2011.  It is currently hosted on github
>> under the Apache license at [https://github.com/kafka-dev/kafka]
>>
>> Kafka is mainly written in Scala with some performance testing code in Java.
>> Several clients have been contributed in other languages, including Ruby,
>> PHP, Clojure, .NET and Python.  Its source tree is entirely self contained
>> and relies of simple build tool (sbt) as its build system and dependency
>> resolution mechanism.
>>
>> == External Dependencies ==
>> The dependencies all have Apache compatible licenses.
>>
>> == Cryptography ==
>> Not applicable.
>>
>> == Required Resources ==
>> === Mailing Lists ===
>> * kafka-private for private PMC discussions (with moderated subscriptions)
>> * kafka-dev
>> * kafka-commits
>> * kafka-user
>>
>> === Subversion Directory ===
>> [https://svn.apache.org/repos/asf/incubator/kafka]
>>
>> === Issue Tracking ===
>> JIRA Kafka (KAFKA)
>>
>> === Other Resources ===
>> The existing code already has unit tests, so we would like a Hudson instance
>> to run them whenever a new patch is submitted. This can be added after
>> project creation.
>>
>> == Initial Committers ==
>> * Jay Kreps
>> * Jun Rao
>> * Neha Narkhede
>> * Jakob Homan
>> * Phillip Rhodes
>> * Henry Saputra
>> * Chris Burroughs
>>
>> == Affiliations ==
>> * Jay Kreps (LinkedIn)
>> * Jun Rao (LinkedIn)
>> * Neha Narkhede (LinkedIn)
>> * Jakob Homan (LinkedIn)
>> * Phillip Rhodes (Fogbeam Labs)
>> * Henry Saputra (Cisco Systems)
>> * Chris Burroughs (Clearspring Technologies)
>>
>> == Sponsors ==
>> === Champion ===
>> Chris Douglas (Apache Member)
>>
>> === Nominated Mentors ===
>> * Alan Cabrera (Apache Member)
>> * Geir Magnusson, Jr. (Apache Member and Director)
>> * Owen O'Malley (Apache Member)
>>
>> === Sponsoring Entity ===
>> We are requesting the Incubator to sponsor this project.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>
>



-- 
Thanks
- Mohammad Nour
  Author of (WebSphere Application Server Community Edition 2.0 User Guide)
  http://www.redbooks.ibm.com/abstracts/sg247585.html
- LinkedIn: http://www.linkedin.com/in/mnour
- Blog: http://tadabborat.blogspot.com
----
"Life is like riding a bicycle. To keep your balance you must keep moving"
- Albert Einstein

"Writing clean code is what you must do in order to call yourself a
professional. There is no reasonable excuse for doing anything less
than your best."
- Clean Code: A Handbook of Agile Software Craftsmanship

"Stay hungry, stay foolish."
- Steve Jobs

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: [VOTE] Kafka to join the Incubator

Posted by "Alan D. Cabrera" <li...@toolazydogs.com>.

+1 binding


Regards,
Alan

On Jun 28, 2011, at 10:00 AM, Jun Rao wrote:

> Hi all,
> 
> 
> Since the discussion on the thread of the Kafka incubator proposal is
> winding down, I'd like to call a vote.
> 
> At the end of this mail, I've put a copy of the current proposal.  Here is
> a link to the document in the wiki:
> http://wiki.apache.org/incubator/KafkaProposal
> 
> And here is a link to the discussion thread:
> http://www.mail-archive.com/general@incubator.apache.org/msg29594.html
> 
> Please cast your votes:
> 
> [  ] +1 Accept Kafka for incubation
> [  ] +0 Indifferent to Kafka incubation
> [  ]  -1 Reject Kafka for incubation
> 
> This vote will close 72 hours from now.
> 
> Thanks,
> 
> Jun
> 
> == Abstract ==
> Kafka is a distributed publish-subscribe system for processing large amounts
> of streaming data.
> 
> == Proposal ==
> Kafka provides an extremely high throughput distributed publish/subscribe
> messaging system.  Additionally, it supports relatively long term
> persistence of messages to support a wide variety of consumers, partitioning
> of the message stream across servers and consumers, and functionality for
> loading data into Apache Hadoop for offline, batch processing.
> 
> == Background ==
> Kafka was developed at LinkedIn to process the large amounts of events
> generated by that company's website and provide a common repository for many
> types of consumers to access and process those events. Kafka has been used
> in production at LinkedIn scale to handle dozens of types of events
> including page views, searches and social network activity. Kafka clusters
> at LinkedIn currently process more than two billion events per day.
> 
> Kafka fills the gap between messaging systems such as Apache ActiveMQ, which
> provide low latency message delivery but don't focus on throughput, and log
> processing systems such as Scribe and Flume, which do not provide adequate
> latency for our diverse set of consumers.  Kafka can also be inserted into
> traditional log-processing systems, acting as an intermediate step before
> further processing. Kafka focuses relentlessly on performance and throughput
> by not introspecting into message content, nor indexing them on the broker.
> We also achieve high performance by depending on Java's sendFile/transferTo
> capabilities to minimize intermediate buffer copies and relying on the OS's
> pagecache to efficiently serve up message contents to consumers. Kafka is
> also designed to be scalable and it depends on Apache ZooKeeper for
> coordination amongst its producers, brokers and consumers.
> 
> Kafka is written in Scala. It was developed internally at LinkedIn to meet
> our particular use cases, but will be useful to many organizations facing a
> similar need to reliably process large amounts of streaming data.
> Therefore, we would like to share it the ASF and begin developing a
> community of developers and users within Apache.
> 
> == Rationale ==
> Many organizations can benefit from a reliable stream processing system such
> as Kafka.  While our use case of processing events from a very large website
> like LinkedIn has driven the design of Kafka, its uses are varied and we
> expect many new use cases to emerge.  Kafka provides a natural bridge
> between near real-time event processing and offline batch processing and
> will appeal to many users.
> 
> == Current Status ==
> === Meritocracy ===
> Our intent with this incubator proposal is to start building a diverse
> developer community around Kafka following the Apache meritocracy model.
> Since Kafka was open sourced we have solicited contributions via the website
> and presentations given to user groups and technical audiences.  We have had
> positive responses to these and have received several contributions and
> clients for other languages.  We plan to continue this support for new
> contributors and work with those who contribute significantly to the project
> to make them committers.
> 
> === Community ===
> Kafka is currently being used by developed by engineers within LinkedIn and
> used in production in that company. Additionally, we have active users in or
> have received contributions from a diverse set of companies including
> MediaSift, SocialTwist, Clearspring and Urban Airship. Recent public
> presentations of Kafka and its goals garnered much interest from potential
> contributors. We hope to extend our contributor base significantly and
> invite all those who are interested in building high-throughput distributed
> systems to participate.  We have begun receiving contributions from outside
> of LinkedIn, including clients for several languages including Ruby, PHP,
> Clojure, .NET and Python.
> 
> To further this goal, we use GitHub issue tracking and branching facilities,
> as well as maintaining a public mailing list via Google Groups.
> 
> === Core Developers ===
> Kafka is currently being developed by four engineers at LinkedIn: Neha
> Narkhede, Jun Rao, Jakob Homan and Jay Kreps. Jun has experience within
> Apache as a Cassandra committer and PMC member. Neha has been an active
> contributor to several projects LinkedIn has open sourced, including Bobo,
> Sensei and Zoie. Jay has experience with open source software as the
> originator of the Project Voldemort project, as well as being active within
> the Hadoop ecosystem community. Jakob is an Apache Hadoop committer and PMC
> and previous Apache ZooKeeper contributor.
> 
> === Alignment ===
> The ASF is the natural choice to host the Kafka project as its goal of
> encouraging community-driven open-source projects fits with our vision for
> Kafka.  Additionally, many other projects with which we are familiar with
> and expect Kafka to integrate with, such as Apache Hadoop, Pig, ZooKeeper
> and log4j are hosted by the ASF and we will benefit and provide benefit by
> close proximity to them.
> 
> == Known Risks ==
> === Orphaned Products ===
> The core developers plan to work full time on the project. There is very
> little risk of Kafka being abandoned as it is a critical part of LinkedIn's
> internal infrastructure and is in production use.
> 
> === Inexperience with Open Source ===
> All of the core developers have experience with open source development.
> LinkedIn open sourced Kafka several months ago and has been receiving
> contributions since.  Jun is an Apache Cassandra committer and PMC member.
> Jay and Neha have been involved with several open source projects released
> by LinkedIn.  Jakob has been actively involved with the ASF as a full-time
> Hadoop committer and PMC member.
> 
> === Homogeneous Developers ===
> The current core developers are all from LinkedIn. However, we hope to
> establish a developer community that includes contributors from several
> corporations and we actively encouraging new contributors via the mailing
> lists and public presentations of Kafka.
> 
> === Reliance on Salaried Developers ===
> Currently, the developers are paid to do work on Kafka. However, once the
> project has a community built around it, we expect to get committers,
> developers and community from outside the current core developers. However,
> because LinkedIn relies on Kafka internally, the reliance on salaried
> developers is unlikely to change.
> 
> === Relationships with Other Apache Products ===
> Kafka is deeply integrated with Apache products. Kafka uses Apache ZooKeeper
> to coordinate its state amongst the brokers, consumers, and soon, the
> producers.  Kafka provides input formats to allow Hadoop MapReduce to load
> data directly from Kafka.  Kafka provides an appender to allow consuming
> data directly from Apache log4j.
> 
> === An Excessive Fascination with the Apache Brand ===
> While we respect the reputation of the Apache brand and have no doubts that
> it will attract contributors and users, our interest is primarily to give
> Kafka a solid home as an open source project following an established
> development model. We have also given reasons in the Rationale and Alignment
> sections.
> 
> == Documentation ==
> Information about Kafka can be found at [http://sna-projects.com/kafka/] The
> following links provide more information about the project:
> 
> * Kafka roadmap and goals: [http://sna-projects.com/kafka/projects.php]
> * The GitHub site: [https://github.com/kafka-dev/kafka]
> * Kafka overview from Jay Kreps: [
> http://www.slideshare.net/ydn/hug-january-2011-kafka-presentation]
> * Kafka overview from Jakob Homan: [http://bit.ly/fLmoZz]
> * Kafka paper at NetDB 2011: [
> http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
> ]
> 
> == Initial Source ==
> Kafka has been under development at LinkedIn since November 2009.  It was
> open sourced by LinkedIn in January 2011.  It is currently hosted on github
> under the Apache license at [https://github.com/kafka-dev/kafka]
> 
> Kafka is mainly written in Scala with some performance testing code in Java.
> Several clients have been contributed in other languages, including Ruby,
> PHP, Clojure, .NET and Python.  Its source tree is entirely self contained
> and relies of simple build tool (sbt) as its build system and dependency
> resolution mechanism.
> 
> == External Dependencies ==
> The dependencies all have Apache compatible licenses.
> 
> == Cryptography ==
> Not applicable.
> 
> == Required Resources ==
> === Mailing Lists ===
> * kafka-private for private PMC discussions (with moderated subscriptions)
> * kafka-dev
> * kafka-commits
> * kafka-user
> 
> === Subversion Directory ===
> [https://svn.apache.org/repos/asf/incubator/kafka]
> 
> === Issue Tracking ===
> JIRA Kafka (KAFKA)
> 
> === Other Resources ===
> The existing code already has unit tests, so we would like a Hudson instance
> to run them whenever a new patch is submitted. This can be added after
> project creation.
> 
> == Initial Committers ==
> * Jay Kreps
> * Jun Rao
> * Neha Narkhede
> * Jakob Homan
> * Phillip Rhodes
> * Henry Saputra
> * Chris Burroughs
> 
> == Affiliations ==
> * Jay Kreps (LinkedIn)
> * Jun Rao (LinkedIn)
> * Neha Narkhede (LinkedIn)
> * Jakob Homan (LinkedIn)
> * Phillip Rhodes (Fogbeam Labs)
> * Henry Saputra (Cisco Systems)
> * Chris Burroughs (Clearspring Technologies)
> 
> == Sponsors ==
> === Champion ===
> Chris Douglas (Apache Member)
> 
> === Nominated Mentors ===
> * Alan Cabrera (Apache Member)
> * Geir Magnusson, Jr. (Apache Member and Director)
> * Owen O'Malley (Apache Member)
> 
> === Sponsoring Entity ===
> We are requesting the Incubator to sponsor this project.


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: [VOTE] Kafka to join the Incubator

Posted by Jeffrey Damick <je...@gmail.com>.

+1

On Tue, Jun 28, 2011 at 1:00 PM, Jun Rao <ju...@gmail.com> wrote:

>
> Please cast your votes:
>
> [  ] +1 Accept Kafka for incubation
> [  ] +0 Indifferent to Kafka incubation
> [  ]  -1 Reject Kafka for incubation
>
> This vote will close 72 hours from now.
>
>

Re: [VOTE] Kafka to join the Incubator

Posted by Jake Mannix <ja...@gmail.com>.

+1

On Tue, Jun 28, 2011 at 10:00 AM, Jun Rao <ju...@gmail.com> wrote:

> Hi all,
>
>
> Since the discussion on the thread of the Kafka incubator proposal is
> winding down, I'd like to call a vote.
>
> At the end of this mail, I've put a copy of the current proposal.  Here is
> a link to the document in the wiki:
> http://wiki.apache.org/incubator/KafkaProposal
>
> And here is a link to the discussion thread:
> http://www.mail-archive.com/general@incubator.apache.org/msg29594.html
>
> Please cast your votes:
>
> [  ] +1 Accept Kafka for incubation
> [  ] +0 Indifferent to Kafka incubation
> [  ]  -1 Reject Kafka for incubation
>
> This vote will close 72 hours from now.
>
> Thanks,
>
> Jun
>
> == Abstract ==
> Kafka is a distributed publish-subscribe system for processing large
> amounts
> of streaming data.
>
> == Proposal ==
> Kafka provides an extremely high throughput distributed publish/subscribe
> messaging system.  Additionally, it supports relatively long term
> persistence of messages to support a wide variety of consumers,
> partitioning
> of the message stream across servers and consumers, and functionality for
> loading data into Apache Hadoop for offline, batch processing.
>
> == Background ==
> Kafka was developed at LinkedIn to process the large amounts of events
> generated by that company's website and provide a common repository for
> many
> types of consumers to access and process those events. Kafka has been used
> in production at LinkedIn scale to handle dozens of types of events
> including page views, searches and social network activity. Kafka clusters
> at LinkedIn currently process more than two billion events per day.
>
> Kafka fills the gap between messaging systems such as Apache ActiveMQ,
> which
> provide low latency message delivery but don't focus on throughput, and log
> processing systems such as Scribe and Flume, which do not provide adequate
> latency for our diverse set of consumers.  Kafka can also be inserted into
> traditional log-processing systems, acting as an intermediate step before
> further processing. Kafka focuses relentlessly on performance and
> throughput
> by not introspecting into message content, nor indexing them on the broker.
>  We also achieve high performance by depending on Java's
> sendFile/transferTo
> capabilities to minimize intermediate buffer copies and relying on the OS's
> pagecache to efficiently serve up message contents to consumers. Kafka is
> also designed to be scalable and it depends on Apache ZooKeeper for
> coordination amongst its producers, brokers and consumers.
>
> Kafka is written in Scala. It was developed internally at LinkedIn to meet
> our particular use cases, but will be useful to many organizations facing a
> similar need to reliably process large amounts of streaming data.
>  Therefore, we would like to share it the ASF and begin developing a
> community of developers and users within Apache.
>
> == Rationale ==
> Many organizations can benefit from a reliable stream processing system
> such
> as Kafka.  While our use case of processing events from a very large
> website
> like LinkedIn has driven the design of Kafka, its uses are varied and we
> expect many new use cases to emerge.  Kafka provides a natural bridge
> between near real-time event processing and offline batch processing and
> will appeal to many users.
>
> == Current Status ==
> === Meritocracy ===
> Our intent with this incubator proposal is to start building a diverse
> developer community around Kafka following the Apache meritocracy model.
> Since Kafka was open sourced we have solicited contributions via the
> website
> and presentations given to user groups and technical audiences.  We have
> had
> positive responses to these and have received several contributions and
> clients for other languages.  We plan to continue this support for new
> contributors and work with those who contribute significantly to the
> project
> to make them committers.
>
> === Community ===
> Kafka is currently being used by developed by engineers within LinkedIn and
> used in production in that company. Additionally, we have active users in
> or
> have received contributions from a diverse set of companies including
> MediaSift, SocialTwist, Clearspring and Urban Airship. Recent public
> presentations of Kafka and its goals garnered much interest from potential
> contributors. We hope to extend our contributor base significantly and
> invite all those who are interested in building high-throughput distributed
> systems to participate.  We have begun receiving contributions from outside
> of LinkedIn, including clients for several languages including Ruby, PHP,
> Clojure, .NET and Python.
>
> To further this goal, we use GitHub issue tracking and branching
> facilities,
> as well as maintaining a public mailing list via Google Groups.
>
> === Core Developers ===
> Kafka is currently being developed by four engineers at LinkedIn: Neha
> Narkhede, Jun Rao, Jakob Homan and Jay Kreps. Jun has experience within
> Apache as a Cassandra committer and PMC member. Neha has been an active
> contributor to several projects LinkedIn has open sourced, including Bobo,
> Sensei and Zoie. Jay has experience with open source software as the
> originator of the Project Voldemort project, as well as being active within
> the Hadoop ecosystem community. Jakob is an Apache Hadoop committer and PMC
> and previous Apache ZooKeeper contributor.
>
> === Alignment ===
> The ASF is the natural choice to host the Kafka project as its goal of
> encouraging community-driven open-source projects fits with our vision for
> Kafka.  Additionally, many other projects with which we are familiar with
> and expect Kafka to integrate with, such as Apache Hadoop, Pig, ZooKeeper
> and log4j are hosted by the ASF and we will benefit and provide benefit by
> close proximity to them.
>
> == Known Risks ==
> === Orphaned Products ===
> The core developers plan to work full time on the project. There is very
> little risk of Kafka being abandoned as it is a critical part of LinkedIn's
> internal infrastructure and is in production use.
>
> === Inexperience with Open Source ===
> All of the core developers have experience with open source development.
>  LinkedIn open sourced Kafka several months ago and has been receiving
> contributions since.  Jun is an Apache Cassandra committer and PMC member.
>  Jay and Neha have been involved with several open source projects released
> by LinkedIn.  Jakob has been actively involved with the ASF as a full-time
> Hadoop committer and PMC member.
>
> === Homogeneous Developers ===
> The current core developers are all from LinkedIn. However, we hope to
> establish a developer community that includes contributors from several
> corporations and we actively encouraging new contributors via the mailing
> lists and public presentations of Kafka.
>
> === Reliance on Salaried Developers ===
> Currently, the developers are paid to do work on Kafka. However, once the
> project has a community built around it, we expect to get committers,
> developers and community from outside the current core developers. However,
> because LinkedIn relies on Kafka internally, the reliance on salaried
> developers is unlikely to change.
>
> === Relationships with Other Apache Products ===
> Kafka is deeply integrated with Apache products. Kafka uses Apache
> ZooKeeper
> to coordinate its state amongst the brokers, consumers, and soon, the
> producers.  Kafka provides input formats to allow Hadoop MapReduce to load
> data directly from Kafka.  Kafka provides an appender to allow consuming
> data directly from Apache log4j.
>
> === An Excessive Fascination with the Apache Brand ===
> While we respect the reputation of the Apache brand and have no doubts that
> it will attract contributors and users, our interest is primarily to give
> Kafka a solid home as an open source project following an established
> development model. We have also given reasons in the Rationale and
> Alignment
> sections.
>
> == Documentation ==
> Information about Kafka can be found at [http://sna-projects.com/kafka/]
> The
> following links provide more information about the project:
>
>  * Kafka roadmap and goals: [http://sna-projects.com/kafka/projects.php]
>  * The GitHub site: [https://github.com/kafka-dev/kafka]
>  * Kafka overview from Jay Kreps: [
> http://www.slideshare.net/ydn/hug-january-2011-kafka-presentation]
>  * Kafka overview from Jakob Homan: [http://bit.ly/fLmoZz]
>  * Kafka paper at NetDB 2011: [
>
> http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
> ]
>
> == Initial Source ==
> Kafka has been under development at LinkedIn since November 2009.  It was
> open sourced by LinkedIn in January 2011.  It is currently hosted on github
> under the Apache license at [https://github.com/kafka-dev/kafka]
>
> Kafka is mainly written in Scala with some performance testing code in
> Java.
>  Several clients have been contributed in other languages, including Ruby,
> PHP, Clojure, .NET and Python.  Its source tree is entirely self contained
> and relies of simple build tool (sbt) as its build system and dependency
> resolution mechanism.
>
> == External Dependencies ==
> The dependencies all have Apache compatible licenses.
>
> == Cryptography ==
> Not applicable.
>
> == Required Resources ==
> === Mailing Lists ===
>  * kafka-private for private PMC discussions (with moderated subscriptions)
>  * kafka-dev
>  * kafka-commits
>  * kafka-user
>
> === Subversion Directory ===
> [https://svn.apache.org/repos/asf/incubator/kafka]
>
> === Issue Tracking ===
> JIRA Kafka (KAFKA)
>
> === Other Resources ===
> The existing code already has unit tests, so we would like a Hudson
> instance
> to run them whenever a new patch is submitted. This can be added after
> project creation.
>
> == Initial Committers ==
>  * Jay Kreps
>  * Jun Rao
>  * Neha Narkhede
>  * Jakob Homan
>  * Phillip Rhodes
>  * Henry Saputra
>  * Chris Burroughs
>
> == Affiliations ==
>  * Jay Kreps (LinkedIn)
>  * Jun Rao (LinkedIn)
>  * Neha Narkhede (LinkedIn)
>  * Jakob Homan (LinkedIn)
>  * Phillip Rhodes (Fogbeam Labs)
>  * Henry Saputra (Cisco Systems)
>  * Chris Burroughs (Clearspring Technologies)
>
> == Sponsors ==
> === Champion ===
> Chris Douglas (Apache Member)
>
> === Nominated Mentors ===
>  * Alan Cabrera (Apache Member)
>  * Geir Magnusson, Jr. (Apache Member and Director)
>  * Owen O'Malley (Apache Member)
>
> === Sponsoring Entity ===
> We are requesting the Incubator to sponsor this project.
>

Re: [VOTE] Kafka to join the Incubator

Posted by Henry Saputra <he...@gmail.com>.

+1

- Henry

On Tue, Jun 28, 2011 at 10:57 AM, Joe Key <jo...@gmail.com> wrote:
> +1
>
> Sincerely,
> J. Andrew Key (Andy)
>
> On Tue, Jun 28, 2011 at 10:32 AM, Mattmann, Chris A (388J) <
> chris.a.mattmann@jpl.nasa.gov> wrote:
>
>> +1 (binding).
>>
>> Thanks!
>>
>> Cheers,
>> Chris
>>
>> On Jun 28, 2011, at 10:00 AM, Jun Rao wrote:
>>
>> > Hi all,
>> >
>> >
>> > Since the discussion on the thread of the Kafka incubator proposal is
>> > winding down, I'd like to call a vote.
>> >
>> > At the end of this mail, I've put a copy of the current proposal.  Here
>> is
>> > a link to the document in the wiki:
>> > http://wiki.apache.org/incubator/KafkaProposal
>> >
>> > And here is a link to the discussion thread:
>> > http://www.mail-archive.com/general@incubator.apache.org/msg29594.html
>> >
>> > Please cast your votes:
>> >
>> > [  ] +1 Accept Kafka for incubation
>> > [  ] +0 Indifferent to Kafka incubation
>> > [  ]  -1 Reject Kafka for incubation
>> >
>> > This vote will close 72 hours from now.
>> >
>> > Thanks,
>> >
>> > Jun
>> >
>> > == Abstract ==
>> > Kafka is a distributed publish-subscribe system for processing large
>> amounts
>> > of streaming data.
>> >
>> > == Proposal ==
>> > Kafka provides an extremely high throughput distributed publish/subscribe
>> > messaging system.  Additionally, it supports relatively long term
>> > persistence of messages to support a wide variety of consumers,
>> partitioning
>> > of the message stream across servers and consumers, and functionality for
>> > loading data into Apache Hadoop for offline, batch processing.
>> >
>> > == Background ==
>> > Kafka was developed at LinkedIn to process the large amounts of events
>> > generated by that company's website and provide a common repository for
>> many
>> > types of consumers to access and process those events. Kafka has been
>> used
>> > in production at LinkedIn scale to handle dozens of types of events
>> > including page views, searches and social network activity. Kafka
>> clusters
>> > at LinkedIn currently process more than two billion events per day.
>> >
>> > Kafka fills the gap between messaging systems such as Apache ActiveMQ,
>> which
>> > provide low latency message delivery but don't focus on throughput, and
>> log
>> > processing systems such as Scribe and Flume, which do not provide
>> adequate
>> > latency for our diverse set of consumers.  Kafka can also be inserted
>> into
>> > traditional log-processing systems, acting as an intermediate step before
>> > further processing. Kafka focuses relentlessly on performance and
>> throughput
>> > by not introspecting into message content, nor indexing them on the
>> broker.
>> > We also achieve high performance by depending on Java's
>> sendFile/transferTo
>> > capabilities to minimize intermediate buffer copies and relying on the
>> OS's
>> > pagecache to efficiently serve up message contents to consumers. Kafka is
>> > also designed to be scalable and it depends on Apache ZooKeeper for
>> > coordination amongst its producers, brokers and consumers.
>> >
>> > Kafka is written in Scala. It was developed internally at LinkedIn to
>> meet
>> > our particular use cases, but will be useful to many organizations facing
>> a
>> > similar need to reliably process large amounts of streaming data.
>> > Therefore, we would like to share it the ASF and begin developing a
>> > community of developers and users within Apache.
>> >
>> > == Rationale ==
>> > Many organizations can benefit from a reliable stream processing system
>> such
>> > as Kafka.  While our use case of processing events from a very large
>> website
>> > like LinkedIn has driven the design of Kafka, its uses are varied and we
>> > expect many new use cases to emerge.  Kafka provides a natural bridge
>> > between near real-time event processing and offline batch processing and
>> > will appeal to many users.
>> >
>> > == Current Status ==
>> > === Meritocracy ===
>> > Our intent with this incubator proposal is to start building a diverse
>> > developer community around Kafka following the Apache meritocracy model.
>> > Since Kafka was open sourced we have solicited contributions via the
>> website
>> > and presentations given to user groups and technical audiences.  We have
>> had
>> > positive responses to these and have received several contributions and
>> > clients for other languages.  We plan to continue this support for new
>> > contributors and work with those who contribute significantly to the
>> project
>> > to make them committers.
>> >
>> > === Community ===
>> > Kafka is currently being used by developed by engineers within LinkedIn
>> and
>> > used in production in that company. Additionally, we have active users in
>> or
>> > have received contributions from a diverse set of companies including
>> > MediaSift, SocialTwist, Clearspring and Urban Airship. Recent public
>> > presentations of Kafka and its goals garnered much interest from
>> potential
>> > contributors. We hope to extend our contributor base significantly and
>> > invite all those who are interested in building high-throughput
>> distributed
>> > systems to participate.  We have begun receiving contributions from
>> outside
>> > of LinkedIn, including clients for several languages including Ruby, PHP,
>> > Clojure, .NET and Python.
>> >
>> > To further this goal, we use GitHub issue tracking and branching
>> facilities,
>> > as well as maintaining a public mailing list via Google Groups.
>> >
>> > === Core Developers ===
>> > Kafka is currently being developed by four engineers at LinkedIn: Neha
>> > Narkhede, Jun Rao, Jakob Homan and Jay Kreps. Jun has experience within
>> > Apache as a Cassandra committer and PMC member. Neha has been an active
>> > contributor to several projects LinkedIn has open sourced, including
>> Bobo,
>> > Sensei and Zoie. Jay has experience with open source software as the
>> > originator of the Project Voldemort project, as well as being active
>> within
>> > the Hadoop ecosystem community. Jakob is an Apache Hadoop committer and
>> PMC
>> > and previous Apache ZooKeeper contributor.
>> >
>> > === Alignment ===
>> > The ASF is the natural choice to host the Kafka project as its goal of
>> > encouraging community-driven open-source projects fits with our vision
>> for
>> > Kafka.  Additionally, many other projects with which we are familiar with
>> > and expect Kafka to integrate with, such as Apache Hadoop, Pig, ZooKeeper
>> > and log4j are hosted by the ASF and we will benefit and provide benefit
>> by
>> > close proximity to them.
>> >
>> > == Known Risks ==
>> > === Orphaned Products ===
>> > The core developers plan to work full time on the project. There is very
>> > little risk of Kafka being abandoned as it is a critical part of
>> LinkedIn's
>> > internal infrastructure and is in production use.
>> >
>> > === Inexperience with Open Source ===
>> > All of the core developers have experience with open source development.
>> > LinkedIn open sourced Kafka several months ago and has been receiving
>> > contributions since.  Jun is an Apache Cassandra committer and PMC
>> member.
>> > Jay and Neha have been involved with several open source projects
>> released
>> > by LinkedIn.  Jakob has been actively involved with the ASF as a
>> full-time
>> > Hadoop committer and PMC member.
>> >
>> > === Homogeneous Developers ===
>> > The current core developers are all from LinkedIn. However, we hope to
>> > establish a developer community that includes contributors from several
>> > corporations and we actively encouraging new contributors via the mailing
>> > lists and public presentations of Kafka.
>> >
>> > === Reliance on Salaried Developers ===
>> > Currently, the developers are paid to do work on Kafka. However, once the
>> > project has a community built around it, we expect to get committers,
>> > developers and community from outside the current core developers.
>> However,
>> > because LinkedIn relies on Kafka internally, the reliance on salaried
>> > developers is unlikely to change.
>> >
>> > === Relationships with Other Apache Products ===
>> > Kafka is deeply integrated with Apache products. Kafka uses Apache
>> ZooKeeper
>> > to coordinate its state amongst the brokers, consumers, and soon, the
>> > producers.  Kafka provides input formats to allow Hadoop MapReduce to
>> load
>> > data directly from Kafka.  Kafka provides an appender to allow consuming
>> > data directly from Apache log4j.
>> >
>> > === An Excessive Fascination with the Apache Brand ===
>> > While we respect the reputation of the Apache brand and have no doubts
>> that
>> > it will attract contributors and users, our interest is primarily to give
>> > Kafka a solid home as an open source project following an established
>> > development model. We have also given reasons in the Rationale and
>> Alignment
>> > sections.
>> >
>> > == Documentation ==
>> > Information about Kafka can be found at [http://sna-projects.com/kafka/]
>> The
>> > following links provide more information about the project:
>> >
>> > * Kafka roadmap and goals: [http://sna-projects.com/kafka/projects.php]
>> > * The GitHub site: [https://github.com/kafka-dev/kafka]
>> > * Kafka overview from Jay Kreps: [
>> > http://www.slideshare.net/ydn/hug-january-2011-kafka-presentation]
>> > * Kafka overview from Jakob Homan: [http://bit.ly/fLmoZz]
>> > * Kafka paper at NetDB 2011: [
>> >
>> http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
>> > ]
>> >
>> > == Initial Source ==
>> > Kafka has been under development at LinkedIn since November 2009.  It was
>> > open sourced by LinkedIn in January 2011.  It is currently hosted on
>> github
>> > under the Apache license at [https://github.com/kafka-dev/kafka]
>> >
>> > Kafka is mainly written in Scala with some performance testing code in
>> Java.
>> > Several clients have been contributed in other languages, including Ruby,
>> > PHP, Clojure, .NET and Python.  Its source tree is entirely self
>> contained
>> > and relies of simple build tool (sbt) as its build system and dependency
>> > resolution mechanism.
>> >
>> > == External Dependencies ==
>> > The dependencies all have Apache compatible licenses.
>> >
>> > == Cryptography ==
>> > Not applicable.
>> >
>> > == Required Resources ==
>> > === Mailing Lists ===
>> > * kafka-private for private PMC discussions (with moderated
>> subscriptions)
>> > * kafka-dev
>> > * kafka-commits
>> > * kafka-user
>> >
>> > === Subversion Directory ===
>> > [https://svn.apache.org/repos/asf/incubator/kafka]
>> >
>> > === Issue Tracking ===
>> > JIRA Kafka (KAFKA)
>> >
>> > === Other Resources ===
>> > The existing code already has unit tests, so we would like a Hudson
>> instance
>> > to run them whenever a new patch is submitted. This can be added after
>> > project creation.
>> >
>> > == Initial Committers ==
>> > * Jay Kreps
>> > * Jun Rao
>> > * Neha Narkhede
>> > * Jakob Homan
>> > * Phillip Rhodes
>> > * Henry Saputra
>> > * Chris Burroughs
>> >
>> > == Affiliations ==
>> > * Jay Kreps (LinkedIn)
>> > * Jun Rao (LinkedIn)
>> > * Neha Narkhede (LinkedIn)
>> > * Jakob Homan (LinkedIn)
>> > * Phillip Rhodes (Fogbeam Labs)
>> > * Henry Saputra (Cisco Systems)
>> > * Chris Burroughs (Clearspring Technologies)
>> >
>> > == Sponsors ==
>> > === Champion ===
>> > Chris Douglas (Apache Member)
>> >
>> > === Nominated Mentors ===
>> > * Alan Cabrera (Apache Member)
>> > * Geir Magnusson, Jr. (Apache Member and Director)
>> > * Owen O'Malley (Apache Member)
>> >
>> > === Sponsoring Entity ===
>> > We are requesting the Incubator to sponsor this project.
>>
>>
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Senior Computer Scientist
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 171-266B, Mailstop: 171-246
>> Email: chris.a.mattmann@nasa.gov
>> WWW:   http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Assistant Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> For additional commands, e-mail: general-help@incubator.apache.org
>>
>>
>
>
> --
> Joe Andrew Key (Andy)
>

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: [VOTE] Kafka to join the Incubator

Posted by Joe Key <jo...@gmail.com>.

+1

Sincerely,
J. Andrew Key (Andy)

On Tue, Jun 28, 2011 at 10:32 AM, Mattmann, Chris A (388J) <
chris.a.mattmann@jpl.nasa.gov> wrote:

> +1 (binding).
>
> Thanks!
>
> Cheers,
> Chris
>
> On Jun 28, 2011, at 10:00 AM, Jun Rao wrote:
>
> > Hi all,
> >
> >
> > Since the discussion on the thread of the Kafka incubator proposal is
> > winding down, I'd like to call a vote.
> >
> > At the end of this mail, I've put a copy of the current proposal.  Here
> is
> > a link to the document in the wiki:
> > http://wiki.apache.org/incubator/KafkaProposal
> >
> > And here is a link to the discussion thread:
> > http://www.mail-archive.com/general@incubator.apache.org/msg29594.html
> >
> > Please cast your votes:
> >
> > [  ] +1 Accept Kafka for incubation
> > [  ] +0 Indifferent to Kafka incubation
> > [  ]  -1 Reject Kafka for incubation
> >
> > This vote will close 72 hours from now.
> >
> > Thanks,
> >
> > Jun
> >
> > == Abstract ==
> > Kafka is a distributed publish-subscribe system for processing large
> amounts
> > of streaming data.
> >
> > == Proposal ==
> > Kafka provides an extremely high throughput distributed publish/subscribe
> > messaging system.  Additionally, it supports relatively long term
> > persistence of messages to support a wide variety of consumers,
> partitioning
> > of the message stream across servers and consumers, and functionality for
> > loading data into Apache Hadoop for offline, batch processing.
> >
> > == Background ==
> > Kafka was developed at LinkedIn to process the large amounts of events
> > generated by that company's website and provide a common repository for
> many
> > types of consumers to access and process those events. Kafka has been
> used
> > in production at LinkedIn scale to handle dozens of types of events
> > including page views, searches and social network activity. Kafka
> clusters
> > at LinkedIn currently process more than two billion events per day.
> >
> > Kafka fills the gap between messaging systems such as Apache ActiveMQ,
> which
> > provide low latency message delivery but don't focus on throughput, and
> log
> > processing systems such as Scribe and Flume, which do not provide
> adequate
> > latency for our diverse set of consumers.  Kafka can also be inserted
> into
> > traditional log-processing systems, acting as an intermediate step before
> > further processing. Kafka focuses relentlessly on performance and
> throughput
> > by not introspecting into message content, nor indexing them on the
> broker.
> > We also achieve high performance by depending on Java's
> sendFile/transferTo
> > capabilities to minimize intermediate buffer copies and relying on the
> OS's
> > pagecache to efficiently serve up message contents to consumers. Kafka is
> > also designed to be scalable and it depends on Apache ZooKeeper for
> > coordination amongst its producers, brokers and consumers.
> >
> > Kafka is written in Scala. It was developed internally at LinkedIn to
> meet
> > our particular use cases, but will be useful to many organizations facing
> a
> > similar need to reliably process large amounts of streaming data.
> > Therefore, we would like to share it the ASF and begin developing a
> > community of developers and users within Apache.
> >
> > == Rationale ==
> > Many organizations can benefit from a reliable stream processing system
> such
> > as Kafka.  While our use case of processing events from a very large
> website
> > like LinkedIn has driven the design of Kafka, its uses are varied and we
> > expect many new use cases to emerge.  Kafka provides a natural bridge
> > between near real-time event processing and offline batch processing and
> > will appeal to many users.
> >
> > == Current Status ==
> > === Meritocracy ===
> > Our intent with this incubator proposal is to start building a diverse
> > developer community around Kafka following the Apache meritocracy model.
> > Since Kafka was open sourced we have solicited contributions via the
> website
> > and presentations given to user groups and technical audiences.  We have
> had
> > positive responses to these and have received several contributions and
> > clients for other languages.  We plan to continue this support for new
> > contributors and work with those who contribute significantly to the
> project
> > to make them committers.
> >
> > === Community ===
> > Kafka is currently being used by developed by engineers within LinkedIn
> and
> > used in production in that company. Additionally, we have active users in
> or
> > have received contributions from a diverse set of companies including
> > MediaSift, SocialTwist, Clearspring and Urban Airship. Recent public
> > presentations of Kafka and its goals garnered much interest from
> potential
> > contributors. We hope to extend our contributor base significantly and
> > invite all those who are interested in building high-throughput
> distributed
> > systems to participate.  We have begun receiving contributions from
> outside
> > of LinkedIn, including clients for several languages including Ruby, PHP,
> > Clojure, .NET and Python.
> >
> > To further this goal, we use GitHub issue tracking and branching
> facilities,
> > as well as maintaining a public mailing list via Google Groups.
> >
> > === Core Developers ===
> > Kafka is currently being developed by four engineers at LinkedIn: Neha
> > Narkhede, Jun Rao, Jakob Homan and Jay Kreps. Jun has experience within
> > Apache as a Cassandra committer and PMC member. Neha has been an active
> > contributor to several projects LinkedIn has open sourced, including
> Bobo,
> > Sensei and Zoie. Jay has experience with open source software as the
> > originator of the Project Voldemort project, as well as being active
> within
> > the Hadoop ecosystem community. Jakob is an Apache Hadoop committer and
> PMC
> > and previous Apache ZooKeeper contributor.
> >
> > === Alignment ===
> > The ASF is the natural choice to host the Kafka project as its goal of
> > encouraging community-driven open-source projects fits with our vision
> for
> > Kafka.  Additionally, many other projects with which we are familiar with
> > and expect Kafka to integrate with, such as Apache Hadoop, Pig, ZooKeeper
> > and log4j are hosted by the ASF and we will benefit and provide benefit
> by
> > close proximity to them.
> >
> > == Known Risks ==
> > === Orphaned Products ===
> > The core developers plan to work full time on the project. There is very
> > little risk of Kafka being abandoned as it is a critical part of
> LinkedIn's
> > internal infrastructure and is in production use.
> >
> > === Inexperience with Open Source ===
> > All of the core developers have experience with open source development.
> > LinkedIn open sourced Kafka several months ago and has been receiving
> > contributions since.  Jun is an Apache Cassandra committer and PMC
> member.
> > Jay and Neha have been involved with several open source projects
> released
> > by LinkedIn.  Jakob has been actively involved with the ASF as a
> full-time
> > Hadoop committer and PMC member.
> >
> > === Homogeneous Developers ===
> > The current core developers are all from LinkedIn. However, we hope to
> > establish a developer community that includes contributors from several
> > corporations and we actively encouraging new contributors via the mailing
> > lists and public presentations of Kafka.
> >
> > === Reliance on Salaried Developers ===
> > Currently, the developers are paid to do work on Kafka. However, once the
> > project has a community built around it, we expect to get committers,
> > developers and community from outside the current core developers.
> However,
> > because LinkedIn relies on Kafka internally, the reliance on salaried
> > developers is unlikely to change.
> >
> > === Relationships with Other Apache Products ===
> > Kafka is deeply integrated with Apache products. Kafka uses Apache
> ZooKeeper
> > to coordinate its state amongst the brokers, consumers, and soon, the
> > producers.  Kafka provides input formats to allow Hadoop MapReduce to
> load
> > data directly from Kafka.  Kafka provides an appender to allow consuming
> > data directly from Apache log4j.
> >
> > === An Excessive Fascination with the Apache Brand ===
> > While we respect the reputation of the Apache brand and have no doubts
> that
> > it will attract contributors and users, our interest is primarily to give
> > Kafka a solid home as an open source project following an established
> > development model. We have also given reasons in the Rationale and
> Alignment
> > sections.
> >
> > == Documentation ==
> > Information about Kafka can be found at [http://sna-projects.com/kafka/]
> The
> > following links provide more information about the project:
> >
> > * Kafka roadmap and goals: [http://sna-projects.com/kafka/projects.php]
> > * The GitHub site: [https://github.com/kafka-dev/kafka]
> > * Kafka overview from Jay Kreps: [
> > http://www.slideshare.net/ydn/hug-january-2011-kafka-presentation]
> > * Kafka overview from Jakob Homan: [http://bit.ly/fLmoZz]
> > * Kafka paper at NetDB 2011: [
> >
> http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
> > ]
> >
> > == Initial Source ==
> > Kafka has been under development at LinkedIn since November 2009.  It was
> > open sourced by LinkedIn in January 2011.  It is currently hosted on
> github
> > under the Apache license at [https://github.com/kafka-dev/kafka]
> >
> > Kafka is mainly written in Scala with some performance testing code in
> Java.
> > Several clients have been contributed in other languages, including Ruby,
> > PHP, Clojure, .NET and Python.  Its source tree is entirely self
> contained
> > and relies of simple build tool (sbt) as its build system and dependency
> > resolution mechanism.
> >
> > == External Dependencies ==
> > The dependencies all have Apache compatible licenses.
> >
> > == Cryptography ==
> > Not applicable.
> >
> > == Required Resources ==
> > === Mailing Lists ===
> > * kafka-private for private PMC discussions (with moderated
> subscriptions)
> > * kafka-dev
> > * kafka-commits
> > * kafka-user
> >
> > === Subversion Directory ===
> > [https://svn.apache.org/repos/asf/incubator/kafka]
> >
> > === Issue Tracking ===
> > JIRA Kafka (KAFKA)
> >
> > === Other Resources ===
> > The existing code already has unit tests, so we would like a Hudson
> instance
> > to run them whenever a new patch is submitted. This can be added after
> > project creation.
> >
> > == Initial Committers ==
> > * Jay Kreps
> > * Jun Rao
> > * Neha Narkhede
> > * Jakob Homan
> > * Phillip Rhodes
> > * Henry Saputra
> > * Chris Burroughs
> >
> > == Affiliations ==
> > * Jay Kreps (LinkedIn)
> > * Jun Rao (LinkedIn)
> > * Neha Narkhede (LinkedIn)
> > * Jakob Homan (LinkedIn)
> > * Phillip Rhodes (Fogbeam Labs)
> > * Henry Saputra (Cisco Systems)
> > * Chris Burroughs (Clearspring Technologies)
> >
> > == Sponsors ==
> > === Champion ===
> > Chris Douglas (Apache Member)
> >
> > === Nominated Mentors ===
> > * Alan Cabrera (Apache Member)
> > * Geir Magnusson, Jr. (Apache Member and Director)
> > * Owen O'Malley (Apache Member)
> >
> > === Sponsoring Entity ===
> > We are requesting the Incubator to sponsor this project.
>
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattmann@nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>
>


-- 
Joe Andrew Key (Andy)

Re: [VOTE] Kafka to join the Incubator

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.

+1 (binding).

Thanks!

Cheers,
Chris

On Jun 28, 2011, at 10:00 AM, Jun Rao wrote:

> Hi all,
> 
> 
> Since the discussion on the thread of the Kafka incubator proposal is
> winding down, I'd like to call a vote.
> 
> At the end of this mail, I've put a copy of the current proposal.  Here is
> a link to the document in the wiki:
> http://wiki.apache.org/incubator/KafkaProposal
> 
> And here is a link to the discussion thread:
> http://www.mail-archive.com/general@incubator.apache.org/msg29594.html
> 
> Please cast your votes:
> 
> [  ] +1 Accept Kafka for incubation
> [  ] +0 Indifferent to Kafka incubation
> [  ]  -1 Reject Kafka for incubation
> 
> This vote will close 72 hours from now.
> 
> Thanks,
> 
> Jun
> 
> == Abstract ==
> Kafka is a distributed publish-subscribe system for processing large amounts
> of streaming data.
> 
> == Proposal ==
> Kafka provides an extremely high throughput distributed publish/subscribe
> messaging system.  Additionally, it supports relatively long term
> persistence of messages to support a wide variety of consumers, partitioning
> of the message stream across servers and consumers, and functionality for
> loading data into Apache Hadoop for offline, batch processing.
> 
> == Background ==
> Kafka was developed at LinkedIn to process the large amounts of events
> generated by that company's website and provide a common repository for many
> types of consumers to access and process those events. Kafka has been used
> in production at LinkedIn scale to handle dozens of types of events
> including page views, searches and social network activity. Kafka clusters
> at LinkedIn currently process more than two billion events per day.
> 
> Kafka fills the gap between messaging systems such as Apache ActiveMQ, which
> provide low latency message delivery but don't focus on throughput, and log
> processing systems such as Scribe and Flume, which do not provide adequate
> latency for our diverse set of consumers.  Kafka can also be inserted into
> traditional log-processing systems, acting as an intermediate step before
> further processing. Kafka focuses relentlessly on performance and throughput
> by not introspecting into message content, nor indexing them on the broker.
> We also achieve high performance by depending on Java's sendFile/transferTo
> capabilities to minimize intermediate buffer copies and relying on the OS's
> pagecache to efficiently serve up message contents to consumers. Kafka is
> also designed to be scalable and it depends on Apache ZooKeeper for
> coordination amongst its producers, brokers and consumers.
> 
> Kafka is written in Scala. It was developed internally at LinkedIn to meet
> our particular use cases, but will be useful to many organizations facing a
> similar need to reliably process large amounts of streaming data.
> Therefore, we would like to share it the ASF and begin developing a
> community of developers and users within Apache.
> 
> == Rationale ==
> Many organizations can benefit from a reliable stream processing system such
> as Kafka.  While our use case of processing events from a very large website
> like LinkedIn has driven the design of Kafka, its uses are varied and we
> expect many new use cases to emerge.  Kafka provides a natural bridge
> between near real-time event processing and offline batch processing and
> will appeal to many users.
> 
> == Current Status ==
> === Meritocracy ===
> Our intent with this incubator proposal is to start building a diverse
> developer community around Kafka following the Apache meritocracy model.
> Since Kafka was open sourced we have solicited contributions via the website
> and presentations given to user groups and technical audiences.  We have had
> positive responses to these and have received several contributions and
> clients for other languages.  We plan to continue this support for new
> contributors and work with those who contribute significantly to the project
> to make them committers.
> 
> === Community ===
> Kafka is currently being used by developed by engineers within LinkedIn and
> used in production in that company. Additionally, we have active users in or
> have received contributions from a diverse set of companies including
> MediaSift, SocialTwist, Clearspring and Urban Airship. Recent public
> presentations of Kafka and its goals garnered much interest from potential
> contributors. We hope to extend our contributor base significantly and
> invite all those who are interested in building high-throughput distributed
> systems to participate.  We have begun receiving contributions from outside
> of LinkedIn, including clients for several languages including Ruby, PHP,
> Clojure, .NET and Python.
> 
> To further this goal, we use GitHub issue tracking and branching facilities,
> as well as maintaining a public mailing list via Google Groups.
> 
> === Core Developers ===
> Kafka is currently being developed by four engineers at LinkedIn: Neha
> Narkhede, Jun Rao, Jakob Homan and Jay Kreps. Jun has experience within
> Apache as a Cassandra committer and PMC member. Neha has been an active
> contributor to several projects LinkedIn has open sourced, including Bobo,
> Sensei and Zoie. Jay has experience with open source software as the
> originator of the Project Voldemort project, as well as being active within
> the Hadoop ecosystem community. Jakob is an Apache Hadoop committer and PMC
> and previous Apache ZooKeeper contributor.
> 
> === Alignment ===
> The ASF is the natural choice to host the Kafka project as its goal of
> encouraging community-driven open-source projects fits with our vision for
> Kafka.  Additionally, many other projects with which we are familiar with
> and expect Kafka to integrate with, such as Apache Hadoop, Pig, ZooKeeper
> and log4j are hosted by the ASF and we will benefit and provide benefit by
> close proximity to them.
> 
> == Known Risks ==
> === Orphaned Products ===
> The core developers plan to work full time on the project. There is very
> little risk of Kafka being abandoned as it is a critical part of LinkedIn's
> internal infrastructure and is in production use.
> 
> === Inexperience with Open Source ===
> All of the core developers have experience with open source development.
> LinkedIn open sourced Kafka several months ago and has been receiving
> contributions since.  Jun is an Apache Cassandra committer and PMC member.
> Jay and Neha have been involved with several open source projects released
> by LinkedIn.  Jakob has been actively involved with the ASF as a full-time
> Hadoop committer and PMC member.
> 
> === Homogeneous Developers ===
> The current core developers are all from LinkedIn. However, we hope to
> establish a developer community that includes contributors from several
> corporations and we actively encouraging new contributors via the mailing
> lists and public presentations of Kafka.
> 
> === Reliance on Salaried Developers ===
> Currently, the developers are paid to do work on Kafka. However, once the
> project has a community built around it, we expect to get committers,
> developers and community from outside the current core developers. However,
> because LinkedIn relies on Kafka internally, the reliance on salaried
> developers is unlikely to change.
> 
> === Relationships with Other Apache Products ===
> Kafka is deeply integrated with Apache products. Kafka uses Apache ZooKeeper
> to coordinate its state amongst the brokers, consumers, and soon, the
> producers.  Kafka provides input formats to allow Hadoop MapReduce to load
> data directly from Kafka.  Kafka provides an appender to allow consuming
> data directly from Apache log4j.
> 
> === An Excessive Fascination with the Apache Brand ===
> While we respect the reputation of the Apache brand and have no doubts that
> it will attract contributors and users, our interest is primarily to give
> Kafka a solid home as an open source project following an established
> development model. We have also given reasons in the Rationale and Alignment
> sections.
> 
> == Documentation ==
> Information about Kafka can be found at [http://sna-projects.com/kafka/] The
> following links provide more information about the project:
> 
> * Kafka roadmap and goals: [http://sna-projects.com/kafka/projects.php]
> * The GitHub site: [https://github.com/kafka-dev/kafka]
> * Kafka overview from Jay Kreps: [
> http://www.slideshare.net/ydn/hug-january-2011-kafka-presentation]
> * Kafka overview from Jakob Homan: [http://bit.ly/fLmoZz]
> * Kafka paper at NetDB 2011: [
> http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
> ]
> 
> == Initial Source ==
> Kafka has been under development at LinkedIn since November 2009.  It was
> open sourced by LinkedIn in January 2011.  It is currently hosted on github
> under the Apache license at [https://github.com/kafka-dev/kafka]
> 
> Kafka is mainly written in Scala with some performance testing code in Java.
> Several clients have been contributed in other languages, including Ruby,
> PHP, Clojure, .NET and Python.  Its source tree is entirely self contained
> and relies of simple build tool (sbt) as its build system and dependency
> resolution mechanism.
> 
> == External Dependencies ==
> The dependencies all have Apache compatible licenses.
> 
> == Cryptography ==
> Not applicable.
> 
> == Required Resources ==
> === Mailing Lists ===
> * kafka-private for private PMC discussions (with moderated subscriptions)
> * kafka-dev
> * kafka-commits
> * kafka-user
> 
> === Subversion Directory ===
> [https://svn.apache.org/repos/asf/incubator/kafka]
> 
> === Issue Tracking ===
> JIRA Kafka (KAFKA)
> 
> === Other Resources ===
> The existing code already has unit tests, so we would like a Hudson instance
> to run them whenever a new patch is submitted. This can be added after
> project creation.
> 
> == Initial Committers ==
> * Jay Kreps
> * Jun Rao
> * Neha Narkhede
> * Jakob Homan
> * Phillip Rhodes
> * Henry Saputra
> * Chris Burroughs
> 
> == Affiliations ==
> * Jay Kreps (LinkedIn)
> * Jun Rao (LinkedIn)
> * Neha Narkhede (LinkedIn)
> * Jakob Homan (LinkedIn)
> * Phillip Rhodes (Fogbeam Labs)
> * Henry Saputra (Cisco Systems)
> * Chris Burroughs (Clearspring Technologies)
> 
> == Sponsors ==
> === Champion ===
> Chris Douglas (Apache Member)
> 
> === Nominated Mentors ===
> * Alan Cabrera (Apache Member)
> * Geir Magnusson, Jr. (Apache Member and Director)
> * Owen O'Malley (Apache Member)
> 
> === Sponsoring Entity ===
> We are requesting the Incubator to sponsor this project.


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: [VOTE] Kafka to join the Incubator

Posted by Bertrand Delacretaz <bd...@apache.org>.

On Tue, Jun 28, 2011 at 8:00 PM, Jun Rao <ju...@gmail.com> wrote:
>... Since the discussion on the thread of the Kafka incubator proposal is
> winding down, I'd like to call a vote....

+1 (binding)

-Bertrand

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: [VOTE] Kafka to join the Incubator

Posted by Chris Burroughs <ch...@gmail.com>.

+1

Chris Burroughs

On 06/28/2011 01:00 PM, Jun Rao wrote:
> Hi all,
> 
> 
> Since the discussion on the thread of the Kafka incubator proposal is
> winding down, I'd like to call a vote.
> 
> At the end of this mail, I've put a copy of the current proposal.  Here is
> a link to the document in the wiki:
> http://wiki.apache.org/incubator/KafkaProposal
> 
> And here is a link to the discussion thread:
> http://www.mail-archive.com/general@incubator.apache.org/msg29594.html
> 
> Please cast your votes:
> 
> [  ] +1 Accept Kafka for incubation
> [  ] +0 Indifferent to Kafka incubation
> [  ]  -1 Reject Kafka for incubation
> 
> This vote will close 72 hours from now.
> 
> Thanks,
> 
> Jun


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: [VOTE] Kafka to join the Incubator

Posted by Damien Katz <da...@apache.org>.

+1

Nice work Jun :)

-Damien

On Jun 28, 2011, at 10:00 AM, Jun Rao wrote:

> Hi all,
> 
> 
> Since the discussion on the thread of the Kafka incubator proposal is
> winding down, I'd like to call a vote.
> 
> At the end of this mail, I've put a copy of the current proposal.  Here is
> a link to the document in the wiki:
> http://wiki.apache.org/incubator/KafkaProposal
> 
> And here is a link to the discussion thread:
> http://www.mail-archive.com/general@incubator.apache.org/msg29594.html
> 
> Please cast your votes:
> 
> [  ] +1 Accept Kafka for incubation
> [  ] +0 Indifferent to Kafka incubation
> [  ]  -1 Reject Kafka for incubation
> 
> This vote will close 72 hours from now.
> 
> Thanks,
> 
> Jun
> 
> == Abstract ==
> Kafka is a distributed publish-subscribe system for processing large amounts
> of streaming data.
> 
> == Proposal ==
> Kafka provides an extremely high throughput distributed publish/subscribe
> messaging system.  Additionally, it supports relatively long term
> persistence of messages to support a wide variety of consumers, partitioning
> of the message stream across servers and consumers, and functionality for
> loading data into Apache Hadoop for offline, batch processing.
> 
> == Background ==
> Kafka was developed at LinkedIn to process the large amounts of events
> generated by that company's website and provide a common repository for many
> types of consumers to access and process those events. Kafka has been used
> in production at LinkedIn scale to handle dozens of types of events
> including page views, searches and social network activity. Kafka clusters
> at LinkedIn currently process more than two billion events per day.
> 
> Kafka fills the gap between messaging systems such as Apache ActiveMQ, which
> provide low latency message delivery but don't focus on throughput, and log
> processing systems such as Scribe and Flume, which do not provide adequate
> latency for our diverse set of consumers.  Kafka can also be inserted into
> traditional log-processing systems, acting as an intermediate step before
> further processing. Kafka focuses relentlessly on performance and throughput
> by not introspecting into message content, nor indexing them on the broker.
> We also achieve high performance by depending on Java's sendFile/transferTo
> capabilities to minimize intermediate buffer copies and relying on the OS's
> pagecache to efficiently serve up message contents to consumers. Kafka is
> also designed to be scalable and it depends on Apache ZooKeeper for
> coordination amongst its producers, brokers and consumers.
> 
> Kafka is written in Scala. It was developed internally at LinkedIn to meet
> our particular use cases, but will be useful to many organizations facing a
> similar need to reliably process large amounts of streaming data.
> Therefore, we would like to share it the ASF and begin developing a
> community of developers and users within Apache.
> 
> == Rationale ==
> Many organizations can benefit from a reliable stream processing system such
> as Kafka.  While our use case of processing events from a very large website
> like LinkedIn has driven the design of Kafka, its uses are varied and we
> expect many new use cases to emerge.  Kafka provides a natural bridge
> between near real-time event processing and offline batch processing and
> will appeal to many users.
> 
> == Current Status ==
> === Meritocracy ===
> Our intent with this incubator proposal is to start building a diverse
> developer community around Kafka following the Apache meritocracy model.
> Since Kafka was open sourced we have solicited contributions via the website
> and presentations given to user groups and technical audiences.  We have had
> positive responses to these and have received several contributions and
> clients for other languages.  We plan to continue this support for new
> contributors and work with those who contribute significantly to the project
> to make them committers.
> 
> === Community ===
> Kafka is currently being used by developed by engineers within LinkedIn and
> used in production in that company. Additionally, we have active users in or
> have received contributions from a diverse set of companies including
> MediaSift, SocialTwist, Clearspring and Urban Airship. Recent public
> presentations of Kafka and its goals garnered much interest from potential
> contributors. We hope to extend our contributor base significantly and
> invite all those who are interested in building high-throughput distributed
> systems to participate.  We have begun receiving contributions from outside
> of LinkedIn, including clients for several languages including Ruby, PHP,
> Clojure, .NET and Python.
> 
> To further this goal, we use GitHub issue tracking and branching facilities,
> as well as maintaining a public mailing list via Google Groups.
> 
> === Core Developers ===
> Kafka is currently being developed by four engineers at LinkedIn: Neha
> Narkhede, Jun Rao, Jakob Homan and Jay Kreps. Jun has experience within
> Apache as a Cassandra committer and PMC member. Neha has been an active
> contributor to several projects LinkedIn has open sourced, including Bobo,
> Sensei and Zoie. Jay has experience with open source software as the
> originator of the Project Voldemort project, as well as being active within
> the Hadoop ecosystem community. Jakob is an Apache Hadoop committer and PMC
> and previous Apache ZooKeeper contributor.
> 
> === Alignment ===
> The ASF is the natural choice to host the Kafka project as its goal of
> encouraging community-driven open-source projects fits with our vision for
> Kafka.  Additionally, many other projects with which we are familiar with
> and expect Kafka to integrate with, such as Apache Hadoop, Pig, ZooKeeper
> and log4j are hosted by the ASF and we will benefit and provide benefit by
> close proximity to them.
> 
> == Known Risks ==
> === Orphaned Products ===
> The core developers plan to work full time on the project. There is very
> little risk of Kafka being abandoned as it is a critical part of LinkedIn's
> internal infrastructure and is in production use.
> 
> === Inexperience with Open Source ===
> All of the core developers have experience with open source development.
> LinkedIn open sourced Kafka several months ago and has been receiving
> contributions since.  Jun is an Apache Cassandra committer and PMC member.
> Jay and Neha have been involved with several open source projects released
> by LinkedIn.  Jakob has been actively involved with the ASF as a full-time
> Hadoop committer and PMC member.
> 
> === Homogeneous Developers ===
> The current core developers are all from LinkedIn. However, we hope to
> establish a developer community that includes contributors from several
> corporations and we actively encouraging new contributors via the mailing
> lists and public presentations of Kafka.
> 
> === Reliance on Salaried Developers ===
> Currently, the developers are paid to do work on Kafka. However, once the
> project has a community built around it, we expect to get committers,
> developers and community from outside the current core developers. However,
> because LinkedIn relies on Kafka internally, the reliance on salaried
> developers is unlikely to change.
> 
> === Relationships with Other Apache Products ===
> Kafka is deeply integrated with Apache products. Kafka uses Apache ZooKeeper
> to coordinate its state amongst the brokers, consumers, and soon, the
> producers.  Kafka provides input formats to allow Hadoop MapReduce to load
> data directly from Kafka.  Kafka provides an appender to allow consuming
> data directly from Apache log4j.
> 
> === An Excessive Fascination with the Apache Brand ===
> While we respect the reputation of the Apache brand and have no doubts that
> it will attract contributors and users, our interest is primarily to give
> Kafka a solid home as an open source project following an established
> development model. We have also given reasons in the Rationale and Alignment
> sections.
> 
> == Documentation ==
> Information about Kafka can be found at [http://sna-projects.com/kafka/] The
> following links provide more information about the project:
> 
> * Kafka roadmap and goals: [http://sna-projects.com/kafka/projects.php]
> * The GitHub site: [https://github.com/kafka-dev/kafka]
> * Kafka overview from Jay Kreps: [
> http://www.slideshare.net/ydn/hug-january-2011-kafka-presentation]
> * Kafka overview from Jakob Homan: [http://bit.ly/fLmoZz]
> * Kafka paper at NetDB 2011: [
> http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
> ]
> 
> == Initial Source ==
> Kafka has been under development at LinkedIn since November 2009.  It was
> open sourced by LinkedIn in January 2011.  It is currently hosted on github
> under the Apache license at [https://github.com/kafka-dev/kafka]
> 
> Kafka is mainly written in Scala with some performance testing code in Java.
> Several clients have been contributed in other languages, including Ruby,
> PHP, Clojure, .NET and Python.  Its source tree is entirely self contained
> and relies of simple build tool (sbt) as its build system and dependency
> resolution mechanism.
> 
> == External Dependencies ==
> The dependencies all have Apache compatible licenses.
> 
> == Cryptography ==
> Not applicable.
> 
> == Required Resources ==
> === Mailing Lists ===
> * kafka-private for private PMC discussions (with moderated subscriptions)
> * kafka-dev
> * kafka-commits
> * kafka-user
> 
> === Subversion Directory ===
> [https://svn.apache.org/repos/asf/incubator/kafka]
> 
> === Issue Tracking ===
> JIRA Kafka (KAFKA)
> 
> === Other Resources ===
> The existing code already has unit tests, so we would like a Hudson instance
> to run them whenever a new patch is submitted. This can be added after
> project creation.
> 
> == Initial Committers ==
> * Jay Kreps
> * Jun Rao
> * Neha Narkhede
> * Jakob Homan
> * Phillip Rhodes
> * Henry Saputra
> * Chris Burroughs
> 
> == Affiliations ==
> * Jay Kreps (LinkedIn)
> * Jun Rao (LinkedIn)
> * Neha Narkhede (LinkedIn)
> * Jakob Homan (LinkedIn)
> * Phillip Rhodes (Fogbeam Labs)
> * Henry Saputra (Cisco Systems)
> * Chris Burroughs (Clearspring Technologies)
> 
> == Sponsors ==
> === Champion ===
> Chris Douglas (Apache Member)
> 
> === Nominated Mentors ===
> * Alan Cabrera (Apache Member)
> * Geir Magnusson, Jr. (Apache Member and Director)
> * Owen O'Malley (Apache Member)
> 
> === Sponsoring Entity ===
> We are requesting the Incubator to sponsor this project.


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: [VOTE] Kafka to join the Incubator

Posted by Richard Hirsch <hi...@gmail.com>.

+1 (Binding)

By the way, it is great to see another Scala-based project coming to Apache.

Dick

VP Apache ESME

On Wed, Jun 29, 2011 at 12:35 PM, Tommaso Teofili
<to...@gmail.com> wrote:
> +1 (binding)
> Tommaso
>
> 2011/6/28 Jun Rao <ju...@gmail.com>
>
>> Hi all,
>>
>>
>> Since the discussion on the thread of the Kafka incubator proposal is
>> winding down, I'd like to call a vote.
>>
>> At the end of this mail, I've put a copy of the current proposal.  Here is
>> a link to the document in the wiki:
>> http://wiki.apache.org/incubator/KafkaProposal
>>
>> And here is a link to the discussion thread:
>> http://www.mail-archive.com/general@incubator.apache.org/msg29594.html
>>
>> Please cast your votes:
>>
>> [  ] +1 Accept Kafka for incubation
>> [  ] +0 Indifferent to Kafka incubation
>> [  ]  -1 Reject Kafka for incubation
>>
>> This vote will close 72 hours from now.
>>
>> Thanks,
>>
>> Jun
>>
>> == Abstract ==
>> Kafka is a distributed publish-subscribe system for processing large
>> amounts
>> of streaming data.
>>
>> == Proposal ==
>> Kafka provides an extremely high throughput distributed publish/subscribe
>> messaging system.  Additionally, it supports relatively long term
>> persistence of messages to support a wide variety of consumers,
>> partitioning
>> of the message stream across servers and consumers, and functionality for
>> loading data into Apache Hadoop for offline, batch processing.
>>
>> == Background ==
>> Kafka was developed at LinkedIn to process the large amounts of events
>> generated by that company's website and provide a common repository for
>> many
>> types of consumers to access and process those events. Kafka has been used
>> in production at LinkedIn scale to handle dozens of types of events
>> including page views, searches and social network activity. Kafka clusters
>> at LinkedIn currently process more than two billion events per day.
>>
>> Kafka fills the gap between messaging systems such as Apache ActiveMQ,
>> which
>> provide low latency message delivery but don't focus on throughput, and log
>> processing systems such as Scribe and Flume, which do not provide adequate
>> latency for our diverse set of consumers.  Kafka can also be inserted into
>> traditional log-processing systems, acting as an intermediate step before
>> further processing. Kafka focuses relentlessly on performance and
>> throughput
>> by not introspecting into message content, nor indexing them on the broker.
>>  We also achieve high performance by depending on Java's
>> sendFile/transferTo
>> capabilities to minimize intermediate buffer copies and relying on the OS's
>> pagecache to efficiently serve up message contents to consumers. Kafka is
>> also designed to be scalable and it depends on Apache ZooKeeper for
>> coordination amongst its producers, brokers and consumers.
>>
>> Kafka is written in Scala. It was developed internally at LinkedIn to meet
>> our particular use cases, but will be useful to many organizations facing a
>> similar need to reliably process large amounts of streaming data.
>>  Therefore, we would like to share it the ASF and begin developing a
>> community of developers and users within Apache.
>>
>> == Rationale ==
>> Many organizations can benefit from a reliable stream processing system
>> such
>> as Kafka.  While our use case of processing events from a very large
>> website
>> like LinkedIn has driven the design of Kafka, its uses are varied and we
>> expect many new use cases to emerge.  Kafka provides a natural bridge
>> between near real-time event processing and offline batch processing and
>> will appeal to many users.
>>
>> == Current Status ==
>> === Meritocracy ===
>> Our intent with this incubator proposal is to start building a diverse
>> developer community around Kafka following the Apache meritocracy model.
>> Since Kafka was open sourced we have solicited contributions via the
>> website
>> and presentations given to user groups and technical audiences.  We have
>> had
>> positive responses to these and have received several contributions and
>> clients for other languages.  We plan to continue this support for new
>> contributors and work with those who contribute significantly to the
>> project
>> to make them committers.
>>
>> === Community ===
>> Kafka is currently being used by developed by engineers within LinkedIn and
>> used in production in that company. Additionally, we have active users in
>> or
>> have received contributions from a diverse set of companies including
>> MediaSift, SocialTwist, Clearspring and Urban Airship. Recent public
>> presentations of Kafka and its goals garnered much interest from potential
>> contributors. We hope to extend our contributor base significantly and
>> invite all those who are interested in building high-throughput distributed
>> systems to participate.  We have begun receiving contributions from outside
>> of LinkedIn, including clients for several languages including Ruby, PHP,
>> Clojure, .NET and Python.
>>
>> To further this goal, we use GitHub issue tracking and branching
>> facilities,
>> as well as maintaining a public mailing list via Google Groups.
>>
>> === Core Developers ===
>> Kafka is currently being developed by four engineers at LinkedIn: Neha
>> Narkhede, Jun Rao, Jakob Homan and Jay Kreps. Jun has experience within
>> Apache as a Cassandra committer and PMC member. Neha has been an active
>> contributor to several projects LinkedIn has open sourced, including Bobo,
>> Sensei and Zoie. Jay has experience with open source software as the
>> originator of the Project Voldemort project, as well as being active within
>> the Hadoop ecosystem community. Jakob is an Apache Hadoop committer and PMC
>> and previous Apache ZooKeeper contributor.
>>
>> === Alignment ===
>> The ASF is the natural choice to host the Kafka project as its goal of
>> encouraging community-driven open-source projects fits with our vision for
>> Kafka.  Additionally, many other projects with which we are familiar with
>> and expect Kafka to integrate with, such as Apache Hadoop, Pig, ZooKeeper
>> and log4j are hosted by the ASF and we will benefit and provide benefit by
>> close proximity to them.
>>
>> == Known Risks ==
>> === Orphaned Products ===
>> The core developers plan to work full time on the project. There is very
>> little risk of Kafka being abandoned as it is a critical part of LinkedIn's
>> internal infrastructure and is in production use.
>>
>> === Inexperience with Open Source ===
>> All of the core developers have experience with open source development.
>>  LinkedIn open sourced Kafka several months ago and has been receiving
>> contributions since.  Jun is an Apache Cassandra committer and PMC member.
>>  Jay and Neha have been involved with several open source projects released
>> by LinkedIn.  Jakob has been actively involved with the ASF as a full-time
>> Hadoop committer and PMC member.
>>
>> === Homogeneous Developers ===
>> The current core developers are all from LinkedIn. However, we hope to
>> establish a developer community that includes contributors from several
>> corporations and we actively encouraging new contributors via the mailing
>> lists and public presentations of Kafka.
>>
>> === Reliance on Salaried Developers ===
>> Currently, the developers are paid to do work on Kafka. However, once the
>> project has a community built around it, we expect to get committers,
>> developers and community from outside the current core developers. However,
>> because LinkedIn relies on Kafka internally, the reliance on salaried
>> developers is unlikely to change.
>>
>> === Relationships with Other Apache Products ===
>> Kafka is deeply integrated with Apache products. Kafka uses Apache
>> ZooKeeper
>> to coordinate its state amongst the brokers, consumers, and soon, the
>> producers.  Kafka provides input formats to allow Hadoop MapReduce to load
>> data directly from Kafka.  Kafka provides an appender to allow consuming
>> data directly from Apache log4j.
>>
>> === An Excessive Fascination with the Apache Brand ===
>> While we respect the reputation of the Apache brand and have no doubts that
>> it will attract contributors and users, our interest is primarily to give
>> Kafka a solid home as an open source project following an established
>> development model. We have also given reasons in the Rationale and
>> Alignment
>> sections.
>>
>> == Documentation ==
>> Information about Kafka can be found at [http://sna-projects.com/kafka/]
>> The
>> following links provide more information about the project:
>>
>>  * Kafka roadmap and goals: [http://sna-projects.com/kafka/projects.php]
>>  * The GitHub site: [https://github.com/kafka-dev/kafka]
>>  * Kafka overview from Jay Kreps: [
>> http://www.slideshare.net/ydn/hug-january-2011-kafka-presentation]
>>  * Kafka overview from Jakob Homan: [http://bit.ly/fLmoZz]
>>  * Kafka paper at NetDB 2011: [
>>
>> http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
>> ]
>>
>> == Initial Source ==
>> Kafka has been under development at LinkedIn since November 2009.  It was
>> open sourced by LinkedIn in January 2011.  It is currently hosted on github
>> under the Apache license at [https://github.com/kafka-dev/kafka]
>>
>> Kafka is mainly written in Scala with some performance testing code in
>> Java.
>>  Several clients have been contributed in other languages, including Ruby,
>> PHP, Clojure, .NET and Python.  Its source tree is entirely self contained
>> and relies of simple build tool (sbt) as its build system and dependency
>> resolution mechanism.
>>
>> == External Dependencies ==
>> The dependencies all have Apache compatible licenses.
>>
>> == Cryptography ==
>> Not applicable.
>>
>> == Required Resources ==
>> === Mailing Lists ===
>>  * kafka-private for private PMC discussions (with moderated subscriptions)
>>  * kafka-dev
>>  * kafka-commits
>>  * kafka-user
>>
>> === Subversion Directory ===
>> [https://svn.apache.org/repos/asf/incubator/kafka]
>>
>> === Issue Tracking ===
>> JIRA Kafka (KAFKA)
>>
>> === Other Resources ===
>> The existing code already has unit tests, so we would like a Hudson
>> instance
>> to run them whenever a new patch is submitted. This can be added after
>> project creation.
>>
>> == Initial Committers ==
>>  * Jay Kreps
>>  * Jun Rao
>>  * Neha Narkhede
>>  * Jakob Homan
>>  * Phillip Rhodes
>>  * Henry Saputra
>>  * Chris Burroughs
>>
>> == Affiliations ==
>>  * Jay Kreps (LinkedIn)
>>  * Jun Rao (LinkedIn)
>>  * Neha Narkhede (LinkedIn)
>>  * Jakob Homan (LinkedIn)
>>  * Phillip Rhodes (Fogbeam Labs)
>>  * Henry Saputra (Cisco Systems)
>>  * Chris Burroughs (Clearspring Technologies)
>>
>> == Sponsors ==
>> === Champion ===
>> Chris Douglas (Apache Member)
>>
>> === Nominated Mentors ===
>>  * Alan Cabrera (Apache Member)
>>  * Geir Magnusson, Jr. (Apache Member and Director)
>>  * Owen O'Malley (Apache Member)
>>
>> === Sponsoring Entity ===
>> We are requesting the Incubator to sponsor this project.
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: [VOTE] Kafka to join the Incubator

Posted by Tommaso Teofili <to...@gmail.com>.

+1 (binding)
Tommaso

2011/6/28 Jun Rao <ju...@gmail.com>

> Hi all,
>
>
> Since the discussion on the thread of the Kafka incubator proposal is
> winding down, I'd like to call a vote.
>
> At the end of this mail, I've put a copy of the current proposal.  Here is
> a link to the document in the wiki:
> http://wiki.apache.org/incubator/KafkaProposal
>
> And here is a link to the discussion thread:
> http://www.mail-archive.com/general@incubator.apache.org/msg29594.html
>
> Please cast your votes:
>
> [  ] +1 Accept Kafka for incubation
> [  ] +0 Indifferent to Kafka incubation
> [  ]  -1 Reject Kafka for incubation
>
> This vote will close 72 hours from now.
>
> Thanks,
>
> Jun
>
> == Abstract ==
> Kafka is a distributed publish-subscribe system for processing large
> amounts
> of streaming data.
>
> == Proposal ==
> Kafka provides an extremely high throughput distributed publish/subscribe
> messaging system.  Additionally, it supports relatively long term
> persistence of messages to support a wide variety of consumers,
> partitioning
> of the message stream across servers and consumers, and functionality for
> loading data into Apache Hadoop for offline, batch processing.
>
> == Background ==
> Kafka was developed at LinkedIn to process the large amounts of events
> generated by that company's website and provide a common repository for
> many
> types of consumers to access and process those events. Kafka has been used
> in production at LinkedIn scale to handle dozens of types of events
> including page views, searches and social network activity. Kafka clusters
> at LinkedIn currently process more than two billion events per day.
>
> Kafka fills the gap between messaging systems such as Apache ActiveMQ,
> which
> provide low latency message delivery but don't focus on throughput, and log
> processing systems such as Scribe and Flume, which do not provide adequate
> latency for our diverse set of consumers.  Kafka can also be inserted into
> traditional log-processing systems, acting as an intermediate step before
> further processing. Kafka focuses relentlessly on performance and
> throughput
> by not introspecting into message content, nor indexing them on the broker.
>  We also achieve high performance by depending on Java's
> sendFile/transferTo
> capabilities to minimize intermediate buffer copies and relying on the OS's
> pagecache to efficiently serve up message contents to consumers. Kafka is
> also designed to be scalable and it depends on Apache ZooKeeper for
> coordination amongst its producers, brokers and consumers.
>
> Kafka is written in Scala. It was developed internally at LinkedIn to meet
> our particular use cases, but will be useful to many organizations facing a
> similar need to reliably process large amounts of streaming data.
>  Therefore, we would like to share it the ASF and begin developing a
> community of developers and users within Apache.
>
> == Rationale ==
> Many organizations can benefit from a reliable stream processing system
> such
> as Kafka.  While our use case of processing events from a very large
> website
> like LinkedIn has driven the design of Kafka, its uses are varied and we
> expect many new use cases to emerge.  Kafka provides a natural bridge
> between near real-time event processing and offline batch processing and
> will appeal to many users.
>
> == Current Status ==
> === Meritocracy ===
> Our intent with this incubator proposal is to start building a diverse
> developer community around Kafka following the Apache meritocracy model.
> Since Kafka was open sourced we have solicited contributions via the
> website
> and presentations given to user groups and technical audiences.  We have
> had
> positive responses to these and have received several contributions and
> clients for other languages.  We plan to continue this support for new
> contributors and work with those who contribute significantly to the
> project
> to make them committers.
>
> === Community ===
> Kafka is currently being used by developed by engineers within LinkedIn and
> used in production in that company. Additionally, we have active users in
> or
> have received contributions from a diverse set of companies including
> MediaSift, SocialTwist, Clearspring and Urban Airship. Recent public
> presentations of Kafka and its goals garnered much interest from potential
> contributors. We hope to extend our contributor base significantly and
> invite all those who are interested in building high-throughput distributed
> systems to participate.  We have begun receiving contributions from outside
> of LinkedIn, including clients for several languages including Ruby, PHP,
> Clojure, .NET and Python.
>
> To further this goal, we use GitHub issue tracking and branching
> facilities,
> as well as maintaining a public mailing list via Google Groups.
>
> === Core Developers ===
> Kafka is currently being developed by four engineers at LinkedIn: Neha
> Narkhede, Jun Rao, Jakob Homan and Jay Kreps. Jun has experience within
> Apache as a Cassandra committer and PMC member. Neha has been an active
> contributor to several projects LinkedIn has open sourced, including Bobo,
> Sensei and Zoie. Jay has experience with open source software as the
> originator of the Project Voldemort project, as well as being active within
> the Hadoop ecosystem community. Jakob is an Apache Hadoop committer and PMC
> and previous Apache ZooKeeper contributor.
>
> === Alignment ===
> The ASF is the natural choice to host the Kafka project as its goal of
> encouraging community-driven open-source projects fits with our vision for
> Kafka.  Additionally, many other projects with which we are familiar with
> and expect Kafka to integrate with, such as Apache Hadoop, Pig, ZooKeeper
> and log4j are hosted by the ASF and we will benefit and provide benefit by
> close proximity to them.
>
> == Known Risks ==
> === Orphaned Products ===
> The core developers plan to work full time on the project. There is very
> little risk of Kafka being abandoned as it is a critical part of LinkedIn's
> internal infrastructure and is in production use.
>
> === Inexperience with Open Source ===
> All of the core developers have experience with open source development.
>  LinkedIn open sourced Kafka several months ago and has been receiving
> contributions since.  Jun is an Apache Cassandra committer and PMC member.
>  Jay and Neha have been involved with several open source projects released
> by LinkedIn.  Jakob has been actively involved with the ASF as a full-time
> Hadoop committer and PMC member.
>
> === Homogeneous Developers ===
> The current core developers are all from LinkedIn. However, we hope to
> establish a developer community that includes contributors from several
> corporations and we actively encouraging new contributors via the mailing
> lists and public presentations of Kafka.
>
> === Reliance on Salaried Developers ===
> Currently, the developers are paid to do work on Kafka. However, once the
> project has a community built around it, we expect to get committers,
> developers and community from outside the current core developers. However,
> because LinkedIn relies on Kafka internally, the reliance on salaried
> developers is unlikely to change.
>
> === Relationships with Other Apache Products ===
> Kafka is deeply integrated with Apache products. Kafka uses Apache
> ZooKeeper
> to coordinate its state amongst the brokers, consumers, and soon, the
> producers.  Kafka provides input formats to allow Hadoop MapReduce to load
> data directly from Kafka.  Kafka provides an appender to allow consuming
> data directly from Apache log4j.
>
> === An Excessive Fascination with the Apache Brand ===
> While we respect the reputation of the Apache brand and have no doubts that
> it will attract contributors and users, our interest is primarily to give
> Kafka a solid home as an open source project following an established
> development model. We have also given reasons in the Rationale and
> Alignment
> sections.
>
> == Documentation ==
> Information about Kafka can be found at [http://sna-projects.com/kafka/]
> The
> following links provide more information about the project:
>
>  * Kafka roadmap and goals: [http://sna-projects.com/kafka/projects.php]
>  * The GitHub site: [https://github.com/kafka-dev/kafka]
>  * Kafka overview from Jay Kreps: [
> http://www.slideshare.net/ydn/hug-january-2011-kafka-presentation]
>  * Kafka overview from Jakob Homan: [http://bit.ly/fLmoZz]
>  * Kafka paper at NetDB 2011: [
>
> http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
> ]
>
> == Initial Source ==
> Kafka has been under development at LinkedIn since November 2009.  It was
> open sourced by LinkedIn in January 2011.  It is currently hosted on github
> under the Apache license at [https://github.com/kafka-dev/kafka]
>
> Kafka is mainly written in Scala with some performance testing code in
> Java.
>  Several clients have been contributed in other languages, including Ruby,
> PHP, Clojure, .NET and Python.  Its source tree is entirely self contained
> and relies of simple build tool (sbt) as its build system and dependency
> resolution mechanism.
>
> == External Dependencies ==
> The dependencies all have Apache compatible licenses.
>
> == Cryptography ==
> Not applicable.
>
> == Required Resources ==
> === Mailing Lists ===
>  * kafka-private for private PMC discussions (with moderated subscriptions)
>  * kafka-dev
>  * kafka-commits
>  * kafka-user
>
> === Subversion Directory ===
> [https://svn.apache.org/repos/asf/incubator/kafka]
>
> === Issue Tracking ===
> JIRA Kafka (KAFKA)
>
> === Other Resources ===
> The existing code already has unit tests, so we would like a Hudson
> instance
> to run them whenever a new patch is submitted. This can be added after
> project creation.
>
> == Initial Committers ==
>  * Jay Kreps
>  * Jun Rao
>  * Neha Narkhede
>  * Jakob Homan
>  * Phillip Rhodes
>  * Henry Saputra
>  * Chris Burroughs
>
> == Affiliations ==
>  * Jay Kreps (LinkedIn)
>  * Jun Rao (LinkedIn)
>  * Neha Narkhede (LinkedIn)
>  * Jakob Homan (LinkedIn)
>  * Phillip Rhodes (Fogbeam Labs)
>  * Henry Saputra (Cisco Systems)
>  * Chris Burroughs (Clearspring Technologies)
>
> == Sponsors ==
> === Champion ===
> Chris Douglas (Apache Member)
>
> === Nominated Mentors ===
>  * Alan Cabrera (Apache Member)
>  * Geir Magnusson, Jr. (Apache Member and Director)
>  * Owen O'Malley (Apache Member)
>
> === Sponsoring Entity ===
> We are requesting the Incubator to sponsor this project.
>