You are viewing a plain text version of this content. The canonical link for it is here.

Posted to general@incubator.apache.org by Jun Rao <ju...@gmail.com> on 2011/06/22 18:17:59 UTC

[PROPOSAL] Kafka for the Apache Incubator

Hi,

I would like to propose Kafka to be an Apache Incubator project.  Kafka is a
distributed, high throughput, publish-subscribe system for processing large
amounts of streaming data.

Here's a link to the proposal in the Incubator wiki
http://wiki.apache.org/incubator/KafkaProposal

I've also pasted the initial contents below.

Thanks,

Jun

== Abstract ==
Kafka is a distributed publish-subscribe system for processing large amounts
of streaming data.

== Proposal ==
Kafka provides an extremely high throughput distributed publish/subscribe
messaging system.  Additionally, it supports relatively long term
persistence of messages to support a wide variety of consumers, partitioning
of the message stream across servers and consumers, and functionality for
loading data into Apache Hadoop for offline, batch processing.

== Background ==
Kafka was developed at LinkedIn to process the large amounts of events
generated by that company's website and provide a common repository for many
types of consumers to access and process those events. Kafka has been used
in production at LinkedIn scale to handle dozens of types of events
including page views, searches and social network activity. Kafka clusters
at LinkedIn currently process more than two billion events per day.

Kafka fills the gap between messaging systems such as Apache ActiveMQ, which
can provide high-volume messaging systems but lack persistence of those
messages, and log processing systems such as Scribe and Flume, which do not
provide adequate latency for our diverse set of consumers.  Kafka can also
be inserted into traditional log-processing systems, acting as an
intermediate step before further processing. Kafka focuses relentlessly on
performance and throughput by not introspecting into message content, nor
indexing them on the broker.  We also achieve high performance by depending
on Java's sendFile/transferTo capabilities to minimize intermediate buffer
copies and relying on the OS's pagecache to efficiently serve up message
contents to consumers.

Kafka is written in Scala and depends on Apache ZooKeeper for coordination
amongst its producers, brokers and consumers.

Kafka was developed internally at LinkedIn to meet our particular use cases,
but will be useful to many organizations facing a similar need to reliably
process large amounts of streaming data.  Therefore, we would like to share
it the ASF and begin developing a community of developers and users within
Apache.

== Rationale ==
Many organizations can benefit from a reliable stream processing system such
as Kafka.  While our use case of processing events from a very large website
like LinkedIn has driven the design of Kafka, its uses are varied and we
expect many new use cases to emerge.  Kafka provides a natural bridge
between near real-time event processing and offline batch processing and
will appeal to many users.

== Current Status ==
=== Meritocracy ===
Our intent with this incubator proposal is to start building a diverse
developer community around Kafka following the Apache meritocracy model.
Since Kafka was open sourced we have solicited contributions via the website
and presentations given to user groups and technical audiences.  We have had
positive responses to these and have received several contributions and
clients for other languages.  We plan to continue this support for new
contributors and work with those who contribute significantly to the project
to make them committers.

=== Community ===
Kafka is currently being used by developed by engineers within LinkedIn and
used in production in that company. Additionally, we have active users in or
have received contributions from a diverse set of companies including
MediaSift, SocialTwist, Clearspring and Urban Airship. Recent public
presentations of Kafka and its goals garnered much interest from potential
contributors. We hope to extend our contributor base significantly and
invite all those who are interested in building high-throughput distributed
systems to participate.  We have begun receiving contributions from outside
of LinkedIn, including clients for several languages including Ruby, PHP,
Clojure, .NET and Python.

To further this goal, we use GitHub issue tracking and branching facilities,
as well as maintaining a public mailing list via Google Groups.

=== Core Developers ===
Kafka is currently being developed by four engineers at LinkedIn: Neha
Narkhede, Jun Rao, Jakob Homan and Jay Kreps. Jun has experience within
Apache as a Cassandra committer and PMC member. Neha has been an active
contributor to several projects LinkedIn has open sourced, including Bobo,
Sensei and Zoie. Jay has experience with open source software as the
originator of the Project Voldemort project, as well as being active within
the Hadoop ecosystem community. Jakob is an Apache Hadoop committer and PMC
and previous Apache ZooKeeper contributor.

=== Alignment ===
The ASF is the natural choice to host the Kafka project as its goal of
encouraging community-driven open-source projects fits with our vision for
Kafka.  Additionally, many other projects with which we are familiar with
and expect Kafka to integrate with, such as Apache Hadoop, Pig, ZooKeeper
and log4j are hosted by the ASF and we will benefit and provide benefit by
close proximity to them.

== Known Risks ==
=== Orphaned Products ===
The core developers plan to work full time on the project. There is very
little risk of Kafka being abandoned as it is a critical part of LinkedIn's
internal infrastructure and is in production use.

=== Inexperience with Open Source ===
All of the core developers have experience with open source development.
 LinkedIn open sourced Kafka several months ago and has been receiving
contributions since.  Jun is an Apache Cassandra committer and PMC member.
 Jay and Neha have been involved with several open source projects released
by LinkedIn.  Jakob has been actively involved with the ASF as a full-time
Hadoop committer and PMC member.

=== Homogeneous Developers ===
The current core developers are all from LinkedIn. However, we hope to
establish a developer community that includes contributors from several
corporations and we actively encouraging new contributors via the mailing
lists and public presentations of Kafka.

=== Reliance on Salaried Developers ===
Currently, the developers are paid to do work on Kafka. However, once the
project has a community built around it, we expect to get committers,
developers and community from outside the current core developers. However,
because LinkedIn relies on Kafka internally, the reliance on salaried
developers is unlikely to change.

=== Relationships with Other Apache Products ===
Kafka is deeply integrated with Apache products. Kafka uses Apache ZooKeeper
to coordinate its state amongst the brokers, consumers, and soon, the
producers.  Kafka provides input formats to allow Hadoop MapReduce to load
data directly from Kafka.  Kafka provides an appender to allow consuming
data directly from Apache log4j.

=== An Excessive Fascination with the Apache Brand ===
While we respect the reputation of the Apache brand and have no doubts that
it will attract contributors and users, our interest is primarily to give
Kafka a solid home as an open source project following an established
development model. We have also given reasons in the Rationale and Alignment
sections.

== Documentation ==
Information about Kafka can be found at [http://sna-projects.com/kafka/] The
following links provide more information about the project:

 * Kafka roadmap and goals: [http://sna-projects.com/kafka/projects.php]
 * The GitHub site: [https://github.com/kafka-dev/kafka]
 * Kafka overview from Jay Kreps: [
http://www.slideshare.net/ydn/hug-january-2011-kafka-presentation]
 * Kafka overview from Jakob Homan: [http://bit.ly/fLmoZz]
 * Kafka paper at NetDB 2011: [
http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
]

== Initial Source ==
Kafka has been under development at LinkedIn since November 2009.  It was
open sourced by LinkedIn in January 2011.  It is currently hosted on github
under the Apache license at [https://github.com/kafka-dev/kafka]

Kafka is mainly written in Scala with some performance testing code in Java.
 Several clients have been contributed in other languages, including Ruby,
PHP, Clojure, .NET and Python.  Its source tree is entirely self contained
and relies of simple build tool (sbt) as its build system and dependency
resolution mechanism.

== External Dependencies ==
The dependencies all have Apache compatible licenses.

== Cryptography ==
Not applicable.

== Required Resources ==
=== Mailing Lists ===
 * kafka-private for private PMC discussions (with moderated subscriptions)
  * kafka-dev   * kafka-commits   * kafka-user

=== Subversion Directory ===
[https://svn.apache.org/repos/asf/incubator/kafka]

=== Issue Tracking ===
JIRA Kafka (KAFKA)

=== Other Resources ===
The existing code already has unit tests, so we would like a Hudson instance
to run them whenever a new patch is submitted. This can be added after
project creation.

== Initial Committers ==
 * Jay Kreps
 * Jun Rao
 * Neha Narkhede
 * Jakob Homan

== Affiliations ==
 * Jay Kreps (LinkedIn)
 * Jun Rao (LinkedIn)
 * Neha Narkhede (LinkedIn)
 * Jakob Homan (LinkedIn)

== Sponsors ==
=== Champion ===
Chris Douglas (Apache Member)

=== Nominated Mentors ===
 * Alan Cabrera (Apache Member)
 * Geir Magnusson, Jr. (Apache Member and Director)
 * Owen O'Malley (Apache Member)

=== Sponsoring Entity ===
We are requesting the Incubator to sponsor this project.

Re: [PROPOSAL] Kafka for the Apache Incubator

Posted by Jeffrey Damick <je...@gmail.com>.

+1  from us also - at neustar are working on a large deployment using kafka
as well.  We'd be interested to help in the future.

-jeff


On Fri, Jun 24, 2011 at 2:17 PM, Henry Saputra <he...@gmail.com>wrote:

> +1
>
> A very good proposal and it seems to help solve our need for low
> latency event messaging system, so looking forward to it.
>
> I would love to contribute to the project and have added my name to
> list initial committers if no objection.
>
> - Henry
>
> >> 2011/6/22 Jun Rao <ju...@gmail.com>
> >>
> >> > Hi,
> >> >
> >> > I would like to propose Kafka to be an Apache Incubator project.
>  Kafka
> >> is
> >> > a
> >> > distributed, high throughput, publish-subscribe system for processing
> >> large
> >> > amounts of streaming data.
> >> >
> >> > Here's a link to the proposal in the Incubator wiki
> >> > http://wiki.apache.org/incubator/KafkaProposal
> >> >
> >> > I've also pasted the initial contents below.
> >> >
> >> > Thanks,
> >> >
> >> > Jun
> >> >
> >> > == Abstract ==
> >> > Kafka is a distributed publish-subscribe system for processing large
> >> > amounts
> >> > of streaming data.
> >> >
> >> > == Proposal ==
> >> > Kafka provides an extremely high throughput distributed
> publish/subscribe
> >> > messaging system.  Additionally, it supports relatively long term
> >> > persistence of messages to support a wide variety of consumers,
> >> > partitioning
> >> > of the message stream across servers and consumers, and functionality
> for
> >> > loading data into Apache Hadoop for offline, batch processing.
> >> >
> >> > == Background ==
> >> > Kafka was developed at LinkedIn to process the large amounts of events
> >> > generated by that company's website and provide a common repository
> for
> >> > many
> >> > types of consumers to access and process those events. Kafka has been
> >> used
> >> > in production at LinkedIn scale to handle dozens of types of events
> >> > including page views, searches and social network activity. Kafka
> >> clusters
> >> > at LinkedIn currently process more than two billion events per day.
> >> >
> >> > Kafka fills the gap between messaging systems such as Apache ActiveMQ,
> >> > which
> >> > can provide high-volume messaging systems but lack persistence of
> those
> >> > messages, and log processing systems such as Scribe and Flume, which
> do
> >> not
> >> > provide adequate latency for our diverse set of consumers.  Kafka can
> >> also
> >> > be inserted into traditional log-processing systems, acting as an
> >> > intermediate step before further processing. Kafka focuses
> relentlessly
> >> on
> >> > performance and throughput by not introspecting into message content,
> nor
> >> > indexing them on the broker.  We also achieve high performance by
> >> depending
> >> > on Java's sendFile/transferTo capabilities to minimize intermediate
> >> buffer
> >> > copies and relying on the OS's pagecache to efficiently serve up
> message
> >> > contents to consumers.
> >> >
> >> > Kafka is written in Scala and depends on Apache ZooKeeper for
> >> coordination
> >> > amongst its producers, brokers and consumers.
> >> >
> >> > Kafka was developed internally at LinkedIn to meet our particular use
> >> > cases,
> >> > but will be useful to many organizations facing a similar need to
> >> reliably
> >> > process large amounts of streaming data.  Therefore, we would like to
> >> share
> >> > it the ASF and begin developing a community of developers and users
> >> within
> >> > Apache.
> >> >
> >> > == Rationale ==
> >> > Many organizations can benefit from a reliable stream processing
> system
> >> > such
> >> > as Kafka.  While our use case of processing events from a very large
> >> > website
> >> > like LinkedIn has driven the design of Kafka, its uses are varied and
> we
> >> > expect many new use cases to emerge.  Kafka provides a natural bridge
> >> > between near real-time event processing and offline batch processing
> and
> >> > will appeal to many users.
> >> >
> >> > == Current Status ==
> >> > === Meritocracy ===
> >> > Our intent with this incubator proposal is to start building a diverse
> >> > developer community around Kafka following the Apache meritocracy
> model.
> >> > Since Kafka was open sourced we have solicited contributions via the
> >> > website
> >> > and presentations given to user groups and technical audiences.  We
> have
> >> > had
> >> > positive responses to these and have received several contributions
> and
> >> > clients for other languages.  We plan to continue this support for new
> >> > contributors and work with those who contribute significantly to the
> >> > project
> >> > to make them committers.
> >> >
> >> > === Community ===
> >> > Kafka is currently being used by developed by engineers within
> LinkedIn
> >> and
> >> > used in production in that company. Additionally, we have active users
> in
> >> > or
> >> > have received contributions from a diverse set of companies including
> >> > MediaSift, SocialTwist, Clearspring and Urban Airship. Recent public
> >> > presentations of Kafka and its goals garnered much interest from
> >> potential
> >> > contributors. We hope to extend our contributor base significantly and
> >> > invite all those who are interested in building high-throughput
> >> distributed
> >> > systems to participate.  We have begun receiving contributions from
> >> outside
> >> > of LinkedIn, including clients for several languages including Ruby,
> PHP,
> >> > Clojure, .NET and Python.
> >> >
> >> > To further this goal, we use GitHub issue tracking and branching
> >> > facilities,
> >> > as well as maintaining a public mailing list via Google Groups.
> >> >
> >> > === Core Developers ===
> >> > Kafka is currently being developed by four engineers at LinkedIn: Neha
> >> > Narkhede, Jun Rao, Jakob Homan and Jay Kreps. Jun has experience
> within
> >> > Apache as a Cassandra committer and PMC member. Neha has been an
> active
> >> > contributor to several projects LinkedIn has open sourced, including
> >> Bobo,
> >> > Sensei and Zoie. Jay has experience with open source software as the
> >> > originator of the Project Voldemort project, as well as being active
> >> within
> >> > the Hadoop ecosystem community. Jakob is an Apache Hadoop committer
> and
> >> PMC
> >> > and previous Apache ZooKeeper contributor.
> >> >
> >> > === Alignment ===
> >> > The ASF is the natural choice to host the Kafka project as its goal of
> >> > encouraging community-driven open-source projects fits with our vision
> >> for
> >> > Kafka.  Additionally, many other projects with which we are familiar
> with
> >> > and expect Kafka to integrate with, such as Apache Hadoop, Pig,
> ZooKeeper
> >> > and log4j are hosted by the ASF and we will benefit and provide
> benefit
> >> by
> >> > close proximity to them.
> >> >
> >> > == Known Risks ==
> >> > === Orphaned Products ===
> >> > The core developers plan to work full time on the project. There is
> very
> >> > little risk of Kafka being abandoned as it is a critical part of
> >> LinkedIn's
> >> > internal infrastructure and is in production use.
> >> >
> >> > === Inexperience with Open Source ===
> >> > All of the core developers have experience with open source
> development.
> >> >  LinkedIn open sourced Kafka several months ago and has been receiving
> >> > contributions since.  Jun is an Apache Cassandra committer and PMC
> >> member.
> >> >  Jay and Neha have been involved with several open source projects
> >> released
> >> > by LinkedIn.  Jakob has been actively involved with the ASF as a
> >> full-time
> >> > Hadoop committer and PMC member.
> >> >
> >> > === Homogeneous Developers ===
> >> > The current core developers are all from LinkedIn. However, we hope to
> >> > establish a developer community that includes contributors from
> several
> >> > corporations and we actively encouraging new contributors via the
> mailing
> >> > lists and public presentations of Kafka.
> >> >
> >> > === Reliance on Salaried Developers ===
> >> > Currently, the developers are paid to do work on Kafka. However, once
> the
> >> > project has a community built around it, we expect to get committers,
> >> > developers and community from outside the current core developers.
> >> However,
> >> > because LinkedIn relies on Kafka internally, the reliance on salaried
> >> > developers is unlikely to change.
> >> >
> >> > === Relationships with Other Apache Products ===
> >> > Kafka is deeply integrated with Apache products. Kafka uses Apache
> >> > ZooKeeper
> >> > to coordinate its state amongst the brokers, consumers, and soon, the
> >> > producers.  Kafka provides input formats to allow Hadoop MapReduce to
> >> load
> >> > data directly from Kafka.  Kafka provides an appender to allow
> consuming
> >> > data directly from Apache log4j.
> >> >
> >> > === An Excessive Fascination with the Apache Brand ===
> >> > While we respect the reputation of the Apache brand and have no doubts
> >> that
> >> > it will attract contributors and users, our interest is primarily to
> give
> >> > Kafka a solid home as an open source project following an established
> >> > development model. We have also given reasons in the Rationale and
> >> > Alignment
> >> > sections.
> >> >
> >> > == Documentation ==
> >> > Information about Kafka can be found at [
> http://sna-projects.com/kafka/]
> >> > The
> >> > following links provide more information about the project:
> >> >
> >> >  * Kafka roadmap and goals: [
> http://sna-projects.com/kafka/projects.php]
> >> >  * The GitHub site: [https://github.com/kafka-dev/kafka]
> >> >  * Kafka overview from Jay Kreps: [
> >> > http://www.slideshare.net/ydn/hug-january-2011-kafka-presentation]
> >> >  * Kafka overview from Jakob Homan: [http://bit.ly/fLmoZz]
> >> >  * Kafka paper at NetDB 2011: [
> >> >
> >> >
> >>
> http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
> >> > ]
> >> >
> >> > == Initial Source ==
> >> > Kafka has been under development at LinkedIn since November 2009.  It
> was
> >> > open sourced by LinkedIn in January 2011.  It is currently hosted on
> >> github
> >> > under the Apache license at [https://github.com/kafka-dev/kafka]
> >> >
> >> > Kafka is mainly written in Scala with some performance testing code in
> >> > Java.
> >> >  Several clients have been contributed in other languages, including
> >> Ruby,
> >> > PHP, Clojure, .NET and Python.  Its source tree is entirely self
> >> contained
> >> > and relies of simple build tool (sbt) as its build system and
> dependency
> >> > resolution mechanism.
> >> >
> >> > == External Dependencies ==
> >> > The dependencies all have Apache compatible licenses.
> >> >
> >> > == Cryptography ==
> >> > Not applicable.
> >> >
> >> > == Required Resources ==
> >> > === Mailing Lists ===
> >> >  * kafka-private for private PMC discussions (with moderated
> >> subscriptions)
> >> >  * kafka-dev   * kafka-commits   * kafka-user
> >> >
> >> > === Subversion Directory ===
> >> > [https://svn.apache.org/repos/asf/incubator/kafka]
> >> >
> >> > === Issue Tracking ===
> >> > JIRA Kafka (KAFKA)
> >> >
> >> > === Other Resources ===
> >> > The existing code already has unit tests, so we would like a Hudson
> >> > instance
> >> > to run them whenever a new patch is submitted. This can be added after
> >> > project creation.
> >> >
> >> > == Initial Committers ==
> >> >  * Jay Kreps
> >> >  * Jun Rao
> >> >  * Neha Narkhede
> >> >  * Jakob Homan
> >> >
> >> > == Affiliations ==
> >> >  * Jay Kreps (LinkedIn)
> >> >  * Jun Rao (LinkedIn)
> >> >  * Neha Narkhede (LinkedIn)
> >> >  * Jakob Homan (LinkedIn)
> >> >
> >> > == Sponsors ==
> >> > === Champion ===
> >> > Chris Douglas (Apache Member)
> >> >
> >> > === Nominated Mentors ===
> >> >  * Alan Cabrera (Apache Member)
> >> >  * Geir Magnusson, Jr. (Apache Member and Director)
> >> >  * Owen O'Malley (Apache Member)
> >> >
> >> > === Sponsoring Entity ===
> >> > We are requesting the Incubator to sponsor this project.
> >> >
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>
>

Re: [PROPOSAL] Kafka for the Apache Incubator

Posted by Henry Saputra <he...@gmail.com>.

+1

A very good proposal and it seems to help solve our need for low
latency event messaging system, so looking forward to it.

I would love to contribute to the project and have added my name to
list initial committers if no objection.

- Henry

>> 2011/6/22 Jun Rao <ju...@gmail.com>
>>
>> > Hi,
>> >
>> > I would like to propose Kafka to be an Apache Incubator project.  Kafka
>> is
>> > a
>> > distributed, high throughput, publish-subscribe system for processing
>> large
>> > amounts of streaming data.
>> >
>> > Here's a link to the proposal in the Incubator wiki
>> > http://wiki.apache.org/incubator/KafkaProposal
>> >
>> > I've also pasted the initial contents below.
>> >
>> > Thanks,
>> >
>> > Jun
>> >
>> > == Abstract ==
>> > Kafka is a distributed publish-subscribe system for processing large
>> > amounts
>> > of streaming data.
>> >
>> > == Proposal ==
>> > Kafka provides an extremely high throughput distributed publish/subscribe
>> > messaging system.  Additionally, it supports relatively long term
>> > persistence of messages to support a wide variety of consumers,
>> > partitioning
>> > of the message stream across servers and consumers, and functionality for
>> > loading data into Apache Hadoop for offline, batch processing.
>> >
>> > == Background ==
>> > Kafka was developed at LinkedIn to process the large amounts of events
>> > generated by that company's website and provide a common repository for
>> > many
>> > types of consumers to access and process those events. Kafka has been
>> used
>> > in production at LinkedIn scale to handle dozens of types of events
>> > including page views, searches and social network activity. Kafka
>> clusters
>> > at LinkedIn currently process more than two billion events per day.
>> >
>> > Kafka fills the gap between messaging systems such as Apache ActiveMQ,
>> > which
>> > can provide high-volume messaging systems but lack persistence of those
>> > messages, and log processing systems such as Scribe and Flume, which do
>> not
>> > provide adequate latency for our diverse set of consumers.  Kafka can
>> also
>> > be inserted into traditional log-processing systems, acting as an
>> > intermediate step before further processing. Kafka focuses relentlessly
>> on
>> > performance and throughput by not introspecting into message content, nor
>> > indexing them on the broker.  We also achieve high performance by
>> depending
>> > on Java's sendFile/transferTo capabilities to minimize intermediate
>> buffer
>> > copies and relying on the OS's pagecache to efficiently serve up message
>> > contents to consumers.
>> >
>> > Kafka is written in Scala and depends on Apache ZooKeeper for
>> coordination
>> > amongst its producers, brokers and consumers.
>> >
>> > Kafka was developed internally at LinkedIn to meet our particular use
>> > cases,
>> > but will be useful to many organizations facing a similar need to
>> reliably
>> > process large amounts of streaming data.  Therefore, we would like to
>> share
>> > it the ASF and begin developing a community of developers and users
>> within
>> > Apache.
>> >
>> > == Rationale ==
>> > Many organizations can benefit from a reliable stream processing system
>> > such
>> > as Kafka.  While our use case of processing events from a very large
>> > website
>> > like LinkedIn has driven the design of Kafka, its uses are varied and we
>> > expect many new use cases to emerge.  Kafka provides a natural bridge
>> > between near real-time event processing and offline batch processing and
>> > will appeal to many users.
>> >
>> > == Current Status ==
>> > === Meritocracy ===
>> > Our intent with this incubator proposal is to start building a diverse
>> > developer community around Kafka following the Apache meritocracy model.
>> > Since Kafka was open sourced we have solicited contributions via the
>> > website
>> > and presentations given to user groups and technical audiences.  We have
>> > had
>> > positive responses to these and have received several contributions and
>> > clients for other languages.  We plan to continue this support for new
>> > contributors and work with those who contribute significantly to the
>> > project
>> > to make them committers.
>> >
>> > === Community ===
>> > Kafka is currently being used by developed by engineers within LinkedIn
>> and
>> > used in production in that company. Additionally, we have active users in
>> > or
>> > have received contributions from a diverse set of companies including
>> > MediaSift, SocialTwist, Clearspring and Urban Airship. Recent public
>> > presentations of Kafka and its goals garnered much interest from
>> potential
>> > contributors. We hope to extend our contributor base significantly and
>> > invite all those who are interested in building high-throughput
>> distributed
>> > systems to participate.  We have begun receiving contributions from
>> outside
>> > of LinkedIn, including clients for several languages including Ruby, PHP,
>> > Clojure, .NET and Python.
>> >
>> > To further this goal, we use GitHub issue tracking and branching
>> > facilities,
>> > as well as maintaining a public mailing list via Google Groups.
>> >
>> > === Core Developers ===
>> > Kafka is currently being developed by four engineers at LinkedIn: Neha
>> > Narkhede, Jun Rao, Jakob Homan and Jay Kreps. Jun has experience within
>> > Apache as a Cassandra committer and PMC member. Neha has been an active
>> > contributor to several projects LinkedIn has open sourced, including
>> Bobo,
>> > Sensei and Zoie. Jay has experience with open source software as the
>> > originator of the Project Voldemort project, as well as being active
>> within
>> > the Hadoop ecosystem community. Jakob is an Apache Hadoop committer and
>> PMC
>> > and previous Apache ZooKeeper contributor.
>> >
>> > === Alignment ===
>> > The ASF is the natural choice to host the Kafka project as its goal of
>> > encouraging community-driven open-source projects fits with our vision
>> for
>> > Kafka.  Additionally, many other projects with which we are familiar with
>> > and expect Kafka to integrate with, such as Apache Hadoop, Pig, ZooKeeper
>> > and log4j are hosted by the ASF and we will benefit and provide benefit
>> by
>> > close proximity to them.
>> >
>> > == Known Risks ==
>> > === Orphaned Products ===
>> > The core developers plan to work full time on the project. There is very
>> > little risk of Kafka being abandoned as it is a critical part of
>> LinkedIn's
>> > internal infrastructure and is in production use.
>> >
>> > === Inexperience with Open Source ===
>> > All of the core developers have experience with open source development.
>> >  LinkedIn open sourced Kafka several months ago and has been receiving
>> > contributions since.  Jun is an Apache Cassandra committer and PMC
>> member.
>> >  Jay and Neha have been involved with several open source projects
>> released
>> > by LinkedIn.  Jakob has been actively involved with the ASF as a
>> full-time
>> > Hadoop committer and PMC member.
>> >
>> > === Homogeneous Developers ===
>> > The current core developers are all from LinkedIn. However, we hope to
>> > establish a developer community that includes contributors from several
>> > corporations and we actively encouraging new contributors via the mailing
>> > lists and public presentations of Kafka.
>> >
>> > === Reliance on Salaried Developers ===
>> > Currently, the developers are paid to do work on Kafka. However, once the
>> > project has a community built around it, we expect to get committers,
>> > developers and community from outside the current core developers.
>> However,
>> > because LinkedIn relies on Kafka internally, the reliance on salaried
>> > developers is unlikely to change.
>> >
>> > === Relationships with Other Apache Products ===
>> > Kafka is deeply integrated with Apache products. Kafka uses Apache
>> > ZooKeeper
>> > to coordinate its state amongst the brokers, consumers, and soon, the
>> > producers.  Kafka provides input formats to allow Hadoop MapReduce to
>> load
>> > data directly from Kafka.  Kafka provides an appender to allow consuming
>> > data directly from Apache log4j.
>> >
>> > === An Excessive Fascination with the Apache Brand ===
>> > While we respect the reputation of the Apache brand and have no doubts
>> that
>> > it will attract contributors and users, our interest is primarily to give
>> > Kafka a solid home as an open source project following an established
>> > development model. We have also given reasons in the Rationale and
>> > Alignment
>> > sections.
>> >
>> > == Documentation ==
>> > Information about Kafka can be found at [http://sna-projects.com/kafka/]
>> > The
>> > following links provide more information about the project:
>> >
>> >  * Kafka roadmap and goals: [http://sna-projects.com/kafka/projects.php]
>> >  * The GitHub site: [https://github.com/kafka-dev/kafka]
>> >  * Kafka overview from Jay Kreps: [
>> > http://www.slideshare.net/ydn/hug-january-2011-kafka-presentation]
>> >  * Kafka overview from Jakob Homan: [http://bit.ly/fLmoZz]
>> >  * Kafka paper at NetDB 2011: [
>> >
>> >
>> http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
>> > ]
>> >
>> > == Initial Source ==
>> > Kafka has been under development at LinkedIn since November 2009.  It was
>> > open sourced by LinkedIn in January 2011.  It is currently hosted on
>> github
>> > under the Apache license at [https://github.com/kafka-dev/kafka]
>> >
>> > Kafka is mainly written in Scala with some performance testing code in
>> > Java.
>> >  Several clients have been contributed in other languages, including
>> Ruby,
>> > PHP, Clojure, .NET and Python.  Its source tree is entirely self
>> contained
>> > and relies of simple build tool (sbt) as its build system and dependency
>> > resolution mechanism.
>> >
>> > == External Dependencies ==
>> > The dependencies all have Apache compatible licenses.
>> >
>> > == Cryptography ==
>> > Not applicable.
>> >
>> > == Required Resources ==
>> > === Mailing Lists ===
>> >  * kafka-private for private PMC discussions (with moderated
>> subscriptions)
>> >  * kafka-dev   * kafka-commits   * kafka-user
>> >
>> > === Subversion Directory ===
>> > [https://svn.apache.org/repos/asf/incubator/kafka]
>> >
>> > === Issue Tracking ===
>> > JIRA Kafka (KAFKA)
>> >
>> > === Other Resources ===
>> > The existing code already has unit tests, so we would like a Hudson
>> > instance
>> > to run them whenever a new patch is submitted. This can be added after
>> > project creation.
>> >
>> > == Initial Committers ==
>> >  * Jay Kreps
>> >  * Jun Rao
>> >  * Neha Narkhede
>> >  * Jakob Homan
>> >
>> > == Affiliations ==
>> >  * Jay Kreps (LinkedIn)
>> >  * Jun Rao (LinkedIn)
>> >  * Neha Narkhede (LinkedIn)
>> >  * Jakob Homan (LinkedIn)
>> >
>> > == Sponsors ==
>> > === Champion ===
>> > Chris Douglas (Apache Member)
>> >
>> > === Nominated Mentors ===
>> >  * Alan Cabrera (Apache Member)
>> >  * Geir Magnusson, Jr. (Apache Member and Director)
>> >  * Owen O'Malley (Apache Member)
>> >
>> > === Sponsoring Entity ===
>> > We are requesting the Incubator to sponsor this project.
>> >
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: [PROPOSAL] Kafka for the Apache Incubator

Posted by Chris Burroughs <ch...@gmail.com>.

On 06/24/2011 04:00 AM, Ioannis Canellos wrote:
> I have a comment to add. ActiveMQ does provide means for persisting the
> messages, so you might want to clarify or rephrase that.

I'm sure Jun will clarify the proposal but for those interested in more
elaboration on this point check out the NetDB paper.

http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: [PROPOSAL] Kafka for the Apache Incubator

Posted by Ioannis Canellos <io...@gmail.com>.

Juan,
I wil definitely let you know!
Till then I wish you the best of luck with the proposal!

-- 
*Ioannis Canellos*
*
 http://iocanel.blogspot.com

Apache Karaf <http://karaf.apache.org/> Committer & PMC
Apache ServiceMix <http://servicemix.apache.org/>  Committer
*

Re: [PROPOSAL] Kafka for the Apache Incubator

Posted by Jun Rao <ju...@gmail.com>.

Ioannis,

That's fine. I just want to point out that Kafka does/will have some
difference from ActiveMQ given (1) its simple storage format; (2) how it
uses Zookeeper for distributed coordination; (3) the future replication work
http://linkedin.jira.com/browse/KAFKA-23 . Let us know if you become
interested again in the future.

Thanks,

Jun

On Fri, Jun 24, 2011 at 3:40 AM, Ioannis Canellos <io...@gmail.com> wrote:

> It seems that Kafka overlaps with ActiveMQ more than I initially estimated,
> which certainly decreases my personal interest in Kafka. So I see fit to
> remove myself from the list.
>
> --
> *Ioannis Canellos*
> *
>  http://iocanel.blogspot.com
>
> Apache Karaf <http://karaf.apache.org/> Committer & PMC
> Apache ServiceMix <http://servicemix.apache.org/>  Committer
> *
>

Re: [PROPOSAL] Kafka for the Apache Incubator

Posted by Ioannis Canellos <io...@gmail.com>.

It seems that Kafka overlaps with ActiveMQ more than I initially estimated,
which certainly decreases my personal interest in Kafka. So I see fit to
remove myself from the list.

-- 
*Ioannis Canellos*
*
 http://iocanel.blogspot.com

Apache Karaf <http://karaf.apache.org/> Committer & PMC
Apache ServiceMix <http://servicemix.apache.org/>  Committer
*

Re: [PROPOSAL] Kafka for the Apache Incubator

Posted by Jun Rao <ju...@gmail.com>.

Thanks for the comments, Ioannis. Will refine the proposal. Welcome aboard!

Jun

On Fri, Jun 24, 2011 at 1:00 AM, Ioannis Canellos <io...@gmail.com> wrote:

> Great!
>
> Here is my + 1
>
> I have a comment to add. ActiveMQ does provide means for persisting the
> messages, so you might want to clarify or rephrase that.
>
> I would like to participate in this effort so I've added myself to the
> Intial Commiters list.
>
> --
> *Ioannis Canellos*
> *
>  http://iocanel.blogspot.com
>
> Apache Karaf <http://karaf.apache.org/> Committer & PMC
> Apache ServiceMix <http://servicemix.apache.org/>  Committer
> *
>

Re: [PROPOSAL] Kafka for the Apache Incubator

Posted by Ioannis Canellos <io...@gmail.com>.

Great!

Here is my + 1

I have a comment to add. ActiveMQ does provide means for persisting the
messages, so you might want to clarify or rephrase that.

I would like to participate in this effort so I've added myself to the
Intial Commiters list.

-- 
*Ioannis Canellos*
*
 http://iocanel.blogspot.com

Apache Karaf <http://karaf.apache.org/> Committer & PMC
Apache ServiceMix <http://servicemix.apache.org/>  Committer
*

Re: [PROPOSAL] Kafka for the Apache Incubator

Posted by Jun Rao <ju...@gmail.com>.

Thanks, Tommaso, Chris and Mattmann.

Jun

On Thu, Jun 23, 2011 at 8:07 AM, Tommaso Teofili
<to...@gmail.com>wrote:

> Wow, very nice proposal guys!
> Tommaso
>
> 2011/6/22 Jun Rao <ju...@gmail.com>
>
> > Hi,
> >
> > I would like to propose Kafka to be an Apache Incubator project.  Kafka
> is
> > a
> > distributed, high throughput, publish-subscribe system for processing
> large
> > amounts of streaming data.
> >
> > Here's a link to the proposal in the Incubator wiki
> > http://wiki.apache.org/incubator/KafkaProposal
> >
> > I've also pasted the initial contents below.
> >
> > Thanks,
> >
> > Jun
> >
> > == Abstract ==
> > Kafka is a distributed publish-subscribe system for processing large
> > amounts
> > of streaming data.
> >
> > == Proposal ==
> > Kafka provides an extremely high throughput distributed publish/subscribe
> > messaging system.  Additionally, it supports relatively long term
> > persistence of messages to support a wide variety of consumers,
> > partitioning
> > of the message stream across servers and consumers, and functionality for
> > loading data into Apache Hadoop for offline, batch processing.
> >
> > == Background ==
> > Kafka was developed at LinkedIn to process the large amounts of events
> > generated by that company's website and provide a common repository for
> > many
> > types of consumers to access and process those events. Kafka has been
> used
> > in production at LinkedIn scale to handle dozens of types of events
> > including page views, searches and social network activity. Kafka
> clusters
> > at LinkedIn currently process more than two billion events per day.
> >
> > Kafka fills the gap between messaging systems such as Apache ActiveMQ,
> > which
> > can provide high-volume messaging systems but lack persistence of those
> > messages, and log processing systems such as Scribe and Flume, which do
> not
> > provide adequate latency for our diverse set of consumers.  Kafka can
> also
> > be inserted into traditional log-processing systems, acting as an
> > intermediate step before further processing. Kafka focuses relentlessly
> on
> > performance and throughput by not introspecting into message content, nor
> > indexing them on the broker.  We also achieve high performance by
> depending
> > on Java's sendFile/transferTo capabilities to minimize intermediate
> buffer
> > copies and relying on the OS's pagecache to efficiently serve up message
> > contents to consumers.
> >
> > Kafka is written in Scala and depends on Apache ZooKeeper for
> coordination
> > amongst its producers, brokers and consumers.
> >
> > Kafka was developed internally at LinkedIn to meet our particular use
> > cases,
> > but will be useful to many organizations facing a similar need to
> reliably
> > process large amounts of streaming data.  Therefore, we would like to
> share
> > it the ASF and begin developing a community of developers and users
> within
> > Apache.
> >
> > == Rationale ==
> > Many organizations can benefit from a reliable stream processing system
> > such
> > as Kafka.  While our use case of processing events from a very large
> > website
> > like LinkedIn has driven the design of Kafka, its uses are varied and we
> > expect many new use cases to emerge.  Kafka provides a natural bridge
> > between near real-time event processing and offline batch processing and
> > will appeal to many users.
> >
> > == Current Status ==
> > === Meritocracy ===
> > Our intent with this incubator proposal is to start building a diverse
> > developer community around Kafka following the Apache meritocracy model.
> > Since Kafka was open sourced we have solicited contributions via the
> > website
> > and presentations given to user groups and technical audiences.  We have
> > had
> > positive responses to these and have received several contributions and
> > clients for other languages.  We plan to continue this support for new
> > contributors and work with those who contribute significantly to the
> > project
> > to make them committers.
> >
> > === Community ===
> > Kafka is currently being used by developed by engineers within LinkedIn
> and
> > used in production in that company. Additionally, we have active users in
> > or
> > have received contributions from a diverse set of companies including
> > MediaSift, SocialTwist, Clearspring and Urban Airship. Recent public
> > presentations of Kafka and its goals garnered much interest from
> potential
> > contributors. We hope to extend our contributor base significantly and
> > invite all those who are interested in building high-throughput
> distributed
> > systems to participate.  We have begun receiving contributions from
> outside
> > of LinkedIn, including clients for several languages including Ruby, PHP,
> > Clojure, .NET and Python.
> >
> > To further this goal, we use GitHub issue tracking and branching
> > facilities,
> > as well as maintaining a public mailing list via Google Groups.
> >
> > === Core Developers ===
> > Kafka is currently being developed by four engineers at LinkedIn: Neha
> > Narkhede, Jun Rao, Jakob Homan and Jay Kreps. Jun has experience within
> > Apache as a Cassandra committer and PMC member. Neha has been an active
> > contributor to several projects LinkedIn has open sourced, including
> Bobo,
> > Sensei and Zoie. Jay has experience with open source software as the
> > originator of the Project Voldemort project, as well as being active
> within
> > the Hadoop ecosystem community. Jakob is an Apache Hadoop committer and
> PMC
> > and previous Apache ZooKeeper contributor.
> >
> > === Alignment ===
> > The ASF is the natural choice to host the Kafka project as its goal of
> > encouraging community-driven open-source projects fits with our vision
> for
> > Kafka.  Additionally, many other projects with which we are familiar with
> > and expect Kafka to integrate with, such as Apache Hadoop, Pig, ZooKeeper
> > and log4j are hosted by the ASF and we will benefit and provide benefit
> by
> > close proximity to them.
> >
> > == Known Risks ==
> > === Orphaned Products ===
> > The core developers plan to work full time on the project. There is very
> > little risk of Kafka being abandoned as it is a critical part of
> LinkedIn's
> > internal infrastructure and is in production use.
> >
> > === Inexperience with Open Source ===
> > All of the core developers have experience with open source development.
> >  LinkedIn open sourced Kafka several months ago and has been receiving
> > contributions since.  Jun is an Apache Cassandra committer and PMC
> member.
> >  Jay and Neha have been involved with several open source projects
> released
> > by LinkedIn.  Jakob has been actively involved with the ASF as a
> full-time
> > Hadoop committer and PMC member.
> >
> > === Homogeneous Developers ===
> > The current core developers are all from LinkedIn. However, we hope to
> > establish a developer community that includes contributors from several
> > corporations and we actively encouraging new contributors via the mailing
> > lists and public presentations of Kafka.
> >
> > === Reliance on Salaried Developers ===
> > Currently, the developers are paid to do work on Kafka. However, once the
> > project has a community built around it, we expect to get committers,
> > developers and community from outside the current core developers.
> However,
> > because LinkedIn relies on Kafka internally, the reliance on salaried
> > developers is unlikely to change.
> >
> > === Relationships with Other Apache Products ===
> > Kafka is deeply integrated with Apache products. Kafka uses Apache
> > ZooKeeper
> > to coordinate its state amongst the brokers, consumers, and soon, the
> > producers.  Kafka provides input formats to allow Hadoop MapReduce to
> load
> > data directly from Kafka.  Kafka provides an appender to allow consuming
> > data directly from Apache log4j.
> >
> > === An Excessive Fascination with the Apache Brand ===
> > While we respect the reputation of the Apache brand and have no doubts
> that
> > it will attract contributors and users, our interest is primarily to give
> > Kafka a solid home as an open source project following an established
> > development model. We have also given reasons in the Rationale and
> > Alignment
> > sections.
> >
> > == Documentation ==
> > Information about Kafka can be found at [http://sna-projects.com/kafka/]
> > The
> > following links provide more information about the project:
> >
> >  * Kafka roadmap and goals: [http://sna-projects.com/kafka/projects.php]
> >  * The GitHub site: [https://github.com/kafka-dev/kafka]
> >  * Kafka overview from Jay Kreps: [
> > http://www.slideshare.net/ydn/hug-january-2011-kafka-presentation]
> >  * Kafka overview from Jakob Homan: [http://bit.ly/fLmoZz]
> >  * Kafka paper at NetDB 2011: [
> >
> >
> http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
> > ]
> >
> > == Initial Source ==
> > Kafka has been under development at LinkedIn since November 2009.  It was
> > open sourced by LinkedIn in January 2011.  It is currently hosted on
> github
> > under the Apache license at [https://github.com/kafka-dev/kafka]
> >
> > Kafka is mainly written in Scala with some performance testing code in
> > Java.
> >  Several clients have been contributed in other languages, including
> Ruby,
> > PHP, Clojure, .NET and Python.  Its source tree is entirely self
> contained
> > and relies of simple build tool (sbt) as its build system and dependency
> > resolution mechanism.
> >
> > == External Dependencies ==
> > The dependencies all have Apache compatible licenses.
> >
> > == Cryptography ==
> > Not applicable.
> >
> > == Required Resources ==
> > === Mailing Lists ===
> >  * kafka-private for private PMC discussions (with moderated
> subscriptions)
> >  * kafka-dev   * kafka-commits   * kafka-user
> >
> > === Subversion Directory ===
> > [https://svn.apache.org/repos/asf/incubator/kafka]
> >
> > === Issue Tracking ===
> > JIRA Kafka (KAFKA)
> >
> > === Other Resources ===
> > The existing code already has unit tests, so we would like a Hudson
> > instance
> > to run them whenever a new patch is submitted. This can be added after
> > project creation.
> >
> > == Initial Committers ==
> >  * Jay Kreps
> >  * Jun Rao
> >  * Neha Narkhede
> >  * Jakob Homan
> >
> > == Affiliations ==
> >  * Jay Kreps (LinkedIn)
> >  * Jun Rao (LinkedIn)
> >  * Neha Narkhede (LinkedIn)
> >  * Jakob Homan (LinkedIn)
> >
> > == Sponsors ==
> > === Champion ===
> > Chris Douglas (Apache Member)
> >
> > === Nominated Mentors ===
> >  * Alan Cabrera (Apache Member)
> >  * Geir Magnusson, Jr. (Apache Member and Director)
> >  * Owen O'Malley (Apache Member)
> >
> > === Sponsoring Entity ===
> > We are requesting the Incubator to sponsor this project.
> >
>

Re: [PROPOSAL] Kafka for the Apache Incubator

Posted by Tommaso Teofili <to...@gmail.com>.

Wow, very nice proposal guys!
Tommaso

2011/6/22 Jun Rao <ju...@gmail.com>

> Hi,
>
> I would like to propose Kafka to be an Apache Incubator project.  Kafka is
> a
> distributed, high throughput, publish-subscribe system for processing large
> amounts of streaming data.
>
> Here's a link to the proposal in the Incubator wiki
> http://wiki.apache.org/incubator/KafkaProposal
>
> I've also pasted the initial contents below.
>
> Thanks,
>
> Jun
>
> == Abstract ==
> Kafka is a distributed publish-subscribe system for processing large
> amounts
> of streaming data.
>
> == Proposal ==
> Kafka provides an extremely high throughput distributed publish/subscribe
> messaging system.  Additionally, it supports relatively long term
> persistence of messages to support a wide variety of consumers,
> partitioning
> of the message stream across servers and consumers, and functionality for
> loading data into Apache Hadoop for offline, batch processing.
>
> == Background ==
> Kafka was developed at LinkedIn to process the large amounts of events
> generated by that company's website and provide a common repository for
> many
> types of consumers to access and process those events. Kafka has been used
> in production at LinkedIn scale to handle dozens of types of events
> including page views, searches and social network activity. Kafka clusters
> at LinkedIn currently process more than two billion events per day.
>
> Kafka fills the gap between messaging systems such as Apache ActiveMQ,
> which
> can provide high-volume messaging systems but lack persistence of those
> messages, and log processing systems such as Scribe and Flume, which do not
> provide adequate latency for our diverse set of consumers.  Kafka can also
> be inserted into traditional log-processing systems, acting as an
> intermediate step before further processing. Kafka focuses relentlessly on
> performance and throughput by not introspecting into message content, nor
> indexing them on the broker.  We also achieve high performance by depending
> on Java's sendFile/transferTo capabilities to minimize intermediate buffer
> copies and relying on the OS's pagecache to efficiently serve up message
> contents to consumers.
>
> Kafka is written in Scala and depends on Apache ZooKeeper for coordination
> amongst its producers, brokers and consumers.
>
> Kafka was developed internally at LinkedIn to meet our particular use
> cases,
> but will be useful to many organizations facing a similar need to reliably
> process large amounts of streaming data.  Therefore, we would like to share
> it the ASF and begin developing a community of developers and users within
> Apache.
>
> == Rationale ==
> Many organizations can benefit from a reliable stream processing system
> such
> as Kafka.  While our use case of processing events from a very large
> website
> like LinkedIn has driven the design of Kafka, its uses are varied and we
> expect many new use cases to emerge.  Kafka provides a natural bridge
> between near real-time event processing and offline batch processing and
> will appeal to many users.
>
> == Current Status ==
> === Meritocracy ===
> Our intent with this incubator proposal is to start building a diverse
> developer community around Kafka following the Apache meritocracy model.
> Since Kafka was open sourced we have solicited contributions via the
> website
> and presentations given to user groups and technical audiences.  We have
> had
> positive responses to these and have received several contributions and
> clients for other languages.  We plan to continue this support for new
> contributors and work with those who contribute significantly to the
> project
> to make them committers.
>
> === Community ===
> Kafka is currently being used by developed by engineers within LinkedIn and
> used in production in that company. Additionally, we have active users in
> or
> have received contributions from a diverse set of companies including
> MediaSift, SocialTwist, Clearspring and Urban Airship. Recent public
> presentations of Kafka and its goals garnered much interest from potential
> contributors. We hope to extend our contributor base significantly and
> invite all those who are interested in building high-throughput distributed
> systems to participate.  We have begun receiving contributions from outside
> of LinkedIn, including clients for several languages including Ruby, PHP,
> Clojure, .NET and Python.
>
> To further this goal, we use GitHub issue tracking and branching
> facilities,
> as well as maintaining a public mailing list via Google Groups.
>
> === Core Developers ===
> Kafka is currently being developed by four engineers at LinkedIn: Neha
> Narkhede, Jun Rao, Jakob Homan and Jay Kreps. Jun has experience within
> Apache as a Cassandra committer and PMC member. Neha has been an active
> contributor to several projects LinkedIn has open sourced, including Bobo,
> Sensei and Zoie. Jay has experience with open source software as the
> originator of the Project Voldemort project, as well as being active within
> the Hadoop ecosystem community. Jakob is an Apache Hadoop committer and PMC
> and previous Apache ZooKeeper contributor.
>
> === Alignment ===
> The ASF is the natural choice to host the Kafka project as its goal of
> encouraging community-driven open-source projects fits with our vision for
> Kafka.  Additionally, many other projects with which we are familiar with
> and expect Kafka to integrate with, such as Apache Hadoop, Pig, ZooKeeper
> and log4j are hosted by the ASF and we will benefit and provide benefit by
> close proximity to them.
>
> == Known Risks ==
> === Orphaned Products ===
> The core developers plan to work full time on the project. There is very
> little risk of Kafka being abandoned as it is a critical part of LinkedIn's
> internal infrastructure and is in production use.
>
> === Inexperience with Open Source ===
> All of the core developers have experience with open source development.
>  LinkedIn open sourced Kafka several months ago and has been receiving
> contributions since.  Jun is an Apache Cassandra committer and PMC member.
>  Jay and Neha have been involved with several open source projects released
> by LinkedIn.  Jakob has been actively involved with the ASF as a full-time
> Hadoop committer and PMC member.
>
> === Homogeneous Developers ===
> The current core developers are all from LinkedIn. However, we hope to
> establish a developer community that includes contributors from several
> corporations and we actively encouraging new contributors via the mailing
> lists and public presentations of Kafka.
>
> === Reliance on Salaried Developers ===
> Currently, the developers are paid to do work on Kafka. However, once the
> project has a community built around it, we expect to get committers,
> developers and community from outside the current core developers. However,
> because LinkedIn relies on Kafka internally, the reliance on salaried
> developers is unlikely to change.
>
> === Relationships with Other Apache Products ===
> Kafka is deeply integrated with Apache products. Kafka uses Apache
> ZooKeeper
> to coordinate its state amongst the brokers, consumers, and soon, the
> producers.  Kafka provides input formats to allow Hadoop MapReduce to load
> data directly from Kafka.  Kafka provides an appender to allow consuming
> data directly from Apache log4j.
>
> === An Excessive Fascination with the Apache Brand ===
> While we respect the reputation of the Apache brand and have no doubts that
> it will attract contributors and users, our interest is primarily to give
> Kafka a solid home as an open source project following an established
> development model. We have also given reasons in the Rationale and
> Alignment
> sections.
>
> == Documentation ==
> Information about Kafka can be found at [http://sna-projects.com/kafka/]
> The
> following links provide more information about the project:
>
>  * Kafka roadmap and goals: [http://sna-projects.com/kafka/projects.php]
>  * The GitHub site: [https://github.com/kafka-dev/kafka]
>  * Kafka overview from Jay Kreps: [
> http://www.slideshare.net/ydn/hug-january-2011-kafka-presentation]
>  * Kafka overview from Jakob Homan: [http://bit.ly/fLmoZz]
>  * Kafka paper at NetDB 2011: [
>
> http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
> ]
>
> == Initial Source ==
> Kafka has been under development at LinkedIn since November 2009.  It was
> open sourced by LinkedIn in January 2011.  It is currently hosted on github
> under the Apache license at [https://github.com/kafka-dev/kafka]
>
> Kafka is mainly written in Scala with some performance testing code in
> Java.
>  Several clients have been contributed in other languages, including Ruby,
> PHP, Clojure, .NET and Python.  Its source tree is entirely self contained
> and relies of simple build tool (sbt) as its build system and dependency
> resolution mechanism.
>
> == External Dependencies ==
> The dependencies all have Apache compatible licenses.
>
> == Cryptography ==
> Not applicable.
>
> == Required Resources ==
> === Mailing Lists ===
>  * kafka-private for private PMC discussions (with moderated subscriptions)
>  * kafka-dev   * kafka-commits   * kafka-user
>
> === Subversion Directory ===
> [https://svn.apache.org/repos/asf/incubator/kafka]
>
> === Issue Tracking ===
> JIRA Kafka (KAFKA)
>
> === Other Resources ===
> The existing code already has unit tests, so we would like a Hudson
> instance
> to run them whenever a new patch is submitted. This can be added after
> project creation.
>
> == Initial Committers ==
>  * Jay Kreps
>  * Jun Rao
>  * Neha Narkhede
>  * Jakob Homan
>
> == Affiliations ==
>  * Jay Kreps (LinkedIn)
>  * Jun Rao (LinkedIn)
>  * Neha Narkhede (LinkedIn)
>  * Jakob Homan (LinkedIn)
>
> == Sponsors ==
> === Champion ===
> Chris Douglas (Apache Member)
>
> === Nominated Mentors ===
>  * Alan Cabrera (Apache Member)
>  * Geir Magnusson, Jr. (Apache Member and Director)
>  * Owen O'Malley (Apache Member)
>
> === Sponsoring Entity ===
> We are requesting the Incubator to sponsor this project.
>

Re: [PROPOSAL] Kafka for the Apache Incubator

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.

Wow looks neat!

Cheers,
Chris

On Jun 22, 2011, at 9:17 AM, Jun Rao wrote:

> Hi,
> 
> I would like to propose Kafka to be an Apache Incubator project.  Kafka is a
> distributed, high throughput, publish-subscribe system for processing large
> amounts of streaming data.
> 
> Here's a link to the proposal in the Incubator wiki
> http://wiki.apache.org/incubator/KafkaProposal
> 
> I've also pasted the initial contents below.
> 
> Thanks,
> 
> Jun
> 
> == Abstract ==
> Kafka is a distributed publish-subscribe system for processing large amounts
> of streaming data.
> 
> == Proposal ==
> Kafka provides an extremely high throughput distributed publish/subscribe
> messaging system.  Additionally, it supports relatively long term
> persistence of messages to support a wide variety of consumers, partitioning
> of the message stream across servers and consumers, and functionality for
> loading data into Apache Hadoop for offline, batch processing.
> 
> == Background ==
> Kafka was developed at LinkedIn to process the large amounts of events
> generated by that company's website and provide a common repository for many
> types of consumers to access and process those events. Kafka has been used
> in production at LinkedIn scale to handle dozens of types of events
> including page views, searches and social network activity. Kafka clusters
> at LinkedIn currently process more than two billion events per day.
> 
> Kafka fills the gap between messaging systems such as Apache ActiveMQ, which
> can provide high-volume messaging systems but lack persistence of those
> messages, and log processing systems such as Scribe and Flume, which do not
> provide adequate latency for our diverse set of consumers.  Kafka can also
> be inserted into traditional log-processing systems, acting as an
> intermediate step before further processing. Kafka focuses relentlessly on
> performance and throughput by not introspecting into message content, nor
> indexing them on the broker.  We also achieve high performance by depending
> on Java's sendFile/transferTo capabilities to minimize intermediate buffer
> copies and relying on the OS's pagecache to efficiently serve up message
> contents to consumers.
> 
> Kafka is written in Scala and depends on Apache ZooKeeper for coordination
> amongst its producers, brokers and consumers.
> 
> Kafka was developed internally at LinkedIn to meet our particular use cases,
> but will be useful to many organizations facing a similar need to reliably
> process large amounts of streaming data.  Therefore, we would like to share
> it the ASF and begin developing a community of developers and users within
> Apache.
> 
> == Rationale ==
> Many organizations can benefit from a reliable stream processing system such
> as Kafka.  While our use case of processing events from a very large website
> like LinkedIn has driven the design of Kafka, its uses are varied and we
> expect many new use cases to emerge.  Kafka provides a natural bridge
> between near real-time event processing and offline batch processing and
> will appeal to many users.
> 
> == Current Status ==
> === Meritocracy ===
> Our intent with this incubator proposal is to start building a diverse
> developer community around Kafka following the Apache meritocracy model.
> Since Kafka was open sourced we have solicited contributions via the website
> and presentations given to user groups and technical audiences.  We have had
> positive responses to these and have received several contributions and
> clients for other languages.  We plan to continue this support for new
> contributors and work with those who contribute significantly to the project
> to make them committers.
> 
> === Community ===
> Kafka is currently being used by developed by engineers within LinkedIn and
> used in production in that company. Additionally, we have active users in or
> have received contributions from a diverse set of companies including
> MediaSift, SocialTwist, Clearspring and Urban Airship. Recent public
> presentations of Kafka and its goals garnered much interest from potential
> contributors. We hope to extend our contributor base significantly and
> invite all those who are interested in building high-throughput distributed
> systems to participate.  We have begun receiving contributions from outside
> of LinkedIn, including clients for several languages including Ruby, PHP,
> Clojure, .NET and Python.
> 
> To further this goal, we use GitHub issue tracking and branching facilities,
> as well as maintaining a public mailing list via Google Groups.
> 
> === Core Developers ===
> Kafka is currently being developed by four engineers at LinkedIn: Neha
> Narkhede, Jun Rao, Jakob Homan and Jay Kreps. Jun has experience within
> Apache as a Cassandra committer and PMC member. Neha has been an active
> contributor to several projects LinkedIn has open sourced, including Bobo,
> Sensei and Zoie. Jay has experience with open source software as the
> originator of the Project Voldemort project, as well as being active within
> the Hadoop ecosystem community. Jakob is an Apache Hadoop committer and PMC
> and previous Apache ZooKeeper contributor.
> 
> === Alignment ===
> The ASF is the natural choice to host the Kafka project as its goal of
> encouraging community-driven open-source projects fits with our vision for
> Kafka.  Additionally, many other projects with which we are familiar with
> and expect Kafka to integrate with, such as Apache Hadoop, Pig, ZooKeeper
> and log4j are hosted by the ASF and we will benefit and provide benefit by
> close proximity to them.
> 
> == Known Risks ==
> === Orphaned Products ===
> The core developers plan to work full time on the project. There is very
> little risk of Kafka being abandoned as it is a critical part of LinkedIn's
> internal infrastructure and is in production use.
> 
> === Inexperience with Open Source ===
> All of the core developers have experience with open source development.
> LinkedIn open sourced Kafka several months ago and has been receiving
> contributions since.  Jun is an Apache Cassandra committer and PMC member.
> Jay and Neha have been involved with several open source projects released
> by LinkedIn.  Jakob has been actively involved with the ASF as a full-time
> Hadoop committer and PMC member.
> 
> === Homogeneous Developers ===
> The current core developers are all from LinkedIn. However, we hope to
> establish a developer community that includes contributors from several
> corporations and we actively encouraging new contributors via the mailing
> lists and public presentations of Kafka.
> 
> === Reliance on Salaried Developers ===
> Currently, the developers are paid to do work on Kafka. However, once the
> project has a community built around it, we expect to get committers,
> developers and community from outside the current core developers. However,
> because LinkedIn relies on Kafka internally, the reliance on salaried
> developers is unlikely to change.
> 
> === Relationships with Other Apache Products ===
> Kafka is deeply integrated with Apache products. Kafka uses Apache ZooKeeper
> to coordinate its state amongst the brokers, consumers, and soon, the
> producers.  Kafka provides input formats to allow Hadoop MapReduce to load
> data directly from Kafka.  Kafka provides an appender to allow consuming
> data directly from Apache log4j.
> 
> === An Excessive Fascination with the Apache Brand ===
> While we respect the reputation of the Apache brand and have no doubts that
> it will attract contributors and users, our interest is primarily to give
> Kafka a solid home as an open source project following an established
> development model. We have also given reasons in the Rationale and Alignment
> sections.
> 
> == Documentation ==
> Information about Kafka can be found at [http://sna-projects.com/kafka/] The
> following links provide more information about the project:
> 
> * Kafka roadmap and goals: [http://sna-projects.com/kafka/projects.php]
> * The GitHub site: [https://github.com/kafka-dev/kafka]
> * Kafka overview from Jay Kreps: [
> http://www.slideshare.net/ydn/hug-january-2011-kafka-presentation]
> * Kafka overview from Jakob Homan: [http://bit.ly/fLmoZz]
> * Kafka paper at NetDB 2011: [
> http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
> ]
> 
> == Initial Source ==
> Kafka has been under development at LinkedIn since November 2009.  It was
> open sourced by LinkedIn in January 2011.  It is currently hosted on github
> under the Apache license at [https://github.com/kafka-dev/kafka]
> 
> Kafka is mainly written in Scala with some performance testing code in Java.
> Several clients have been contributed in other languages, including Ruby,
> PHP, Clojure, .NET and Python.  Its source tree is entirely self contained
> and relies of simple build tool (sbt) as its build system and dependency
> resolution mechanism.
> 
> == External Dependencies ==
> The dependencies all have Apache compatible licenses.
> 
> == Cryptography ==
> Not applicable.
> 
> == Required Resources ==
> === Mailing Lists ===
> * kafka-private for private PMC discussions (with moderated subscriptions)
>  * kafka-dev   * kafka-commits   * kafka-user
> 
> === Subversion Directory ===
> [https://svn.apache.org/repos/asf/incubator/kafka]
> 
> === Issue Tracking ===
> JIRA Kafka (KAFKA)
> 
> === Other Resources ===
> The existing code already has unit tests, so we would like a Hudson instance
> to run them whenever a new patch is submitted. This can be added after
> project creation.
> 
> == Initial Committers ==
> * Jay Kreps
> * Jun Rao
> * Neha Narkhede
> * Jakob Homan
> 
> == Affiliations ==
> * Jay Kreps (LinkedIn)
> * Jun Rao (LinkedIn)
> * Neha Narkhede (LinkedIn)
> * Jakob Homan (LinkedIn)
> 
> == Sponsors ==
> === Champion ===
> Chris Douglas (Apache Member)
> 
> === Nominated Mentors ===
> * Alan Cabrera (Apache Member)
> * Geir Magnusson, Jr. (Apache Member and Director)
> * Owen O'Malley (Apache Member)
> 
> === Sponsoring Entity ===
> We are requesting the Incubator to sponsor this project.


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: [PROPOSAL] Kafka for the Apache Incubator

Posted by Jake Mannix <ja...@gmail.com>.

+1

Both from the perspective of incorporating it in streaming machine-learning
work
in Mahout, and from the perspective of a persistent scalable WAL
(*especially*
once http://linkedin.jira.com/browse/KAFKA-23 gets finished up), I'm
very interested, and I know some more folks at Twitter who are interested as
well.

On Fri, Jun 24, 2011 at 5:51 AM, Shalin Shekhar Mangar <
shalinmangar@gmail.com> wrote:

> On Wed, Jun 22, 2011 at 9:47 PM, Jun Rao <ju...@gmail.com> wrote:
> > Hi,
> >
> > I would like to propose Kafka to be an Apache Incubator project.  Kafka
> is a
> > distributed, high throughput, publish-subscribe system for processing
> large
> > amounts of streaming data.
> >
> > Here's a link to the proposal in the Incubator wiki
> > http://wiki.apache.org/incubator/KafkaProposal
> >
>
> +1
>
> I had evaluated Kafka for an internal project and came back impressed.
> Great to see it moving to Apache!
>
> --
> Regards,
> Shalin Shekhar Mangar.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>
>

Re: [PROPOSAL] Kafka for the Apache Incubator

Posted by Jun Rao <ju...@gmail.com>.

Thanks Shalin, Phillip. Welcome aboard, Phillip.

Jun

On Fri, Jun 24, 2011 at 6:34 AM, Phillip Rhodes
<mo...@gmail.com>wrote:

> On Fri, Jun 24, 2011 at 8:51 AM, Shalin Shekhar Mangar <
> shalinmangar@gmail.com> wrote:
>
> > On Wed, Jun 22, 2011 at 9:47 PM, Jun Rao <ju...@gmail.com> wrote:
> > > Hi,
> > >
> > > I would like to propose Kafka to be an Apache Incubator project.  Kafka
> > is a
> > > distributed, high throughput, publish-subscribe system for processing
> > large
> > > amounts of streaming data.
> > >
> > > Here's a link to the proposal in the Incubator wiki
> > > http://wiki.apache.org/incubator/KafkaProposal
> > >
> >
>
> +1
>
> Also, I'm willing to volunteer to help with this project.. I've added my
> name to the proposal wiki under "initial committers."  If anybody wants to
> know more
> about who I am, there's probably an "intro" email from me already in the
> Incubator email archives, or I can certainly post again if anybody cares.
>
>
> Phillip Rhodes
>

Re: [PROPOSAL] Kafka for the Apache Incubator

Posted by Phillip Rhodes <mo...@gmail.com>.

On Fri, Jun 24, 2011 at 8:51 AM, Shalin Shekhar Mangar <
shalinmangar@gmail.com> wrote:

> On Wed, Jun 22, 2011 at 9:47 PM, Jun Rao <ju...@gmail.com> wrote:
> > Hi,
> >
> > I would like to propose Kafka to be an Apache Incubator project.  Kafka
> is a
> > distributed, high throughput, publish-subscribe system for processing
> large
> > amounts of streaming data.
> >
> > Here's a link to the proposal in the Incubator wiki
> > http://wiki.apache.org/incubator/KafkaProposal
> >
>

+1

Also, I'm willing to volunteer to help with this project.. I've added my
name to the proposal wiki under "initial committers."  If anybody wants to
know more
about who I am, there's probably an "intro" email from me already in the
Incubator email archives, or I can certainly post again if anybody cares.

Phillip Rhodes

Re: [PROPOSAL] Kafka for the Apache Incubator

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.

On Wed, Jun 22, 2011 at 9:47 PM, Jun Rao <ju...@gmail.com> wrote:
> Hi,
>
> I would like to propose Kafka to be an Apache Incubator project.  Kafka is a
> distributed, high throughput, publish-subscribe system for processing large
> amounts of streaming data.
>
> Here's a link to the proposal in the Incubator wiki
> http://wiki.apache.org/incubator/KafkaProposal
>

+1

I had evaluated Kafka for an internal project and came back impressed.
Great to see it moving to Apache!

-- 
Regards,
Shalin Shekhar Mangar.

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org