You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@incubator.apache.org by Byung-Gon Chun <bg...@gmail.com> on 2014/08/09 07:40:23 UTC

[VOTE] Accept REEF into the Apache Incubator

Hi,

Thanks for participating in the proposal discussion on REEF. The discussion
has calmed. I would like to call a vote for acceptance of REEF into the
Apache Incubator.

The proposal is attached below, and it is also available at
https://wiki.apache.org/incubator/ReefProposal

Let's keep this vote open for three business days, closing the voting on
August 11, 11:59PM (PDT).

[] +1 Accept REEF into the Incubator
[] 0 Don't care
[] -1 Don't accept REEF because...

Thanks!
-Gon

-- 
Byung-Gon Chun


# REEFProposal - Incubator


# Abstract

REEF (Retainable Evaluator Execution Framework) is a scale-out
computing fabric that eases the development of Big Data applications
on top of resource managers such as Apache YARN and Mesos.


# Proposal

REEF is a Big Data system that makes it easy to implement scalable,
fault-tolerant runtime environments for a range of data processing
models (e.g., graph processing and machine learning) on top of
resource managers such as Apache YARN and Mesos. REEF provides
capabilities to run multiple heterogeneous frameworks and workflows of
those efficiently.

Additionally, REEF contains two libraries that are of independent
value: Wake is an event-based-programming framework inspired by Rx and
SEDA.  Tang is a dependency injection framework inspired by Google
Guice, but designed specifically for configuring distributed systems.


# Background

The resource management layer such as Apache YARN and Mesos has
emerged as a critical layer in the new scale-out data processing
stack; resource managers assume the responsibility of multiplexing a
cluster of shared-nothing machines across heterogeneous
applications. They operate behind an interface for leasing containers
- a slice of a machine’s resources - to computations in an elastic
fashion. However, building data processing frameworks directly on this
layer comes at a high cost: each framework must tackle the same
challenges (e.g., fault-tolerance, task scheduling and coordination)
and reimplement common mechanisms (e.g., caching, bulk transfers).

REEF provides a reusable control-plane for scheduling and coordinating
task-level work on cluster resource managers. The REEF design enables
sophisticated optimizations, such as container re-use and data
caching, and facilitates workflows that span multiple
frameworks. Examples include pipelining data between different
operators in a relational system, retaining state across iterations in
iterative or recursive data flow, and passing the result of a
MapReduce job to a Machine Learning computation.


# Rationale

Since REEF is a library that makes it easy to write distributed
applications on top of Apache YARN or Mesos, the Apache Software Foundation
is the perfect home for hosting REEF.


# Current Status

REEF has been developed mostly by Microsoft, UCLA and the Seoul
National University.  The REEF codebase is open-sourced under Apache
License 2.0 and is currently hosted in a public repository at
github.com.


# Meritocracy

We plan to build a strong open community by following the Apache
meritocracy principles. We will work with those who contribute
significantly to the project and invite them to be its committers.


# Community

REEF is currently being used internally at Microsoft.  Also, SK
Telecom builds their data analytics infrastructure on top of REEF in
collaboration with Seoul National University.  We hope to extend our
contributor base by becoming an Apache incubator project. REEF will
attract developers who are interested in creating common building
blocks for simplifying the development of large-scale big data
applications.


# Core Developers

Core developers are engineers from Microsoft, Purestorage, UCB, UCLA,
UW and Seoul National University.


# Alignment

REEF depends on many Apache projects and dependencies. REEF is built
on resource managers such as Apache YARN and Apache Mesos. REEF also
uses HDFS as a distributed storage layer.


# Known Risks
## Orphaned Products

The risk of REEF being orphaned is small because Microsoft products
are built on REEF. The core REEF developers continue to work on REEF
at Microsoft, UCLA, and Seoul National University. The REEF project is
gaining interest from other institutions to be used as their
infrastructure.

## Inexperience with Open Source

Several core developers have experience with open source development.
REEF committers will be guided by the mentors with strong Apache open
source project backgrounds.

## Homogeneous Developers

The initial committers include developers from several institutions
including Microsoft, Purestorage, UCB, UCLA, and Seoul National
University.

## Reliance on Salaried Developers

Developers from Microsoft are paid to work on REEF. Since the work is
used internally at Microsoft, Microsoft will keep supporting the
developers to work on REEF. There are also engineers and graduate
students that contribute to REEF from UCLA, UCB, UW and Seoul National
University.  We plan to attract active developers from other
institutions.

## Relationships with Other Apache Products

Given REEF's position in the big data stack, there are three
relationships to consider: Projects that fit below, on top of, or
alongside REEF in the stack.

### Below REEF: Mesos and YARN

REEF is designed to facilitate application development on top of
resource managers.  Hence, its relationship with the aforementioned
resource managers is symbiotic by design.

### On Top of REEF

Apache Spark, Giraph, MapReduce and Flink are only some of the
projects that logically belong at a higher layer of the big data stack
than REEF.  Of course, none of these today actually are leveraging
REEF and had to each individually solve some of the issues REEF
addresses.  It is our goal that REEF will help developers create
an even richer set of future big data frameworks.

### Alongside REEF

Apache hosts several projects building intermediate, library layers on
top of a resource management platform. Twill, Slider, and Tez are
notable examples in the incubator. These projects share many
objectives with REEF (and each other).  We expect these parallel
explorations to converge and differentiate within Apache, as the space
for distributed applications and deployment is too vast for a single
answer.

Apache Twill and REEF both aim to simplify application development on
top of resource managers.  However, REEF and Twill go about this in
different ways: Twill simplifies programming by exposing a programming
model, Java Threads.  REEF on the other hand provides a set of common
building blocks (e.g., job coordination, state passing, cluster
membership) for building big data processing applications and
virtualizes underlying resources managers.  None of this prescribes a
specific programming model.  As such, REEF occupies a slot ever so
slightly below Twill in an architecture stack.

Apache Slider is a framework to make it easy to deploy and manage
long-running static applications in a YARN cluster. The focus is to
adapt existing applications such as HBase and Accumulo to run on YARN
with little modification. Therefore, the goals of Slider and REEF are
different.

Apache Tez is a project to develop a generic Directed Acyclic Graph (DAG)
processing framework with a reusable set of data processing primitives.
The initial focus is to provide improved data processing capabilities for
projects like Apache Hive, Apache Pig, and Cascading. Tez is still a single
framework for DAG processing.  In contrast, REEF provides a generic
layer on which diverse computation models (DAG, ML, Graph processing,
and Interactive query processing) can be built.  More importantly,
REEF provides a layer that facilitates inter-framework resource and
in-memory state use and virtualizes resource managers. Regarding
re-usable data processing primitives, Tez and REEF share the same
goal.  We hope to collaborate on features which can be shared between
Tez and REEF.

Apache Helix automates application-wide management operations which require
global knowledge and coordination, such as repartitioning of resources and
scheduling of maintenance tasks. Helix separates global coordination
concerns from the functional tasks of the application with a state machine
abstraction. REEF's generic layer makes it easy to program the functional
and management tasks, which may span small or large groups within the
application. Helix can work hand-in-hand with REEF, by providing the global
management component for REEF applications.

## An Excessive Fascination with the Apache Brand

The Apache Software Foundation has a reputation of being the best place to
host open source projects. We believe that we will attract many developers
who want to contribute to innovating in the Big Data platform space by
joining the Apache Software Foundation.


# Documentation

The current documentation for REEF is at
https://github.com/Microsoft-CISL/REEF as well as on
http://www.reef-project.org


# Initial Source

The REEF codebase is currently hosted at
https://github.com/Microsoft-CISL/REEF.


# External Dependencies

REEF makes extensive use of the vast array of Java libraries from the
Apache Software Foundation, namely:

 * avro (Apache 2.0)
 * hadoop (Apache 2.0)
 * hdfs (Apache 2.0)
 * yarn (Apache 2.0)
 * commons-cli (Apache 2.0)
 * commons-configuration (Apache 2.0)
 * commons-lang (Apache 2.0)
 * commons-logging (Apache 2.0)

To the best of our knowledge, the external dependencies of REEF are
distributed under Apache compatible licenses:

 * guava-libraries (Apache 2.0)
 * protobuf (BSD)
 * asm (BSD)
 * netty (Apache 2.0)
 * mockito (MIT)
 * junit (EPL 1.0)
 * slf4j (MIT)


# Cryptography

REEF will depend on secure Hadoop, which can optionally use Kerberos.

# Required Resources

## Mailing Lists

  * reef-private for private PMC discussions
  * reef-dev for technical discussions among contributors and
                 notification about commits

## Subversion Directory

The REEF team uses Git for source version control:
git://git.apache.org/reef

## Issue Tracking

JIRA REEF (REEF)

## Other Resources

Jenkins continuous integration testing

# Initial Committers

 * Markus Weimer
 * Sergiy Matusevych
 * Julia Wang
 * Shravan M Narayanamurthy
 * Yingda Chen
 * Tony Majestro
 * Beysim Sezgin
 * Boris Shulman
 * Russell Sears
 * Jung Ryong Lee
 * You Sun Jung
 * Dong Joon Hyun
 * Josh Rosen
 * Tyson Condie
 * Brandon Myers
 * Yunseong Lee
 * Taegeon Um
 * Youngseok Yang
 * Brian Cho
 * Byung-Gon Chun

# Affiliations

 * Microsoft:
  * Markus Weimer
  * Sergiy Matusevych
  * Julia Wang
  * Shravan M Narayanamurthy
  * Yingda Chen
  * Tony Majestro
  * Beysim Sezgin
  * Boris Shulman
 * Purestorage:
  * Russell Sears
 * SK Telecom:
  * Jung Ryong Lee
  * You Sun Jung
  * Dong Joon Hyun
 * University of California:
  * Josh Rosen (Berkeley)
  * Tyson Condie (LA)
 * University of Washington:
  * Brandon Myers
 * Seoul National University:
  * Yunseong Lee
  * Taegeon Um
  * Youngseok Yang
  * Brian Cho
  * Byung-Gon Chun


# Sponsors

## Champions
Chris Douglas <cd...@apache.org>

## Nominated Mentors
 * Chris Mattmann <ma...@apache.org>
 * Ross Gardler <rg...@apache.org>
 * Owen O'Malley <om...@apache.org>

## Sponsoring Entity
The Apache Incubator

Re: [VOTE] Accept REEF into the Apache Incubator

Posted by Ross Gardler <rg...@opendirective.com>.
[x] +1 Accept REEF into the Incubator






On 8 August 2014 22:40, Byung-Gon Chun <bg...@gmail.com> wrote:

> Hi,
>
> Thanks for participating in the proposal discussion on REEF. The discussion
> has calmed. I would like to call a vote for acceptance of REEF into the
> Apache Incubator.
>
> The proposal is attached below, and it is also available at
> https://wiki.apache.org/incubator/ReefProposal
>
> Let's keep this vote open for three business days, closing the voting on
> August 11, 11:59PM (PDT).
>
> [] +1 Accept REEF into the Incubator
> [] 0 Don't care
> [] -1 Don't accept REEF because...
>
> Thanks!
> -Gon
>
> --
> Byung-Gon Chun
>
>
> # REEFProposal - Incubator
>
>
> # Abstract
>
> REEF (Retainable Evaluator Execution Framework) is a scale-out
> computing fabric that eases the development of Big Data applications
> on top of resource managers such as Apache YARN and Mesos.
>
>
> # Proposal
>
> REEF is a Big Data system that makes it easy to implement scalable,
> fault-tolerant runtime environments for a range of data processing
> models (e.g., graph processing and machine learning) on top of
> resource managers such as Apache YARN and Mesos. REEF provides
> capabilities to run multiple heterogeneous frameworks and workflows of
> those efficiently.
>
> Additionally, REEF contains two libraries that are of independent
> value: Wake is an event-based-programming framework inspired by Rx and
> SEDA.  Tang is a dependency injection framework inspired by Google
> Guice, but designed specifically for configuring distributed systems.
>
>
> # Background
>
> The resource management layer such as Apache YARN and Mesos has
> emerged as a critical layer in the new scale-out data processing
> stack; resource managers assume the responsibility of multiplexing a
> cluster of shared-nothing machines across heterogeneous
> applications. They operate behind an interface for leasing containers
> - a slice of a machine's resources - to computations in an elastic
> fashion. However, building data processing frameworks directly on this
> layer comes at a high cost: each framework must tackle the same
> challenges (e.g., fault-tolerance, task scheduling and coordination)
> and reimplement common mechanisms (e.g., caching, bulk transfers).
>
> REEF provides a reusable control-plane for scheduling and coordinating
> task-level work on cluster resource managers. The REEF design enables
> sophisticated optimizations, such as container re-use and data
> caching, and facilitates workflows that span multiple
> frameworks. Examples include pipelining data between different
> operators in a relational system, retaining state across iterations in
> iterative or recursive data flow, and passing the result of a
> MapReduce job to a Machine Learning computation.
>
>
> # Rationale
>
> Since REEF is a library that makes it easy to write distributed
> applications on top of Apache YARN or Mesos, the Apache Software Foundation
> is the perfect home for hosting REEF.
>
>
> # Current Status
>
> REEF has been developed mostly by Microsoft, UCLA and the Seoul
> National University.  The REEF codebase is open-sourced under Apache
> License 2.0 and is currently hosted in a public repository at
> github.com.
>
>
> # Meritocracy
>
> We plan to build a strong open community by following the Apache
> meritocracy principles. We will work with those who contribute
> significantly to the project and invite them to be its committers.
>
>
> # Community
>
> REEF is currently being used internally at Microsoft.  Also, SK
> Telecom builds their data analytics infrastructure on top of REEF in
> collaboration with Seoul National University.  We hope to extend our
> contributor base by becoming an Apache incubator project. REEF will
> attract developers who are interested in creating common building
> blocks for simplifying the development of large-scale big data
> applications.
>
>
> # Core Developers
>
> Core developers are engineers from Microsoft, Purestorage, UCB, UCLA,
> UW and Seoul National University.
>
>
> # Alignment
>
> REEF depends on many Apache projects and dependencies. REEF is built
> on resource managers such as Apache YARN and Apache Mesos. REEF also
> uses HDFS as a distributed storage layer.
>
>
> # Known Risks
> ## Orphaned Products
>
> The risk of REEF being orphaned is small because Microsoft products
> are built on REEF. The core REEF developers continue to work on REEF
> at Microsoft, UCLA, and Seoul National University. The REEF project is
> gaining interest from other institutions to be used as their
> infrastructure.
>
> ## Inexperience with Open Source
>
> Several core developers have experience with open source development.
> REEF committers will be guided by the mentors with strong Apache open
> source project backgrounds.
>
> ## Homogeneous Developers
>
> The initial committers include developers from several institutions
> including Microsoft, Purestorage, UCB, UCLA, and Seoul National
> University.
>
> ## Reliance on Salaried Developers
>
> Developers from Microsoft are paid to work on REEF. Since the work is
> used internally at Microsoft, Microsoft will keep supporting the
> developers to work on REEF. There are also engineers and graduate
> students that contribute to REEF from UCLA, UCB, UW and Seoul National
> University.  We plan to attract active developers from other
> institutions.
>
> ## Relationships with Other Apache Products
>
> Given REEF's position in the big data stack, there are three
> relationships to consider: Projects that fit below, on top of, or
> alongside REEF in the stack.
>
> ### Below REEF: Mesos and YARN
>
> REEF is designed to facilitate application development on top of
> resource managers.  Hence, its relationship with the aforementioned
> resource managers is symbiotic by design.
>
> ### On Top of REEF
>
> Apache Spark, Giraph, MapReduce and Flink are only some of the
> projects that logically belong at a higher layer of the big data stack
> than REEF.  Of course, none of these today actually are leveraging
> REEF and had to each individually solve some of the issues REEF
> addresses.  It is our goal that REEF will help developers create
> an even richer set of future big data frameworks.
>
> ### Alongside REEF
>
> Apache hosts several projects building intermediate, library layers on
> top of a resource management platform. Twill, Slider, and Tez are
> notable examples in the incubator. These projects share many
> objectives with REEF (and each other).  We expect these parallel
> explorations to converge and differentiate within Apache, as the space
> for distributed applications and deployment is too vast for a single
> answer.
>
> Apache Twill and REEF both aim to simplify application development on
> top of resource managers.  However, REEF and Twill go about this in
> different ways: Twill simplifies programming by exposing a programming
> model, Java Threads.  REEF on the other hand provides a set of common
> building blocks (e.g., job coordination, state passing, cluster
> membership) for building big data processing applications and
> virtualizes underlying resources managers.  None of this prescribes a
> specific programming model.  As such, REEF occupies a slot ever so
> slightly below Twill in an architecture stack.
>
> Apache Slider is a framework to make it easy to deploy and manage
> long-running static applications in a YARN cluster. The focus is to
> adapt existing applications such as HBase and Accumulo to run on YARN
> with little modification. Therefore, the goals of Slider and REEF are
> different.
>
> Apache Tez is a project to develop a generic Directed Acyclic Graph (DAG)
> processing framework with a reusable set of data processing primitives.
> The initial focus is to provide improved data processing capabilities for
> projects like Apache Hive, Apache Pig, and Cascading. Tez is still a single
> framework for DAG processing.  In contrast, REEF provides a generic
> layer on which diverse computation models (DAG, ML, Graph processing,
> and Interactive query processing) can be built.  More importantly,
> REEF provides a layer that facilitates inter-framework resource and
> in-memory state use and virtualizes resource managers. Regarding
> re-usable data processing primitives, Tez and REEF share the same
> goal.  We hope to collaborate on features which can be shared between
> Tez and REEF.
>
> Apache Helix automates application-wide management operations which require
> global knowledge and coordination, such as repartitioning of resources and
> scheduling of maintenance tasks. Helix separates global coordination
> concerns from the functional tasks of the application with a state machine
> abstraction. REEF's generic layer makes it easy to program the functional
> and management tasks, which may span small or large groups within the
> application. Helix can work hand-in-hand with REEF, by providing the global
> management component for REEF applications.
>
> ## An Excessive Fascination with the Apache Brand
>
> The Apache Software Foundation has a reputation of being the best place to
> host open source projects. We believe that we will attract many developers
> who want to contribute to innovating in the Big Data platform space by
> joining the Apache Software Foundation.
>
>
> # Documentation
>
> The current documentation for REEF is at
> https://github.com/Microsoft-CISL/REEF as well as on
> http://www.reef-project.org
>
>
> # Initial Source
>
> The REEF codebase is currently hosted at
> https://github.com/Microsoft-CISL/REEF.
>
>
> # External Dependencies
>
> REEF makes extensive use of the vast array of Java libraries from the
> Apache Software Foundation, namely:
>
>  * avro (Apache 2.0)
>  * hadoop (Apache 2.0)
>  * hdfs (Apache 2.0)
>  * yarn (Apache 2.0)
>  * commons-cli (Apache 2.0)
>  * commons-configuration (Apache 2.0)
>  * commons-lang (Apache 2.0)
>  * commons-logging (Apache 2.0)
>
> To the best of our knowledge, the external dependencies of REEF are
> distributed under Apache compatible licenses:
>
>  * guava-libraries (Apache 2.0)
>  * protobuf (BSD)
>  * asm (BSD)
>  * netty (Apache 2.0)
>  * mockito (MIT)
>  * junit (EPL 1.0)
>  * slf4j (MIT)
>
>
> # Cryptography
>
> REEF will depend on secure Hadoop, which can optionally use Kerberos.
>
> # Required Resources
>
> ## Mailing Lists
>
>   * reef-private for private PMC discussions
>   * reef-dev for technical discussions among contributors and
>                  notification about commits
>
> ## Subversion Directory
>
> The REEF team uses Git for source version control:
> git://git.apache.org/reef
>
> ## Issue Tracking
>
> JIRA REEF (REEF)
>
> ## Other Resources
>
> Jenkins continuous integration testing
>
> # Initial Committers
>
>  * Markus Weimer
>  * Sergiy Matusevych
>  * Julia Wang
>  * Shravan M Narayanamurthy
>  * Yingda Chen
>  * Tony Majestro
>  * Beysim Sezgin
>  * Boris Shulman
>  * Russell Sears
>  * Jung Ryong Lee
>  * You Sun Jung
>  * Dong Joon Hyun
>  * Josh Rosen
>  * Tyson Condie
>  * Brandon Myers
>  * Yunseong Lee
>  * Taegeon Um
>  * Youngseok Yang
>  * Brian Cho
>  * Byung-Gon Chun
>
> # Affiliations
>
>  * Microsoft:
>   * Markus Weimer
>   * Sergiy Matusevych
>   * Julia Wang
>   * Shravan M Narayanamurthy
>   * Yingda Chen
>   * Tony Majestro
>   * Beysim Sezgin
>   * Boris Shulman
>  * Purestorage:
>   * Russell Sears
>  * SK Telecom:
>   * Jung Ryong Lee
>   * You Sun Jung
>   * Dong Joon Hyun
>  * University of California:
>   * Josh Rosen (Berkeley)
>   * Tyson Condie (LA)
>  * University of Washington:
>   * Brandon Myers
>  * Seoul National University:
>   * Yunseong Lee
>   * Taegeon Um
>   * Youngseok Yang
>   * Brian Cho
>   * Byung-Gon Chun
>
>
> # Sponsors
>
> ## Champions
> Chris Douglas <cd...@apache.org>
>
> ## Nominated Mentors
>  * Chris Mattmann <ma...@apache.org>
>  * Ross Gardler <rg...@apache.org>
>  * Owen O'Malley <om...@apache.org>
>
> ## Sponsoring Entity
> The Apache Incubator
>

Re: [VOTE] Accept REEF into the Apache Incubator

Posted by "Alan D. Cabrera" <li...@toolazydogs.com>.
+1 binding


Regards,
Alan

On Aug 8, 2014, at 10:40 PM, Byung-Gon Chun <bg...@gmail.com> wrote:

> Let's keep this vote open for three business days, closing the voting on
> August 11, 11:59PM (PDT).
> 
> [] +1 Accept REEF into the Incubator
> [] 0 Don't care
> [] -1 Don't accept REEF because...


Re: [VOTE] Accept REEF into the Apache Incubator

Posted by Till Westmann <ti...@apache.org>.
+1

On Fri, Aug 8, 2014 at 10:40 PM, Byung-Gon Chun <bg...@gmail.com> wrote:

> Hi,
>
> Thanks for participating in the proposal discussion on REEF. The discussion
> has calmed. I would like to call a vote for acceptance of REEF into the
> Apache Incubator.
>
> The proposal is attached below, and it is also available at
> https://wiki.apache.org/incubator/ReefProposal
>
> Let's keep this vote open for three business days, closing the voting on
> August 11, 11:59PM (PDT).
>
> [] +1 Accept REEF into the Incubator
> [] 0 Don't care
> [] -1 Don't accept REEF because...
>
> Thanks!
> -Gon
>
> --
> Byung-Gon Chun
>
>
> # REEFProposal - Incubator
>
>
> # Abstract
>
> REEF (Retainable Evaluator Execution Framework) is a scale-out
> computing fabric that eases the development of Big Data applications
> on top of resource managers such as Apache YARN and Mesos.
>
>
> # Proposal
>
> REEF is a Big Data system that makes it easy to implement scalable,
> fault-tolerant runtime environments for a range of data processing
> models (e.g., graph processing and machine learning) on top of
> resource managers such as Apache YARN and Mesos. REEF provides
> capabilities to run multiple heterogeneous frameworks and workflows of
> those efficiently.
>
> Additionally, REEF contains two libraries that are of independent
> value: Wake is an event-based-programming framework inspired by Rx and
> SEDA.  Tang is a dependency injection framework inspired by Google
> Guice, but designed specifically for configuring distributed systems.
>
>
> # Background
>
> The resource management layer such as Apache YARN and Mesos has
> emerged as a critical layer in the new scale-out data processing
> stack; resource managers assume the responsibility of multiplexing a
> cluster of shared-nothing machines across heterogeneous
> applications. They operate behind an interface for leasing containers
> - a slice of a machine’s resources - to computations in an elastic
> fashion. However, building data processing frameworks directly on this
> layer comes at a high cost: each framework must tackle the same
> challenges (e.g., fault-tolerance, task scheduling and coordination)
> and reimplement common mechanisms (e.g., caching, bulk transfers).
>
> REEF provides a reusable control-plane for scheduling and coordinating
> task-level work on cluster resource managers. The REEF design enables
> sophisticated optimizations, such as container re-use and data
> caching, and facilitates workflows that span multiple
> frameworks. Examples include pipelining data between different
> operators in a relational system, retaining state across iterations in
> iterative or recursive data flow, and passing the result of a
> MapReduce job to a Machine Learning computation.
>
>
> # Rationale
>
> Since REEF is a library that makes it easy to write distributed
> applications on top of Apache YARN or Mesos, the Apache Software Foundation
> is the perfect home for hosting REEF.
>
>
> # Current Status
>
> REEF has been developed mostly by Microsoft, UCLA and the Seoul
> National University.  The REEF codebase is open-sourced under Apache
> License 2.0 and is currently hosted in a public repository at
> github.com.
>
>
> # Meritocracy
>
> We plan to build a strong open community by following the Apache
> meritocracy principles. We will work with those who contribute
> significantly to the project and invite them to be its committers.
>
>
> # Community
>
> REEF is currently being used internally at Microsoft.  Also, SK
> Telecom builds their data analytics infrastructure on top of REEF in
> collaboration with Seoul National University.  We hope to extend our
> contributor base by becoming an Apache incubator project. REEF will
> attract developers who are interested in creating common building
> blocks for simplifying the development of large-scale big data
> applications.
>
>
> # Core Developers
>
> Core developers are engineers from Microsoft, Purestorage, UCB, UCLA,
> UW and Seoul National University.
>
>
> # Alignment
>
> REEF depends on many Apache projects and dependencies. REEF is built
> on resource managers such as Apache YARN and Apache Mesos. REEF also
> uses HDFS as a distributed storage layer.
>
>
> # Known Risks
> ## Orphaned Products
>
> The risk of REEF being orphaned is small because Microsoft products
> are built on REEF. The core REEF developers continue to work on REEF
> at Microsoft, UCLA, and Seoul National University. The REEF project is
> gaining interest from other institutions to be used as their
> infrastructure.
>
> ## Inexperience with Open Source
>
> Several core developers have experience with open source development.
> REEF committers will be guided by the mentors with strong Apache open
> source project backgrounds.
>
> ## Homogeneous Developers
>
> The initial committers include developers from several institutions
> including Microsoft, Purestorage, UCB, UCLA, and Seoul National
> University.
>
> ## Reliance on Salaried Developers
>
> Developers from Microsoft are paid to work on REEF. Since the work is
> used internally at Microsoft, Microsoft will keep supporting the
> developers to work on REEF. There are also engineers and graduate
> students that contribute to REEF from UCLA, UCB, UW and Seoul National
> University.  We plan to attract active developers from other
> institutions.
>
> ## Relationships with Other Apache Products
>
> Given REEF's position in the big data stack, there are three
> relationships to consider: Projects that fit below, on top of, or
> alongside REEF in the stack.
>
> ### Below REEF: Mesos and YARN
>
> REEF is designed to facilitate application development on top of
> resource managers.  Hence, its relationship with the aforementioned
> resource managers is symbiotic by design.
>
> ### On Top of REEF
>
> Apache Spark, Giraph, MapReduce and Flink are only some of the
> projects that logically belong at a higher layer of the big data stack
> than REEF.  Of course, none of these today actually are leveraging
> REEF and had to each individually solve some of the issues REEF
> addresses.  It is our goal that REEF will help developers create
> an even richer set of future big data frameworks.
>
> ### Alongside REEF
>
> Apache hosts several projects building intermediate, library layers on
> top of a resource management platform. Twill, Slider, and Tez are
> notable examples in the incubator. These projects share many
> objectives with REEF (and each other).  We expect these parallel
> explorations to converge and differentiate within Apache, as the space
> for distributed applications and deployment is too vast for a single
> answer.
>
> Apache Twill and REEF both aim to simplify application development on
> top of resource managers.  However, REEF and Twill go about this in
> different ways: Twill simplifies programming by exposing a programming
> model, Java Threads.  REEF on the other hand provides a set of common
> building blocks (e.g., job coordination, state passing, cluster
> membership) for building big data processing applications and
> virtualizes underlying resources managers.  None of this prescribes a
> specific programming model.  As such, REEF occupies a slot ever so
> slightly below Twill in an architecture stack.
>
> Apache Slider is a framework to make it easy to deploy and manage
> long-running static applications in a YARN cluster. The focus is to
> adapt existing applications such as HBase and Accumulo to run on YARN
> with little modification. Therefore, the goals of Slider and REEF are
> different.
>
> Apache Tez is a project to develop a generic Directed Acyclic Graph (DAG)
> processing framework with a reusable set of data processing primitives.
> The initial focus is to provide improved data processing capabilities for
> projects like Apache Hive, Apache Pig, and Cascading. Tez is still a single
> framework for DAG processing.  In contrast, REEF provides a generic
> layer on which diverse computation models (DAG, ML, Graph processing,
> and Interactive query processing) can be built.  More importantly,
> REEF provides a layer that facilitates inter-framework resource and
> in-memory state use and virtualizes resource managers. Regarding
> re-usable data processing primitives, Tez and REEF share the same
> goal.  We hope to collaborate on features which can be shared between
> Tez and REEF.
>
> Apache Helix automates application-wide management operations which require
> global knowledge and coordination, such as repartitioning of resources and
> scheduling of maintenance tasks. Helix separates global coordination
> concerns from the functional tasks of the application with a state machine
> abstraction. REEF's generic layer makes it easy to program the functional
> and management tasks, which may span small or large groups within the
> application. Helix can work hand-in-hand with REEF, by providing the global
> management component for REEF applications.
>
> ## An Excessive Fascination with the Apache Brand
>
> The Apache Software Foundation has a reputation of being the best place to
> host open source projects. We believe that we will attract many developers
> who want to contribute to innovating in the Big Data platform space by
> joining the Apache Software Foundation.
>
>
> # Documentation
>
> The current documentation for REEF is at
> https://github.com/Microsoft-CISL/REEF as well as on
> http://www.reef-project.org
>
>
> # Initial Source
>
> The REEF codebase is currently hosted at
> https://github.com/Microsoft-CISL/REEF.
>
>
> # External Dependencies
>
> REEF makes extensive use of the vast array of Java libraries from the
> Apache Software Foundation, namely:
>
>  * avro (Apache 2.0)
>  * hadoop (Apache 2.0)
>  * hdfs (Apache 2.0)
>  * yarn (Apache 2.0)
>  * commons-cli (Apache 2.0)
>  * commons-configuration (Apache 2.0)
>  * commons-lang (Apache 2.0)
>  * commons-logging (Apache 2.0)
>
> To the best of our knowledge, the external dependencies of REEF are
> distributed under Apache compatible licenses:
>
>  * guava-libraries (Apache 2.0)
>  * protobuf (BSD)
>  * asm (BSD)
>  * netty (Apache 2.0)
>  * mockito (MIT)
>  * junit (EPL 1.0)
>  * slf4j (MIT)
>
>
> # Cryptography
>
> REEF will depend on secure Hadoop, which can optionally use Kerberos.
>
> # Required Resources
>
> ## Mailing Lists
>
>   * reef-private for private PMC discussions
>   * reef-dev for technical discussions among contributors and
>                  notification about commits
>
> ## Subversion Directory
>
> The REEF team uses Git for source version control:
> git://git.apache.org/reef
>
> ## Issue Tracking
>
> JIRA REEF (REEF)
>
> ## Other Resources
>
> Jenkins continuous integration testing
>
> # Initial Committers
>
>  * Markus Weimer
>  * Sergiy Matusevych
>  * Julia Wang
>  * Shravan M Narayanamurthy
>  * Yingda Chen
>  * Tony Majestro
>  * Beysim Sezgin
>  * Boris Shulman
>  * Russell Sears
>  * Jung Ryong Lee
>  * You Sun Jung
>  * Dong Joon Hyun
>  * Josh Rosen
>  * Tyson Condie
>  * Brandon Myers
>  * Yunseong Lee
>  * Taegeon Um
>  * Youngseok Yang
>  * Brian Cho
>  * Byung-Gon Chun
>
> # Affiliations
>
>  * Microsoft:
>   * Markus Weimer
>   * Sergiy Matusevych
>   * Julia Wang
>   * Shravan M Narayanamurthy
>   * Yingda Chen
>   * Tony Majestro
>   * Beysim Sezgin
>   * Boris Shulman
>  * Purestorage:
>   * Russell Sears
>  * SK Telecom:
>   * Jung Ryong Lee
>   * You Sun Jung
>   * Dong Joon Hyun
>  * University of California:
>   * Josh Rosen (Berkeley)
>   * Tyson Condie (LA)
>  * University of Washington:
>   * Brandon Myers
>  * Seoul National University:
>   * Yunseong Lee
>   * Taegeon Um
>   * Youngseok Yang
>   * Brian Cho
>   * Byung-Gon Chun
>
>
> # Sponsors
>
> ## Champions
> Chris Douglas <cd...@apache.org>
>
> ## Nominated Mentors
>  * Chris Mattmann <ma...@apache.org>
>  * Ross Gardler <rg...@apache.org>
>  * Owen O'Malley <om...@apache.org>
>
> ## Sponsoring Entity
> The Apache Incubator
>

Re: [VOTE] Accept REEF into the Apache Incubator

Posted by jg...@gmail.com.
+1 (binding)

-Jakob






From: Bertrand Delacretaz
Sent: ‎Monday‎, ‎August‎ ‎11‎, ‎2014 ‎1‎:‎16‎ ‎AM
To: general@incubator.apache.org





On Sat, Aug 9, 2014 at 7:40 AM, Byung-Gon Chun <bg...@gmail.com> wrote:
> ...I would like to call a vote for acceptance of REEF into the
> Apache Incubator...

+1

-Bertrand

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: [VOTE] Accept REEF into the Apache Incubator

Posted by Bertrand Delacretaz <bd...@apache.org>.
On Sat, Aug 9, 2014 at 7:40 AM, Byung-Gon Chun <bg...@gmail.com> wrote:
> ...I would like to call a vote for acceptance of REEF into the
> Apache Incubator...

+1

-Bertrand

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [VOTE] Accept REEF into the Apache Incubator

Posted by Roman Shaposhnik <ro...@shaposhnik.org>.
On Fri, Aug 8, 2014 at 10:40 PM, Byung-Gon Chun <bg...@gmail.com> wrote:
> Hi,
>
> Thanks for participating in the proposal discussion on REEF. The discussion
> has calmed. I would like to call a vote for acceptance of REEF into the
> Apache Incubator.
>
> The proposal is attached below, and it is also available at
> https://wiki.apache.org/incubator/ReefProposal
>
> Let's keep this vote open for three business days, closing the voting on
> August 11, 11:59PM (PDT).
>
> [] +1 Accept REEF into the Incubator
> [] 0 Don't care
> [] -1 Don't accept REEF because...

+1 (binding)

Thanks,
Roman.

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [VOTE] Accept REEF into the Apache Incubator

Posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov>.
+1 binding thanks 

Sent from my iPhone

> On Aug 8, 2014, at 10:40 PM, "Byung-Gon Chun" <bg...@gmail.com> wrote:
> 
> Hi,
> 
> Thanks for participating in the proposal discussion on REEF. The discussion
> has calmed. I would like to call a vote for acceptance of REEF into the
> Apache Incubator.
> 
> The proposal is attached below, and it is also available at
> https://wiki.apache.org/incubator/ReefProposal
> 
> Let's keep this vote open for three business days, closing the voting on
> August 11, 11:59PM (PDT).
> 
> [] +1 Accept REEF into the Incubator
> [] 0 Don't care
> [] -1 Don't accept REEF because...
> 
> Thanks!
> -Gon
> 
> -- 
> Byung-Gon Chun
> 
> 
> # REEFProposal - Incubator
> 
> 
> # Abstract
> 
> REEF (Retainable Evaluator Execution Framework) is a scale-out
> computing fabric that eases the development of Big Data applications
> on top of resource managers such as Apache YARN and Mesos.
> 
> 
> # Proposal
> 
> REEF is a Big Data system that makes it easy to implement scalable,
> fault-tolerant runtime environments for a range of data processing
> models (e.g., graph processing and machine learning) on top of
> resource managers such as Apache YARN and Mesos. REEF provides
> capabilities to run multiple heterogeneous frameworks and workflows of
> those efficiently.
> 
> Additionally, REEF contains two libraries that are of independent
> value: Wake is an event-based-programming framework inspired by Rx and
> SEDA.  Tang is a dependency injection framework inspired by Google
> Guice, but designed specifically for configuring distributed systems.
> 
> 
> # Background
> 
> The resource management layer such as Apache YARN and Mesos has
> emerged as a critical layer in the new scale-out data processing
> stack; resource managers assume the responsibility of multiplexing a
> cluster of shared-nothing machines across heterogeneous
> applications. They operate behind an interface for leasing containers
> - a slice of a machine’s resources - to computations in an elastic
> fashion. However, building data processing frameworks directly on this
> layer comes at a high cost: each framework must tackle the same
> challenges (e.g., fault-tolerance, task scheduling and coordination)
> and reimplement common mechanisms (e.g., caching, bulk transfers).
> 
> REEF provides a reusable control-plane for scheduling and coordinating
> task-level work on cluster resource managers. The REEF design enables
> sophisticated optimizations, such as container re-use and data
> caching, and facilitates workflows that span multiple
> frameworks. Examples include pipelining data between different
> operators in a relational system, retaining state across iterations in
> iterative or recursive data flow, and passing the result of a
> MapReduce job to a Machine Learning computation.
> 
> 
> # Rationale
> 
> Since REEF is a library that makes it easy to write distributed
> applications on top of Apache YARN or Mesos, the Apache Software Foundation
> is the perfect home for hosting REEF.
> 
> 
> # Current Status
> 
> REEF has been developed mostly by Microsoft, UCLA and the Seoul
> National University.  The REEF codebase is open-sourced under Apache
> License 2.0 and is currently hosted in a public repository at
> github.com.
> 
> 
> # Meritocracy
> 
> We plan to build a strong open community by following the Apache
> meritocracy principles. We will work with those who contribute
> significantly to the project and invite them to be its committers.
> 
> 
> # Community
> 
> REEF is currently being used internally at Microsoft.  Also, SK
> Telecom builds their data analytics infrastructure on top of REEF in
> collaboration with Seoul National University.  We hope to extend our
> contributor base by becoming an Apache incubator project. REEF will
> attract developers who are interested in creating common building
> blocks for simplifying the development of large-scale big data
> applications.
> 
> 
> # Core Developers
> 
> Core developers are engineers from Microsoft, Purestorage, UCB, UCLA,
> UW and Seoul National University.
> 
> 
> # Alignment
> 
> REEF depends on many Apache projects and dependencies. REEF is built
> on resource managers such as Apache YARN and Apache Mesos. REEF also
> uses HDFS as a distributed storage layer.
> 
> 
> # Known Risks
> ## Orphaned Products
> 
> The risk of REEF being orphaned is small because Microsoft products
> are built on REEF. The core REEF developers continue to work on REEF
> at Microsoft, UCLA, and Seoul National University. The REEF project is
> gaining interest from other institutions to be used as their
> infrastructure.
> 
> ## Inexperience with Open Source
> 
> Several core developers have experience with open source development.
> REEF committers will be guided by the mentors with strong Apache open
> source project backgrounds.
> 
> ## Homogeneous Developers
> 
> The initial committers include developers from several institutions
> including Microsoft, Purestorage, UCB, UCLA, and Seoul National
> University.
> 
> ## Reliance on Salaried Developers
> 
> Developers from Microsoft are paid to work on REEF. Since the work is
> used internally at Microsoft, Microsoft will keep supporting the
> developers to work on REEF. There are also engineers and graduate
> students that contribute to REEF from UCLA, UCB, UW and Seoul National
> University.  We plan to attract active developers from other
> institutions.
> 
> ## Relationships with Other Apache Products
> 
> Given REEF's position in the big data stack, there are three
> relationships to consider: Projects that fit below, on top of, or
> alongside REEF in the stack.
> 
> ### Below REEF: Mesos and YARN
> 
> REEF is designed to facilitate application development on top of
> resource managers.  Hence, its relationship with the aforementioned
> resource managers is symbiotic by design.
> 
> ### On Top of REEF
> 
> Apache Spark, Giraph, MapReduce and Flink are only some of the
> projects that logically belong at a higher layer of the big data stack
> than REEF.  Of course, none of these today actually are leveraging
> REEF and had to each individually solve some of the issues REEF
> addresses.  It is our goal that REEF will help developers create
> an even richer set of future big data frameworks.
> 
> ### Alongside REEF
> 
> Apache hosts several projects building intermediate, library layers on
> top of a resource management platform. Twill, Slider, and Tez are
> notable examples in the incubator. These projects share many
> objectives with REEF (and each other).  We expect these parallel
> explorations to converge and differentiate within Apache, as the space
> for distributed applications and deployment is too vast for a single
> answer.
> 
> Apache Twill and REEF both aim to simplify application development on
> top of resource managers.  However, REEF and Twill go about this in
> different ways: Twill simplifies programming by exposing a programming
> model, Java Threads.  REEF on the other hand provides a set of common
> building blocks (e.g., job coordination, state passing, cluster
> membership) for building big data processing applications and
> virtualizes underlying resources managers.  None of this prescribes a
> specific programming model.  As such, REEF occupies a slot ever so
> slightly below Twill in an architecture stack.
> 
> Apache Slider is a framework to make it easy to deploy and manage
> long-running static applications in a YARN cluster. The focus is to
> adapt existing applications such as HBase and Accumulo to run on YARN
> with little modification. Therefore, the goals of Slider and REEF are
> different.
> 
> Apache Tez is a project to develop a generic Directed Acyclic Graph (DAG)
> processing framework with a reusable set of data processing primitives.
> The initial focus is to provide improved data processing capabilities for
> projects like Apache Hive, Apache Pig, and Cascading. Tez is still a single
> framework for DAG processing.  In contrast, REEF provides a generic
> layer on which diverse computation models (DAG, ML, Graph processing,
> and Interactive query processing) can be built.  More importantly,
> REEF provides a layer that facilitates inter-framework resource and
> in-memory state use and virtualizes resource managers. Regarding
> re-usable data processing primitives, Tez and REEF share the same
> goal.  We hope to collaborate on features which can be shared between
> Tez and REEF.
> 
> Apache Helix automates application-wide management operations which require
> global knowledge and coordination, such as repartitioning of resources and
> scheduling of maintenance tasks. Helix separates global coordination
> concerns from the functional tasks of the application with a state machine
> abstraction. REEF's generic layer makes it easy to program the functional
> and management tasks, which may span small or large groups within the
> application. Helix can work hand-in-hand with REEF, by providing the global
> management component for REEF applications.
> 
> ## An Excessive Fascination with the Apache Brand
> 
> The Apache Software Foundation has a reputation of being the best place to
> host open source projects. We believe that we will attract many developers
> who want to contribute to innovating in the Big Data platform space by
> joining the Apache Software Foundation.
> 
> 
> # Documentation
> 
> The current documentation for REEF is at
> https://github.com/Microsoft-CISL/REEF as well as on
> http://www.reef-project.org
> 
> 
> # Initial Source
> 
> The REEF codebase is currently hosted at
> https://github.com/Microsoft-CISL/REEF.
> 
> 
> # External Dependencies
> 
> REEF makes extensive use of the vast array of Java libraries from the
> Apache Software Foundation, namely:
> 
> * avro (Apache 2.0)
> * hadoop (Apache 2.0)
> * hdfs (Apache 2.0)
> * yarn (Apache 2.0)
> * commons-cli (Apache 2.0)
> * commons-configuration (Apache 2.0)
> * commons-lang (Apache 2.0)
> * commons-logging (Apache 2.0)
> 
> To the best of our knowledge, the external dependencies of REEF are
> distributed under Apache compatible licenses:
> 
> * guava-libraries (Apache 2.0)
> * protobuf (BSD)
> * asm (BSD)
> * netty (Apache 2.0)
> * mockito (MIT)
> * junit (EPL 1.0)
> * slf4j (MIT)
> 
> 
> # Cryptography
> 
> REEF will depend on secure Hadoop, which can optionally use Kerberos.
> 
> # Required Resources
> 
> ## Mailing Lists
> 
>  * reef-private for private PMC discussions
>  * reef-dev for technical discussions among contributors and
>                 notification about commits
> 
> ## Subversion Directory
> 
> The REEF team uses Git for source version control:
> git://git.apache.org/reef
> 
> ## Issue Tracking
> 
> JIRA REEF (REEF)
> 
> ## Other Resources
> 
> Jenkins continuous integration testing
> 
> # Initial Committers
> 
> * Markus Weimer
> * Sergiy Matusevych
> * Julia Wang
> * Shravan M Narayanamurthy
> * Yingda Chen
> * Tony Majestro
> * Beysim Sezgin
> * Boris Shulman
> * Russell Sears
> * Jung Ryong Lee
> * You Sun Jung
> * Dong Joon Hyun
> * Josh Rosen
> * Tyson Condie
> * Brandon Myers
> * Yunseong Lee
> * Taegeon Um
> * Youngseok Yang
> * Brian Cho
> * Byung-Gon Chun
> 
> # Affiliations
> 
> * Microsoft:
>  * Markus Weimer
>  * Sergiy Matusevych
>  * Julia Wang
>  * Shravan M Narayanamurthy
>  * Yingda Chen
>  * Tony Majestro
>  * Beysim Sezgin
>  * Boris Shulman
> * Purestorage:
>  * Russell Sears
> * SK Telecom:
>  * Jung Ryong Lee
>  * You Sun Jung
>  * Dong Joon Hyun
> * University of California:
>  * Josh Rosen (Berkeley)
>  * Tyson Condie (LA)
> * University of Washington:
>  * Brandon Myers
> * Seoul National University:
>  * Yunseong Lee
>  * Taegeon Um
>  * Youngseok Yang
>  * Brian Cho
>  * Byung-Gon Chun
> 
> 
> # Sponsors
> 
> ## Champions
> Chris Douglas <cd...@apache.org>
> 
> ## Nominated Mentors
> * Chris Mattmann <ma...@apache.org>
> * Ross Gardler <rg...@apache.org>
> * Owen O'Malley <om...@apache.org>
> 
> ## Sponsoring Entity
> The Apache Incubator

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [VOTE] Accept REEF into the Apache Incubator

Posted by Andrew Purtell <ap...@apache.org>.
+1 (binding)


On Fri, Aug 8, 2014 at 10:40 PM, Byung-Gon Chun <bg...@gmail.com> wrote:

> Hi,
>
> Thanks for participating in the proposal discussion on REEF. The discussion
> has calmed. I would like to call a vote for acceptance of REEF into the
> Apache Incubator.
>
> The proposal is attached below, and it is also available at
> https://wiki.apache.org/incubator/ReefProposal
>
> Let's keep this vote open for three business days, closing the voting on
> August 11, 11:59PM (PDT).
>
> [] +1 Accept REEF into the Incubator
> [] 0 Don't care
> [] -1 Don't accept REEF because...
>
> Thanks!
> -Gon
>
> --
> Byung-Gon Chun
>
>
> # REEFProposal - Incubator
>
>
> # Abstract
>
> REEF (Retainable Evaluator Execution Framework) is a scale-out
> computing fabric that eases the development of Big Data applications
> on top of resource managers such as Apache YARN and Mesos.
>
>
> # Proposal
>
> REEF is a Big Data system that makes it easy to implement scalable,
> fault-tolerant runtime environments for a range of data processing
> models (e.g., graph processing and machine learning) on top of
> resource managers such as Apache YARN and Mesos. REEF provides
> capabilities to run multiple heterogeneous frameworks and workflows of
> those efficiently.
>
> Additionally, REEF contains two libraries that are of independent
> value: Wake is an event-based-programming framework inspired by Rx and
> SEDA.  Tang is a dependency injection framework inspired by Google
> Guice, but designed specifically for configuring distributed systems.
>
>
> # Background
>
> The resource management layer such as Apache YARN and Mesos has
> emerged as a critical layer in the new scale-out data processing
> stack; resource managers assume the responsibility of multiplexing a
> cluster of shared-nothing machines across heterogeneous
> applications. They operate behind an interface for leasing containers
> - a slice of a machine’s resources - to computations in an elastic
> fashion. However, building data processing frameworks directly on this
> layer comes at a high cost: each framework must tackle the same
> challenges (e.g., fault-tolerance, task scheduling and coordination)
> and reimplement common mechanisms (e.g., caching, bulk transfers).
>
> REEF provides a reusable control-plane for scheduling and coordinating
> task-level work on cluster resource managers. The REEF design enables
> sophisticated optimizations, such as container re-use and data
> caching, and facilitates workflows that span multiple
> frameworks. Examples include pipelining data between different
> operators in a relational system, retaining state across iterations in
> iterative or recursive data flow, and passing the result of a
> MapReduce job to a Machine Learning computation.
>
>
> # Rationale
>
> Since REEF is a library that makes it easy to write distributed
> applications on top of Apache YARN or Mesos, the Apache Software Foundation
> is the perfect home for hosting REEF.
>
>
> # Current Status
>
> REEF has been developed mostly by Microsoft, UCLA and the Seoul
> National University.  The REEF codebase is open-sourced under Apache
> License 2.0 and is currently hosted in a public repository at
> github.com.
>
>
> # Meritocracy
>
> We plan to build a strong open community by following the Apache
> meritocracy principles. We will work with those who contribute
> significantly to the project and invite them to be its committers.
>
>
> # Community
>
> REEF is currently being used internally at Microsoft.  Also, SK
> Telecom builds their data analytics infrastructure on top of REEF in
> collaboration with Seoul National University.  We hope to extend our
> contributor base by becoming an Apache incubator project. REEF will
> attract developers who are interested in creating common building
> blocks for simplifying the development of large-scale big data
> applications.
>
>
> # Core Developers
>
> Core developers are engineers from Microsoft, Purestorage, UCB, UCLA,
> UW and Seoul National University.
>
>
> # Alignment
>
> REEF depends on many Apache projects and dependencies. REEF is built
> on resource managers such as Apache YARN and Apache Mesos. REEF also
> uses HDFS as a distributed storage layer.
>
>
> # Known Risks
> ## Orphaned Products
>
> The risk of REEF being orphaned is small because Microsoft products
> are built on REEF. The core REEF developers continue to work on REEF
> at Microsoft, UCLA, and Seoul National University. The REEF project is
> gaining interest from other institutions to be used as their
> infrastructure.
>
> ## Inexperience with Open Source
>
> Several core developers have experience with open source development.
> REEF committers will be guided by the mentors with strong Apache open
> source project backgrounds.
>
> ## Homogeneous Developers
>
> The initial committers include developers from several institutions
> including Microsoft, Purestorage, UCB, UCLA, and Seoul National
> University.
>
> ## Reliance on Salaried Developers
>
> Developers from Microsoft are paid to work on REEF. Since the work is
> used internally at Microsoft, Microsoft will keep supporting the
> developers to work on REEF. There are also engineers and graduate
> students that contribute to REEF from UCLA, UCB, UW and Seoul National
> University.  We plan to attract active developers from other
> institutions.
>
> ## Relationships with Other Apache Products
>
> Given REEF's position in the big data stack, there are three
> relationships to consider: Projects that fit below, on top of, or
> alongside REEF in the stack.
>
> ### Below REEF: Mesos and YARN
>
> REEF is designed to facilitate application development on top of
> resource managers.  Hence, its relationship with the aforementioned
> resource managers is symbiotic by design.
>
> ### On Top of REEF
>
> Apache Spark, Giraph, MapReduce and Flink are only some of the
> projects that logically belong at a higher layer of the big data stack
> than REEF.  Of course, none of these today actually are leveraging
> REEF and had to each individually solve some of the issues REEF
> addresses.  It is our goal that REEF will help developers create
> an even richer set of future big data frameworks.
>
> ### Alongside REEF
>
> Apache hosts several projects building intermediate, library layers on
> top of a resource management platform. Twill, Slider, and Tez are
> notable examples in the incubator. These projects share many
> objectives with REEF (and each other).  We expect these parallel
> explorations to converge and differentiate within Apache, as the space
> for distributed applications and deployment is too vast for a single
> answer.
>
> Apache Twill and REEF both aim to simplify application development on
> top of resource managers.  However, REEF and Twill go about this in
> different ways: Twill simplifies programming by exposing a programming
> model, Java Threads.  REEF on the other hand provides a set of common
> building blocks (e.g., job coordination, state passing, cluster
> membership) for building big data processing applications and
> virtualizes underlying resources managers.  None of this prescribes a
> specific programming model.  As such, REEF occupies a slot ever so
> slightly below Twill in an architecture stack.
>
> Apache Slider is a framework to make it easy to deploy and manage
> long-running static applications in a YARN cluster. The focus is to
> adapt existing applications such as HBase and Accumulo to run on YARN
> with little modification. Therefore, the goals of Slider and REEF are
> different.
>
> Apache Tez is a project to develop a generic Directed Acyclic Graph (DAG)
> processing framework with a reusable set of data processing primitives.
> The initial focus is to provide improved data processing capabilities for
> projects like Apache Hive, Apache Pig, and Cascading. Tez is still a single
> framework for DAG processing.  In contrast, REEF provides a generic
> layer on which diverse computation models (DAG, ML, Graph processing,
> and Interactive query processing) can be built.  More importantly,
> REEF provides a layer that facilitates inter-framework resource and
> in-memory state use and virtualizes resource managers. Regarding
> re-usable data processing primitives, Tez and REEF share the same
> goal.  We hope to collaborate on features which can be shared between
> Tez and REEF.
>
> Apache Helix automates application-wide management operations which require
> global knowledge and coordination, such as repartitioning of resources and
> scheduling of maintenance tasks. Helix separates global coordination
> concerns from the functional tasks of the application with a state machine
> abstraction. REEF's generic layer makes it easy to program the functional
> and management tasks, which may span small or large groups within the
> application. Helix can work hand-in-hand with REEF, by providing the global
> management component for REEF applications.
>
> ## An Excessive Fascination with the Apache Brand
>
> The Apache Software Foundation has a reputation of being the best place to
> host open source projects. We believe that we will attract many developers
> who want to contribute to innovating in the Big Data platform space by
> joining the Apache Software Foundation.
>
>
> # Documentation
>
> The current documentation for REEF is at
> https://github.com/Microsoft-CISL/REEF as well as on
> http://www.reef-project.org
>
>
> # Initial Source
>
> The REEF codebase is currently hosted at
> https://github.com/Microsoft-CISL/REEF.
>
>
> # External Dependencies
>
> REEF makes extensive use of the vast array of Java libraries from the
> Apache Software Foundation, namely:
>
>  * avro (Apache 2.0)
>  * hadoop (Apache 2.0)
>  * hdfs (Apache 2.0)
>  * yarn (Apache 2.0)
>  * commons-cli (Apache 2.0)
>  * commons-configuration (Apache 2.0)
>  * commons-lang (Apache 2.0)
>  * commons-logging (Apache 2.0)
>
> To the best of our knowledge, the external dependencies of REEF are
> distributed under Apache compatible licenses:
>
>  * guava-libraries (Apache 2.0)
>  * protobuf (BSD)
>  * asm (BSD)
>  * netty (Apache 2.0)
>  * mockito (MIT)
>  * junit (EPL 1.0)
>  * slf4j (MIT)
>
>
> # Cryptography
>
> REEF will depend on secure Hadoop, which can optionally use Kerberos.
>
> # Required Resources
>
> ## Mailing Lists
>
>   * reef-private for private PMC discussions
>   * reef-dev for technical discussions among contributors and
>                  notification about commits
>
> ## Subversion Directory
>
> The REEF team uses Git for source version control:
> git://git.apache.org/reef
>
> ## Issue Tracking
>
> JIRA REEF (REEF)
>
> ## Other Resources
>
> Jenkins continuous integration testing
>
> # Initial Committers
>
>  * Markus Weimer
>  * Sergiy Matusevych
>  * Julia Wang
>  * Shravan M Narayanamurthy
>  * Yingda Chen
>  * Tony Majestro
>  * Beysim Sezgin
>  * Boris Shulman
>  * Russell Sears
>  * Jung Ryong Lee
>  * You Sun Jung
>  * Dong Joon Hyun
>  * Josh Rosen
>  * Tyson Condie
>  * Brandon Myers
>  * Yunseong Lee
>  * Taegeon Um
>  * Youngseok Yang
>  * Brian Cho
>  * Byung-Gon Chun
>
> # Affiliations
>
>  * Microsoft:
>   * Markus Weimer
>   * Sergiy Matusevych
>   * Julia Wang
>   * Shravan M Narayanamurthy
>   * Yingda Chen
>   * Tony Majestro
>   * Beysim Sezgin
>   * Boris Shulman
>  * Purestorage:
>   * Russell Sears
>  * SK Telecom:
>   * Jung Ryong Lee
>   * You Sun Jung
>   * Dong Joon Hyun
>  * University of California:
>   * Josh Rosen (Berkeley)
>   * Tyson Condie (LA)
>  * University of Washington:
>   * Brandon Myers
>  * Seoul National University:
>   * Yunseong Lee
>   * Taegeon Um
>   * Youngseok Yang
>   * Brian Cho
>   * Byung-Gon Chun
>
>
> # Sponsors
>
> ## Champions
> Chris Douglas <cd...@apache.org>
>
> ## Nominated Mentors
>  * Chris Mattmann <ma...@apache.org>
>  * Ross Gardler <rg...@apache.org>
>  * Owen O'Malley <om...@apache.org>
>
> ## Sponsoring Entity
> The Apache Incubator
>



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: [VOTE] Accept REEF into the Apache Incubator

Posted by Owen O'Malley <om...@apache.org>.
+1 (binding)


On Mon, Aug 11, 2014 at 6:20 PM, Hitesh Shah <hi...@apache.org> wrote:

> +1 ( non-binding )
>
> — Hitesh
>
> On Aug 8, 2014, at 10:40 PM, Byung-Gon Chun <bg...@gmail.com> wrote:
>
> > Hi,
> >
> > Thanks for participating in the proposal discussion on REEF. The
> discussion
> > has calmed. I would like to call a vote for acceptance of REEF into the
> > Apache Incubator.
> >
> > The proposal is attached below, and it is also available at
> > https://wiki.apache.org/incubator/ReefProposal
> >
> > Let's keep this vote open for three business days, closing the voting on
> > August 11, 11:59PM (PDT).
> >
> > [] +1 Accept REEF into the Incubator
> > [] 0 Don't care
> > [] -1 Don't accept REEF because...
> >
> > Thanks!
> > -Gon
> >
> > --
> > Byung-Gon Chun
> >
> >
> > # REEFProposal - Incubator
> >
> >
> > # Abstract
> >
> > REEF (Retainable Evaluator Execution Framework) is a scale-out
> > computing fabric that eases the development of Big Data applications
> > on top of resource managers such as Apache YARN and Mesos.
> >
> >
> > # Proposal
> >
> > REEF is a Big Data system that makes it easy to implement scalable,
> > fault-tolerant runtime environments for a range of data processing
> > models (e.g., graph processing and machine learning) on top of
> > resource managers such as Apache YARN and Mesos. REEF provides
> > capabilities to run multiple heterogeneous frameworks and workflows of
> > those efficiently.
> >
> > Additionally, REEF contains two libraries that are of independent
> > value: Wake is an event-based-programming framework inspired by Rx and
> > SEDA.  Tang is a dependency injection framework inspired by Google
> > Guice, but designed specifically for configuring distributed systems.
> >
> >
> > # Background
> >
> > The resource management layer such as Apache YARN and Mesos has
> > emerged as a critical layer in the new scale-out data processing
> > stack; resource managers assume the responsibility of multiplexing a
> > cluster of shared-nothing machines across heterogeneous
> > applications. They operate behind an interface for leasing containers
> > - a slice of a machine’s resources - to computations in an elastic
> > fashion. However, building data processing frameworks directly on this
> > layer comes at a high cost: each framework must tackle the same
> > challenges (e.g., fault-tolerance, task scheduling and coordination)
> > and reimplement common mechanisms (e.g., caching, bulk transfers).
> >
> > REEF provides a reusable control-plane for scheduling and coordinating
> > task-level work on cluster resource managers. The REEF design enables
> > sophisticated optimizations, such as container re-use and data
> > caching, and facilitates workflows that span multiple
> > frameworks. Examples include pipelining data between different
> > operators in a relational system, retaining state across iterations in
> > iterative or recursive data flow, and passing the result of a
> > MapReduce job to a Machine Learning computation.
> >
> >
> > # Rationale
> >
> > Since REEF is a library that makes it easy to write distributed
> > applications on top of Apache YARN or Mesos, the Apache Software
> Foundation
> > is the perfect home for hosting REEF.
> >
> >
> > # Current Status
> >
> > REEF has been developed mostly by Microsoft, UCLA and the Seoul
> > National University.  The REEF codebase is open-sourced under Apache
> > License 2.0 and is currently hosted in a public repository at
> > github.com.
> >
> >
> > # Meritocracy
> >
> > We plan to build a strong open community by following the Apache
> > meritocracy principles. We will work with those who contribute
> > significantly to the project and invite them to be its committers.
> >
> >
> > # Community
> >
> > REEF is currently being used internally at Microsoft.  Also, SK
> > Telecom builds their data analytics infrastructure on top of REEF in
> > collaboration with Seoul National University.  We hope to extend our
> > contributor base by becoming an Apache incubator project. REEF will
> > attract developers who are interested in creating common building
> > blocks for simplifying the development of large-scale big data
> > applications.
> >
> >
> > # Core Developers
> >
> > Core developers are engineers from Microsoft, Purestorage, UCB, UCLA,
> > UW and Seoul National University.
> >
> >
> > # Alignment
> >
> > REEF depends on many Apache projects and dependencies. REEF is built
> > on resource managers such as Apache YARN and Apache Mesos. REEF also
> > uses HDFS as a distributed storage layer.
> >
> >
> > # Known Risks
> > ## Orphaned Products
> >
> > The risk of REEF being orphaned is small because Microsoft products
> > are built on REEF. The core REEF developers continue to work on REEF
> > at Microsoft, UCLA, and Seoul National University. The REEF project is
> > gaining interest from other institutions to be used as their
> > infrastructure.
> >
> > ## Inexperience with Open Source
> >
> > Several core developers have experience with open source development.
> > REEF committers will be guided by the mentors with strong Apache open
> > source project backgrounds.
> >
> > ## Homogeneous Developers
> >
> > The initial committers include developers from several institutions
> > including Microsoft, Purestorage, UCB, UCLA, and Seoul National
> > University.
> >
> > ## Reliance on Salaried Developers
> >
> > Developers from Microsoft are paid to work on REEF. Since the work is
> > used internally at Microsoft, Microsoft will keep supporting the
> > developers to work on REEF. There are also engineers and graduate
> > students that contribute to REEF from UCLA, UCB, UW and Seoul National
> > University.  We plan to attract active developers from other
> > institutions.
> >
> > ## Relationships with Other Apache Products
> >
> > Given REEF's position in the big data stack, there are three
> > relationships to consider: Projects that fit below, on top of, or
> > alongside REEF in the stack.
> >
> > ### Below REEF: Mesos and YARN
> >
> > REEF is designed to facilitate application development on top of
> > resource managers.  Hence, its relationship with the aforementioned
> > resource managers is symbiotic by design.
> >
> > ### On Top of REEF
> >
> > Apache Spark, Giraph, MapReduce and Flink are only some of the
> > projects that logically belong at a higher layer of the big data stack
> > than REEF.  Of course, none of these today actually are leveraging
> > REEF and had to each individually solve some of the issues REEF
> > addresses.  It is our goal that REEF will help developers create
> > an even richer set of future big data frameworks.
> >
> > ### Alongside REEF
> >
> > Apache hosts several projects building intermediate, library layers on
> > top of a resource management platform. Twill, Slider, and Tez are
> > notable examples in the incubator. These projects share many
> > objectives with REEF (and each other).  We expect these parallel
> > explorations to converge and differentiate within Apache, as the space
> > for distributed applications and deployment is too vast for a single
> > answer.
> >
> > Apache Twill and REEF both aim to simplify application development on
> > top of resource managers.  However, REEF and Twill go about this in
> > different ways: Twill simplifies programming by exposing a programming
> > model, Java Threads.  REEF on the other hand provides a set of common
> > building blocks (e.g., job coordination, state passing, cluster
> > membership) for building big data processing applications and
> > virtualizes underlying resources managers.  None of this prescribes a
> > specific programming model.  As such, REEF occupies a slot ever so
> > slightly below Twill in an architecture stack.
> >
> > Apache Slider is a framework to make it easy to deploy and manage
> > long-running static applications in a YARN cluster. The focus is to
> > adapt existing applications such as HBase and Accumulo to run on YARN
> > with little modification. Therefore, the goals of Slider and REEF are
> > different.
> >
> > Apache Tez is a project to develop a generic Directed Acyclic Graph (DAG)
> > processing framework with a reusable set of data processing primitives.
> > The initial focus is to provide improved data processing capabilities for
> > projects like Apache Hive, Apache Pig, and Cascading. Tez is still a
> single
> > framework for DAG processing.  In contrast, REEF provides a generic
> > layer on which diverse computation models (DAG, ML, Graph processing,
> > and Interactive query processing) can be built.  More importantly,
> > REEF provides a layer that facilitates inter-framework resource and
> > in-memory state use and virtualizes resource managers. Regarding
> > re-usable data processing primitives, Tez and REEF share the same
> > goal.  We hope to collaborate on features which can be shared between
> > Tez and REEF.
> >
> > Apache Helix automates application-wide management operations which
> require
> > global knowledge and coordination, such as repartitioning of resources
> and
> > scheduling of maintenance tasks. Helix separates global coordination
> > concerns from the functional tasks of the application with a state
> machine
> > abstraction. REEF's generic layer makes it easy to program the functional
> > and management tasks, which may span small or large groups within the
> > application. Helix can work hand-in-hand with REEF, by providing the
> global
> > management component for REEF applications.
> >
> > ## An Excessive Fascination with the Apache Brand
> >
> > The Apache Software Foundation has a reputation of being the best place
> to
> > host open source projects. We believe that we will attract many
> developers
> > who want to contribute to innovating in the Big Data platform space by
> > joining the Apache Software Foundation.
> >
> >
> > # Documentation
> >
> > The current documentation for REEF is at
> > https://github.com/Microsoft-CISL/REEF as well as on
> > http://www.reef-project.org
> >
> >
> > # Initial Source
> >
> > The REEF codebase is currently hosted at
> > https://github.com/Microsoft-CISL/REEF.
> >
> >
> > # External Dependencies
> >
> > REEF makes extensive use of the vast array of Java libraries from the
> > Apache Software Foundation, namely:
> >
> > * avro (Apache 2.0)
> > * hadoop (Apache 2.0)
> > * hdfs (Apache 2.0)
> > * yarn (Apache 2.0)
> > * commons-cli (Apache 2.0)
> > * commons-configuration (Apache 2.0)
> > * commons-lang (Apache 2.0)
> > * commons-logging (Apache 2.0)
> >
> > To the best of our knowledge, the external dependencies of REEF are
> > distributed under Apache compatible licenses:
> >
> > * guava-libraries (Apache 2.0)
> > * protobuf (BSD)
> > * asm (BSD)
> > * netty (Apache 2.0)
> > * mockito (MIT)
> > * junit (EPL 1.0)
> > * slf4j (MIT)
> >
> >
> > # Cryptography
> >
> > REEF will depend on secure Hadoop, which can optionally use Kerberos.
> >
> > # Required Resources
> >
> > ## Mailing Lists
> >
> >  * reef-private for private PMC discussions
> >  * reef-dev for technical discussions among contributors and
> >                 notification about commits
> >
> > ## Subversion Directory
> >
> > The REEF team uses Git for source version control:
> > git://git.apache.org/reef
> >
> > ## Issue Tracking
> >
> > JIRA REEF (REEF)
> >
> > ## Other Resources
> >
> > Jenkins continuous integration testing
> >
> > # Initial Committers
> >
> > * Markus Weimer
> > * Sergiy Matusevych
> > * Julia Wang
> > * Shravan M Narayanamurthy
> > * Yingda Chen
> > * Tony Majestro
> > * Beysim Sezgin
> > * Boris Shulman
> > * Russell Sears
> > * Jung Ryong Lee
> > * You Sun Jung
> > * Dong Joon Hyun
> > * Josh Rosen
> > * Tyson Condie
> > * Brandon Myers
> > * Yunseong Lee
> > * Taegeon Um
> > * Youngseok Yang
> > * Brian Cho
> > * Byung-Gon Chun
> >
> > # Affiliations
> >
> > * Microsoft:
> >  * Markus Weimer
> >  * Sergiy Matusevych
> >  * Julia Wang
> >  * Shravan M Narayanamurthy
> >  * Yingda Chen
> >  * Tony Majestro
> >  * Beysim Sezgin
> >  * Boris Shulman
> > * Purestorage:
> >  * Russell Sears
> > * SK Telecom:
> >  * Jung Ryong Lee
> >  * You Sun Jung
> >  * Dong Joon Hyun
> > * University of California:
> >  * Josh Rosen (Berkeley)
> >  * Tyson Condie (LA)
> > * University of Washington:
> >  * Brandon Myers
> > * Seoul National University:
> >  * Yunseong Lee
> >  * Taegeon Um
> >  * Youngseok Yang
> >  * Brian Cho
> >  * Byung-Gon Chun
> >
> >
> > # Sponsors
> >
> > ## Champions
> > Chris Douglas <cd...@apache.org>
> >
> > ## Nominated Mentors
> > * Chris Mattmann <ma...@apache.org>
> > * Ross Gardler <rg...@apache.org>
> > * Owen O'Malley <om...@apache.org>
> >
> > ## Sponsoring Entity
> > The Apache Incubator
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>
>

Re: [VOTE] Accept REEF into the Apache Incubator

Posted by Hitesh Shah <hi...@apache.org>.
+1 ( non-binding )

— Hitesh 

On Aug 8, 2014, at 10:40 PM, Byung-Gon Chun <bg...@gmail.com> wrote:

> Hi,
> 
> Thanks for participating in the proposal discussion on REEF. The discussion
> has calmed. I would like to call a vote for acceptance of REEF into the
> Apache Incubator.
> 
> The proposal is attached below, and it is also available at
> https://wiki.apache.org/incubator/ReefProposal
> 
> Let's keep this vote open for three business days, closing the voting on
> August 11, 11:59PM (PDT).
> 
> [] +1 Accept REEF into the Incubator
> [] 0 Don't care
> [] -1 Don't accept REEF because...
> 
> Thanks!
> -Gon
> 
> -- 
> Byung-Gon Chun
> 
> 
> # REEFProposal - Incubator
> 
> 
> # Abstract
> 
> REEF (Retainable Evaluator Execution Framework) is a scale-out
> computing fabric that eases the development of Big Data applications
> on top of resource managers such as Apache YARN and Mesos.
> 
> 
> # Proposal
> 
> REEF is a Big Data system that makes it easy to implement scalable,
> fault-tolerant runtime environments for a range of data processing
> models (e.g., graph processing and machine learning) on top of
> resource managers such as Apache YARN and Mesos. REEF provides
> capabilities to run multiple heterogeneous frameworks and workflows of
> those efficiently.
> 
> Additionally, REEF contains two libraries that are of independent
> value: Wake is an event-based-programming framework inspired by Rx and
> SEDA.  Tang is a dependency injection framework inspired by Google
> Guice, but designed specifically for configuring distributed systems.
> 
> 
> # Background
> 
> The resource management layer such as Apache YARN and Mesos has
> emerged as a critical layer in the new scale-out data processing
> stack; resource managers assume the responsibility of multiplexing a
> cluster of shared-nothing machines across heterogeneous
> applications. They operate behind an interface for leasing containers
> - a slice of a machine’s resources - to computations in an elastic
> fashion. However, building data processing frameworks directly on this
> layer comes at a high cost: each framework must tackle the same
> challenges (e.g., fault-tolerance, task scheduling and coordination)
> and reimplement common mechanisms (e.g., caching, bulk transfers).
> 
> REEF provides a reusable control-plane for scheduling and coordinating
> task-level work on cluster resource managers. The REEF design enables
> sophisticated optimizations, such as container re-use and data
> caching, and facilitates workflows that span multiple
> frameworks. Examples include pipelining data between different
> operators in a relational system, retaining state across iterations in
> iterative or recursive data flow, and passing the result of a
> MapReduce job to a Machine Learning computation.
> 
> 
> # Rationale
> 
> Since REEF is a library that makes it easy to write distributed
> applications on top of Apache YARN or Mesos, the Apache Software Foundation
> is the perfect home for hosting REEF.
> 
> 
> # Current Status
> 
> REEF has been developed mostly by Microsoft, UCLA and the Seoul
> National University.  The REEF codebase is open-sourced under Apache
> License 2.0 and is currently hosted in a public repository at
> github.com.
> 
> 
> # Meritocracy
> 
> We plan to build a strong open community by following the Apache
> meritocracy principles. We will work with those who contribute
> significantly to the project and invite them to be its committers.
> 
> 
> # Community
> 
> REEF is currently being used internally at Microsoft.  Also, SK
> Telecom builds their data analytics infrastructure on top of REEF in
> collaboration with Seoul National University.  We hope to extend our
> contributor base by becoming an Apache incubator project. REEF will
> attract developers who are interested in creating common building
> blocks for simplifying the development of large-scale big data
> applications.
> 
> 
> # Core Developers
> 
> Core developers are engineers from Microsoft, Purestorage, UCB, UCLA,
> UW and Seoul National University.
> 
> 
> # Alignment
> 
> REEF depends on many Apache projects and dependencies. REEF is built
> on resource managers such as Apache YARN and Apache Mesos. REEF also
> uses HDFS as a distributed storage layer.
> 
> 
> # Known Risks
> ## Orphaned Products
> 
> The risk of REEF being orphaned is small because Microsoft products
> are built on REEF. The core REEF developers continue to work on REEF
> at Microsoft, UCLA, and Seoul National University. The REEF project is
> gaining interest from other institutions to be used as their
> infrastructure.
> 
> ## Inexperience with Open Source
> 
> Several core developers have experience with open source development.
> REEF committers will be guided by the mentors with strong Apache open
> source project backgrounds.
> 
> ## Homogeneous Developers
> 
> The initial committers include developers from several institutions
> including Microsoft, Purestorage, UCB, UCLA, and Seoul National
> University.
> 
> ## Reliance on Salaried Developers
> 
> Developers from Microsoft are paid to work on REEF. Since the work is
> used internally at Microsoft, Microsoft will keep supporting the
> developers to work on REEF. There are also engineers and graduate
> students that contribute to REEF from UCLA, UCB, UW and Seoul National
> University.  We plan to attract active developers from other
> institutions.
> 
> ## Relationships with Other Apache Products
> 
> Given REEF's position in the big data stack, there are three
> relationships to consider: Projects that fit below, on top of, or
> alongside REEF in the stack.
> 
> ### Below REEF: Mesos and YARN
> 
> REEF is designed to facilitate application development on top of
> resource managers.  Hence, its relationship with the aforementioned
> resource managers is symbiotic by design.
> 
> ### On Top of REEF
> 
> Apache Spark, Giraph, MapReduce and Flink are only some of the
> projects that logically belong at a higher layer of the big data stack
> than REEF.  Of course, none of these today actually are leveraging
> REEF and had to each individually solve some of the issues REEF
> addresses.  It is our goal that REEF will help developers create
> an even richer set of future big data frameworks.
> 
> ### Alongside REEF
> 
> Apache hosts several projects building intermediate, library layers on
> top of a resource management platform. Twill, Slider, and Tez are
> notable examples in the incubator. These projects share many
> objectives with REEF (and each other).  We expect these parallel
> explorations to converge and differentiate within Apache, as the space
> for distributed applications and deployment is too vast for a single
> answer.
> 
> Apache Twill and REEF both aim to simplify application development on
> top of resource managers.  However, REEF and Twill go about this in
> different ways: Twill simplifies programming by exposing a programming
> model, Java Threads.  REEF on the other hand provides a set of common
> building blocks (e.g., job coordination, state passing, cluster
> membership) for building big data processing applications and
> virtualizes underlying resources managers.  None of this prescribes a
> specific programming model.  As such, REEF occupies a slot ever so
> slightly below Twill in an architecture stack.
> 
> Apache Slider is a framework to make it easy to deploy and manage
> long-running static applications in a YARN cluster. The focus is to
> adapt existing applications such as HBase and Accumulo to run on YARN
> with little modification. Therefore, the goals of Slider and REEF are
> different.
> 
> Apache Tez is a project to develop a generic Directed Acyclic Graph (DAG)
> processing framework with a reusable set of data processing primitives.
> The initial focus is to provide improved data processing capabilities for
> projects like Apache Hive, Apache Pig, and Cascading. Tez is still a single
> framework for DAG processing.  In contrast, REEF provides a generic
> layer on which diverse computation models (DAG, ML, Graph processing,
> and Interactive query processing) can be built.  More importantly,
> REEF provides a layer that facilitates inter-framework resource and
> in-memory state use and virtualizes resource managers. Regarding
> re-usable data processing primitives, Tez and REEF share the same
> goal.  We hope to collaborate on features which can be shared between
> Tez and REEF.
> 
> Apache Helix automates application-wide management operations which require
> global knowledge and coordination, such as repartitioning of resources and
> scheduling of maintenance tasks. Helix separates global coordination
> concerns from the functional tasks of the application with a state machine
> abstraction. REEF's generic layer makes it easy to program the functional
> and management tasks, which may span small or large groups within the
> application. Helix can work hand-in-hand with REEF, by providing the global
> management component for REEF applications.
> 
> ## An Excessive Fascination with the Apache Brand
> 
> The Apache Software Foundation has a reputation of being the best place to
> host open source projects. We believe that we will attract many developers
> who want to contribute to innovating in the Big Data platform space by
> joining the Apache Software Foundation.
> 
> 
> # Documentation
> 
> The current documentation for REEF is at
> https://github.com/Microsoft-CISL/REEF as well as on
> http://www.reef-project.org
> 
> 
> # Initial Source
> 
> The REEF codebase is currently hosted at
> https://github.com/Microsoft-CISL/REEF.
> 
> 
> # External Dependencies
> 
> REEF makes extensive use of the vast array of Java libraries from the
> Apache Software Foundation, namely:
> 
> * avro (Apache 2.0)
> * hadoop (Apache 2.0)
> * hdfs (Apache 2.0)
> * yarn (Apache 2.0)
> * commons-cli (Apache 2.0)
> * commons-configuration (Apache 2.0)
> * commons-lang (Apache 2.0)
> * commons-logging (Apache 2.0)
> 
> To the best of our knowledge, the external dependencies of REEF are
> distributed under Apache compatible licenses:
> 
> * guava-libraries (Apache 2.0)
> * protobuf (BSD)
> * asm (BSD)
> * netty (Apache 2.0)
> * mockito (MIT)
> * junit (EPL 1.0)
> * slf4j (MIT)
> 
> 
> # Cryptography
> 
> REEF will depend on secure Hadoop, which can optionally use Kerberos.
> 
> # Required Resources
> 
> ## Mailing Lists
> 
>  * reef-private for private PMC discussions
>  * reef-dev for technical discussions among contributors and
>                 notification about commits
> 
> ## Subversion Directory
> 
> The REEF team uses Git for source version control:
> git://git.apache.org/reef
> 
> ## Issue Tracking
> 
> JIRA REEF (REEF)
> 
> ## Other Resources
> 
> Jenkins continuous integration testing
> 
> # Initial Committers
> 
> * Markus Weimer
> * Sergiy Matusevych
> * Julia Wang
> * Shravan M Narayanamurthy
> * Yingda Chen
> * Tony Majestro
> * Beysim Sezgin
> * Boris Shulman
> * Russell Sears
> * Jung Ryong Lee
> * You Sun Jung
> * Dong Joon Hyun
> * Josh Rosen
> * Tyson Condie
> * Brandon Myers
> * Yunseong Lee
> * Taegeon Um
> * Youngseok Yang
> * Brian Cho
> * Byung-Gon Chun
> 
> # Affiliations
> 
> * Microsoft:
>  * Markus Weimer
>  * Sergiy Matusevych
>  * Julia Wang
>  * Shravan M Narayanamurthy
>  * Yingda Chen
>  * Tony Majestro
>  * Beysim Sezgin
>  * Boris Shulman
> * Purestorage:
>  * Russell Sears
> * SK Telecom:
>  * Jung Ryong Lee
>  * You Sun Jung
>  * Dong Joon Hyun
> * University of California:
>  * Josh Rosen (Berkeley)
>  * Tyson Condie (LA)
> * University of Washington:
>  * Brandon Myers
> * Seoul National University:
>  * Yunseong Lee
>  * Taegeon Um
>  * Youngseok Yang
>  * Brian Cho
>  * Byung-Gon Chun
> 
> 
> # Sponsors
> 
> ## Champions
> Chris Douglas <cd...@apache.org>
> 
> ## Nominated Mentors
> * Chris Mattmann <ma...@apache.org>
> * Ross Gardler <rg...@apache.org>
> * Owen O'Malley <om...@apache.org>
> 
> ## Sponsoring Entity
> The Apache Incubator


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [VOTE] Accept REEF into the Apache Incubator

Posted by jan i <ja...@apache.org>.
On Aug 12, 2014 7:26 PM, "Suresh Srinivas" <su...@hortonworks.com> wrote:
>
> +1 (binding)
+1

>
>
> On Fri, Aug 8, 2014 at 10:40 PM, Byung-Gon Chun <bg...@gmail.com> wrote:
>
> > Hi,
> >
> > Thanks for participating in the proposal discussion on REEF. The
discussion
> > has calmed. I would like to call a vote for acceptance of REEF into the
> > Apache Incubator.
> >
> > The proposal is attached below, and it is also available at
> > https://wiki.apache.org/incubator/ReefProposal
> >
> > Let's keep this vote open for three business days, closing the voting on
> > August 11, 11:59PM (PDT).
> >
> > [] +1 Accept REEF into the Incubator
> > [] 0 Don't care
> > [] -1 Don't accept REEF because...
> >
> > Thanks!
> > -Gon
> >
> > --
> > Byung-Gon Chun
> >
> >
> > # REEFProposal - Incubator
> >
> >
> > # Abstract
> >
> > REEF (Retainable Evaluator Execution Framework) is a scale-out
> > computing fabric that eases the development of Big Data applications
> > on top of resource managers such as Apache YARN and Mesos.
> >
> >
> > # Proposal
> >
> > REEF is a Big Data system that makes it easy to implement scalable,
> > fault-tolerant runtime environments for a range of data processing
> > models (e.g., graph processing and machine learning) on top of
> > resource managers such as Apache YARN and Mesos. REEF provides
> > capabilities to run multiple heterogeneous frameworks and workflows of
> > those efficiently.
> >
> > Additionally, REEF contains two libraries that are of independent
> > value: Wake is an event-based-programming framework inspired by Rx and
> > SEDA.  Tang is a dependency injection framework inspired by Google
> > Guice, but designed specifically for configuring distributed systems.
> >
> >
> > # Background
> >
> > The resource management layer such as Apache YARN and Mesos has
> > emerged as a critical layer in the new scale-out data processing
> > stack; resource managers assume the responsibility of multiplexing a
> > cluster of shared-nothing machines across heterogeneous
> > applications. They operate behind an interface for leasing containers
> > - a slice of a machine’s resources - to computations in an elastic
> > fashion. However, building data processing frameworks directly on this
> > layer comes at a high cost: each framework must tackle the same
> > challenges (e.g., fault-tolerance, task scheduling and coordination)
> > and reimplement common mechanisms (e.g., caching, bulk transfers).
> >
> > REEF provides a reusable control-plane for scheduling and coordinating
> > task-level work on cluster resource managers. The REEF design enables
> > sophisticated optimizations, such as container re-use and data
> > caching, and facilitates workflows that span multiple
> > frameworks. Examples include pipelining data between different
> > operators in a relational system, retaining state across iterations in
> > iterative or recursive data flow, and passing the result of a
> > MapReduce job to a Machine Learning computation.
> >
> >
> > # Rationale
> >
> > Since REEF is a library that makes it easy to write distributed
> > applications on top of Apache YARN or Mesos, the Apache Software
Foundation
> > is the perfect home for hosting REEF.
> >
> >
> > # Current Status
> >
> > REEF has been developed mostly by Microsoft, UCLA and the Seoul
> > National University.  The REEF codebase is open-sourced under Apache
> > License 2.0 and is currently hosted in a public repository at
> > github.com.
> >
> >
> > # Meritocracy
> >
> > We plan to build a strong open community by following the Apache
> > meritocracy principles. We will work with those who contribute
> > significantly to the project and invite them to be its committers.
> >
> >
> > # Community
> >
> > REEF is currently being used internally at Microsoft.  Also, SK
> > Telecom builds their data analytics infrastructure on top of REEF in
> > collaboration with Seoul National University.  We hope to extend our
> > contributor base by becoming an Apache incubator project. REEF will
> > attract developers who are interested in creating common building
> > blocks for simplifying the development of large-scale big data
> > applications.
> >
> >
> > # Core Developers
> >
> > Core developers are engineers from Microsoft, Purestorage, UCB, UCLA,
> > UW and Seoul National University.
> >
> >
> > # Alignment
> >
> > REEF depends on many Apache projects and dependencies. REEF is built
> > on resource managers such as Apache YARN and Apache Mesos. REEF also
> > uses HDFS as a distributed storage layer.
> >
> >
> > # Known Risks
> > ## Orphaned Products
> >
> > The risk of REEF being orphaned is small because Microsoft products
> > are built on REEF. The core REEF developers continue to work on REEF
> > at Microsoft, UCLA, and Seoul National University. The REEF project is
> > gaining interest from other institutions to be used as their
> > infrastructure.
> >
> > ## Inexperience with Open Source
> >
> > Several core developers have experience with open source development.
> > REEF committers will be guided by the mentors with strong Apache open
> > source project backgrounds.
> >
> > ## Homogeneous Developers
> >
> > The initial committers include developers from several institutions
> > including Microsoft, Purestorage, UCB, UCLA, and Seoul National
> > University.
> >
> > ## Reliance on Salaried Developers
> >
> > Developers from Microsoft are paid to work on REEF. Since the work is
> > used internally at Microsoft, Microsoft will keep supporting the
> > developers to work on REEF. There are also engineers and graduate
> > students that contribute to REEF from UCLA, UCB, UW and Seoul National
> > University.  We plan to attract active developers from other
> > institutions.
> >
> > ## Relationships with Other Apache Products
> >
> > Given REEF's position in the big data stack, there are three
> > relationships to consider: Projects that fit below, on top of, or
> > alongside REEF in the stack.
> >
> > ### Below REEF: Mesos and YARN
> >
> > REEF is designed to facilitate application development on top of
> > resource managers.  Hence, its relationship with the aforementioned
> > resource managers is symbiotic by design.
> >
> > ### On Top of REEF
> >
> > Apache Spark, Giraph, MapReduce and Flink are only some of the
> > projects that logically belong at a higher layer of the big data stack
> > than REEF.  Of course, none of these today actually are leveraging
> > REEF and had to each individually solve some of the issues REEF
> > addresses.  It is our goal that REEF will help developers create
> > an even richer set of future big data frameworks.
> >
> > ### Alongside REEF
> >
> > Apache hosts several projects building intermediate, library layers on
> > top of a resource management platform. Twill, Slider, and Tez are
> > notable examples in the incubator. These projects share many
> > objectives with REEF (and each other).  We expect these parallel
> > explorations to converge and differentiate within Apache, as the space
> > for distributed applications and deployment is too vast for a single
> > answer.
> >
> > Apache Twill and REEF both aim to simplify application development on
> > top of resource managers.  However, REEF and Twill go about this in
> > different ways: Twill simplifies programming by exposing a programming
> > model, Java Threads.  REEF on the other hand provides a set of common
> > building blocks (e.g., job coordination, state passing, cluster
> > membership) for building big data processing applications and
> > virtualizes underlying resources managers.  None of this prescribes a
> > specific programming model.  As such, REEF occupies a slot ever so
> > slightly below Twill in an architecture stack.
> >
> > Apache Slider is a framework to make it easy to deploy and manage
> > long-running static applications in a YARN cluster. The focus is to
> > adapt existing applications such as HBase and Accumulo to run on YARN
> > with little modification. Therefore, the goals of Slider and REEF are
> > different.
> >
> > Apache Tez is a project to develop a generic Directed Acyclic Graph
(DAG)
> > processing framework with a reusable set of data processing primitives.
> > The initial focus is to provide improved data processing capabilities
for
> > projects like Apache Hive, Apache Pig, and Cascading. Tez is still a
single
> > framework for DAG processing.  In contrast, REEF provides a generic
> > layer on which diverse computation models (DAG, ML, Graph processing,
> > and Interactive query processing) can be built.  More importantly,
> > REEF provides a layer that facilitates inter-framework resource and
> > in-memory state use and virtualizes resource managers. Regarding
> > re-usable data processing primitives, Tez and REEF share the same
> > goal.  We hope to collaborate on features which can be shared between
> > Tez and REEF.
> >
> > Apache Helix automates application-wide management operations which
require
> > global knowledge and coordination, such as repartitioning of resources
and
> > scheduling of maintenance tasks. Helix separates global coordination
> > concerns from the functional tasks of the application with a state
machine
> > abstraction. REEF's generic layer makes it easy to program the
functional
> > and management tasks, which may span small or large groups within the
> > application. Helix can work hand-in-hand with REEF, by providing the
global
> > management component for REEF applications.
> >
> > ## An Excessive Fascination with the Apache Brand
> >
> > The Apache Software Foundation has a reputation of being the best place
to
> > host open source projects. We believe that we will attract many
developers
> > who want to contribute to innovating in the Big Data platform space by
> > joining the Apache Software Foundation.
> >
> >
> > # Documentation
> >
> > The current documentation for REEF is at
> > https://github.com/Microsoft-CISL/REEF as well as on
> > http://www.reef-project.org
> >
> >
> > # Initial Source
> >
> > The REEF codebase is currently hosted at
> > https://github.com/Microsoft-CISL/REEF.
> >
> >
> > # External Dependencies
> >
> > REEF makes extensive use of the vast array of Java libraries from the
> > Apache Software Foundation, namely:
> >
> >  * avro (Apache 2.0)
> >  * hadoop (Apache 2.0)
> >  * hdfs (Apache 2.0)
> >  * yarn (Apache 2.0)
> >  * commons-cli (Apache 2.0)
> >  * commons-configuration (Apache 2.0)
> >  * commons-lang (Apache 2.0)
> >  * commons-logging (Apache 2.0)
> >
> > To the best of our knowledge, the external dependencies of REEF are
> > distributed under Apache compatible licenses:
> >
> >  * guava-libraries (Apache 2.0)
> >  * protobuf (BSD)
> >  * asm (BSD)
> >  * netty (Apache 2.0)
> >  * mockito (MIT)
> >  * junit (EPL 1.0)
> >  * slf4j (MIT)
> >
> >
> > # Cryptography
> >
> > REEF will depend on secure Hadoop, which can optionally use Kerberos.
> >
> > # Required Resources
> >
> > ## Mailing Lists
> >
> >   * reef-private for private PMC discussions
> >   * reef-dev for technical discussions among contributors and
> >                  notification about commits
> >
> > ## Subversion Directory
> >
> > The REEF team uses Git for source version control:
> > git://git.apache.org/reef
> >
> > ## Issue Tracking
> >
> > JIRA REEF (REEF)
> >
> > ## Other Resources
> >
> > Jenkins continuous integration testing
> >
> > # Initial Committers
> >
> >  * Markus Weimer
> >  * Sergiy Matusevych
> >  * Julia Wang
> >  * Shravan M Narayanamurthy
> >  * Yingda Chen
> >  * Tony Majestro
> >  * Beysim Sezgin
> >  * Boris Shulman
> >  * Russell Sears
> >  * Jung Ryong Lee
> >  * You Sun Jung
> >  * Dong Joon Hyun
> >  * Josh Rosen
> >  * Tyson Condie
> >  * Brandon Myers
> >  * Yunseong Lee
> >  * Taegeon Um
> >  * Youngseok Yang
> >  * Brian Cho
> >  * Byung-Gon Chun
> >
> > # Affiliations
> >
> >  * Microsoft:
> >   * Markus Weimer
> >   * Sergiy Matusevych
> >   * Julia Wang
> >   * Shravan M Narayanamurthy
> >   * Yingda Chen
> >   * Tony Majestro
> >   * Beysim Sezgin
> >   * Boris Shulman
> >  * Purestorage:
> >   * Russell Sears
> >  * SK Telecom:
> >   * Jung Ryong Lee
> >   * You Sun Jung
> >   * Dong Joon Hyun
> >  * University of California:
> >   * Josh Rosen (Berkeley)
> >   * Tyson Condie (LA)
> >  * University of Washington:
> >   * Brandon Myers
> >  * Seoul National University:
> >   * Yunseong Lee
> >   * Taegeon Um
> >   * Youngseok Yang
> >   * Brian Cho
> >   * Byung-Gon Chun
> >
> >
> > # Sponsors
> >
> > ## Champions
> > Chris Douglas <cd...@apache.org>
> >
> > ## Nominated Mentors
> >  * Chris Mattmann <ma...@apache.org>
> >  * Ross Gardler <rg...@apache.org>
> >  * Owen O'Malley <om...@apache.org>
> >
> > ## Sponsoring Entity
> > The Apache Incubator
> >
>
>
>
> --
> http://hortonworks.com/download/
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified
that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender
immediately
> and delete it from your system. Thank You.

Re: [VOTE] Accept REEF into the Apache Incubator

Posted by Suresh Srinivas <su...@hortonworks.com>.
+1 (binding)


On Fri, Aug 8, 2014 at 10:40 PM, Byung-Gon Chun <bg...@gmail.com> wrote:

> Hi,
>
> Thanks for participating in the proposal discussion on REEF. The discussion
> has calmed. I would like to call a vote for acceptance of REEF into the
> Apache Incubator.
>
> The proposal is attached below, and it is also available at
> https://wiki.apache.org/incubator/ReefProposal
>
> Let's keep this vote open for three business days, closing the voting on
> August 11, 11:59PM (PDT).
>
> [] +1 Accept REEF into the Incubator
> [] 0 Don't care
> [] -1 Don't accept REEF because...
>
> Thanks!
> -Gon
>
> --
> Byung-Gon Chun
>
>
> # REEFProposal - Incubator
>
>
> # Abstract
>
> REEF (Retainable Evaluator Execution Framework) is a scale-out
> computing fabric that eases the development of Big Data applications
> on top of resource managers such as Apache YARN and Mesos.
>
>
> # Proposal
>
> REEF is a Big Data system that makes it easy to implement scalable,
> fault-tolerant runtime environments for a range of data processing
> models (e.g., graph processing and machine learning) on top of
> resource managers such as Apache YARN and Mesos. REEF provides
> capabilities to run multiple heterogeneous frameworks and workflows of
> those efficiently.
>
> Additionally, REEF contains two libraries that are of independent
> value: Wake is an event-based-programming framework inspired by Rx and
> SEDA.  Tang is a dependency injection framework inspired by Google
> Guice, but designed specifically for configuring distributed systems.
>
>
> # Background
>
> The resource management layer such as Apache YARN and Mesos has
> emerged as a critical layer in the new scale-out data processing
> stack; resource managers assume the responsibility of multiplexing a
> cluster of shared-nothing machines across heterogeneous
> applications. They operate behind an interface for leasing containers
> - a slice of a machine’s resources - to computations in an elastic
> fashion. However, building data processing frameworks directly on this
> layer comes at a high cost: each framework must tackle the same
> challenges (e.g., fault-tolerance, task scheduling and coordination)
> and reimplement common mechanisms (e.g., caching, bulk transfers).
>
> REEF provides a reusable control-plane for scheduling and coordinating
> task-level work on cluster resource managers. The REEF design enables
> sophisticated optimizations, such as container re-use and data
> caching, and facilitates workflows that span multiple
> frameworks. Examples include pipelining data between different
> operators in a relational system, retaining state across iterations in
> iterative or recursive data flow, and passing the result of a
> MapReduce job to a Machine Learning computation.
>
>
> # Rationale
>
> Since REEF is a library that makes it easy to write distributed
> applications on top of Apache YARN or Mesos, the Apache Software Foundation
> is the perfect home for hosting REEF.
>
>
> # Current Status
>
> REEF has been developed mostly by Microsoft, UCLA and the Seoul
> National University.  The REEF codebase is open-sourced under Apache
> License 2.0 and is currently hosted in a public repository at
> github.com.
>
>
> # Meritocracy
>
> We plan to build a strong open community by following the Apache
> meritocracy principles. We will work with those who contribute
> significantly to the project and invite them to be its committers.
>
>
> # Community
>
> REEF is currently being used internally at Microsoft.  Also, SK
> Telecom builds their data analytics infrastructure on top of REEF in
> collaboration with Seoul National University.  We hope to extend our
> contributor base by becoming an Apache incubator project. REEF will
> attract developers who are interested in creating common building
> blocks for simplifying the development of large-scale big data
> applications.
>
>
> # Core Developers
>
> Core developers are engineers from Microsoft, Purestorage, UCB, UCLA,
> UW and Seoul National University.
>
>
> # Alignment
>
> REEF depends on many Apache projects and dependencies. REEF is built
> on resource managers such as Apache YARN and Apache Mesos. REEF also
> uses HDFS as a distributed storage layer.
>
>
> # Known Risks
> ## Orphaned Products
>
> The risk of REEF being orphaned is small because Microsoft products
> are built on REEF. The core REEF developers continue to work on REEF
> at Microsoft, UCLA, and Seoul National University. The REEF project is
> gaining interest from other institutions to be used as their
> infrastructure.
>
> ## Inexperience with Open Source
>
> Several core developers have experience with open source development.
> REEF committers will be guided by the mentors with strong Apache open
> source project backgrounds.
>
> ## Homogeneous Developers
>
> The initial committers include developers from several institutions
> including Microsoft, Purestorage, UCB, UCLA, and Seoul National
> University.
>
> ## Reliance on Salaried Developers
>
> Developers from Microsoft are paid to work on REEF. Since the work is
> used internally at Microsoft, Microsoft will keep supporting the
> developers to work on REEF. There are also engineers and graduate
> students that contribute to REEF from UCLA, UCB, UW and Seoul National
> University.  We plan to attract active developers from other
> institutions.
>
> ## Relationships with Other Apache Products
>
> Given REEF's position in the big data stack, there are three
> relationships to consider: Projects that fit below, on top of, or
> alongside REEF in the stack.
>
> ### Below REEF: Mesos and YARN
>
> REEF is designed to facilitate application development on top of
> resource managers.  Hence, its relationship with the aforementioned
> resource managers is symbiotic by design.
>
> ### On Top of REEF
>
> Apache Spark, Giraph, MapReduce and Flink are only some of the
> projects that logically belong at a higher layer of the big data stack
> than REEF.  Of course, none of these today actually are leveraging
> REEF and had to each individually solve some of the issues REEF
> addresses.  It is our goal that REEF will help developers create
> an even richer set of future big data frameworks.
>
> ### Alongside REEF
>
> Apache hosts several projects building intermediate, library layers on
> top of a resource management platform. Twill, Slider, and Tez are
> notable examples in the incubator. These projects share many
> objectives with REEF (and each other).  We expect these parallel
> explorations to converge and differentiate within Apache, as the space
> for distributed applications and deployment is too vast for a single
> answer.
>
> Apache Twill and REEF both aim to simplify application development on
> top of resource managers.  However, REEF and Twill go about this in
> different ways: Twill simplifies programming by exposing a programming
> model, Java Threads.  REEF on the other hand provides a set of common
> building blocks (e.g., job coordination, state passing, cluster
> membership) for building big data processing applications and
> virtualizes underlying resources managers.  None of this prescribes a
> specific programming model.  As such, REEF occupies a slot ever so
> slightly below Twill in an architecture stack.
>
> Apache Slider is a framework to make it easy to deploy and manage
> long-running static applications in a YARN cluster. The focus is to
> adapt existing applications such as HBase and Accumulo to run on YARN
> with little modification. Therefore, the goals of Slider and REEF are
> different.
>
> Apache Tez is a project to develop a generic Directed Acyclic Graph (DAG)
> processing framework with a reusable set of data processing primitives.
> The initial focus is to provide improved data processing capabilities for
> projects like Apache Hive, Apache Pig, and Cascading. Tez is still a single
> framework for DAG processing.  In contrast, REEF provides a generic
> layer on which diverse computation models (DAG, ML, Graph processing,
> and Interactive query processing) can be built.  More importantly,
> REEF provides a layer that facilitates inter-framework resource and
> in-memory state use and virtualizes resource managers. Regarding
> re-usable data processing primitives, Tez and REEF share the same
> goal.  We hope to collaborate on features which can be shared between
> Tez and REEF.
>
> Apache Helix automates application-wide management operations which require
> global knowledge and coordination, such as repartitioning of resources and
> scheduling of maintenance tasks. Helix separates global coordination
> concerns from the functional tasks of the application with a state machine
> abstraction. REEF's generic layer makes it easy to program the functional
> and management tasks, which may span small or large groups within the
> application. Helix can work hand-in-hand with REEF, by providing the global
> management component for REEF applications.
>
> ## An Excessive Fascination with the Apache Brand
>
> The Apache Software Foundation has a reputation of being the best place to
> host open source projects. We believe that we will attract many developers
> who want to contribute to innovating in the Big Data platform space by
> joining the Apache Software Foundation.
>
>
> # Documentation
>
> The current documentation for REEF is at
> https://github.com/Microsoft-CISL/REEF as well as on
> http://www.reef-project.org
>
>
> # Initial Source
>
> The REEF codebase is currently hosted at
> https://github.com/Microsoft-CISL/REEF.
>
>
> # External Dependencies
>
> REEF makes extensive use of the vast array of Java libraries from the
> Apache Software Foundation, namely:
>
>  * avro (Apache 2.0)
>  * hadoop (Apache 2.0)
>  * hdfs (Apache 2.0)
>  * yarn (Apache 2.0)
>  * commons-cli (Apache 2.0)
>  * commons-configuration (Apache 2.0)
>  * commons-lang (Apache 2.0)
>  * commons-logging (Apache 2.0)
>
> To the best of our knowledge, the external dependencies of REEF are
> distributed under Apache compatible licenses:
>
>  * guava-libraries (Apache 2.0)
>  * protobuf (BSD)
>  * asm (BSD)
>  * netty (Apache 2.0)
>  * mockito (MIT)
>  * junit (EPL 1.0)
>  * slf4j (MIT)
>
>
> # Cryptography
>
> REEF will depend on secure Hadoop, which can optionally use Kerberos.
>
> # Required Resources
>
> ## Mailing Lists
>
>   * reef-private for private PMC discussions
>   * reef-dev for technical discussions among contributors and
>                  notification about commits
>
> ## Subversion Directory
>
> The REEF team uses Git for source version control:
> git://git.apache.org/reef
>
> ## Issue Tracking
>
> JIRA REEF (REEF)
>
> ## Other Resources
>
> Jenkins continuous integration testing
>
> # Initial Committers
>
>  * Markus Weimer
>  * Sergiy Matusevych
>  * Julia Wang
>  * Shravan M Narayanamurthy
>  * Yingda Chen
>  * Tony Majestro
>  * Beysim Sezgin
>  * Boris Shulman
>  * Russell Sears
>  * Jung Ryong Lee
>  * You Sun Jung
>  * Dong Joon Hyun
>  * Josh Rosen
>  * Tyson Condie
>  * Brandon Myers
>  * Yunseong Lee
>  * Taegeon Um
>  * Youngseok Yang
>  * Brian Cho
>  * Byung-Gon Chun
>
> # Affiliations
>
>  * Microsoft:
>   * Markus Weimer
>   * Sergiy Matusevych
>   * Julia Wang
>   * Shravan M Narayanamurthy
>   * Yingda Chen
>   * Tony Majestro
>   * Beysim Sezgin
>   * Boris Shulman
>  * Purestorage:
>   * Russell Sears
>  * SK Telecom:
>   * Jung Ryong Lee
>   * You Sun Jung
>   * Dong Joon Hyun
>  * University of California:
>   * Josh Rosen (Berkeley)
>   * Tyson Condie (LA)
>  * University of Washington:
>   * Brandon Myers
>  * Seoul National University:
>   * Yunseong Lee
>   * Taegeon Um
>   * Youngseok Yang
>   * Brian Cho
>   * Byung-Gon Chun
>
>
> # Sponsors
>
> ## Champions
> Chris Douglas <cd...@apache.org>
>
> ## Nominated Mentors
>  * Chris Mattmann <ma...@apache.org>
>  * Ross Gardler <rg...@apache.org>
>  * Owen O'Malley <om...@apache.org>
>
> ## Sponsoring Entity
> The Apache Incubator
>



-- 
http://hortonworks.com/download/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: [VOTE] Accept REEF into the Apache Incubator

Posted by Jake Farrell <jf...@apache.org>.
+1 (binding)

-Jake


On Sat, Aug 9, 2014 at 1:40 AM, Byung-Gon Chun <bg...@gmail.com> wrote:

> Hi,
>
> Thanks for participating in the proposal discussion on REEF. The discussion
> has calmed. I would like to call a vote for acceptance of REEF into the
> Apache Incubator.
>
> The proposal is attached below, and it is also available at
> https://wiki.apache.org/incubator/ReefProposal
>
> Let's keep this vote open for three business days, closing the voting on
> August 11, 11:59PM (PDT).
>
> [] +1 Accept REEF into the Incubator
> [] 0 Don't care
> [] -1 Don't accept REEF because...
>
> Thanks!
> -Gon
>
> --
> Byung-Gon Chun
>
>
> # REEFProposal - Incubator
>
>
> # Abstract
>
> REEF (Retainable Evaluator Execution Framework) is a scale-out
> computing fabric that eases the development of Big Data applications
> on top of resource managers such as Apache YARN and Mesos.
>
>
> # Proposal
>
> REEF is a Big Data system that makes it easy to implement scalable,
> fault-tolerant runtime environments for a range of data processing
> models (e.g., graph processing and machine learning) on top of
> resource managers such as Apache YARN and Mesos. REEF provides
> capabilities to run multiple heterogeneous frameworks and workflows of
> those efficiently.
>
> Additionally, REEF contains two libraries that are of independent
> value: Wake is an event-based-programming framework inspired by Rx and
> SEDA.  Tang is a dependency injection framework inspired by Google
> Guice, but designed specifically for configuring distributed systems.
>
>
> # Background
>
> The resource management layer such as Apache YARN and Mesos has
> emerged as a critical layer in the new scale-out data processing
> stack; resource managers assume the responsibility of multiplexing a
> cluster of shared-nothing machines across heterogeneous
> applications. They operate behind an interface for leasing containers
> - a slice of a machine’s resources - to computations in an elastic
> fashion. However, building data processing frameworks directly on this
> layer comes at a high cost: each framework must tackle the same
> challenges (e.g., fault-tolerance, task scheduling and coordination)
> and reimplement common mechanisms (e.g., caching, bulk transfers).
>
> REEF provides a reusable control-plane for scheduling and coordinating
> task-level work on cluster resource managers. The REEF design enables
> sophisticated optimizations, such as container re-use and data
> caching, and facilitates workflows that span multiple
> frameworks. Examples include pipelining data between different
> operators in a relational system, retaining state across iterations in
> iterative or recursive data flow, and passing the result of a
> MapReduce job to a Machine Learning computation.
>
>
> # Rationale
>
> Since REEF is a library that makes it easy to write distributed
> applications on top of Apache YARN or Mesos, the Apache Software Foundation
> is the perfect home for hosting REEF.
>
>
> # Current Status
>
> REEF has been developed mostly by Microsoft, UCLA and the Seoul
> National University.  The REEF codebase is open-sourced under Apache
> License 2.0 and is currently hosted in a public repository at
> github.com.
>
>
> # Meritocracy
>
> We plan to build a strong open community by following the Apache
> meritocracy principles. We will work with those who contribute
> significantly to the project and invite them to be its committers.
>
>
> # Community
>
> REEF is currently being used internally at Microsoft.  Also, SK
> Telecom builds their data analytics infrastructure on top of REEF in
> collaboration with Seoul National University.  We hope to extend our
> contributor base by becoming an Apache incubator project. REEF will
> attract developers who are interested in creating common building
> blocks for simplifying the development of large-scale big data
> applications.
>
>
> # Core Developers
>
> Core developers are engineers from Microsoft, Purestorage, UCB, UCLA,
> UW and Seoul National University.
>
>
> # Alignment
>
> REEF depends on many Apache projects and dependencies. REEF is built
> on resource managers such as Apache YARN and Apache Mesos. REEF also
> uses HDFS as a distributed storage layer.
>
>
> # Known Risks
> ## Orphaned Products
>
> The risk of REEF being orphaned is small because Microsoft products
> are built on REEF. The core REEF developers continue to work on REEF
> at Microsoft, UCLA, and Seoul National University. The REEF project is
> gaining interest from other institutions to be used as their
> infrastructure.
>
> ## Inexperience with Open Source
>
> Several core developers have experience with open source development.
> REEF committers will be guided by the mentors with strong Apache open
> source project backgrounds.
>
> ## Homogeneous Developers
>
> The initial committers include developers from several institutions
> including Microsoft, Purestorage, UCB, UCLA, and Seoul National
> University.
>
> ## Reliance on Salaried Developers
>
> Developers from Microsoft are paid to work on REEF. Since the work is
> used internally at Microsoft, Microsoft will keep supporting the
> developers to work on REEF. There are also engineers and graduate
> students that contribute to REEF from UCLA, UCB, UW and Seoul National
> University.  We plan to attract active developers from other
> institutions.
>
> ## Relationships with Other Apache Products
>
> Given REEF's position in the big data stack, there are three
> relationships to consider: Projects that fit below, on top of, or
> alongside REEF in the stack.
>
> ### Below REEF: Mesos and YARN
>
> REEF is designed to facilitate application development on top of
> resource managers.  Hence, its relationship with the aforementioned
> resource managers is symbiotic by design.
>
> ### On Top of REEF
>
> Apache Spark, Giraph, MapReduce and Flink are only some of the
> projects that logically belong at a higher layer of the big data stack
> than REEF.  Of course, none of these today actually are leveraging
> REEF and had to each individually solve some of the issues REEF
> addresses.  It is our goal that REEF will help developers create
> an even richer set of future big data frameworks.
>
> ### Alongside REEF
>
> Apache hosts several projects building intermediate, library layers on
> top of a resource management platform. Twill, Slider, and Tez are
> notable examples in the incubator. These projects share many
> objectives with REEF (and each other).  We expect these parallel
> explorations to converge and differentiate within Apache, as the space
> for distributed applications and deployment is too vast for a single
> answer.
>
> Apache Twill and REEF both aim to simplify application development on
> top of resource managers.  However, REEF and Twill go about this in
> different ways: Twill simplifies programming by exposing a programming
> model, Java Threads.  REEF on the other hand provides a set of common
> building blocks (e.g., job coordination, state passing, cluster
> membership) for building big data processing applications and
> virtualizes underlying resources managers.  None of this prescribes a
> specific programming model.  As such, REEF occupies a slot ever so
> slightly below Twill in an architecture stack.
>
> Apache Slider is a framework to make it easy to deploy and manage
> long-running static applications in a YARN cluster. The focus is to
> adapt existing applications such as HBase and Accumulo to run on YARN
> with little modification. Therefore, the goals of Slider and REEF are
> different.
>
> Apache Tez is a project to develop a generic Directed Acyclic Graph (DAG)
> processing framework with a reusable set of data processing primitives.
> The initial focus is to provide improved data processing capabilities for
> projects like Apache Hive, Apache Pig, and Cascading. Tez is still a single
> framework for DAG processing.  In contrast, REEF provides a generic
> layer on which diverse computation models (DAG, ML, Graph processing,
> and Interactive query processing) can be built.  More importantly,
> REEF provides a layer that facilitates inter-framework resource and
> in-memory state use and virtualizes resource managers. Regarding
> re-usable data processing primitives, Tez and REEF share the same
> goal.  We hope to collaborate on features which can be shared between
> Tez and REEF.
>
> Apache Helix automates application-wide management operations which require
> global knowledge and coordination, such as repartitioning of resources and
> scheduling of maintenance tasks. Helix separates global coordination
> concerns from the functional tasks of the application with a state machine
> abstraction. REEF's generic layer makes it easy to program the functional
> and management tasks, which may span small or large groups within the
> application. Helix can work hand-in-hand with REEF, by providing the global
> management component for REEF applications.
>
> ## An Excessive Fascination with the Apache Brand
>
> The Apache Software Foundation has a reputation of being the best place to
> host open source projects. We believe that we will attract many developers
> who want to contribute to innovating in the Big Data platform space by
> joining the Apache Software Foundation.
>
>
> # Documentation
>
> The current documentation for REEF is at
> https://github.com/Microsoft-CISL/REEF as well as on
> http://www.reef-project.org
>
>
> # Initial Source
>
> The REEF codebase is currently hosted at
> https://github.com/Microsoft-CISL/REEF.
>
>
> # External Dependencies
>
> REEF makes extensive use of the vast array of Java libraries from the
> Apache Software Foundation, namely:
>
>  * avro (Apache 2.0)
>  * hadoop (Apache 2.0)
>  * hdfs (Apache 2.0)
>  * yarn (Apache 2.0)
>  * commons-cli (Apache 2.0)
>  * commons-configuration (Apache 2.0)
>  * commons-lang (Apache 2.0)
>  * commons-logging (Apache 2.0)
>
> To the best of our knowledge, the external dependencies of REEF are
> distributed under Apache compatible licenses:
>
>  * guava-libraries (Apache 2.0)
>  * protobuf (BSD)
>  * asm (BSD)
>  * netty (Apache 2.0)
>  * mockito (MIT)
>  * junit (EPL 1.0)
>  * slf4j (MIT)
>
>
> # Cryptography
>
> REEF will depend on secure Hadoop, which can optionally use Kerberos.
>
> # Required Resources
>
> ## Mailing Lists
>
>   * reef-private for private PMC discussions
>   * reef-dev for technical discussions among contributors and
>                  notification about commits
>
> ## Subversion Directory
>
> The REEF team uses Git for source version control:
> git://git.apache.org/reef
>
> ## Issue Tracking
>
> JIRA REEF (REEF)
>
> ## Other Resources
>
> Jenkins continuous integration testing
>
> # Initial Committers
>
>  * Markus Weimer
>  * Sergiy Matusevych
>  * Julia Wang
>  * Shravan M Narayanamurthy
>  * Yingda Chen
>  * Tony Majestro
>  * Beysim Sezgin
>  * Boris Shulman
>  * Russell Sears
>  * Jung Ryong Lee
>  * You Sun Jung
>  * Dong Joon Hyun
>  * Josh Rosen
>  * Tyson Condie
>  * Brandon Myers
>  * Yunseong Lee
>  * Taegeon Um
>  * Youngseok Yang
>  * Brian Cho
>  * Byung-Gon Chun
>
> # Affiliations
>
>  * Microsoft:
>   * Markus Weimer
>   * Sergiy Matusevych
>   * Julia Wang
>   * Shravan M Narayanamurthy
>   * Yingda Chen
>   * Tony Majestro
>   * Beysim Sezgin
>   * Boris Shulman
>  * Purestorage:
>   * Russell Sears
>  * SK Telecom:
>   * Jung Ryong Lee
>   * You Sun Jung
>   * Dong Joon Hyun
>  * University of California:
>   * Josh Rosen (Berkeley)
>   * Tyson Condie (LA)
>  * University of Washington:
>   * Brandon Myers
>  * Seoul National University:
>   * Yunseong Lee
>   * Taegeon Um
>   * Youngseok Yang
>   * Brian Cho
>   * Byung-Gon Chun
>
>
> # Sponsors
>
> ## Champions
> Chris Douglas <cd...@apache.org>
>
> ## Nominated Mentors
>  * Chris Mattmann <ma...@apache.org>
>  * Ross Gardler <rg...@apache.org>
>  * Owen O'Malley <om...@apache.org>
>
> ## Sponsoring Entity
> The Apache Incubator
>

Re: [VOTE] Accept REEF into the Apache Incubator

Posted by Konstantin Boudnik <co...@apache.org>.
+1

On Sat, Aug 09, 2014 at 02:40PM, Byung-Gon Chun wrote:
> Hi,
> 
> Thanks for participating in the proposal discussion on REEF. The discussion
> has calmed. I would like to call a vote for acceptance of REEF into the
> Apache Incubator.
> 
> The proposal is attached below, and it is also available at
> https://wiki.apache.org/incubator/ReefProposal
> 
> Let's keep this vote open for three business days, closing the voting on
> August 11, 11:59PM (PDT).
> 
> [] +1 Accept REEF into the Incubator
> [] 0 Don't care
> [] -1 Don't accept REEF because...
> 
> Thanks!
> -Gon
> 
> -- 
> Byung-Gon Chun
> 
> 
> # REEFProposal - Incubator
> 
> 
> # Abstract
> 
> REEF (Retainable Evaluator Execution Framework) is a scale-out
> computing fabric that eases the development of Big Data applications
> on top of resource managers such as Apache YARN and Mesos.
> 
> 
> # Proposal
> 
> REEF is a Big Data system that makes it easy to implement scalable,
> fault-tolerant runtime environments for a range of data processing
> models (e.g., graph processing and machine learning) on top of
> resource managers such as Apache YARN and Mesos. REEF provides
> capabilities to run multiple heterogeneous frameworks and workflows of
> those efficiently.
> 
> Additionally, REEF contains two libraries that are of independent
> value: Wake is an event-based-programming framework inspired by Rx and
> SEDA.  Tang is a dependency injection framework inspired by Google
> Guice, but designed specifically for configuring distributed systems.
> 
> 
> # Background
> 
> The resource management layer such as Apache YARN and Mesos has
> emerged as a critical layer in the new scale-out data processing
> stack; resource managers assume the responsibility of multiplexing a
> cluster of shared-nothing machines across heterogeneous
> applications. They operate behind an interface for leasing containers
> - a slice of a machine’s resources - to computations in an elastic
> fashion. However, building data processing frameworks directly on this
> layer comes at a high cost: each framework must tackle the same
> challenges (e.g., fault-tolerance, task scheduling and coordination)
> and reimplement common mechanisms (e.g., caching, bulk transfers).
> 
> REEF provides a reusable control-plane for scheduling and coordinating
> task-level work on cluster resource managers. The REEF design enables
> sophisticated optimizations, such as container re-use and data
> caching, and facilitates workflows that span multiple
> frameworks. Examples include pipelining data between different
> operators in a relational system, retaining state across iterations in
> iterative or recursive data flow, and passing the result of a
> MapReduce job to a Machine Learning computation.
> 
> 
> # Rationale
> 
> Since REEF is a library that makes it easy to write distributed
> applications on top of Apache YARN or Mesos, the Apache Software Foundation
> is the perfect home for hosting REEF.
> 
> 
> # Current Status
> 
> REEF has been developed mostly by Microsoft, UCLA and the Seoul
> National University.  The REEF codebase is open-sourced under Apache
> License 2.0 and is currently hosted in a public repository at
> github.com.
> 
> 
> # Meritocracy
> 
> We plan to build a strong open community by following the Apache
> meritocracy principles. We will work with those who contribute
> significantly to the project and invite them to be its committers.
> 
> 
> # Community
> 
> REEF is currently being used internally at Microsoft.  Also, SK
> Telecom builds their data analytics infrastructure on top of REEF in
> collaboration with Seoul National University.  We hope to extend our
> contributor base by becoming an Apache incubator project. REEF will
> attract developers who are interested in creating common building
> blocks for simplifying the development of large-scale big data
> applications.
> 
> 
> # Core Developers
> 
> Core developers are engineers from Microsoft, Purestorage, UCB, UCLA,
> UW and Seoul National University.
> 
> 
> # Alignment
> 
> REEF depends on many Apache projects and dependencies. REEF is built
> on resource managers such as Apache YARN and Apache Mesos. REEF also
> uses HDFS as a distributed storage layer.
> 
> 
> # Known Risks
> ## Orphaned Products
> 
> The risk of REEF being orphaned is small because Microsoft products
> are built on REEF. The core REEF developers continue to work on REEF
> at Microsoft, UCLA, and Seoul National University. The REEF project is
> gaining interest from other institutions to be used as their
> infrastructure.
> 
> ## Inexperience with Open Source
> 
> Several core developers have experience with open source development.
> REEF committers will be guided by the mentors with strong Apache open
> source project backgrounds.
> 
> ## Homogeneous Developers
> 
> The initial committers include developers from several institutions
> including Microsoft, Purestorage, UCB, UCLA, and Seoul National
> University.
> 
> ## Reliance on Salaried Developers
> 
> Developers from Microsoft are paid to work on REEF. Since the work is
> used internally at Microsoft, Microsoft will keep supporting the
> developers to work on REEF. There are also engineers and graduate
> students that contribute to REEF from UCLA, UCB, UW and Seoul National
> University.  We plan to attract active developers from other
> institutions.
> 
> ## Relationships with Other Apache Products
> 
> Given REEF's position in the big data stack, there are three
> relationships to consider: Projects that fit below, on top of, or
> alongside REEF in the stack.
> 
> ### Below REEF: Mesos and YARN
> 
> REEF is designed to facilitate application development on top of
> resource managers.  Hence, its relationship with the aforementioned
> resource managers is symbiotic by design.
> 
> ### On Top of REEF
> 
> Apache Spark, Giraph, MapReduce and Flink are only some of the
> projects that logically belong at a higher layer of the big data stack
> than REEF.  Of course, none of these today actually are leveraging
> REEF and had to each individually solve some of the issues REEF
> addresses.  It is our goal that REEF will help developers create
> an even richer set of future big data frameworks.
> 
> ### Alongside REEF
> 
> Apache hosts several projects building intermediate, library layers on
> top of a resource management platform. Twill, Slider, and Tez are
> notable examples in the incubator. These projects share many
> objectives with REEF (and each other).  We expect these parallel
> explorations to converge and differentiate within Apache, as the space
> for distributed applications and deployment is too vast for a single
> answer.
> 
> Apache Twill and REEF both aim to simplify application development on
> top of resource managers.  However, REEF and Twill go about this in
> different ways: Twill simplifies programming by exposing a programming
> model, Java Threads.  REEF on the other hand provides a set of common
> building blocks (e.g., job coordination, state passing, cluster
> membership) for building big data processing applications and
> virtualizes underlying resources managers.  None of this prescribes a
> specific programming model.  As such, REEF occupies a slot ever so
> slightly below Twill in an architecture stack.
> 
> Apache Slider is a framework to make it easy to deploy and manage
> long-running static applications in a YARN cluster. The focus is to
> adapt existing applications such as HBase and Accumulo to run on YARN
> with little modification. Therefore, the goals of Slider and REEF are
> different.
> 
> Apache Tez is a project to develop a generic Directed Acyclic Graph (DAG)
> processing framework with a reusable set of data processing primitives.
> The initial focus is to provide improved data processing capabilities for
> projects like Apache Hive, Apache Pig, and Cascading. Tez is still a single
> framework for DAG processing.  In contrast, REEF provides a generic
> layer on which diverse computation models (DAG, ML, Graph processing,
> and Interactive query processing) can be built.  More importantly,
> REEF provides a layer that facilitates inter-framework resource and
> in-memory state use and virtualizes resource managers. Regarding
> re-usable data processing primitives, Tez and REEF share the same
> goal.  We hope to collaborate on features which can be shared between
> Tez and REEF.
> 
> Apache Helix automates application-wide management operations which require
> global knowledge and coordination, such as repartitioning of resources and
> scheduling of maintenance tasks. Helix separates global coordination
> concerns from the functional tasks of the application with a state machine
> abstraction. REEF's generic layer makes it easy to program the functional
> and management tasks, which may span small or large groups within the
> application. Helix can work hand-in-hand with REEF, by providing the global
> management component for REEF applications.
> 
> ## An Excessive Fascination with the Apache Brand
> 
> The Apache Software Foundation has a reputation of being the best place to
> host open source projects. We believe that we will attract many developers
> who want to contribute to innovating in the Big Data platform space by
> joining the Apache Software Foundation.
> 
> 
> # Documentation
> 
> The current documentation for REEF is at
> https://github.com/Microsoft-CISL/REEF as well as on
> http://www.reef-project.org
> 
> 
> # Initial Source
> 
> The REEF codebase is currently hosted at
> https://github.com/Microsoft-CISL/REEF.
> 
> 
> # External Dependencies
> 
> REEF makes extensive use of the vast array of Java libraries from the
> Apache Software Foundation, namely:
> 
>  * avro (Apache 2.0)
>  * hadoop (Apache 2.0)
>  * hdfs (Apache 2.0)
>  * yarn (Apache 2.0)
>  * commons-cli (Apache 2.0)
>  * commons-configuration (Apache 2.0)
>  * commons-lang (Apache 2.0)
>  * commons-logging (Apache 2.0)
> 
> To the best of our knowledge, the external dependencies of REEF are
> distributed under Apache compatible licenses:
> 
>  * guava-libraries (Apache 2.0)
>  * protobuf (BSD)
>  * asm (BSD)
>  * netty (Apache 2.0)
>  * mockito (MIT)
>  * junit (EPL 1.0)
>  * slf4j (MIT)
> 
> 
> # Cryptography
> 
> REEF will depend on secure Hadoop, which can optionally use Kerberos.
> 
> # Required Resources
> 
> ## Mailing Lists
> 
>   * reef-private for private PMC discussions
>   * reef-dev for technical discussions among contributors and
>                  notification about commits
> 
> ## Subversion Directory
> 
> The REEF team uses Git for source version control:
> git://git.apache.org/reef
> 
> ## Issue Tracking
> 
> JIRA REEF (REEF)
> 
> ## Other Resources
> 
> Jenkins continuous integration testing
> 
> # Initial Committers
> 
>  * Markus Weimer
>  * Sergiy Matusevych
>  * Julia Wang
>  * Shravan M Narayanamurthy
>  * Yingda Chen
>  * Tony Majestro
>  * Beysim Sezgin
>  * Boris Shulman
>  * Russell Sears
>  * Jung Ryong Lee
>  * You Sun Jung
>  * Dong Joon Hyun
>  * Josh Rosen
>  * Tyson Condie
>  * Brandon Myers
>  * Yunseong Lee
>  * Taegeon Um
>  * Youngseok Yang
>  * Brian Cho
>  * Byung-Gon Chun
> 
> # Affiliations
> 
>  * Microsoft:
>   * Markus Weimer
>   * Sergiy Matusevych
>   * Julia Wang
>   * Shravan M Narayanamurthy
>   * Yingda Chen
>   * Tony Majestro
>   * Beysim Sezgin
>   * Boris Shulman
>  * Purestorage:
>   * Russell Sears
>  * SK Telecom:
>   * Jung Ryong Lee
>   * You Sun Jung
>   * Dong Joon Hyun
>  * University of California:
>   * Josh Rosen (Berkeley)
>   * Tyson Condie (LA)
>  * University of Washington:
>   * Brandon Myers
>  * Seoul National University:
>   * Yunseong Lee
>   * Taegeon Um
>   * Youngseok Yang
>   * Brian Cho
>   * Byung-Gon Chun
> 
> 
> # Sponsors
> 
> ## Champions
> Chris Douglas <cd...@apache.org>
> 
> ## Nominated Mentors
>  * Chris Mattmann <ma...@apache.org>
>  * Ross Gardler <rg...@apache.org>
>  * Owen O'Malley <om...@apache.org>
> 
> ## Sponsoring Entity
> The Apache Incubator

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [VOTE] Accept REEF into the Apache Incubator

Posted by Chris Douglas <cd...@apache.org>.
+1 -C

On Fri, Aug 8, 2014 at 10:40 PM, Byung-Gon Chun <bg...@gmail.com> wrote:
> Hi,
>
> Thanks for participating in the proposal discussion on REEF. The discussion
> has calmed. I would like to call a vote for acceptance of REEF into the
> Apache Incubator.
>
> The proposal is attached below, and it is also available at
> https://wiki.apache.org/incubator/ReefProposal
>
> Let's keep this vote open for three business days, closing the voting on
> August 11, 11:59PM (PDT).
>
> [] +1 Accept REEF into the Incubator
> [] 0 Don't care
> [] -1 Don't accept REEF because...
>
> Thanks!
> -Gon
>
> --
> Byung-Gon Chun
>
>
> # REEFProposal - Incubator
>
>
> # Abstract
>
> REEF (Retainable Evaluator Execution Framework) is a scale-out
> computing fabric that eases the development of Big Data applications
> on top of resource managers such as Apache YARN and Mesos.
>
>
> # Proposal
>
> REEF is a Big Data system that makes it easy to implement scalable,
> fault-tolerant runtime environments for a range of data processing
> models (e.g., graph processing and machine learning) on top of
> resource managers such as Apache YARN and Mesos. REEF provides
> capabilities to run multiple heterogeneous frameworks and workflows of
> those efficiently.
>
> Additionally, REEF contains two libraries that are of independent
> value: Wake is an event-based-programming framework inspired by Rx and
> SEDA.  Tang is a dependency injection framework inspired by Google
> Guice, but designed specifically for configuring distributed systems.
>
>
> # Background
>
> The resource management layer such as Apache YARN and Mesos has
> emerged as a critical layer in the new scale-out data processing
> stack; resource managers assume the responsibility of multiplexing a
> cluster of shared-nothing machines across heterogeneous
> applications. They operate behind an interface for leasing containers
> - a slice of a machine’s resources - to computations in an elastic
> fashion. However, building data processing frameworks directly on this
> layer comes at a high cost: each framework must tackle the same
> challenges (e.g., fault-tolerance, task scheduling and coordination)
> and reimplement common mechanisms (e.g., caching, bulk transfers).
>
> REEF provides a reusable control-plane for scheduling and coordinating
> task-level work on cluster resource managers. The REEF design enables
> sophisticated optimizations, such as container re-use and data
> caching, and facilitates workflows that span multiple
> frameworks. Examples include pipelining data between different
> operators in a relational system, retaining state across iterations in
> iterative or recursive data flow, and passing the result of a
> MapReduce job to a Machine Learning computation.
>
>
> # Rationale
>
> Since REEF is a library that makes it easy to write distributed
> applications on top of Apache YARN or Mesos, the Apache Software Foundation
> is the perfect home for hosting REEF.
>
>
> # Current Status
>
> REEF has been developed mostly by Microsoft, UCLA and the Seoul
> National University.  The REEF codebase is open-sourced under Apache
> License 2.0 and is currently hosted in a public repository at
> github.com.
>
>
> # Meritocracy
>
> We plan to build a strong open community by following the Apache
> meritocracy principles. We will work with those who contribute
> significantly to the project and invite them to be its committers.
>
>
> # Community
>
> REEF is currently being used internally at Microsoft.  Also, SK
> Telecom builds their data analytics infrastructure on top of REEF in
> collaboration with Seoul National University.  We hope to extend our
> contributor base by becoming an Apache incubator project. REEF will
> attract developers who are interested in creating common building
> blocks for simplifying the development of large-scale big data
> applications.
>
>
> # Core Developers
>
> Core developers are engineers from Microsoft, Purestorage, UCB, UCLA,
> UW and Seoul National University.
>
>
> # Alignment
>
> REEF depends on many Apache projects and dependencies. REEF is built
> on resource managers such as Apache YARN and Apache Mesos. REEF also
> uses HDFS as a distributed storage layer.
>
>
> # Known Risks
> ## Orphaned Products
>
> The risk of REEF being orphaned is small because Microsoft products
> are built on REEF. The core REEF developers continue to work on REEF
> at Microsoft, UCLA, and Seoul National University. The REEF project is
> gaining interest from other institutions to be used as their
> infrastructure.
>
> ## Inexperience with Open Source
>
> Several core developers have experience with open source development.
> REEF committers will be guided by the mentors with strong Apache open
> source project backgrounds.
>
> ## Homogeneous Developers
>
> The initial committers include developers from several institutions
> including Microsoft, Purestorage, UCB, UCLA, and Seoul National
> University.
>
> ## Reliance on Salaried Developers
>
> Developers from Microsoft are paid to work on REEF. Since the work is
> used internally at Microsoft, Microsoft will keep supporting the
> developers to work on REEF. There are also engineers and graduate
> students that contribute to REEF from UCLA, UCB, UW and Seoul National
> University.  We plan to attract active developers from other
> institutions.
>
> ## Relationships with Other Apache Products
>
> Given REEF's position in the big data stack, there are three
> relationships to consider: Projects that fit below, on top of, or
> alongside REEF in the stack.
>
> ### Below REEF: Mesos and YARN
>
> REEF is designed to facilitate application development on top of
> resource managers.  Hence, its relationship with the aforementioned
> resource managers is symbiotic by design.
>
> ### On Top of REEF
>
> Apache Spark, Giraph, MapReduce and Flink are only some of the
> projects that logically belong at a higher layer of the big data stack
> than REEF.  Of course, none of these today actually are leveraging
> REEF and had to each individually solve some of the issues REEF
> addresses.  It is our goal that REEF will help developers create
> an even richer set of future big data frameworks.
>
> ### Alongside REEF
>
> Apache hosts several projects building intermediate, library layers on
> top of a resource management platform. Twill, Slider, and Tez are
> notable examples in the incubator. These projects share many
> objectives with REEF (and each other).  We expect these parallel
> explorations to converge and differentiate within Apache, as the space
> for distributed applications and deployment is too vast for a single
> answer.
>
> Apache Twill and REEF both aim to simplify application development on
> top of resource managers.  However, REEF and Twill go about this in
> different ways: Twill simplifies programming by exposing a programming
> model, Java Threads.  REEF on the other hand provides a set of common
> building blocks (e.g., job coordination, state passing, cluster
> membership) for building big data processing applications and
> virtualizes underlying resources managers.  None of this prescribes a
> specific programming model.  As such, REEF occupies a slot ever so
> slightly below Twill in an architecture stack.
>
> Apache Slider is a framework to make it easy to deploy and manage
> long-running static applications in a YARN cluster. The focus is to
> adapt existing applications such as HBase and Accumulo to run on YARN
> with little modification. Therefore, the goals of Slider and REEF are
> different.
>
> Apache Tez is a project to develop a generic Directed Acyclic Graph (DAG)
> processing framework with a reusable set of data processing primitives.
> The initial focus is to provide improved data processing capabilities for
> projects like Apache Hive, Apache Pig, and Cascading. Tez is still a single
> framework for DAG processing.  In contrast, REEF provides a generic
> layer on which diverse computation models (DAG, ML, Graph processing,
> and Interactive query processing) can be built.  More importantly,
> REEF provides a layer that facilitates inter-framework resource and
> in-memory state use and virtualizes resource managers. Regarding
> re-usable data processing primitives, Tez and REEF share the same
> goal.  We hope to collaborate on features which can be shared between
> Tez and REEF.
>
> Apache Helix automates application-wide management operations which require
> global knowledge and coordination, such as repartitioning of resources and
> scheduling of maintenance tasks. Helix separates global coordination
> concerns from the functional tasks of the application with a state machine
> abstraction. REEF's generic layer makes it easy to program the functional
> and management tasks, which may span small or large groups within the
> application. Helix can work hand-in-hand with REEF, by providing the global
> management component for REEF applications.
>
> ## An Excessive Fascination with the Apache Brand
>
> The Apache Software Foundation has a reputation of being the best place to
> host open source projects. We believe that we will attract many developers
> who want to contribute to innovating in the Big Data platform space by
> joining the Apache Software Foundation.
>
>
> # Documentation
>
> The current documentation for REEF is at
> https://github.com/Microsoft-CISL/REEF as well as on
> http://www.reef-project.org
>
>
> # Initial Source
>
> The REEF codebase is currently hosted at
> https://github.com/Microsoft-CISL/REEF.
>
>
> # External Dependencies
>
> REEF makes extensive use of the vast array of Java libraries from the
> Apache Software Foundation, namely:
>
>  * avro (Apache 2.0)
>  * hadoop (Apache 2.0)
>  * hdfs (Apache 2.0)
>  * yarn (Apache 2.0)
>  * commons-cli (Apache 2.0)
>  * commons-configuration (Apache 2.0)
>  * commons-lang (Apache 2.0)
>  * commons-logging (Apache 2.0)
>
> To the best of our knowledge, the external dependencies of REEF are
> distributed under Apache compatible licenses:
>
>  * guava-libraries (Apache 2.0)
>  * protobuf (BSD)
>  * asm (BSD)
>  * netty (Apache 2.0)
>  * mockito (MIT)
>  * junit (EPL 1.0)
>  * slf4j (MIT)
>
>
> # Cryptography
>
> REEF will depend on secure Hadoop, which can optionally use Kerberos.
>
> # Required Resources
>
> ## Mailing Lists
>
>   * reef-private for private PMC discussions
>   * reef-dev for technical discussions among contributors and
>                  notification about commits
>
> ## Subversion Directory
>
> The REEF team uses Git for source version control:
> git://git.apache.org/reef
>
> ## Issue Tracking
>
> JIRA REEF (REEF)
>
> ## Other Resources
>
> Jenkins continuous integration testing
>
> # Initial Committers
>
>  * Markus Weimer
>  * Sergiy Matusevych
>  * Julia Wang
>  * Shravan M Narayanamurthy
>  * Yingda Chen
>  * Tony Majestro
>  * Beysim Sezgin
>  * Boris Shulman
>  * Russell Sears
>  * Jung Ryong Lee
>  * You Sun Jung
>  * Dong Joon Hyun
>  * Josh Rosen
>  * Tyson Condie
>  * Brandon Myers
>  * Yunseong Lee
>  * Taegeon Um
>  * Youngseok Yang
>  * Brian Cho
>  * Byung-Gon Chun
>
> # Affiliations
>
>  * Microsoft:
>   * Markus Weimer
>   * Sergiy Matusevych
>   * Julia Wang
>   * Shravan M Narayanamurthy
>   * Yingda Chen
>   * Tony Majestro
>   * Beysim Sezgin
>   * Boris Shulman
>  * Purestorage:
>   * Russell Sears
>  * SK Telecom:
>   * Jung Ryong Lee
>   * You Sun Jung
>   * Dong Joon Hyun
>  * University of California:
>   * Josh Rosen (Berkeley)
>   * Tyson Condie (LA)
>  * University of Washington:
>   * Brandon Myers
>  * Seoul National University:
>   * Yunseong Lee
>   * Taegeon Um
>   * Youngseok Yang
>   * Brian Cho
>   * Byung-Gon Chun
>
>
> # Sponsors
>
> ## Champions
> Chris Douglas <cd...@apache.org>
>
> ## Nominated Mentors
>  * Chris Mattmann <ma...@apache.org>
>  * Ross Gardler <rg...@apache.org>
>  * Owen O'Malley <om...@apache.org>
>
> ## Sponsoring Entity
> The Apache Incubator

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org