You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@incubator.apache.org by Branko Čibej <br...@apache.org> on 2013/03/13 04:32:02 UTC

[RESULT][VOTE] Accept Tez into Incubator

On 25.02.2013 05:44, Arun C Murthy wrote:
> Thanks to all who voted. Obviously, I'm +1 (binding) on the proposal.
>
> With 14 +1s (10 binding) the vote passes.
>
> I'll start the work to get the podling started.
>
> thanks,
> Arun
>
> On Feb 19, 2013, at 8:26 PM, Arun C Murthy wrote:
>
>> Hi Folks,
>>
>> Thanks for participating in the discussion. I'd like to call a VOTE for acceptance of Apache Tez into the Incubator. I'll let the vote run till into this weekend (Sun 2/24 6pm PST).
>>
>> [ ]  +1 Accept Apache Tez into the Incubator
>> [ ]  +0 Don't care.
>> [ ]  -1 Don't accept Apache Tez into the Incubator because...
>>
>> Full proposal is pasted at the bottom of this email, and the corresponding wiki is http://wiki.apache.org/incubator/TezProposal. 
>>
>> Only VOTEs from Incubator PMC members are binding, but all are welcome to express their thoughts.
>>
>> Here's my +1 (binding).
>>
>> thanks,
>> Arun
>>
>> PS: From the initial discussion, the only changes are that I've added one new mentor and 2 new committers. All the new additions come from the non-major employer while we continue to strive to further diversify during the incubation. Thanks.
>>
>> ----
>>
>> = Tez =
>>
>> == Abstract ==
>> Tez is an effort to develop a generic application framework which can be used
>> to process arbitrarily complex data-processing tasks and also a re-usable set
>> of data-processing primitives which can be used by other projects.
>>
>> == Proposal ==
>> Tez is a proposal to develop a generic application which can be used to
>> process complex data-processing task DAGs and runs natively on Apache Hadoop 
>> YARN. YARN is a generic resource-management system on which currently 
>> applications like MapReduce already exist. MapReduce is a specific, and
>> constrained, DAG - which is not optimal for several frameworks like Apache Hive
>> and Apache Pig. Furthermore, we propose to develop a re-usable set of
>> libraries of data-processing primitives such as sorting, merging,
>> data-shuffling, intermediate data management etc. which are necessary for Tez 
>> which we envision can be used directly by other projects. 
>>
>> == Background ==
>> Apache Hadoop MapReduce has emerged as the assembly-language on which other
>> frameworks like Apache Pig and Apache Hive have been built. However, it has
>> been well accepted that MapReduce produces very constrained task DAGs for each
>> job which results in Apache Pig and Apache Hive requiring multiple MapReduce
>> jobs for several queries. By providing a more expressive DAG of tasks for a
>> job, Tez attempts to provide significantly enhanced data-processing
>> capabilities for projects like Apache Pig, Apache Hive, Cascading etc.
>>
>> == Rationale ==
>> There is an important gap that Tez fulfills in the Apache Hadoop ecosystem of
>> allowing for more expressive task DAGs for data-processing applications such
>> as Apache Pig, Apache Hive, Cascading etc.
>>
>> With emergence of Apache Hadoop YARN, there is a strong need for a
>> common DAG application which can then be shared by Apache Pig, Apache Hive,
>> Cascading etc.
>>
>> == Initial Goals ==
>> The initial goals for this project are to specify the detailed requirements
>> and architecture, and then develop the initial implementation including the
>> DAG ApplicationMaster to run natively inside Apache Hadoop YARN. 
>>
>> == Current Status ==
>> Significant work has been completed to identify the initial requirements and
>> define the overall system architecture. There is a patch available in the
>> internal Hortonworks git repository which can act as the initial seed. 
>>
>> === Meritocracy ===
>> We plan to invest in supporting a meritocracy. We will discuss the requirements 
>> in an open forum. Several companies have already expressed interest in this 
>> project, and we intend to invite additional developers to participate. 
>> We will encourage and monitor community participation so that privileges can be 
>> extended to those that contribute. 
>>
>> === Community ===
>> The need for a generic DAG application for data processing in the open source is 
>> tremendous, so there is a potential for a very large community. We believe
>> that Tez's extensible architecture will further encourage community participation. 
>> Also, related Apache projects (eg, Pig, Hive) have very large and active 
>> communities, and we expect that over time Tez will also attract a large community.
>>
>> === Core Developers ===
>> The developers on the initial committers list include people very experienced
>> in the Apache Hadoop ecosystem:
>>
>>  * Alan Gates <gates at apache dot org>
>>  * Arun C Murthy <acmurthy at apache dot org>
>>  * Ashutosh Chauhan <hashutosh at apache dot org>
>>  * Bikas Saha <bikas at apache dot org>
>>  * Chris Douglas <cdouglas at apache dot org>
>>  * Daryn Sharp <daryn at apache dot org>
>>  * Devaraj Das <ddas at apache dot org>
>>  * Gopal Vijayaraghavan <gopal at hortonworks dot com>
>>  * Gunther Hagleitner <ghagleitner at hortonworks dot com>
>>  * Hitesh Shah <hitesh at apache dot org>
>>  * Jason Lowe <jlowe at apache dot org>
>>  * Jean Xu <jeanxu at facebook dot com>
>>  * Jitendra Pandey <jitendra at apache dot org>
>>  * Julien Le Dem <julien at apache dot org>
>>  * Kevin Wilfong <kevinwilfong at apache dot org>
>>  * Mike Liddell <mike dot lidell at microsoft dot com>
>>  * Namit Jain <namit at apache dot org>
>>  * Nathan Roberts <nroberts at yahoo dash inc dot com>
>>  * Owen O'Malley <omalley at apache dot org>
>>  * Robert Evans <bobby at apache dot org>
>>  * Siddharth Seth <sseth at apache dot org>
>>  * Tom White <tomwhite at apache dot org>
>>  * Thomas Graves <tgraves at apache dot org>
>>  * Vikram Dixit <vikram at apache dot org>
>>  * Vinod Kumar Vavilapalli <vinodkv at apache dot org>
>>  * William Graham <billgraham at apache dot org>
>>
>> We realize that though we have significant employer diversity already, 
>> additional diversity is always better, and we will work 
>> aggressively to recruit developers from additional companies.
>>
>> === Alignment ===
>> The initial committers strongly believe that a standard task DAG 
>> application on Apache Hadoop YARN will gain broader adoption as an open source, 
>> community driven project, where the community can contribute not only to the 
>> core components, but also to a growing collection of applications which will
>> be based on top of Tez. Our hope is that the Apache Hive, Apache Pig,
>> Cascading and other communities will find tremendous value in Tez and will adopt 
>> it en masse. 
>>
>> == Known Risks ==
>>
>> === Orphaned Products ===
>> The contributors are leading users and vendors in the Apache Hadoop ecosystem, 
>> with significant open source experience, so the risk of being orphaned is 
>> relatively low. The project could be at risk if vendors decided to change 
>> their strategies in the market. In such an event, the current committers 
>> plan to continue working on the project on their own time, though the 
>> progress will likely be slower. We plan to mitigate this risk by 
>> recruiting additional committers.
>>
>> === Inexperience with Open Source ===
>> The initial committers include veteran Apache members (Committers, PMC members
>> and Apache Members) and other developers who have varying degrees of experience 
>> with open source projects. All have been involved with source code that has 
>> been released under an open source license, and several also have experience 
>> developing code with an open source development process.
>>
>> === Homogenous Developers ===
>> The initial committers are employed by a number of companies, including
>> Cloudera, Facebook, Hortonworks, Microsoft, Twitter and Yahoo. We are committed 
>> to recruiting additional committers from other companies based on their 
>> contributions to the project even though we do have significant diversity
>> already. 
>>
>> === Reliance on Salaried Developers ===
>> It is expected that Tez development will occur on both salaried time and on 
>> volunteer time, after hours. The majority of initial committers are paid by 
>> their employer to contribute to this project. However, they are all passionate 
>> about the project, and we are confident that the project will continue even if 
>> no salaried developers contribute to the project. We are committed to recruiting 
>> additional committers including non-salaried developers.
>>
>> === Relationships with Other Apache Products ===
>> As mentioned in the Alignment section, Tez is closely integrated with Hadoop,
>> Hive and Pig in a numerous ways. We look forward to collaborating with 
>> those communities, as well as other Apache communities. 
>>
>> === An Excessive Fascination with the Apache Brand ===
>> Tez solves a real need for generic task DAG management in the Apache Hadoop
>> ecosystem, something which has been addressed in a very ad hoc manner so far
>> by multiple Apache projects. Our rationale for developing Tez as an Apache 
>> project is detailed in the Rationale section. We believe that the Apache brand 
>> and community process will help us attract more contributors to this project, 
>> and help establish ubiquitous APIs. 
>>
>> == Documentation ==
>> http://wiki.apache.org/incubator/TezProposal
>>
>> == Initial Source ==
>> Available as a patch.
>>
>> == Cryptography ==
>> Tez will eventually support encryption on the wire. This is not one of the initial 
>> goals, and we do not expect Tez to be a controlled export item due to the use 
>> of encryption.
>>
>> == Required Resources ==
>>
>> === Mailing List ===
>>  * tez-private
>>  * tez-dev
>>  * tez-user
>>
>> === Subversion Directory ===
>> Git is the preferred source control system: git://git.apache.org/tez
>>
>> === Issue Tracking ===
>>
>> JIRA Tez (TEZ) 
>>
>> == Initial Committers ==
>>  * Alan Gates <gates at apache dot org>
>>  * Arun C Murthy <acmurthy at apache dot org>
>>  * Ashutosh Chauhan <hashutosh at apache dot org>
>>  * Bikas Saha <bikas at apache dot org>
>>  * Chris Douglas <cdouglas at apache dot org>
>>  * Daryn Sharp <daryn at apache dot org>
>>  * Devaraj Das <ddas at apache dot org>
>>  * Gopal Vijayaraghavan <gopal at hortonworks dot com>
>>  * Gunther Hagleitner <ghagleitner at hortonworks dot com>
>>  * Hitesh Shah <hitesh at apache dot org>
>>  * Jason Lowe <jlowe at apache dot org>
>>  * Jean Xu <jeanxu at facebook dot com>
>>  * Jitendra Pandey <jitendra at apache dot org>
>>  * Julien Le Dem <julien at apache dot org>
>>  * Kevin Wilfong <kevinwilfong at apache dot org>
>>  * Mike Liddell <mike dot lidell at microsoft dot com>
>>  * Namit Jain <namit at apache dot org>
>>  * Nathan Roberts <nroberts at yahoo dash inc dot com>
>>  * Owen O'Malley <omalley at apache dot org>
>>  * Robert Evans <bobby at apache dot org>
>>  * Siddharth Seth <sseth at apache dot org>
>>  * Tom White <tomwhite at apache dot org>
>>  * Thomas Graves <tgraves at apache dot org>
>>  * Vikram Dixit <vikram at apache dot org>
>>  * Vinod Kumar Vavilapalli <vinodkv at apache dot org>
>>  * William Graham <billgraham at apache dot org>
>>
>> == Affiliations ==
>> The initial committers are employees of Cloudera, Facebook, Hortonworks,
>> Microsoft, Twitter and Yahoo Inc. 
>>
>>  * Alan Gates - Hortonworks 
>>  * Arun C Murthy - Hortonworks 
>>  * Ashutosh Chauhan - Hortonworks 
>>  * Bikas Saha - Hortonworks 
>>  * Chris Douglas - Microsoft 
>>  * Daryn Sharp - Yahoo 
>>  * Devaraj Das - Hortonworks 
>>  * Gopal Vijayaraghavan - Hortonworks 
>>  * Gunther Hagleitner - Hortonworks 
>>  * Hitesh Shah - Hortonworks 
>>  * Jason Lowe - Yahoo 
>>  * Jean Xu - Facebook 
>>  * Jitendra Pandey - Hortonworks 
>>  * Julien Le Dem - Twitter
>>  * Kevin Wilfong - Facebook 
>>  * Mike Liddell - Microsoft 
>>  * Namit Jain - Facebook 
>>  * Nathan Roberts - Yahoo 
>>  * Owen O'Malley - Hortonworks
>>  * Robert Evans - Yahoo 
>>  * Siddharth Seth - Hortonworks 
>>  * Tom White - Cloudera 
>>  * Thomas Graves - Yahoo 
>>  * Vikram Dixit - Hortonworks 
>>  * Vinod Kumar Vavilapalli - Hortonworks 
>>  * William Graham - Twitter 
>>
>> The nominated mentors are employees of Hortonworks, LinkedIn, 
>> NASA JPL and Microsoft.
>>  
>>  * Alan Gates - Hortonworks 
>>  * Arun C Murthy - Hortonworks 
>>  * Chris Douglas - Microsoft 
>>  * Chris Mattman - NASA JPL 
>>  * Jakob Homan - LinkedIn 
>>  * Owen O'Malley - Hortonworks 
>>
>> == Sponsors ==
>>
>> === Champion ===
>> Arun C Murthy <acmurthy at apache dot org>
>>
>> === Nominated Mentors ===
>>  * Alan Gates <gates at apache dot org> – Architect at Hortonworks. Committer for Pig. 
>>  * Arun C Murthy <acmurthy at apache dot org> – Architect at Hortonworks. Committer for Hadoop. 
>>  * Chris Douglas <cdouglas at apache dot org> - Sr. Research Engineer at Microsoft. Committer for Hadoop. 
>>  * Chris Mattman <mattmann at apache dot org> - Sr. Computer Scientist, NASA JPL. Committer for Nutch, OODT and Tika.  
>>  * Jakob Homan <jghoman at apache dot org> – Sr. Software Engineer, LinkedIn. Committer for Hadoop, Kafka, Giraph.
>>  * Owen O'Malley <omalley at apache dot org> – Architect at Hortonworks. Committer for Hadoop, Ambari. 
>>
>> === Sponsoring Entity ===
>> Incubator
>>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org