You are viewing a plain text version of this content. The canonical link for it is here.

Posted to general@incubator.apache.org by Srikanth Sundarrajan <sr...@inmobi.com> on 2013/03/13 18:00:01 UTC

[PROPOSAL] Ivory - Hadoop data management and processing platform

= Ivory Proposal =

== Abstract ==
Ivory is a data processing and management solution for Hadoop designed for
data motion, coordination of data pipelines, lifecycle management, and
data discovery. Ivory enables end consumers to quickly onboard their data
and its associated processing and management tasks on Hadoop clusters.

== Proposal ==
Ivory will enable easy data management via declarative mechanism for
Hadoop. Users of Ivory platform simply define infrastructure endpoints,
data sets and processing rules declaratively. These configurations
are expressed in such a way that the dependencies between
these entities are explicitly described. This information about
inter-dependencies between various entities allows Ivory to orchestrate and
manage various data management functions.

The key use cases that Ivory addresses are:
 * Data Motion
 * Process orchestration and scheduling
 * Policy-based Lifecycle Management
 * Data Discovery
 * Operability/Usability

With these features it is possible for users to onboard their data sets
with
a comprehensive and holistic understanding of how, when and where their
data
is managed across its lifecycle. Complex functions such as retrying
failures,
identifying possible SLA breaches or automated handling of input data
changes
are now simple directives. All the administrative functions and user level
functions are available via RESTful APIs. CLI is simply a wrapper over the
RESTful APIs.

== Background ==
Hadoop and its ecosystem of products have made storing and processing
massive
amounts of data commonplace. This has enabled numerous organizations to
gain
valuable insights that they never could have achieved in the past. While it
is easy to leverage Hadoop for crunching large volumes of data, organizing
data, managing life cycle of data and processing data is fairly involved.
This is solved adequately well in a classic data platform involving data
warehouses and standard ETL (extract-transform-load) tools, but remains
largely
unsolved today. In addition to data processing complexities, Hadoop
presents
new sets of challenges and opportunities relating to management of data.

Data Management on Hadoop encompasses data motion, process orchestration,
lifecycle management, data discovery, etc. among other concerns that are
beyond
ETL. Ivory is a new data processing and management platform for Hadoop that
solves this problem and creates additional opportunities by building on
existing
components within the Hadoop ecosystem (ex. Apache Oozie, Apache Hadoop
DistCp
etc.) without reinventing the wheel. Ivory has been in production at
InMobi,
going on its second year and has been managing hundreds of feeds and
processes.

Ivory is being developed by engineers employed with InMobi, Hortonworks and
Yahoo!. This platform addition will increase the adoption of Apache Hadoop
by
driving data management tractable for end users. We are therefore proposing
to
make Ivory an Apache open source project.

== Rationale ==
The Ivory project aims to improve the usability of Apache Hadoop. As a
result
Apache Hadoop will grow its community of users by increasing the places
Hadoop
can be utilized and the use cases it will solve. By developing Ivory in
Apache
we hope to gather a diverse community of contributors, helping to ensure
that
Ivory is deployable for a broad range of scenarios. Members of the Hadoop
development community will be able to influence Ivory’s roadmap, and
contribute
to it. We believe having Ivory as part of the Apache Hadoop ecosystem will
be
a great benefit to all of Hadoop's users.

== Current Status ==
Ivory is widely deployed in production within InMobi and moving on to its
second year. A version with a valuable set of features is developed by the
list of initial committers and is hosted on github.

=== Meritocracy ===
Our intent with this incubator proposal is to start building a diverse
developer
community around Ivory following the Apache meritocracy model. We have
wanted to
make the project open source and encourage contributors from multiple
organizations from the start. We plan to provide plenty of support to new
developers and to quickly recruit those who make solid contributions to
committer status.

=== Community ===
We are happy to report that the initial team already represents multiple
organizations. We hope to extend the user and developer base further in the
future and build a solid open source community around Ivory.

=== Core Developers ===
Ivory is currently being developed by three engineers from InMobi –
Srikanth Sundarrajan, Shwetha G S, and Shaik Idris, two Hortonworks
employees –
Sanjay Radia and Venkatesh Seetharam. In addition, two Yahoo! employees,
Rohini Palaniswamy and Thiruvel Thirumoolan, are also involved. Srikanth,
Shwetha and Shaik are the original developers. All the engineers have built
two generations of Data Management on Hadoop, having deep expertise in
Hadoop
and are quite familiar with the Hadoop Ecosystem.

=== Alignment ===
The ASF is a natural host for Ivory given that it is already the home of
Hadoop,
Pig, Knox, HCatalog, and other emerging “big data” software projects. Ivory
has
been designed to solve the data management challenges and opportunities of
the
Hadoop ecosystem family of products. Ivory fills the gap that Hadoop
ecosystem
has been lacking in the areas of data processing and data lifecycle
management.

== Known Risks ==

=== Orphaned products & Reliance on Salaried Developers ===
The core developers plan to work full time on the project. There is very
little
risk of Ivory getting orphaned. Ivory is in use by companies we work for so
the
companies have an interest in its continued vitality.

=== Inexperience with Open Source ===
All of the core developers are active users and followers of open source.
Srikanth Sundarrajan has been contributing patches to Apache Hadoop and
Apache
Oozie, Shwetha GS has been contributing patches to Apache Oozie.
Seetharam Venkatesh is a committer on Apache Knox. Rohini Palaniswamy is a
committer on Apache PIG. Sharad Agarwal, Amareshwari SR (also a Apache Hive
PMC member) and Sanjay Radia are PMC members on Apache Hadoop.

=== Homogeneous Developers ===
The current core developers are from diverse set of organizations such as
InMobi, Hortonworks, and, Yahoo!. We expect to quickly establish a
developer
community that includes contributors from several corporations post
incubation.

=== Reliance on Salaried Developers ===
Currently, most developers are paid to do work on Ivory but few are
contributing
in their spare time. However, once the project has a community built around
it
post incubation, we expect to get committers and developers from outside
the
current core developers.

=== Relationships with Other Apache Products ===
Ivory is going to be used by the users of Hadoop and the Hadoop ecosystem
in
general.

=== A Excessive Fascination with the Apache Brand ===
While we respect the reputation of the Apache brand and have no doubts that
it
will attract contributors and users, our interest is primarily to give
Ivory a
solid home as an open source project following an established development
model.
We have also given reasons in the Rationale and Alignment sections.

== Documentation ==
There is documentation in github repository at:
https://github.com/sriksun/Ivory

== Initial Source ==
The source is currently in github repository at:
https://github.com/sriksun/Ivory

== Source and Intellectual Property Submission Plan ==
The complete Ivory code is under Apache Software License 2.

== External Dependencies ==
The dependencies all have Apache compatible licenses. These include BSD,
MIT licensed dependencies.

== Cryptography ==
None

== Required Resources ==

=== Mailing lists ===

 * ivory-dev AT incubator DOT apache DOT org
 * ivory-commits AT incubator DOT apache DOT org
 * ivory-user AT incubator apache DOT org
 * ivory-private AT incubator DOT apache DOT org

=== Subversion Directory ===
https://svn.apache.org/repos/asf/incubator/ivory

=== Issue Tracking ===
JIRA IVORY

== Initial Committers ==
 * Srikanth Sundarrajan (Srikanth.Sundarrajan AT inmobi DOT com)
 * Shwetha GS (shwetha.gs AT inmobi DOT com)
 * Shaik Idris (shaik.idris AT inmobi DOT com)
 * Venkatesh Seetharam (Venkatesh AT apache DOT com)
 * Rohini Palaniswamy (rohinip AT yahoo-inc DOT com)
 * Thiruvel Thirumoolan (thiruvel AT yahoo-inc DOT com)
 * Sanjay Radia (sanjay AT apache DOT org)
 * Sharad Agarwal (sharad AT apache DOT org)
 * Amareshwari SR (amareshwari AT apache DOT org)

== Affiliations ==
 * Srikanth Sundarrajan (InMobi)
 * Shwetha GS (InMobi)
 * Shaik Idris (InMobi)
 * Venkatesh Seetharam (Hortonworks Inc)
 * Rohini Palaniswamy (Yahoo! Inc)
 * Thiruvel Thirumoolan (Yahoo! Inc)
 * Sanjay Radia (Hortonworks Inc)
 * Sharad Agarwal (InMobi)
 * Amareshwari SR (InMobi)

== Sponsors ==

=== Champion ===
 * Arun C Murthy (acmurthy at apache dot org)

=== Nominated Mentors ===
 * Alan Gates (gates AT apache DOT org)
 * Chris Douglas (cdouglas AT apache DOT org)
 * Devaraj  Das (ddas AT apache DOT org)
 * Owen O’Malley (omalley AT apache DOT org)

=== Sponsoring Entity ===
Incubator PMC

-- 
_____________________________________________________________
The information contained in this communication is intended solely for the 
use of the individual or entity to whom it is addressed and others 
authorized to receive it. It may contain confidential or legally privileged 
information. If you are not the intended recipient you are hereby notified 
that any disclosure, copying, distribution or taking any action in reliance 
on the contents of this information is strictly prohibited and may be 
unlawful. If you have received this communication in error, please notify 
us immediately by responding to this email and then delete it from your 
system. The firm is neither liable for the proper and complete transmission 
of the information contained in this communication nor for any delay in its 
receipt.

Re: [PROPOSAL] Ivory - Hadoop data management and processing platform

Posted by Niall Pemberton <ni...@gmail.com>.

+1

Niall

On Wed, Mar 13, 2013 at 5:00 PM, Srikanth Sundarrajan
<sr...@inmobi.com> wrote:
> = Ivory Proposal =
>
> == Abstract ==
> Ivory is a data processing and management solution for Hadoop designed for
> data motion, coordination of data pipelines, lifecycle management, and
> data discovery. Ivory enables end consumers to quickly onboard their data
> and its associated processing and management tasks on Hadoop clusters.
>
> == Proposal ==
> Ivory will enable easy data management via declarative mechanism for
> Hadoop. Users of Ivory platform simply define infrastructure endpoints,
> data sets and processing rules declaratively. These configurations
> are expressed in such a way that the dependencies between
> these entities are explicitly described. This information about
> inter-dependencies between various entities allows Ivory to orchestrate and
> manage various data management functions.
>
> The key use cases that Ivory addresses are:
>  * Data Motion
>  * Process orchestration and scheduling
>  * Policy-based Lifecycle Management
>  * Data Discovery
>  * Operability/Usability
>
> With these features it is possible for users to onboard their data sets
> with
> a comprehensive and holistic understanding of how, when and where their
> data
> is managed across its lifecycle. Complex functions such as retrying
> failures,
> identifying possible SLA breaches or automated handling of input data
> changes
> are now simple directives. All the administrative functions and user level
> functions are available via RESTful APIs. CLI is simply a wrapper over the
> RESTful APIs.
>
> == Background ==
> Hadoop and its ecosystem of products have made storing and processing
> massive
> amounts of data commonplace. This has enabled numerous organizations to
> gain
> valuable insights that they never could have achieved in the past. While it
> is easy to leverage Hadoop for crunching large volumes of data, organizing
> data, managing life cycle of data and processing data is fairly involved.
> This is solved adequately well in a classic data platform involving data
> warehouses and standard ETL (extract-transform-load) tools, but remains
> largely
> unsolved today. In addition to data processing complexities, Hadoop
> presents
> new sets of challenges and opportunities relating to management of data.
>
> Data Management on Hadoop encompasses data motion, process orchestration,
> lifecycle management, data discovery, etc. among other concerns that are
> beyond
> ETL. Ivory is a new data processing and management platform for Hadoop that
> solves this problem and creates additional opportunities by building on
> existing
> components within the Hadoop ecosystem (ex. Apache Oozie, Apache Hadoop
> DistCp
> etc.) without reinventing the wheel. Ivory has been in production at
> InMobi,
> going on its second year and has been managing hundreds of feeds and
> processes.
>
> Ivory is being developed by engineers employed with InMobi, Hortonworks and
> Yahoo!. This platform addition will increase the adoption of Apache Hadoop
> by
> driving data management tractable for end users. We are therefore proposing
> to
> make Ivory an Apache open source project.
>
> == Rationale ==
> The Ivory project aims to improve the usability of Apache Hadoop. As a
> result
> Apache Hadoop will grow its community of users by increasing the places
> Hadoop
> can be utilized and the use cases it will solve. By developing Ivory in
> Apache
> we hope to gather a diverse community of contributors, helping to ensure
> that
> Ivory is deployable for a broad range of scenarios. Members of the Hadoop
> development community will be able to influence Ivory’s roadmap, and
> contribute
> to it. We believe having Ivory as part of the Apache Hadoop ecosystem will
> be
> a great benefit to all of Hadoop's users.
>
> == Current Status ==
> Ivory is widely deployed in production within InMobi and moving on to its
> second year. A version with a valuable set of features is developed by the
> list of initial committers and is hosted on github.
>
> === Meritocracy ===
> Our intent with this incubator proposal is to start building a diverse
> developer
> community around Ivory following the Apache meritocracy model. We have
> wanted to
> make the project open source and encourage contributors from multiple
> organizations from the start. We plan to provide plenty of support to new
> developers and to quickly recruit those who make solid contributions to
> committer status.
>
> === Community ===
> We are happy to report that the initial team already represents multiple
> organizations. We hope to extend the user and developer base further in the
> future and build a solid open source community around Ivory.
>
> === Core Developers ===
> Ivory is currently being developed by three engineers from InMobi –
> Srikanth Sundarrajan, Shwetha G S, and Shaik Idris, two Hortonworks
> employees –
> Sanjay Radia and Venkatesh Seetharam. In addition, two Yahoo! employees,
> Rohini Palaniswamy and Thiruvel Thirumoolan, are also involved. Srikanth,
> Shwetha and Shaik are the original developers. All the engineers have built
> two generations of Data Management on Hadoop, having deep expertise in
> Hadoop
> and are quite familiar with the Hadoop Ecosystem.
>
> === Alignment ===
> The ASF is a natural host for Ivory given that it is already the home of
> Hadoop,
> Pig, Knox, HCatalog, and other emerging “big data” software projects. Ivory
> has
> been designed to solve the data management challenges and opportunities of
> the
> Hadoop ecosystem family of products. Ivory fills the gap that Hadoop
> ecosystem
> has been lacking in the areas of data processing and data lifecycle
> management.
>
> == Known Risks ==
>
> === Orphaned products & Reliance on Salaried Developers ===
> The core developers plan to work full time on the project. There is very
> little
> risk of Ivory getting orphaned. Ivory is in use by companies we work for so
> the
> companies have an interest in its continued vitality.
>
> === Inexperience with Open Source ===
> All of the core developers are active users and followers of open source.
> Srikanth Sundarrajan has been contributing patches to Apache Hadoop and
> Apache
> Oozie, Shwetha GS has been contributing patches to Apache Oozie.
> Seetharam Venkatesh is a committer on Apache Knox. Rohini Palaniswamy is a
> committer on Apache PIG. Sharad Agarwal, Amareshwari SR (also a Apache Hive
> PMC member) and Sanjay Radia are PMC members on Apache Hadoop.
>
> === Homogeneous Developers ===
> The current core developers are from diverse set of organizations such as
> InMobi, Hortonworks, and, Yahoo!. We expect to quickly establish a
> developer
> community that includes contributors from several corporations post
> incubation.
>
> === Reliance on Salaried Developers ===
> Currently, most developers are paid to do work on Ivory but few are
> contributing
> in their spare time. However, once the project has a community built around
> it
> post incubation, we expect to get committers and developers from outside
> the
> current core developers.
>
> === Relationships with Other Apache Products ===
> Ivory is going to be used by the users of Hadoop and the Hadoop ecosystem
> in
> general.
>
> === A Excessive Fascination with the Apache Brand ===
> While we respect the reputation of the Apache brand and have no doubts that
> it
> will attract contributors and users, our interest is primarily to give
> Ivory a
> solid home as an open source project following an established development
> model.
> We have also given reasons in the Rationale and Alignment sections.
>
> == Documentation ==
> There is documentation in github repository at:
> https://github.com/sriksun/Ivory
>
> == Initial Source ==
> The source is currently in github repository at:
> https://github.com/sriksun/Ivory
>
> == Source and Intellectual Property Submission Plan ==
> The complete Ivory code is under Apache Software License 2.
>
> == External Dependencies ==
> The dependencies all have Apache compatible licenses. These include BSD,
> MIT licensed dependencies.
>
> == Cryptography ==
> None
>
> == Required Resources ==
>
> === Mailing lists ===
>
>  * ivory-dev AT incubator DOT apache DOT org
>  * ivory-commits AT incubator DOT apache DOT org
>  * ivory-user AT incubator apache DOT org
>  * ivory-private AT incubator DOT apache DOT org
>
> === Subversion Directory ===
> https://svn.apache.org/repos/asf/incubator/ivory
>
> === Issue Tracking ===
> JIRA IVORY
>
> == Initial Committers ==
>  * Srikanth Sundarrajan (Srikanth.Sundarrajan AT inmobi DOT com)
>  * Shwetha GS (shwetha.gs AT inmobi DOT com)
>  * Shaik Idris (shaik.idris AT inmobi DOT com)
>  * Venkatesh Seetharam (Venkatesh AT apache DOT com)
>  * Rohini Palaniswamy (rohinip AT yahoo-inc DOT com)
>  * Thiruvel Thirumoolan (thiruvel AT yahoo-inc DOT com)
>  * Sanjay Radia (sanjay AT apache DOT org)
>  * Sharad Agarwal (sharad AT apache DOT org)
>  * Amareshwari SR (amareshwari AT apache DOT org)
>
> == Affiliations ==
>  * Srikanth Sundarrajan (InMobi)
>  * Shwetha GS (InMobi)
>  * Shaik Idris (InMobi)
>  * Venkatesh Seetharam (Hortonworks Inc)
>  * Rohini Palaniswamy (Yahoo! Inc)
>  * Thiruvel Thirumoolan (Yahoo! Inc)
>  * Sanjay Radia (Hortonworks Inc)
>  * Sharad Agarwal (InMobi)
>  * Amareshwari SR (InMobi)
>
> == Sponsors ==
>
> === Champion ===
>  * Arun C Murthy (acmurthy at apache dot org)
>
> === Nominated Mentors ===
>  * Alan Gates (gates AT apache DOT org)
>  * Chris Douglas (cdouglas AT apache DOT org)
>  * Devaraj  Das (ddas AT apache DOT org)
>  * Owen O’Malley (omalley AT apache DOT org)
>
> === Sponsoring Entity ===
> Incubator PMC
>
> --
> _____________________________________________________________
> The information contained in this communication is intended solely for the
> use of the individual or entity to whom it is addressed and others
> authorized to receive it. It may contain confidential or legally privileged
> information. If you are not the intended recipient you are hereby notified
> that any disclosure, copying, distribution or taking any action in reliance
> on the contents of this information is strictly prohibited and may be
> unlawful. If you have received this communication in error, please notify
> us immediately by responding to this email and then delete it from your
> system. The firm is neither liable for the proper and complete transmission
> of the information contained in this communication nor for any delay in its
> receipt.

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: [PROPOSAL] Ivory - Hadoop data management and processing platform

Posted by Roman Shaposhnik <rv...@apache.org>.

On Fri, Mar 15, 2013 at 11:09 AM, Seetharam Venkatesh
<ve...@innerzeal.com> wrote:
> Hi Henry,
>
> Is there a concern with the current name? The closest is a tool for
> Information Retrieval. Not sure if there is an overlap.  We will also bring
> this up with the champion and mentors to see if this needs to be vet with
> trademarks folks as well.

I think there's a *bit* of a concern (nothing blocking, mind you!). The way
I see it one of the major points of going through the incubation is to
bootstrap your community. Not only community of developers, but users
as well. In my experience there's no greater enemy to the community
building endeavor than confusion. And given that the first thing I see
on google for Hadoop Ivory is this: http://lintool.github.com/Ivory/
I'd say it is confusing.

Just my 2c.

Thanks,
Roman.

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: [PROPOSAL] Ivory - Hadoop data management and processing platform

Posted by Henry Saputra <he...@gmail.com>.

+1 =)




On Fri, Mar 15, 2013 at 5:22 PM, Joe Schaefer <jo...@yahoo.com>wrote:

> Can we pretty-please do this *before* resources
> are requested, just to save us poor infra saps
> the trouble of renaming everything?
>
>
>
>
>
> >________________________________
> > From: Jakob Homan <jg...@gmail.com>
> >To: general@incubator.apache.org
> >Sent: Friday, March 15, 2013 8:18 PM
> >Subject: Re: [PROPOSAL] Ivory - Hadoop data management and processing
> platform
> >
> >As part of Incubation a suitable name search will be done to verify the
> >name's appropriate.  I imagine Ivory would fail this test based on the
> >prior project, so this Ivory would need to find a new name.
> Alternatively,
> >before the vote, the Ivory folks can find another name.  This has happened
> >before (Howl -> HCatalog), so it's not a huge reason to be concerned.
> >
> >
> >On Fri, Mar 15, 2013 at 5:15 PM, Dmitriy Ryaboy <dv...@gmail.com>
> wrote:
> >
> >> It would be awfully nice of you not to stomp on another hadoop ecosystem
> >> project's google-fu when your project becomes very successful and
> admired
> >> across the hadoopverse :)
> >>
> >> Ivory isn't a fly-by-night project someone threw up on github -- it's
> >> generated over a dozen peer-reviewed papers, and has many watchers and
> dev
> >> forks.
> >>
> >> I don't have a vote here, but I'd say that yes, this will lead to
> confusion
> >> when people look for hadoop ivory.
> >>
> >> D
> >>
> >>
> >> On Fri, Mar 15, 2013 at 11:09 AM, Seetharam Venkatesh <
> >> venkatesh@innerzeal.com> wrote:
> >>
> >> > Hi Henry,
> >> >
> >> > Is there a concern with the current name? The closest is a tool for
> >> > Information Retrieval. Not sure if there is an overlap.  We will also
> >> bring
> >> > this up with the champion and mentors to see if this needs to be vet
> with
> >> > trademarks folks as well.
> >> >
> >> > Your suggestions are welcome.
> >> >
> >> > Thanks!
> >> >
> >> >
> >> > On Fri, Mar 15, 2013 at 10:18 AM, Henry Saputra <
> henry.saputra@gmail.com
> >> > >wrote:
> >> >
> >> > > HI Srikanth,
> >> > >
> >> > > So does the Ivory name stay or once the podling near graduation it
> will
> >> > try
> >> > > to find another name?
> >> > >
> >> > > - Henry
> >> > >
> >> > >
> >> > > On Fri, Mar 15, 2013 at 12:34 AM, Srikanth Sundarrajan <
> >> > > srikanth.sundarrajan@inmobi.com> wrote:
> >> > >
> >> > > > Made few edits to the proposal (
> >> > > > http://wiki.apache.org/incubator/IvoryProposal) as per the
> feedback
> >> > > > received so far.
> >> > > >
> >> > > > Regards
> >> > > > Srikanth Sundarrajan
> >> > > >
> >> > > > = Ivory Proposal =
> >> > > >
> >> > > > == Abstract ==
> >> > > > Ivory is a data processing and management solution for Hadoop
> >> designed
> >> > > > for data motion, coordination of data pipelines, lifecycle
> >> management,
> >> > > > and data discovery. Ivory enables end consumers to quickly onboard
> >> > > > their data and its associated processing and management tasks on
> >> > > > Hadoop clusters.
> >> > > >
> >> > > > == Proposal ==
> >> > > > Ivory will enable easy data management via declarative mechanism
> for
> >> > > > Hadoop. Users of Ivory platform simply define infrastructure
> >> > > > endpoints, data sets and processing rules declaratively. These
> >> > > > declarative configurations are expressed in such a way that the
> >> > > > dependencies between these configured entities are explicitly
> >> > > > described. This information about inter-dependencies between
> various
> >> > > > entities allows Ivory to orchestrate and manage various data
> >> > > > management functions.
> >> > > >
> >> > > > The key use cases that Ivory addresses are:
> >> > > >  * Data Motion
> >> > > >  * Process orchestration and scheduling
> >> > > >  * Policy-based Lifecycle Management
> >> > > >  * Data Discovery
> >> > > >  * Operability/Usability
> >> > > >
> >> > > > With these features it is possible for users to onboard their data
> >> > > > sets with a comprehensive and holistic understanding of how, when
> and
> >> > > > where their data is managed across its lifecycle. Complex
> functions
> >> > > > such as retrying failures, identifying possible SLA breaches or
> >> > > > automated handling of input data changes are now simple
> directives.
> >> > > > All the administrative functions and user level functions are
> >> > > > available via RESTful APIs. CLI is simply a wrapper over the
> RESTful
> >> > > > APIs.
> >> > > >
> >> > > > == Background ==
> >> > > > Hadoop and its ecosystem of products have made storing and
> processing
> >> > > > massive amounts of data commonplace. This has enabled numerous
> >> > > > organizations to gain valuable insights that they never could have
> >> > > > achieved in the past. While it is easy to leverage Hadoop for
> >> > > > crunching large volumes of data, organizing data, managing life
> cycle
> >> > > > of data and processing data is fairly involved. This is solved
> >> > > > adequately well in a classic data platform involving data
> warehouses
> >> > > > and standard ETL (extract-transform-load) tools, but remains
> largely
> >> > > > unsolved today. In addition to data processing complexities,
> Hadoop
> >> > > > presents new sets of challenges and opportunities relating to
> >> > > > management of data.
> >> > > >
> >> > > > Data Management on Hadoop encompasses data motion, process
> >> > > > orchestration, lifecycle management, data discovery, etc. among
> other
> >> > > > concerns that are beyond ETL. Ivory is a new data processing and
> >> > > > management platform for Hadoop that solves this problem and
> creates
> >> > > > additional opportunities by building on existing components within
> >> the
> >> > > > Hadoop ecosystem (ex. Apache Oozie, Apache Hadoop DistCp etc.)
> >> without
> >> > > > reinventing the wheel. Ivory has been in production at InMobi,
> going
> >> > > > on its second year and has been managing hundreds of feeds and
> >> > > > processes.
> >> > > >
> >> > > > Ivory is being developed by engineers employed with InMobi and
> >> > > > Hortonworks. This platform addition will increase the adoption of
> >> > > > Apache Hadoop by driving data management tractable for end users.
> We
> >> > > > are therefore proposing to make Ivory an Apache open source
> project.
> >> > > >
> >> > > > == Rationale ==
> >> > > > The Ivory project aims to improve the usability of Apache Hadoop.
> As
> >> a
> >> > > > result Apache Hadoop will grow its community of users by
> increasing
> >> > > > the places Hadoop can be utilized and the use cases it will
> solve. By
> >> > > > developing Ivory in Apache we hope to gather a diverse community
> of
> >> > > > contributors, helping to ensure that Ivory is deployable for a
> broad
> >> > > > range of scenarios. Members of the Hadoop development community
> will
> >> > > > be able to influence Ivory’s roadmap, and contribute to it. We
> >> believe
> >> > > > having Ivory as part of the Apache Hadoop ecosystem will be a
> great
> >> > > > benefit to all of Hadoop's users.
> >> > > >
> >> > > > == Current Status ==
> >> > > > Ivory is widely deployed in production within InMobi and moving
> on to
> >> > > > its second year. A version with a valuable set of features is
> >> > > > developed by the list of initial committers and is hosted on
> github.
> >> > > >
> >> > > > === Meritocracy ===
> >> > > > Our intent with this incubator proposal is to start building a
> >> diverse
> >> > > > developer community around Ivory following the Apache meritocracy
> >> > > > model. We have wanted to make the project open source and
> encourage
> >> > > > contributors from multiple organizations from the start. We plan
> to
> >> > > > provide plenty of support to new developers and to quickly recruit
> >> > > > those who make solid contributions to committer status.
> >> > > >
> >> > > > === Community ===
> >> > > > We are happy to report that the initial team already represents
> >> > > > multiple organizations. We hope to extend the user and developer
> base
> >> > > > further in the future and build a solid open source community
> around
> >> > > > Ivory.
> >> > > >
> >> > > > === Core Developers ===
> >> > > > Ivory is currently being developed by three engineers from InMobi
> –
> >> > > > Srikanth Sunderrajan, Shwetha G S, and Shaik Idris, two
> Hortonworks
> >> > > > employees – Sanjay Radia and Venkatesh Seetharam. In addition,
> Rohini
> >> > > > Palaniswamy and Thiruvel Thirumoolan, were also involved in the
> >> > > > initial design discussions. Srikanth, Shwetha and Shaik are the
> >> > > > original developers. All the engineers have built two generations
> of
> >> > > > Data Management on Hadoop, having deep expertise in Hadoop and are
> >> > > > quite familiar with the Hadoop Ecosystem. Samarth Gupta & Rishu
> >> > > > Mehrothra, both from InMobi have build the QA automation for
> Ivory.
> >> > > >
> >> > > > === Alignment ===
> >> > > > The ASF is a natural host for Ivory given that it is already the
> home
> >> > > > of Hadoop, Pig, Knox, HCatalog, and other emerging “big data”
> >> software
> >> > > > projects. Ivory has been designed to solve the data management
> >> > > > challenges and opportunities of the Hadoop ecosystem family of
> >> > > > products. Ivory fills the gap that Hadoop ecosystem has been
> lacking
> >> > > > in the areas of data processing and data lifecycle management.
> >> > > >
> >> > > > == Known Risks ==
> >> > > >
> >> > > > === Orphaned products & Reliance on Salaried Developers ===
> >> > > > The core developers plan to work full time on the project. There
> is
> >> > > > very little risk of Ivory getting orphaned. Ivory is in use by
> >> > > > companies we work for so the companies have an interest in its
> >> > > > continued vitality.
> >> > > >
> >> > > > === Inexperience with Open Source ===
> >> > > > All of the core developers are active users and followers of open
> >> > > > source. Srikanth Sundarrajan has been contributing patches to
> Apache
> >> > > > Hadoop and Apache Oozie, Shwetha GS has been contributing patches
> to
> >> > > > Apache Oozie.  Seetharam Venkatesh is a committer on Apache Knox.
> >> > > > Sharad Agarwal, Amareshwari SR (also a Apache Hive PMC member) and
> >> > > > Sanjay Radia are PMC members on Apache Hadoop.
> >> > > >
> >> > > > === Homogeneous Developers ===
> >> > > > The current core developers are from diverse set of organizations
> >> such
> >> > > > as InMobi and Hortonworks. We expect to quickly establish a
> developer
> >> > > > community that includes contributors from several corporations
> post
> >> > > > incubation.
> >> > > >
> >> > > > === Reliance on Salaried Developers ===
> >> > > > Currently, most developers are paid to do work on Ivory but few
> are
> >> > > > contributing in their spare time. However, once the project has a
> >> > > > community built around it post incubation, we expect to get
> >> committers
> >> > > > and developers from outside the current core developers.
> >> > > >
> >> > > > === Relationships with Other Apache Products ===
> >> > > > Ivory is going to be used by the users of Hadoop and the Hadoop
> >> > > > ecosystem in general.
> >> > > >
> >> > > > === A Excessive Fascination with the Apache Brand ===
> >> > > > While we respect the reputation of the Apache brand and have no
> >> doubts
> >> > > > that it will attract contributors and users, our interest is
> >> primarily
> >> > > > to give Ivory a solid home as an open source project following an
> >> > > > established development model. We have also given reasons in the
> >> > > > Rationale and Alignment sections.
> >> > > >
> >> > > > == Documentation ==http://wiki.apache.org/incubator/IvoryProposal
> >> > > >
> >> > > > == Initial Source ==
> >> > > > The source is currently in github repository at:
> >> > > > https://github.com/sriksun/Ivory
> >> > > >
> >> > > > == Source and Intellectual Property Submission Plan ==
> >> > > > The complete Ivory code is under Apache Software License 2.
> >> > > >
> >> > > > == External Dependencies ==
> >> > > > The dependencies all have Apache compatible licenses. These
> include
> >> > > > BSD, MIT licensed dependencies.
> >> > > >
> >> > > > == Cryptography ==
> >> > > > None
> >> > > >
> >> > > > == Required Resources ==
> >> > > >
> >> > > > === Mailing lists ===
> >> > > >
> >> > > >  * ivory-dev AT incubator DOT apache DOT org
> >> > > >  * ivory-commits AT incubator DOT apache DOT org
> >> > > >  * ivory-user AT incubator apache DOT org
> >> > > >  * ivory-private AT incubator DOT apache DOT org
> >> > > >
> >> > > > === Subversion Directory ===
> >> > > > Git is the preferred source control system: git://
> >> git.apache.org/ivory
> >> > > >
> >> > > > === Issue Tracking ===
> >> > > > JIRA IVORY
> >> > > >
> >> > > > == Initial Committers ==
> >> > > >  * Srikanth Sundarrajan (Srikanth.Sundarrajan AT inmobi DOT com)
> >> > > >  * Shwetha GS (shwetha.gs AT inmobi DOT com)
> >> > > >  * Shaik Idris (shaik.idris AT inmobi DOT com)
> >> > > >  * Venkatesh Seetharam (Venkatesh AT apache DOT org)
> >> > > >  * Sanjay Radia (sanjay AT apache DOT org)
> >> > > >  * Sharad Agarwal (sharad AT apache DOT org)
> >> > > >  * Amareshwari SR (amareshwari AT apache DOT org)
> >> > > >  * Samarth Gupta (samarth.gupta AT inmobi DOT com)
> >> > > >  * Rishu Mehrothra (rishu.mehrothra AT inmobi DOT com)
> >> > > >
> >> > > > == Affiliations ==
> >> > > >  * Srikanth Sundarrajan (InMobi)
> >> > > >  * Shwetha GS (InMobi)
> >> > > >  * Shaik Idris (InMobi)
> >> > > >  * Venkatesh Seetharam (Hortonworks Inc.)
> >> > > >  * Sanjay Radia (Hortonworks Inc.)
> >> > > >  * Sharad Agarwal (InMobi)
> >> > > >  * Amareshwari SR (InMobi)
> >> > > >  * Samarth Gupta (InMobi)
> >> > > >  * Rishu Mehrothra (InMobi)
> >> > > >
> >> > > > == Sponsors ==
> >> > > >
> >> > > > === Champion ===
> >> > > >  * Arun C Murthy (acmurthy at apache dot org)
> >> > > >
> >> > > > === Nominated Mentors ===
> >> > > >  * Alan Gates (gates AT apache DOT org)
> >> > > >  * Chris Douglas (cdouglas AT apache DOT org)
> >> > > >  * Devaraj  Das (ddas AT apache DOT org)
> >> > > >  * Owen O’Malley (omalley AT apache DOT org)
> >> > > >
> >> > > > === Sponsoring Entity ===
> >> > > > Incubator PMC
> >> > > >
> >> > > > --
> >> > > > _____________________________________________________________
> >> > > > The information contained in this communication is intended solely
> >> for
> >> > > the
> >> > > > use of the individual or entity to whom it is addressed and others
> >> > > > authorized to receive it. It may contain confidential or legally
> >> > > privileged
> >> > > > information. If you are not the intended recipient you are hereby
> >> > > notified
> >> > > > that any disclosure, copying, distribution or taking any action in
> >> > > reliance
> >> > > > on the contents of this information is strictly prohibited and
> may be
> >> > > > unlawful. If you have received this communication in error, please
> >> > notify
> >> > > > us immediately by responding to this email and then delete it from
> >> your
> >> > > > system. The firm is neither liable for the proper and complete
> >> > > transmission
> >> > > > of the information contained in this communication nor for any
> delay
> >> in
> >> > > its
> >> > > > receipt.
> >> > > >
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> > Regards,
> >> > Venkatesh
> >> >
> >> > http://in.linkedin.com/in/seetharamvenkatesh
> >> > http://about.me/SeetharamVenkatesh
> >> >
> >> > “Perfection (in design) is achieved not when there is nothing more to
> >> add,
> >> > but rather when there is nothing more to take away.”
> >> > - Antoine de Saint-Exupéry
> >> >
> >>
> >
> >
> >
>

Re: [PROPOSAL] Ivory - Hadoop data management and processing platform

Posted by Ted Dunning <te...@gmail.com>.

Also the name of the dominant credit card fraud detection system.

Everybody loves the name.

On Thu, Mar 21, 2013 at 6:16 AM, David Jencks <da...@yahoo.com>wrote:

> Falcon is also the name of a database engine:
>
> http://en.wikipedia.org/wiki/Falcon_(storage_engine)
>
> the name of a programming language
>
> http://falconpl.org/project_docs/core/index.html
>
> and very close to the name of some kind of oracle add on vendor:
>
>
> http://www.falconstor.com/solutions/business-applications/oracle-database-solutions
>
> david jencks
>
> On Mar 20, 2013, at 10:02 PM, Srikanth Sundarrajan <
> srikanth.sundarrajan@inmobi.com> wrote:
>
> > Hi Justin,
> >    I am assuming it won't be an issue as Falcon used within the
> > Adobe/Apache Flex isn't related to Hadoop.
> >
> > Regards
> > Srikanth Sundarrajan
> >
> > On Thu, Mar 21, 2013 at 10:23 AM, Justin Mclean <justinmclean@gmail.com
> >wrote:
> >
> >> Hi,
> >>
> >> JFYI Falcon is already a name used by Adobe and Apache Flex. It's an AS
> >> compiler and an experimental AS to JS compiler (Falcon JS) - not sure if
> >> that is an issue or not.
> >>
> >> Justin
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> >> For additional commands, e-mail: general-help@incubator.apache.org
> >>
> >>
> >
> > --
> > _____________________________________________________________
> > The information contained in this communication is intended solely for
> the
> > use of the individual or entity to whom it is addressed and others
> > authorized to receive it. It may contain confidential or legally
> privileged
> > information. If you are not the intended recipient you are hereby
> notified
> > that any disclosure, copying, distribution or taking any action in
> reliance
> > on the contents of this information is strictly prohibited and may be
> > unlawful. If you have received this communication in error, please notify
> > us immediately by responding to this email and then delete it from your
> > system. The firm is neither liable for the proper and complete
> transmission
> > of the information contained in this communication nor for any delay in
> its
> > receipt.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>
>

Re: [PROPOSAL] Ivory - Hadoop data management and processing platform

Posted by David Jencks <da...@yahoo.com>.

Falcon is also the name of a database engine:

http://en.wikipedia.org/wiki/Falcon_(storage_engine)

the name of a programming language

http://falconpl.org/project_docs/core/index.html

and very close to the name of some kind of oracle add on vendor:

http://www.falconstor.com/solutions/business-applications/oracle-database-solutions

david jencks

On Mar 20, 2013, at 10:02 PM, Srikanth Sundarrajan <sr...@inmobi.com> wrote:

> Hi Justin,
>    I am assuming it won't be an issue as Falcon used within the
> Adobe/Apache Flex isn't related to Hadoop.
> 
> Regards
> Srikanth Sundarrajan
> 
> On Thu, Mar 21, 2013 at 10:23 AM, Justin Mclean <ju...@gmail.com>wrote:
> 
>> Hi,
>> 
>> JFYI Falcon is already a name used by Adobe and Apache Flex. It's an AS
>> compiler and an experimental AS to JS compiler (Falcon JS) - not sure if
>> that is an issue or not.
>> 
>> Justin
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> For additional commands, e-mail: general-help@incubator.apache.org
>> 
>> 
> 
> -- 
> _____________________________________________________________
> The information contained in this communication is intended solely for the 
> use of the individual or entity to whom it is addressed and others 
> authorized to receive it. It may contain confidential or legally privileged 
> information. If you are not the intended recipient you are hereby notified 
> that any disclosure, copying, distribution or taking any action in reliance 
> on the contents of this information is strictly prohibited and may be 
> unlawful. If you have received this communication in error, please notify 
> us immediately by responding to this email and then delete it from your 
> system. The firm is neither liable for the proper and complete transmission 
> of the information contained in this communication nor for any delay in its 
> receipt.


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: [PROPOSAL] Ivory - Hadoop data management and processing platform

Posted by Justin Mclean <ju...@gmail.com>.

Hi,

> I agree with that - if the Flex PMC thinks otherwise they should speak up now.

I don't see any issues (different software space) but will ask the rest of the Flex PMC.

Thanks,
Justin
---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: [PROPOSAL] Ivory - Hadoop data management and processing platform

Posted by Bertrand Delacretaz <bd...@apache.org>.

On Thu, Mar 21, 2013 at 6:02 AM, Srikanth Sundarrajan
<sr...@inmobi.com> wrote:
>...I am assuming it won't be an issue as Falcon used within the
> Adobe/Apache Flex isn't related to Hadoop...

I agree with that - if the Flex PMC thinks otherwise they should speak up now.

-Bertrand

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: [PROPOSAL] Ivory - Hadoop data management and processing platform

Posted by Srikanth Sundarrajan <sr...@inmobi.com>.

Hi Justin,
    I am assuming it won't be an issue as Falcon used within the
Adobe/Apache Flex isn't related to Hadoop.

Regards
Srikanth Sundarrajan

On Thu, Mar 21, 2013 at 10:23 AM, Justin Mclean <ju...@gmail.com>wrote:

> Hi,
>
> JFYI Falcon is already a name used by Adobe and Apache Flex. It's an AS
> compiler and an experimental AS to JS compiler (Falcon JS) - not sure if
> that is an issue or not.
>
> Justin
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>
>

-- 
_____________________________________________________________
The information contained in this communication is intended solely for the 
use of the individual or entity to whom it is addressed and others 
authorized to receive it. It may contain confidential or legally privileged 
information. If you are not the intended recipient you are hereby notified 
that any disclosure, copying, distribution or taking any action in reliance 
on the contents of this information is strictly prohibited and may be 
unlawful. If you have received this communication in error, please notify 
us immediately by responding to this email and then delete it from your 
system. The firm is neither liable for the proper and complete transmission 
of the information contained in this communication nor for any delay in its 
receipt.

Re: [PROPOSAL] Ivory - Hadoop data management and processing platform

Posted by Justin Mclean <ju...@gmail.com>.

Hi,

JFYI Falcon is already a name used by Adobe and Apache Flex. It's an AS compiler and an experimental AS to JS compiler (Falcon JS) - not sure if that is an issue or not.

Justin
---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: [PROPOSAL] Ivory - Hadoop data management and processing platform

Posted by Srikanth Sundarrajan <sr...@inmobi.com>.

As there were a few concerns relating to the name of the project, we are
renaming this project to Falcon. The proposal has been updated accordingly.
(PS: http://wiki.apache.org/incubator/FalconProposal)

= Falcon Proposal =

== Abstract ==
Falcon is a data processing and management solution for Hadoop
designed for data motion, coordination of data pipelines, lifecycle
management, and data discovery. Falcon enables end consumers to
quickly onboard their data and its associated processing and
management tasks on Hadoop clusters.

== Proposal ==
Falcon will enable easy data management via declarative mechanism for
Hadoop. Users of Falcon platform simply define infrastructure
endpoints, data sets and processing rules declaratively. These
declarative configurations are expressed in such a way that the
dependencies between these configured entities are explicitly
described. This information about inter-dependencies between various
entities allows Falcon to orchestrate and manage various data
management functions.

The key use cases that Falcon addresses are:
 * Data Motion
 * Process orchestration and scheduling
 * Policy-based Lifecycle Management
 * Data Discovery
 * Operability/Usability

With these features it is possible for users to onboard their data
sets with a comprehensive and holistic understanding of how, when and
where their data is managed across its lifecycle. Complex functions
such as retrying failures, identifying possible SLA breaches or
automated handling of input data changes are now simple directives.
All the administrative functions and user level functions are
available via RESTful APIs. CLI is simply a wrapper over the RESTful
APIs.

== Background ==
Hadoop and its ecosystem of products have made storing and processing
massive amounts of data commonplace. This has enabled numerous
organizations to gain valuable insights that they never could have
achieved in the past. While it is easy to leverage Hadoop for
crunching large volumes of data, organizing data, managing life cycle
of data and processing data is fairly involved. This is solved
adequately well in a classic data platform involving data warehouses
and standard ETL (extract-transform-load) tools, but remains largely
unsolved today. In addition to data processing complexities, Hadoop
presents new sets of challenges and opportunities relating to
management of data.

Data Management on Hadoop encompasses data motion, process
orchestration, lifecycle management, data discovery, etc. among other
concerns that are beyond ETL. Falcon is a new data processing and
management platform for Hadoop that solves this problem and creates
additional opportunities by building on existing components within the
Hadoop ecosystem (ex. Apache Oozie, Apache Hadoop DistCp etc.) without
reinventing the wheel. Falcon has been in production at InMobi, going
on its second year and has been managing hundreds of feeds and
processes.

Falcon is being developed by engineers employed with InMobi and
Hortonworks. This platform addition will increase the adoption of
Apache Hadoop by driving data management tractable for end users. We
are therefore proposing to make Falcon an Apache open source project.

== Rationale ==
The Falcon project aims to improve the usability of Apache Hadoop. As
a result Apache Hadoop will grow its community of users by increasing
the places Hadoop can be utilized and the use cases it will solve. By
developing Falcon in Apache we hope to gather a diverse community of
contributors, helping to ensure that Falcon is deployable for a broad
range of scenarios. Members of the Hadoop development community will
be able to influence Falcon’s roadmap, and contribute to it. We
believe having Falcon as part of the Apache Hadoop ecosystem will be a
great benefit to all of Hadoop's users.

== Current Status ==
Falcon is widely deployed in production within InMobi and moving on to
its second year. A version with a valuable set of features is
developed by the list of initial committers and is hosted on github.

=== Meritocracy ===
Our intent with this incubator proposal is to start building a diverse
developer community around Falcon following the Apache meritocracy
model. We have wanted to make the project open source and encourage
contributors from multiple organizations from the start. We plan to
provide plenty of support to new developers and to quickly recruit
those who make solid contributions to committer status.

=== Community ===
We are happy to report that the initial team already represents
multiple organizations. We hope to extend the user and developer base
further in the future and build a solid open source community around
Falcon.

=== Core Developers ===
Falcon is currently being developed by three engineers from InMobi –
Srikanth Sunderrajan, Shwetha G S, and Shaik Idris, two Hortonworks
employees – Sanjay Radia and Venkatesh Seetharam. In addition, Rohini
Palaniswamy and Thiruvel Thirumoolan, were also involved in the
initial design discussions. Srikanth, Shwetha and Shaik are the
original developers. All the engineers have built two generations of
Data Management on Hadoop, having deep expertise in Hadoop and are
quite familiar with the Hadoop Ecosystem. Samarth Gupta & Rishu
Mehrothra, both from InMobi have build the QA automation for Falcon.

=== Alignment ===
The ASF is a natural host for Falcon given that it is already the home
of Hadoop, Pig, Knox, HCatalog, and other emerging “big data” software
projects. Falcon has been designed to solve the data management
challenges and opportunities of the Hadoop ecosystem family of
products. Falcon fills the gap that Hadoop ecosystem has been lacking
in the areas of data processing and data lifecycle management.

== Known Risks ==

=== Orphaned products & Reliance on Salaried Developers ===
The core developers plan to work full time on the project. There is
very little risk of Falcon getting orphaned. Falcon is in use by
companies we work for so the companies have an interest in its
continued vitality.

=== Inexperience with Open Source ===
All of the core developers are active users and followers of open
source. Srikanth Sundarrajan has been contributing patches to Apache
Hadoop and Apache Oozie, Shwetha GS has been contributing patches to
Apache Oozie.  Seetharam Venkatesh is a committer on Apache Knox.
Sharad Agarwal, Amareshwari SR (also a Apache Hive PMC member) and
Sanjay Radia are PMC members on Apache Hadoop.

=== Homogeneous Developers ===
The current core developers are from diverse set of organizations such
as InMobi and Hortonworks. We expect to quickly establish a developer
community that includes contributors from several corporations post
incubation.

=== Reliance on Salaried Developers ===
Currently, most developers are paid to do work on Falcon but few are
contributing in their spare time. However, once the project has a
community built around it post incubation, we expect to get committers
and developers from outside the current core developers.

=== Relationships with Other Apache Products ===
Falcon is going to be used by the users of Hadoop and the Hadoop
ecosystem in general.

=== A Excessive Fascination with the Apache Brand ===
While we respect the reputation of the Apache brand and have no doubts
that it will attract contributors and users, our interest is primarily
to give Falcon a solid home as an open source project following an
established development model. We have also given reasons in the
Rationale and Alignment sections.

== Documentation ==http://wiki.apache.org/incubator/FalconProposal

== Initial Source ==
The source is currently in github repository at:
https://github.com/sriksun/Falcon

== Source and Intellectual Property Submission Plan ==
The complete Falcon code is under Apache Software License 2.

== External Dependencies ==
The dependencies all have Apache compatible licenses. These include
BSD, MIT licensed dependencies.

== Cryptography ==
None

== Required Resources ==

=== Mailing lists ===

 * falcon-dev AT incubator DOT apache DOT org
 * falcon-commits AT incubator DOT apache DOT org
 * falcon-user AT incubator apache DOT org
 * falcon-private AT incubator DOT apache DOT org

=== Subversion Directory ===
Git is the preferred source control system: git://git.apache.org/falcon

=== Issue Tracking ===
JIRA FALCON

== Initial Committers ==
 * Srikanth Sundarrajan (Srikanth.Sundarrajan AT inmobi DOT com)
 * Shwetha GS (shwetha.gs AT inmobi DOT com)
 * Shaik Idris (shaik.idris AT inmobi DOT com)
 * Venkatesh Seetharam (Venkatesh AT apache DOT org)
 * Sanjay Radia (sanjay AT apache DOT org)
 * Sharad Agarwal (sharad AT apache DOT org)
 * Amareshwari SR (amareshwari AT apache DOT org)
 * Samarth Gupta (samarth.gupta AT inmobi DOT com)
 * Rishu Mehrothra (rishu.mehrothra AT inmobi DOT com)

== Affiliations ==
 * Srikanth Sundarrajan (InMobi)
 * Shwetha GS (InMobi)
 * Shaik Idris (InMobi)
 * Venkatesh Seetharam (Hortonworks Inc.)
 * Sanjay Radia (Hortonworks Inc.)
 * Sharad Agarwal (InMobi)
 * Amareshwari SR (InMobi)
 * Samarth Gupta (InMobi)
 * Rishu Mehrothra (InMobi)

== Sponsors ==

=== Champion ===
 * Arun C Murthy (acmurthy at apache dot org)

=== Nominated Mentors ===
 * Alan Gates (gates AT apache DOT org)
 * Chris Douglas (cdouglas AT apache DOT org)
 * Devaraj  Das (ddas AT apache DOT org)
 * Owen O’Malley (omalley AT apache DOT org)

=== Sponsoring Entity ===
Incubator PMC



On Sat, Mar 16, 2013 at 5:52 AM, Joe Schaefer <jo...@yahoo.com>wrote:

> Can we pretty-please do this *before* resources
> are requested, just to save us poor infra saps
> the trouble of renaming everything?
>
>
>
>
>
> >________________________________
> > From: Jakob Homan <jg...@gmail.com>
> >To: general@incubator.apache.org
> >Sent: Friday, March 15, 2013 8:18 PM
> >Subject: Re: [PROPOSAL] Ivory - Hadoop data management and processing
> platform
> >
> >As part of Incubation a suitable name search will be done to verify the
> >name's appropriate.  I imagine Ivory would fail this test based on the
> >prior project, so this Ivory would need to find a new name.
> Alternatively,
> >before the vote, the Ivory folks can find another name.  This has happened
> >before (Howl -> HCatalog), so it's not a huge reason to be concerned.
> >
> >
> >On Fri, Mar 15, 2013 at 5:15 PM, Dmitriy Ryaboy <dv...@gmail.com>
> wrote:
> >
> >> It would be awfully nice of you not to stomp on another hadoop ecosystem
> >> project's google-fu when your project becomes very successful and
> admired
> >> across the hadoopverse :)
> >>
> >> Ivory isn't a fly-by-night project someone threw up on github -- it's
> >> generated over a dozen peer-reviewed papers, and has many watchers and
> dev
> >> forks.
> >>
> >> I don't have a vote here, but I'd say that yes, this will lead to
> confusion
> >> when people look for hadoop ivory.
> >>
> >> D
> >>
> >>
> >> On Fri, Mar 15, 2013 at 11:09 AM, Seetharam Venkatesh <
> >> venkatesh@innerzeal.com> wrote:
> >>
> >> > Hi Henry,
> >> >
> >> > Is there a concern with the current name? The closest is a tool for
> >> > Information Retrieval. Not sure if there is an overlap.  We will also
> >> bring
> >> > this up with the champion and mentors to see if this needs to be vet
> with
> >> > trademarks folks as well.
> >> >
> >> > Your suggestions are welcome.
> >> >
> >> > Thanks!
> >> >
> >> >
> >> > On Fri, Mar 15, 2013 at 10:18 AM, Henry Saputra <
> henry.saputra@gmail.com
> >> > >wrote:
> >> >
> >> > > HI Srikanth,
> >> > >
> >> > > So does the Ivory name stay or once the podling near graduation it
> will
> >> > try
> >> > > to find another name?
> >> > >
> >> > > - Henry
> >> > >
> >> > >
> >> > > On Fri, Mar 15, 2013 at 12:34 AM, Srikanth Sundarrajan <
> >> > > srikanth.sundarrajan@inmobi.com> wrote:
> >> > >
> >> > > > Made few edits to the proposal (
> >> > > > http://wiki.apache.org/incubator/IvoryProposal) as per the
> feedback
> >> > > > received so far.
> >> > > >
> >> > > > Regards
> >> > > > Srikanth Sundarrajan
> >> > > >
> >> > > > = Ivory Proposal =
> >> > > >
> >> > > > == Abstract ==
> >> > > > Ivory is a data processing and management solution for Hadoop
> >> designed
> >> > > > for data motion, coordination of data pipelines, lifecycle
> >> management,
> >> > > > and data discovery. Ivory enables end consumers to quickly onboard
> >> > > > their data and its associated processing and management tasks on
> >> > > > Hadoop clusters.
> >> > > >
> >> > > > == Proposal ==
> >> > > > Ivory will enable easy data management via declarative mechanism
> for
> >> > > > Hadoop. Users of Ivory platform simply define infrastructure
> >> > > > endpoints, data sets and processing rules declaratively. These
> >> > > > declarative configurations are expressed in such a way that the
> >> > > > dependencies between these configured entities are explicitly
> >> > > > described. This information about inter-dependencies between
> various
> >> > > > entities allows Ivory to orchestrate and manage various data
> >> > > > management functions.
> >> > > >
> >> > > > The key use cases that Ivory addresses are:
> >> > > >  * Data Motion
> >> > > >  * Process orchestration and scheduling
> >> > > >  * Policy-based Lifecycle Management
> >> > > >  * Data Discovery
> >> > > >  * Operability/Usability
> >> > > >
> >> > > > With these features it is possible for users to onboard their data
> >> > > > sets with a comprehensive and holistic understanding of how, when
> and
> >> > > > where their data is managed across its lifecycle. Complex
> functions
> >> > > > such as retrying failures, identifying possible SLA breaches or
> >> > > > automated handling of input data changes are now simple
> directives.
> >> > > > All the administrative functions and user level functions are
> >> > > > available via RESTful APIs. CLI is simply a wrapper over the
> RESTful
> >> > > > APIs.
> >> > > >
> >> > > > == Background ==
> >> > > > Hadoop and its ecosystem of products have made storing and
> processing
> >> > > > massive amounts of data commonplace. This has enabled numerous
> >> > > > organizations to gain valuable insights that they never could have
> >> > > > achieved in the past. While it is easy to leverage Hadoop for
> >> > > > crunching large volumes of data, organizing data, managing life
> cycle
> >> > > > of data and processing data is fairly involved. This is solved
> >> > > > adequately well in a classic data platform involving data
> warehouses
> >> > > > and standard ETL (extract-transform-load) tools, but remains
> largely
> >> > > > unsolved today. In addition to data processing complexities,
> Hadoop
> >> > > > presents new sets of challenges and opportunities relating to
> >> > > > management of data.
> >> > > >
> >> > > > Data Management on Hadoop encompasses data motion, process
> >> > > > orchestration, lifecycle management, data discovery, etc. among
> other
> >> > > > concerns that are beyond ETL. Ivory is a new data processing and
> >> > > > management platform for Hadoop that solves this problem and
> creates
> >> > > > additional opportunities by building on existing components within
> >> the
> >> > > > Hadoop ecosystem (ex. Apache Oozie, Apache Hadoop DistCp etc.)
> >> without
> >> > > > reinventing the wheel. Ivory has been in production at InMobi,
> going
> >> > > > on its second year and has been managing hundreds of feeds and
> >> > > > processes.
> >> > > >
> >> > > > Ivory is being developed by engineers employed with InMobi and
> >> > > > Hortonworks. This platform addition will increase the adoption of
> >> > > > Apache Hadoop by driving data management tractable for end users.
> We
> >> > > > are therefore proposing to make Ivory an Apache open source
> project.
> >> > > >
> >> > > > == Rationale ==
> >> > > > The Ivory project aims to improve the usability of Apache Hadoop.
> As
> >> a
> >> > > > result Apache Hadoop will grow its community of users by
> increasing
> >> > > > the places Hadoop can be utilized and the use cases it will
> solve. By
> >> > > > developing Ivory in Apache we hope to gather a diverse community
> of
> >> > > > contributors, helping to ensure that Ivory is deployable for a
> broad
> >> > > > range of scenarios. Members of the Hadoop development community
> will
> >> > > > be able to influence Ivory’s roadmap, and contribute to it. We
> >> believe
> >> > > > having Ivory as part of the Apache Hadoop ecosystem will be a
> great
> >> > > > benefit to all of Hadoop's users.
> >> > > >
> >> > > > == Current Status ==
> >> > > > Ivory is widely deployed in production within InMobi and moving
> on to
> >> > > > its second year. A version with a valuable set of features is
> >> > > > developed by the list of initial committers and is hosted on
> github.
> >> > > >
> >> > > > === Meritocracy ===
> >> > > > Our intent with this incubator proposal is to start building a
> >> diverse
> >> > > > developer community around Ivory following the Apache meritocracy
> >> > > > model. We have wanted to make the project open source and
> encourage
> >> > > > contributors from multiple organizations from the start. We plan
> to
> >> > > > provide plenty of support to new developers and to quickly recruit
> >> > > > those who make solid contributions to committer status.
> >> > > >
> >> > > > === Community ===
> >> > > > We are happy to report that the initial team already represents
> >> > > > multiple organizations. We hope to extend the user and developer
> base
> >> > > > further in the future and build a solid open source community
> around
> >> > > > Ivory.
> >> > > >
> >> > > > === Core Developers ===
> >> > > > Ivory is currently being developed by three engineers from InMobi
> –
> >> > > > Srikanth Sunderrajan, Shwetha G S, and Shaik Idris, two
> Hortonworks
> >> > > > employees – Sanjay Radia and Venkatesh Seetharam. In addition,
> Rohini
> >> > > > Palaniswamy and Thiruvel Thirumoolan, were also involved in the
> >> > > > initial design discussions. Srikanth, Shwetha and Shaik are the
> >> > > > original developers. All the engineers have built two generations
> of
> >> > > > Data Management on Hadoop, having deep expertise in Hadoop and are
> >> > > > quite familiar with the Hadoop Ecosystem. Samarth Gupta & Rishu
> >> > > > Mehrothra, both from InMobi have build the QA automation for
> Ivory.
> >> > > >
> >> > > > === Alignment ===
> >> > > > The ASF is a natural host for Ivory given that it is already the
> home
> >> > > > of Hadoop, Pig, Knox, HCatalog, and other emerging “big data”
> >> software
> >> > > > projects. Ivory has been designed to solve the data management
> >> > > > challenges and opportunities of the Hadoop ecosystem family of
> >> > > > products. Ivory fills the gap that Hadoop ecosystem has been
> lacking
> >> > > > in the areas of data processing and data lifecycle management.
> >> > > >
> >> > > > == Known Risks ==
> >> > > >
> >> > > > === Orphaned products & Reliance on Salaried Developers ===
> >> > > > The core developers plan to work full time on the project. There
> is
> >> > > > very little risk of Ivory getting orphaned. Ivory is in use by
> >> > > > companies we work for so the companies have an interest in its
> >> > > > continued vitality.
> >> > > >
> >> > > > === Inexperience with Open Source ===
> >> > > > All of the core developers are active users and followers of open
> >> > > > source. Srikanth Sundarrajan has been contributing patches to
> Apache
> >> > > > Hadoop and Apache Oozie, Shwetha GS has been contributing patches
> to
> >> > > > Apache Oozie.  Seetharam Venkatesh is a committer on Apache Knox.
> >> > > > Sharad Agarwal, Amareshwari SR (also a Apache Hive PMC member) and
> >> > > > Sanjay Radia are PMC members on Apache Hadoop.
> >> > > >
> >> > > > === Homogeneous Developers ===
> >> > > > The current core developers are from diverse set of organizations
> >> such
> >> > > > as InMobi and Hortonworks. We expect to quickly establish a
> developer
> >> > > > community that includes contributors from several corporations
> post
> >> > > > incubation.
> >> > > >
> >> > > > === Reliance on Salaried Developers ===
> >> > > > Currently, most developers are paid to do work on Ivory but few
> are
> >> > > > contributing in their spare time. However, once the project has a
> >> > > > community built around it post incubation, we expect to get
> >> committers
> >> > > > and developers from outside the current core developers.
> >> > > >
> >> > > > === Relationships with Other Apache Products ===
> >> > > > Ivory is going to be used by the users of Hadoop and the Hadoop
> >> > > > ecosystem in general.
> >> > > >
> >> > > > === A Excessive Fascination with the Apache Brand ===
> >> > > > While we respect the reputation of the Apache brand and have no
> >> doubts
> >> > > > that it will attract contributors and users, our interest is
> >> primarily
> >> > > > to give Ivory a solid home as an open source project following an
> >> > > > established development model. We have also given reasons in the
> >> > > > Rationale and Alignment sections.
> >> > > >
> >> > > > == Documentation ==http://wiki.apache.org/incubator/IvoryProposal
> >> > > >
> >> > > > == Initial Source ==
> >> > > > The source is currently in github repository at:
> >> > > > https://github.com/sriksun/Ivory
> >> > > >
> >> > > > == Source and Intellectual Property Submission Plan ==
> >> > > > The complete Ivory code is under Apache Software License 2.
> >> > > >
> >> > > > == External Dependencies ==
> >> > > > The dependencies all have Apache compatible licenses. These
> include
> >> > > > BSD, MIT licensed dependencies.
> >> > > >
> >> > > > == Cryptography ==
> >> > > > None
> >> > > >
> >> > > > == Required Resources ==
> >> > > >
> >> > > > === Mailing lists ===
> >> > > >
> >> > > >  * ivory-dev AT incubator DOT apache DOT org
> >> > > >  * ivory-commits AT incubator DOT apache DOT org
> >> > > >  * ivory-user AT incubator apache DOT org
> >> > > >  * ivory-private AT incubator DOT apache DOT org
> >> > > >
> >> > > > === Subversion Directory ===
> >> > > > Git is the preferred source control system: git://
> >> git.apache.org/ivory
> >> > > >
> >> > > > === Issue Tracking ===
> >> > > > JIRA IVORY
> >> > > >
> >> > > > == Initial Committers ==
> >> > > >  * Srikanth Sundarrajan (Srikanth.Sundarrajan AT inmobi DOT com)
> >> > > >  * Shwetha GS (shwetha.gs AT inmobi DOT com)
> >> > > >  * Shaik Idris (shaik.idris AT inmobi DOT com)
> >> > > >  * Venkatesh Seetharam (Venkatesh AT apache DOT org)
> >> > > >  * Sanjay Radia (sanjay AT apache DOT org)
> >> > > >  * Sharad Agarwal (sharad AT apache DOT org)
> >> > > >  * Amareshwari SR (amareshwari AT apache DOT org)
> >> > > >  * Samarth Gupta (samarth.gupta AT inmobi DOT com)
> >> > > >  * Rishu Mehrothra (rishu.mehrothra AT inmobi DOT com)
> >> > > >
> >> > > > == Affiliations ==
> >> > > >  * Srikanth Sundarrajan (InMobi)
> >> > > >  * Shwetha GS (InMobi)
> >> > > >  * Shaik Idris (InMobi)
> >> > > >  * Venkatesh Seetharam (Hortonworks Inc.)
> >> > > >  * Sanjay Radia (Hortonworks Inc.)
> >> > > >  * Sharad Agarwal (InMobi)
> >> > > >  * Amareshwari SR (InMobi)
> >> > > >  * Samarth Gupta (InMobi)
> >> > > >  * Rishu Mehrothra (InMobi)
> >> > > >
> >> > > > == Sponsors ==
> >> > > >
> >> > > > === Champion ===
> >> > > >  * Arun C Murthy (acmurthy at apache dot org)
> >> > > >
> >> > > > === Nominated Mentors ===
> >> > > >  * Alan Gates (gates AT apache DOT org)
> >> > > >  * Chris Douglas (cdouglas AT apache DOT org)
> >> > > >  * Devaraj  Das (ddas AT apache DOT org)
> >> > > >  * Owen O’Malley (omalley AT apache DOT org)
> >> > > >
> >> > > > === Sponsoring Entity ===
> >> > > > Incubator PMC
> >> > > >
> >> > > > --
> >> > > > _____________________________________________________________
> >> > > > The information contained in this communication is intended solely
> >> for
> >> > > the
> >> > > > use of the individual or entity to whom it is addressed and others
> >> > > > authorized to receive it. It may contain confidential or legally
> >> > > privileged
> >> > > > information. If you are not the intended recipient you are hereby
> >> > > notified
> >> > > > that any disclosure, copying, distribution or taking any action in
> >> > > reliance
> >> > > > on the contents of this information is strictly prohibited and
> may be
> >> > > > unlawful. If you have received this communication in error, please
> >> > notify
> >> > > > us immediately by responding to this email and then delete it from
> >> your
> >> > > > system. The firm is neither liable for the proper and complete
> >> > > transmission
> >> > > > of the information contained in this communication nor for any
> delay
> >> in
> >> > > its
> >> > > > receipt.
> >> > > >
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> > Regards,
> >> > Venkatesh
> >> >
> >> > http://in.linkedin.com/in/seetharamvenkatesh
> >> > http://about.me/SeetharamVenkatesh
> >> >
> >> > “Perfection (in design) is achieved not when there is nothing more to
> >> add,
> >> > but rather when there is nothing more to take away.”
> >> > - Antoine de Saint-Exupéry
> >> >
> >>
> >
> >
> >

-- 
_____________________________________________________________
The information contained in this communication is intended solely for the 
use of the individual or entity to whom it is addressed and others 
authorized to receive it. It may contain confidential or legally privileged 
information. If you are not the intended recipient you are hereby notified 
that any disclosure, copying, distribution or taking any action in reliance 
on the contents of this information is strictly prohibited and may be 
unlawful. If you have received this communication in error, please notify 
us immediately by responding to this email and then delete it from your 
system. The firm is neither liable for the proper and complete transmission 
of the information contained in this communication nor for any delay in its 
receipt.

Re: [PROPOSAL] Ivory - Hadoop data management and processing platform

Posted by Joe Schaefer <jo...@yahoo.com>.

Can we pretty-please do this *before* resources
are requested, just to save us poor infra saps
the trouble of renaming everything?





>________________________________
> From: Jakob Homan <jg...@gmail.com>
>To: general@incubator.apache.org 
>Sent: Friday, March 15, 2013 8:18 PM
>Subject: Re: [PROPOSAL] Ivory - Hadoop data management and processing platform
> 
>As part of Incubation a suitable name search will be done to verify the
>name's appropriate.  I imagine Ivory would fail this test based on the
>prior project, so this Ivory would need to find a new name.  Alternatively,
>before the vote, the Ivory folks can find another name.  This has happened
>before (Howl -> HCatalog), so it's not a huge reason to be concerned.
>
>
>On Fri, Mar 15, 2013 at 5:15 PM, Dmitriy Ryaboy <dv...@gmail.com> wrote:
>
>> It would be awfully nice of you not to stomp on another hadoop ecosystem
>> project's google-fu when your project becomes very successful and admired
>> across the hadoopverse :)
>>
>> Ivory isn't a fly-by-night project someone threw up on github -- it's
>> generated over a dozen peer-reviewed papers, and has many watchers and dev
>> forks.
>>
>> I don't have a vote here, but I'd say that yes, this will lead to confusion
>> when people look for hadoop ivory.
>>
>> D
>>
>>
>> On Fri, Mar 15, 2013 at 11:09 AM, Seetharam Venkatesh <
>> venkatesh@innerzeal.com> wrote:
>>
>> > Hi Henry,
>> >
>> > Is there a concern with the current name? The closest is a tool for
>> > Information Retrieval. Not sure if there is an overlap.  We will also
>> bring
>> > this up with the champion and mentors to see if this needs to be vet with
>> > trademarks folks as well.
>> >
>> > Your suggestions are welcome.
>> >
>> > Thanks!
>> >
>> >
>> > On Fri, Mar 15, 2013 at 10:18 AM, Henry Saputra <henry.saputra@gmail.com
>> > >wrote:
>> >
>> > > HI Srikanth,
>> > >
>> > > So does the Ivory name stay or once the podling near graduation it will
>> > try
>> > > to find another name?
>> > >
>> > > - Henry
>> > >
>> > >
>> > > On Fri, Mar 15, 2013 at 12:34 AM, Srikanth Sundarrajan <
>> > > srikanth.sundarrajan@inmobi.com> wrote:
>> > >
>> > > > Made few edits to the proposal (
>> > > > http://wiki.apache.org/incubator/IvoryProposal) as per the feedback
>> > > > received so far.
>> > > >
>> > > > Regards
>> > > > Srikanth Sundarrajan
>> > > >
>> > > > = Ivory Proposal =
>> > > >
>> > > > == Abstract ==
>> > > > Ivory is a data processing and management solution for Hadoop
>> designed
>> > > > for data motion, coordination of data pipelines, lifecycle
>> management,
>> > > > and data discovery. Ivory enables end consumers to quickly onboard
>> > > > their data and its associated processing and management tasks on
>> > > > Hadoop clusters.
>> > > >
>> > > > == Proposal ==
>> > > > Ivory will enable easy data management via declarative mechanism for
>> > > > Hadoop. Users of Ivory platform simply define infrastructure
>> > > > endpoints, data sets and processing rules declaratively. These
>> > > > declarative configurations are expressed in such a way that the
>> > > > dependencies between these configured entities are explicitly
>> > > > described. This information about inter-dependencies between various
>> > > > entities allows Ivory to orchestrate and manage various data
>> > > > management functions.
>> > > >
>> > > > The key use cases that Ivory addresses are:
>> > > >  * Data Motion
>> > > >  * Process orchestration and scheduling
>> > > >  * Policy-based Lifecycle Management
>> > > >  * Data Discovery
>> > > >  * Operability/Usability
>> > > >
>> > > > With these features it is possible for users to onboard their data
>> > > > sets with a comprehensive and holistic understanding of how, when and
>> > > > where their data is managed across its lifecycle. Complex functions
>> > > > such as retrying failures, identifying possible SLA breaches or
>> > > > automated handling of input data changes are now simple directives.
>> > > > All the administrative functions and user level functions are
>> > > > available via RESTful APIs. CLI is simply a wrapper over the RESTful
>> > > > APIs.
>> > > >
>> > > > == Background ==
>> > > > Hadoop and its ecosystem of products have made storing and processing
>> > > > massive amounts of data commonplace. This has enabled numerous
>> > > > organizations to gain valuable insights that they never could have
>> > > > achieved in the past. While it is easy to leverage Hadoop for
>> > > > crunching large volumes of data, organizing data, managing life cycle
>> > > > of data and processing data is fairly involved. This is solved
>> > > > adequately well in a classic data platform involving data warehouses
>> > > > and standard ETL (extract-transform-load) tools, but remains largely
>> > > > unsolved today. In addition to data processing complexities, Hadoop
>> > > > presents new sets of challenges and opportunities relating to
>> > > > management of data.
>> > > >
>> > > > Data Management on Hadoop encompasses data motion, process
>> > > > orchestration, lifecycle management, data discovery, etc. among other
>> > > > concerns that are beyond ETL. Ivory is a new data processing and
>> > > > management platform for Hadoop that solves this problem and creates
>> > > > additional opportunities by building on existing components within
>> the
>> > > > Hadoop ecosystem (ex. Apache Oozie, Apache Hadoop DistCp etc.)
>> without
>> > > > reinventing the wheel. Ivory has been in production at InMobi, going
>> > > > on its second year and has been managing hundreds of feeds and
>> > > > processes.
>> > > >
>> > > > Ivory is being developed by engineers employed with InMobi and
>> > > > Hortonworks. This platform addition will increase the adoption of
>> > > > Apache Hadoop by driving data management tractable for end users. We
>> > > > are therefore proposing to make Ivory an Apache open source project.
>> > > >
>> > > > == Rationale ==
>> > > > The Ivory project aims to improve the usability of Apache Hadoop. As
>> a
>> > > > result Apache Hadoop will grow its community of users by increasing
>> > > > the places Hadoop can be utilized and the use cases it will solve. By
>> > > > developing Ivory in Apache we hope to gather a diverse community of
>> > > > contributors, helping to ensure that Ivory is deployable for a broad
>> > > > range of scenarios. Members of the Hadoop development community will
>> > > > be able to influence Ivory’s roadmap, and contribute to it. We
>> believe
>> > > > having Ivory as part of the Apache Hadoop ecosystem will be a great
>> > > > benefit to all of Hadoop's users.
>> > > >
>> > > > == Current Status ==
>> > > > Ivory is widely deployed in production within InMobi and moving on to
>> > > > its second year. A version with a valuable set of features is
>> > > > developed by the list of initial committers and is hosted on github.
>> > > >
>> > > > === Meritocracy ===
>> > > > Our intent with this incubator proposal is to start building a
>> diverse
>> > > > developer community around Ivory following the Apache meritocracy
>> > > > model. We have wanted to make the project open source and encourage
>> > > > contributors from multiple organizations from the start. We plan to
>> > > > provide plenty of support to new developers and to quickly recruit
>> > > > those who make solid contributions to committer status.
>> > > >
>> > > > === Community ===
>> > > > We are happy to report that the initial team already represents
>> > > > multiple organizations. We hope to extend the user and developer base
>> > > > further in the future and build a solid open source community around
>> > > > Ivory.
>> > > >
>> > > > === Core Developers ===
>> > > > Ivory is currently being developed by three engineers from InMobi –
>> > > > Srikanth Sunderrajan, Shwetha G S, and Shaik Idris, two Hortonworks
>> > > > employees – Sanjay Radia and Venkatesh Seetharam. In addition, Rohini
>> > > > Palaniswamy and Thiruvel Thirumoolan, were also involved in the
>> > > > initial design discussions. Srikanth, Shwetha and Shaik are the
>> > > > original developers. All the engineers have built two generations of
>> > > > Data Management on Hadoop, having deep expertise in Hadoop and are
>> > > > quite familiar with the Hadoop Ecosystem. Samarth Gupta & Rishu
>> > > > Mehrothra, both from InMobi have build the QA automation for Ivory.
>> > > >
>> > > > === Alignment ===
>> > > > The ASF is a natural host for Ivory given that it is already the home
>> > > > of Hadoop, Pig, Knox, HCatalog, and other emerging “big data”
>> software
>> > > > projects. Ivory has been designed to solve the data management
>> > > > challenges and opportunities of the Hadoop ecosystem family of
>> > > > products. Ivory fills the gap that Hadoop ecosystem has been lacking
>> > > > in the areas of data processing and data lifecycle management.
>> > > >
>> > > > == Known Risks ==
>> > > >
>> > > > === Orphaned products & Reliance on Salaried Developers ===
>> > > > The core developers plan to work full time on the project. There is
>> > > > very little risk of Ivory getting orphaned. Ivory is in use by
>> > > > companies we work for so the companies have an interest in its
>> > > > continued vitality.
>> > > >
>> > > > === Inexperience with Open Source ===
>> > > > All of the core developers are active users and followers of open
>> > > > source. Srikanth Sundarrajan has been contributing patches to Apache
>> > > > Hadoop and Apache Oozie, Shwetha GS has been contributing patches to
>> > > > Apache Oozie.  Seetharam Venkatesh is a committer on Apache Knox.
>> > > > Sharad Agarwal, Amareshwari SR (also a Apache Hive PMC member) and
>> > > > Sanjay Radia are PMC members on Apache Hadoop.
>> > > >
>> > > > === Homogeneous Developers ===
>> > > > The current core developers are from diverse set of organizations
>> such
>> > > > as InMobi and Hortonworks. We expect to quickly establish a developer
>> > > > community that includes contributors from several corporations post
>> > > > incubation.
>> > > >
>> > > > === Reliance on Salaried Developers ===
>> > > > Currently, most developers are paid to do work on Ivory but few are
>> > > > contributing in their spare time. However, once the project has a
>> > > > community built around it post incubation, we expect to get
>> committers
>> > > > and developers from outside the current core developers.
>> > > >
>> > > > === Relationships with Other Apache Products ===
>> > > > Ivory is going to be used by the users of Hadoop and the Hadoop
>> > > > ecosystem in general.
>> > > >
>> > > > === A Excessive Fascination with the Apache Brand ===
>> > > > While we respect the reputation of the Apache brand and have no
>> doubts
>> > > > that it will attract contributors and users, our interest is
>> primarily
>> > > > to give Ivory a solid home as an open source project following an
>> > > > established development model. We have also given reasons in the
>> > > > Rationale and Alignment sections.
>> > > >
>> > > > == Documentation ==http://wiki.apache.org/incubator/IvoryProposal
>> > > >
>> > > > == Initial Source ==
>> > > > The source is currently in github repository at:
>> > > > https://github.com/sriksun/Ivory
>> > > >
>> > > > == Source and Intellectual Property Submission Plan ==
>> > > > The complete Ivory code is under Apache Software License 2.
>> > > >
>> > > > == External Dependencies ==
>> > > > The dependencies all have Apache compatible licenses. These include
>> > > > BSD, MIT licensed dependencies.
>> > > >
>> > > > == Cryptography ==
>> > > > None
>> > > >
>> > > > == Required Resources ==
>> > > >
>> > > > === Mailing lists ===
>> > > >
>> > > >  * ivory-dev AT incubator DOT apache DOT org
>> > > >  * ivory-commits AT incubator DOT apache DOT org
>> > > >  * ivory-user AT incubator apache DOT org
>> > > >  * ivory-private AT incubator DOT apache DOT org
>> > > >
>> > > > === Subversion Directory ===
>> > > > Git is the preferred source control system: git://
>> git.apache.org/ivory
>> > > >
>> > > > === Issue Tracking ===
>> > > > JIRA IVORY
>> > > >
>> > > > == Initial Committers ==
>> > > >  * Srikanth Sundarrajan (Srikanth.Sundarrajan AT inmobi DOT com)
>> > > >  * Shwetha GS (shwetha.gs AT inmobi DOT com)
>> > > >  * Shaik Idris (shaik.idris AT inmobi DOT com)
>> > > >  * Venkatesh Seetharam (Venkatesh AT apache DOT org)
>> > > >  * Sanjay Radia (sanjay AT apache DOT org)
>> > > >  * Sharad Agarwal (sharad AT apache DOT org)
>> > > >  * Amareshwari SR (amareshwari AT apache DOT org)
>> > > >  * Samarth Gupta (samarth.gupta AT inmobi DOT com)
>> > > >  * Rishu Mehrothra (rishu.mehrothra AT inmobi DOT com)
>> > > >
>> > > > == Affiliations ==
>> > > >  * Srikanth Sundarrajan (InMobi)
>> > > >  * Shwetha GS (InMobi)
>> > > >  * Shaik Idris (InMobi)
>> > > >  * Venkatesh Seetharam (Hortonworks Inc.)
>> > > >  * Sanjay Radia (Hortonworks Inc.)
>> > > >  * Sharad Agarwal (InMobi)
>> > > >  * Amareshwari SR (InMobi)
>> > > >  * Samarth Gupta (InMobi)
>> > > >  * Rishu Mehrothra (InMobi)
>> > > >
>> > > > == Sponsors ==
>> > > >
>> > > > === Champion ===
>> > > >  * Arun C Murthy (acmurthy at apache dot org)
>> > > >
>> > > > === Nominated Mentors ===
>> > > >  * Alan Gates (gates AT apache DOT org)
>> > > >  * Chris Douglas (cdouglas AT apache DOT org)
>> > > >  * Devaraj  Das (ddas AT apache DOT org)
>> > > >  * Owen O’Malley (omalley AT apache DOT org)
>> > > >
>> > > > === Sponsoring Entity ===
>> > > > Incubator PMC
>> > > >
>> > > > --
>> > > > _____________________________________________________________
>> > > > The information contained in this communication is intended solely
>> for
>> > > the
>> > > > use of the individual or entity to whom it is addressed and others
>> > > > authorized to receive it. It may contain confidential or legally
>> > > privileged
>> > > > information. If you are not the intended recipient you are hereby
>> > > notified
>> > > > that any disclosure, copying, distribution or taking any action in
>> > > reliance
>> > > > on the contents of this information is strictly prohibited and may be
>> > > > unlawful. If you have received this communication in error, please
>> > notify
>> > > > us immediately by responding to this email and then delete it from
>> your
>> > > > system. The firm is neither liable for the proper and complete
>> > > transmission
>> > > > of the information contained in this communication nor for any delay
>> in
>> > > its
>> > > > receipt.
>> > > >
>> > >
>> >
>> >
>> >
>> > --
>> > Regards,
>> > Venkatesh
>> >
>> > http://in.linkedin.com/in/seetharamvenkatesh
>> > http://about.me/SeetharamVenkatesh
>> >
>> > “Perfection (in design) is achieved not when there is nothing more to
>> add,
>> > but rather when there is nothing more to take away.”
>> > - Antoine de Saint-Exupéry
>> >
>>
>
>
>

Re: [PROPOSAL] Ivory - Hadoop data management and processing platform

Posted by Jakob Homan <jg...@gmail.com>.

As part of Incubation a suitable name search will be done to verify the
name's appropriate.  I imagine Ivory would fail this test based on the
prior project, so this Ivory would need to find a new name.  Alternatively,
before the vote, the Ivory folks can find another name.  This has happened
before (Howl -> HCatalog), so it's not a huge reason to be concerned.


On Fri, Mar 15, 2013 at 5:15 PM, Dmitriy Ryaboy <dv...@gmail.com> wrote:

> It would be awfully nice of you not to stomp on another hadoop ecosystem
> project's google-fu when your project becomes very successful and admired
> across the hadoopverse :)
>
> Ivory isn't a fly-by-night project someone threw up on github -- it's
> generated over a dozen peer-reviewed papers, and has many watchers and dev
> forks.
>
> I don't have a vote here, but I'd say that yes, this will lead to confusion
> when people look for hadoop ivory.
>
> D
>
>
> On Fri, Mar 15, 2013 at 11:09 AM, Seetharam Venkatesh <
> venkatesh@innerzeal.com> wrote:
>
> > Hi Henry,
> >
> > Is there a concern with the current name? The closest is a tool for
> > Information Retrieval. Not sure if there is an overlap.  We will also
> bring
> > this up with the champion and mentors to see if this needs to be vet with
> > trademarks folks as well.
> >
> > Your suggestions are welcome.
> >
> > Thanks!
> >
> >
> > On Fri, Mar 15, 2013 at 10:18 AM, Henry Saputra <henry.saputra@gmail.com
> > >wrote:
> >
> > > HI Srikanth,
> > >
> > > So does the Ivory name stay or once the podling near graduation it will
> > try
> > > to find another name?
> > >
> > > - Henry
> > >
> > >
> > > On Fri, Mar 15, 2013 at 12:34 AM, Srikanth Sundarrajan <
> > > srikanth.sundarrajan@inmobi.com> wrote:
> > >
> > > > Made few edits to the proposal (
> > > > http://wiki.apache.org/incubator/IvoryProposal) as per the feedback
> > > > received so far.
> > > >
> > > > Regards
> > > > Srikanth Sundarrajan
> > > >
> > > > = Ivory Proposal =
> > > >
> > > > == Abstract ==
> > > > Ivory is a data processing and management solution for Hadoop
> designed
> > > > for data motion, coordination of data pipelines, lifecycle
> management,
> > > > and data discovery. Ivory enables end consumers to quickly onboard
> > > > their data and its associated processing and management tasks on
> > > > Hadoop clusters.
> > > >
> > > > == Proposal ==
> > > > Ivory will enable easy data management via declarative mechanism for
> > > > Hadoop. Users of Ivory platform simply define infrastructure
> > > > endpoints, data sets and processing rules declaratively. These
> > > > declarative configurations are expressed in such a way that the
> > > > dependencies between these configured entities are explicitly
> > > > described. This information about inter-dependencies between various
> > > > entities allows Ivory to orchestrate and manage various data
> > > > management functions.
> > > >
> > > > The key use cases that Ivory addresses are:
> > > >  * Data Motion
> > > >  * Process orchestration and scheduling
> > > >  * Policy-based Lifecycle Management
> > > >  * Data Discovery
> > > >  * Operability/Usability
> > > >
> > > > With these features it is possible for users to onboard their data
> > > > sets with a comprehensive and holistic understanding of how, when and
> > > > where their data is managed across its lifecycle. Complex functions
> > > > such as retrying failures, identifying possible SLA breaches or
> > > > automated handling of input data changes are now simple directives.
> > > > All the administrative functions and user level functions are
> > > > available via RESTful APIs. CLI is simply a wrapper over the RESTful
> > > > APIs.
> > > >
> > > > == Background ==
> > > > Hadoop and its ecosystem of products have made storing and processing
> > > > massive amounts of data commonplace. This has enabled numerous
> > > > organizations to gain valuable insights that they never could have
> > > > achieved in the past. While it is easy to leverage Hadoop for
> > > > crunching large volumes of data, organizing data, managing life cycle
> > > > of data and processing data is fairly involved. This is solved
> > > > adequately well in a classic data platform involving data warehouses
> > > > and standard ETL (extract-transform-load) tools, but remains largely
> > > > unsolved today. In addition to data processing complexities, Hadoop
> > > > presents new sets of challenges and opportunities relating to
> > > > management of data.
> > > >
> > > > Data Management on Hadoop encompasses data motion, process
> > > > orchestration, lifecycle management, data discovery, etc. among other
> > > > concerns that are beyond ETL. Ivory is a new data processing and
> > > > management platform for Hadoop that solves this problem and creates
> > > > additional opportunities by building on existing components within
> the
> > > > Hadoop ecosystem (ex. Apache Oozie, Apache Hadoop DistCp etc.)
> without
> > > > reinventing the wheel. Ivory has been in production at InMobi, going
> > > > on its second year and has been managing hundreds of feeds and
> > > > processes.
> > > >
> > > > Ivory is being developed by engineers employed with InMobi and
> > > > Hortonworks. This platform addition will increase the adoption of
> > > > Apache Hadoop by driving data management tractable for end users. We
> > > > are therefore proposing to make Ivory an Apache open source project.
> > > >
> > > > == Rationale ==
> > > > The Ivory project aims to improve the usability of Apache Hadoop. As
> a
> > > > result Apache Hadoop will grow its community of users by increasing
> > > > the places Hadoop can be utilized and the use cases it will solve. By
> > > > developing Ivory in Apache we hope to gather a diverse community of
> > > > contributors, helping to ensure that Ivory is deployable for a broad
> > > > range of scenarios. Members of the Hadoop development community will
> > > > be able to influence Ivory’s roadmap, and contribute to it. We
> believe
> > > > having Ivory as part of the Apache Hadoop ecosystem will be a great
> > > > benefit to all of Hadoop's users.
> > > >
> > > > == Current Status ==
> > > > Ivory is widely deployed in production within InMobi and moving on to
> > > > its second year. A version with a valuable set of features is
> > > > developed by the list of initial committers and is hosted on github.
> > > >
> > > > === Meritocracy ===
> > > > Our intent with this incubator proposal is to start building a
> diverse
> > > > developer community around Ivory following the Apache meritocracy
> > > > model. We have wanted to make the project open source and encourage
> > > > contributors from multiple organizations from the start. We plan to
> > > > provide plenty of support to new developers and to quickly recruit
> > > > those who make solid contributions to committer status.
> > > >
> > > > === Community ===
> > > > We are happy to report that the initial team already represents
> > > > multiple organizations. We hope to extend the user and developer base
> > > > further in the future and build a solid open source community around
> > > > Ivory.
> > > >
> > > > === Core Developers ===
> > > > Ivory is currently being developed by three engineers from InMobi –
> > > > Srikanth Sunderrajan, Shwetha G S, and Shaik Idris, two Hortonworks
> > > > employees – Sanjay Radia and Venkatesh Seetharam. In addition, Rohini
> > > > Palaniswamy and Thiruvel Thirumoolan, were also involved in the
> > > > initial design discussions. Srikanth, Shwetha and Shaik are the
> > > > original developers. All the engineers have built two generations of
> > > > Data Management on Hadoop, having deep expertise in Hadoop and are
> > > > quite familiar with the Hadoop Ecosystem. Samarth Gupta & Rishu
> > > > Mehrothra, both from InMobi have build the QA automation for Ivory.
> > > >
> > > > === Alignment ===
> > > > The ASF is a natural host for Ivory given that it is already the home
> > > > of Hadoop, Pig, Knox, HCatalog, and other emerging “big data”
> software
> > > > projects. Ivory has been designed to solve the data management
> > > > challenges and opportunities of the Hadoop ecosystem family of
> > > > products. Ivory fills the gap that Hadoop ecosystem has been lacking
> > > > in the areas of data processing and data lifecycle management.
> > > >
> > > > == Known Risks ==
> > > >
> > > > === Orphaned products & Reliance on Salaried Developers ===
> > > > The core developers plan to work full time on the project. There is
> > > > very little risk of Ivory getting orphaned. Ivory is in use by
> > > > companies we work for so the companies have an interest in its
> > > > continued vitality.
> > > >
> > > > === Inexperience with Open Source ===
> > > > All of the core developers are active users and followers of open
> > > > source. Srikanth Sundarrajan has been contributing patches to Apache
> > > > Hadoop and Apache Oozie, Shwetha GS has been contributing patches to
> > > > Apache Oozie.  Seetharam Venkatesh is a committer on Apache Knox.
> > > > Sharad Agarwal, Amareshwari SR (also a Apache Hive PMC member) and
> > > > Sanjay Radia are PMC members on Apache Hadoop.
> > > >
> > > > === Homogeneous Developers ===
> > > > The current core developers are from diverse set of organizations
> such
> > > > as InMobi and Hortonworks. We expect to quickly establish a developer
> > > > community that includes contributors from several corporations post
> > > > incubation.
> > > >
> > > > === Reliance on Salaried Developers ===
> > > > Currently, most developers are paid to do work on Ivory but few are
> > > > contributing in their spare time. However, once the project has a
> > > > community built around it post incubation, we expect to get
> committers
> > > > and developers from outside the current core developers.
> > > >
> > > > === Relationships with Other Apache Products ===
> > > > Ivory is going to be used by the users of Hadoop and the Hadoop
> > > > ecosystem in general.
> > > >
> > > > === A Excessive Fascination with the Apache Brand ===
> > > > While we respect the reputation of the Apache brand and have no
> doubts
> > > > that it will attract contributors and users, our interest is
> primarily
> > > > to give Ivory a solid home as an open source project following an
> > > > established development model. We have also given reasons in the
> > > > Rationale and Alignment sections.
> > > >
> > > > == Documentation ==http://wiki.apache.org/incubator/IvoryProposal
> > > >
> > > > == Initial Source ==
> > > > The source is currently in github repository at:
> > > > https://github.com/sriksun/Ivory
> > > >
> > > > == Source and Intellectual Property Submission Plan ==
> > > > The complete Ivory code is under Apache Software License 2.
> > > >
> > > > == External Dependencies ==
> > > > The dependencies all have Apache compatible licenses. These include
> > > > BSD, MIT licensed dependencies.
> > > >
> > > > == Cryptography ==
> > > > None
> > > >
> > > > == Required Resources ==
> > > >
> > > > === Mailing lists ===
> > > >
> > > >  * ivory-dev AT incubator DOT apache DOT org
> > > >  * ivory-commits AT incubator DOT apache DOT org
> > > >  * ivory-user AT incubator apache DOT org
> > > >  * ivory-private AT incubator DOT apache DOT org
> > > >
> > > > === Subversion Directory ===
> > > > Git is the preferred source control system: git://
> git.apache.org/ivory
> > > >
> > > > === Issue Tracking ===
> > > > JIRA IVORY
> > > >
> > > > == Initial Committers ==
> > > >  * Srikanth Sundarrajan (Srikanth.Sundarrajan AT inmobi DOT com)
> > > >  * Shwetha GS (shwetha.gs AT inmobi DOT com)
> > > >  * Shaik Idris (shaik.idris AT inmobi DOT com)
> > > >  * Venkatesh Seetharam (Venkatesh AT apache DOT org)
> > > >  * Sanjay Radia (sanjay AT apache DOT org)
> > > >  * Sharad Agarwal (sharad AT apache DOT org)
> > > >  * Amareshwari SR (amareshwari AT apache DOT org)
> > > >  * Samarth Gupta (samarth.gupta AT inmobi DOT com)
> > > >  * Rishu Mehrothra (rishu.mehrothra AT inmobi DOT com)
> > > >
> > > > == Affiliations ==
> > > >  * Srikanth Sundarrajan (InMobi)
> > > >  * Shwetha GS (InMobi)
> > > >  * Shaik Idris (InMobi)
> > > >  * Venkatesh Seetharam (Hortonworks Inc.)
> > > >  * Sanjay Radia (Hortonworks Inc.)
> > > >  * Sharad Agarwal (InMobi)
> > > >  * Amareshwari SR (InMobi)
> > > >  * Samarth Gupta (InMobi)
> > > >  * Rishu Mehrothra (InMobi)
> > > >
> > > > == Sponsors ==
> > > >
> > > > === Champion ===
> > > >  * Arun C Murthy (acmurthy at apache dot org)
> > > >
> > > > === Nominated Mentors ===
> > > >  * Alan Gates (gates AT apache DOT org)
> > > >  * Chris Douglas (cdouglas AT apache DOT org)
> > > >  * Devaraj  Das (ddas AT apache DOT org)
> > > >  * Owen O’Malley (omalley AT apache DOT org)
> > > >
> > > > === Sponsoring Entity ===
> > > > Incubator PMC
> > > >
> > > > --
> > > > _____________________________________________________________
> > > > The information contained in this communication is intended solely
> for
> > > the
> > > > use of the individual or entity to whom it is addressed and others
> > > > authorized to receive it. It may contain confidential or legally
> > > privileged
> > > > information. If you are not the intended recipient you are hereby
> > > notified
> > > > that any disclosure, copying, distribution or taking any action in
> > > reliance
> > > > on the contents of this information is strictly prohibited and may be
> > > > unlawful. If you have received this communication in error, please
> > notify
> > > > us immediately by responding to this email and then delete it from
> your
> > > > system. The firm is neither liable for the proper and complete
> > > transmission
> > > > of the information contained in this communication nor for any delay
> in
> > > its
> > > > receipt.
> > > >
> > >
> >
> >
> >
> > --
> > Regards,
> > Venkatesh
> >
> > http://in.linkedin.com/in/seetharamvenkatesh
> > http://about.me/SeetharamVenkatesh
> >
> > “Perfection (in design) is achieved not when there is nothing more to
> add,
> > but rather when there is nothing more to take away.”
> > - Antoine de Saint-Exupéry
> >
>

Re: [PROPOSAL] Ivory - Hadoop data management and processing platform

Posted by Dmitriy Ryaboy <dv...@gmail.com>.

It would be awfully nice of you not to stomp on another hadoop ecosystem
project's google-fu when your project becomes very successful and admired
across the hadoopverse :)

Ivory isn't a fly-by-night project someone threw up on github -- it's
generated over a dozen peer-reviewed papers, and has many watchers and dev
forks.

I don't have a vote here, but I'd say that yes, this will lead to confusion
when people look for hadoop ivory.

D


On Fri, Mar 15, 2013 at 11:09 AM, Seetharam Venkatesh <
venkatesh@innerzeal.com> wrote:

> Hi Henry,
>
> Is there a concern with the current name? The closest is a tool for
> Information Retrieval. Not sure if there is an overlap.  We will also bring
> this up with the champion and mentors to see if this needs to be vet with
> trademarks folks as well.
>
> Your suggestions are welcome.
>
> Thanks!
>
>
> On Fri, Mar 15, 2013 at 10:18 AM, Henry Saputra <henry.saputra@gmail.com
> >wrote:
>
> > HI Srikanth,
> >
> > So does the Ivory name stay or once the podling near graduation it will
> try
> > to find another name?
> >
> > - Henry
> >
> >
> > On Fri, Mar 15, 2013 at 12:34 AM, Srikanth Sundarrajan <
> > srikanth.sundarrajan@inmobi.com> wrote:
> >
> > > Made few edits to the proposal (
> > > http://wiki.apache.org/incubator/IvoryProposal) as per the feedback
> > > received so far.
> > >
> > > Regards
> > > Srikanth Sundarrajan
> > >
> > > = Ivory Proposal =
> > >
> > > == Abstract ==
> > > Ivory is a data processing and management solution for Hadoop designed
> > > for data motion, coordination of data pipelines, lifecycle management,
> > > and data discovery. Ivory enables end consumers to quickly onboard
> > > their data and its associated processing and management tasks on
> > > Hadoop clusters.
> > >
> > > == Proposal ==
> > > Ivory will enable easy data management via declarative mechanism for
> > > Hadoop. Users of Ivory platform simply define infrastructure
> > > endpoints, data sets and processing rules declaratively. These
> > > declarative configurations are expressed in such a way that the
> > > dependencies between these configured entities are explicitly
> > > described. This information about inter-dependencies between various
> > > entities allows Ivory to orchestrate and manage various data
> > > management functions.
> > >
> > > The key use cases that Ivory addresses are:
> > >  * Data Motion
> > >  * Process orchestration and scheduling
> > >  * Policy-based Lifecycle Management
> > >  * Data Discovery
> > >  * Operability/Usability
> > >
> > > With these features it is possible for users to onboard their data
> > > sets with a comprehensive and holistic understanding of how, when and
> > > where their data is managed across its lifecycle. Complex functions
> > > such as retrying failures, identifying possible SLA breaches or
> > > automated handling of input data changes are now simple directives.
> > > All the administrative functions and user level functions are
> > > available via RESTful APIs. CLI is simply a wrapper over the RESTful
> > > APIs.
> > >
> > > == Background ==
> > > Hadoop and its ecosystem of products have made storing and processing
> > > massive amounts of data commonplace. This has enabled numerous
> > > organizations to gain valuable insights that they never could have
> > > achieved in the past. While it is easy to leverage Hadoop for
> > > crunching large volumes of data, organizing data, managing life cycle
> > > of data and processing data is fairly involved. This is solved
> > > adequately well in a classic data platform involving data warehouses
> > > and standard ETL (extract-transform-load) tools, but remains largely
> > > unsolved today. In addition to data processing complexities, Hadoop
> > > presents new sets of challenges and opportunities relating to
> > > management of data.
> > >
> > > Data Management on Hadoop encompasses data motion, process
> > > orchestration, lifecycle management, data discovery, etc. among other
> > > concerns that are beyond ETL. Ivory is a new data processing and
> > > management platform for Hadoop that solves this problem and creates
> > > additional opportunities by building on existing components within the
> > > Hadoop ecosystem (ex. Apache Oozie, Apache Hadoop DistCp etc.) without
> > > reinventing the wheel. Ivory has been in production at InMobi, going
> > > on its second year and has been managing hundreds of feeds and
> > > processes.
> > >
> > > Ivory is being developed by engineers employed with InMobi and
> > > Hortonworks. This platform addition will increase the adoption of
> > > Apache Hadoop by driving data management tractable for end users. We
> > > are therefore proposing to make Ivory an Apache open source project.
> > >
> > > == Rationale ==
> > > The Ivory project aims to improve the usability of Apache Hadoop. As a
> > > result Apache Hadoop will grow its community of users by increasing
> > > the places Hadoop can be utilized and the use cases it will solve. By
> > > developing Ivory in Apache we hope to gather a diverse community of
> > > contributors, helping to ensure that Ivory is deployable for a broad
> > > range of scenarios. Members of the Hadoop development community will
> > > be able to influence Ivory’s roadmap, and contribute to it. We believe
> > > having Ivory as part of the Apache Hadoop ecosystem will be a great
> > > benefit to all of Hadoop's users.
> > >
> > > == Current Status ==
> > > Ivory is widely deployed in production within InMobi and moving on to
> > > its second year. A version with a valuable set of features is
> > > developed by the list of initial committers and is hosted on github.
> > >
> > > === Meritocracy ===
> > > Our intent with this incubator proposal is to start building a diverse
> > > developer community around Ivory following the Apache meritocracy
> > > model. We have wanted to make the project open source and encourage
> > > contributors from multiple organizations from the start. We plan to
> > > provide plenty of support to new developers and to quickly recruit
> > > those who make solid contributions to committer status.
> > >
> > > === Community ===
> > > We are happy to report that the initial team already represents
> > > multiple organizations. We hope to extend the user and developer base
> > > further in the future and build a solid open source community around
> > > Ivory.
> > >
> > > === Core Developers ===
> > > Ivory is currently being developed by three engineers from InMobi –
> > > Srikanth Sunderrajan, Shwetha G S, and Shaik Idris, two Hortonworks
> > > employees – Sanjay Radia and Venkatesh Seetharam. In addition, Rohini
> > > Palaniswamy and Thiruvel Thirumoolan, were also involved in the
> > > initial design discussions. Srikanth, Shwetha and Shaik are the
> > > original developers. All the engineers have built two generations of
> > > Data Management on Hadoop, having deep expertise in Hadoop and are
> > > quite familiar with the Hadoop Ecosystem. Samarth Gupta & Rishu
> > > Mehrothra, both from InMobi have build the QA automation for Ivory.
> > >
> > > === Alignment ===
> > > The ASF is a natural host for Ivory given that it is already the home
> > > of Hadoop, Pig, Knox, HCatalog, and other emerging “big data” software
> > > projects. Ivory has been designed to solve the data management
> > > challenges and opportunities of the Hadoop ecosystem family of
> > > products. Ivory fills the gap that Hadoop ecosystem has been lacking
> > > in the areas of data processing and data lifecycle management.
> > >
> > > == Known Risks ==
> > >
> > > === Orphaned products & Reliance on Salaried Developers ===
> > > The core developers plan to work full time on the project. There is
> > > very little risk of Ivory getting orphaned. Ivory is in use by
> > > companies we work for so the companies have an interest in its
> > > continued vitality.
> > >
> > > === Inexperience with Open Source ===
> > > All of the core developers are active users and followers of open
> > > source. Srikanth Sundarrajan has been contributing patches to Apache
> > > Hadoop and Apache Oozie, Shwetha GS has been contributing patches to
> > > Apache Oozie.  Seetharam Venkatesh is a committer on Apache Knox.
> > > Sharad Agarwal, Amareshwari SR (also a Apache Hive PMC member) and
> > > Sanjay Radia are PMC members on Apache Hadoop.
> > >
> > > === Homogeneous Developers ===
> > > The current core developers are from diverse set of organizations such
> > > as InMobi and Hortonworks. We expect to quickly establish a developer
> > > community that includes contributors from several corporations post
> > > incubation.
> > >
> > > === Reliance on Salaried Developers ===
> > > Currently, most developers are paid to do work on Ivory but few are
> > > contributing in their spare time. However, once the project has a
> > > community built around it post incubation, we expect to get committers
> > > and developers from outside the current core developers.
> > >
> > > === Relationships with Other Apache Products ===
> > > Ivory is going to be used by the users of Hadoop and the Hadoop
> > > ecosystem in general.
> > >
> > > === A Excessive Fascination with the Apache Brand ===
> > > While we respect the reputation of the Apache brand and have no doubts
> > > that it will attract contributors and users, our interest is primarily
> > > to give Ivory a solid home as an open source project following an
> > > established development model. We have also given reasons in the
> > > Rationale and Alignment sections.
> > >
> > > == Documentation ==http://wiki.apache.org/incubator/IvoryProposal
> > >
> > > == Initial Source ==
> > > The source is currently in github repository at:
> > > https://github.com/sriksun/Ivory
> > >
> > > == Source and Intellectual Property Submission Plan ==
> > > The complete Ivory code is under Apache Software License 2.
> > >
> > > == External Dependencies ==
> > > The dependencies all have Apache compatible licenses. These include
> > > BSD, MIT licensed dependencies.
> > >
> > > == Cryptography ==
> > > None
> > >
> > > == Required Resources ==
> > >
> > > === Mailing lists ===
> > >
> > >  * ivory-dev AT incubator DOT apache DOT org
> > >  * ivory-commits AT incubator DOT apache DOT org
> > >  * ivory-user AT incubator apache DOT org
> > >  * ivory-private AT incubator DOT apache DOT org
> > >
> > > === Subversion Directory ===
> > > Git is the preferred source control system: git://git.apache.org/ivory
> > >
> > > === Issue Tracking ===
> > > JIRA IVORY
> > >
> > > == Initial Committers ==
> > >  * Srikanth Sundarrajan (Srikanth.Sundarrajan AT inmobi DOT com)
> > >  * Shwetha GS (shwetha.gs AT inmobi DOT com)
> > >  * Shaik Idris (shaik.idris AT inmobi DOT com)
> > >  * Venkatesh Seetharam (Venkatesh AT apache DOT org)
> > >  * Sanjay Radia (sanjay AT apache DOT org)
> > >  * Sharad Agarwal (sharad AT apache DOT org)
> > >  * Amareshwari SR (amareshwari AT apache DOT org)
> > >  * Samarth Gupta (samarth.gupta AT inmobi DOT com)
> > >  * Rishu Mehrothra (rishu.mehrothra AT inmobi DOT com)
> > >
> > > == Affiliations ==
> > >  * Srikanth Sundarrajan (InMobi)
> > >  * Shwetha GS (InMobi)
> > >  * Shaik Idris (InMobi)
> > >  * Venkatesh Seetharam (Hortonworks Inc.)
> > >  * Sanjay Radia (Hortonworks Inc.)
> > >  * Sharad Agarwal (InMobi)
> > >  * Amareshwari SR (InMobi)
> > >  * Samarth Gupta (InMobi)
> > >  * Rishu Mehrothra (InMobi)
> > >
> > > == Sponsors ==
> > >
> > > === Champion ===
> > >  * Arun C Murthy (acmurthy at apache dot org)
> > >
> > > === Nominated Mentors ===
> > >  * Alan Gates (gates AT apache DOT org)
> > >  * Chris Douglas (cdouglas AT apache DOT org)
> > >  * Devaraj  Das (ddas AT apache DOT org)
> > >  * Owen O’Malley (omalley AT apache DOT org)
> > >
> > > === Sponsoring Entity ===
> > > Incubator PMC
> > >
> > > --
> > > _____________________________________________________________
> > > The information contained in this communication is intended solely for
> > the
> > > use of the individual or entity to whom it is addressed and others
> > > authorized to receive it. It may contain confidential or legally
> > privileged
> > > information. If you are not the intended recipient you are hereby
> > notified
> > > that any disclosure, copying, distribution or taking any action in
> > reliance
> > > on the contents of this information is strictly prohibited and may be
> > > unlawful. If you have received this communication in error, please
> notify
> > > us immediately by responding to this email and then delete it from your
> > > system. The firm is neither liable for the proper and complete
> > transmission
> > > of the information contained in this communication nor for any delay in
> > its
> > > receipt.
> > >
> >
>
>
>
> --
> Regards,
> Venkatesh
>
> http://in.linkedin.com/in/seetharamvenkatesh
> http://about.me/SeetharamVenkatesh
>
> “Perfection (in design) is achieved not when there is nothing more to add,
> but rather when there is nothing more to take away.”
> - Antoine de Saint-Exupéry
>

Re: [PROPOSAL] Ivory - Hadoop data management and processing platform

Posted by Henry Saputra <he...@gmail.com>.

Well the question is whether its is ok for incubation to have name that
could have conflict until it is time to graduate to become the TLP.

As Jakob has mentioned part of incubation is to check for the name and
looks like it will fail later.

It would be better to propose new name I suppose.

- Henry


On Fri, Mar 15, 2013 at 11:09 AM, Seetharam Venkatesh <
venkatesh@innerzeal.com> wrote:

> Hi Henry,
>
> Is there a concern with the current name? The closest is a tool for
> Information Retrieval. Not sure if there is an overlap.  We will also bring
> this up with the champion and mentors to see if this needs to be vet with
> trademarks folks as well.
>
> Your suggestions are welcome.
>
> Thanks!
>
>
> On Fri, Mar 15, 2013 at 10:18 AM, Henry Saputra <henry.saputra@gmail.com
> >wrote:
>
> > HI Srikanth,
> >
> > So does the Ivory name stay or once the podling near graduation it will
> try
> > to find another name?
> >
> > - Henry
> >
> >
> > On Fri, Mar 15, 2013 at 12:34 AM, Srikanth Sundarrajan <
> > srikanth.sundarrajan@inmobi.com> wrote:
> >
> > > Made few edits to the proposal (
> > > http://wiki.apache.org/incubator/IvoryProposal) as per the feedback
> > > received so far.
> > >
> > > Regards
> > > Srikanth Sundarrajan
> > >
> > > = Ivory Proposal =
> > >
> > > == Abstract ==
> > > Ivory is a data processing and management solution for Hadoop designed
> > > for data motion, coordination of data pipelines, lifecycle management,
> > > and data discovery. Ivory enables end consumers to quickly onboard
> > > their data and its associated processing and management tasks on
> > > Hadoop clusters.
> > >
> > > == Proposal ==
> > > Ivory will enable easy data management via declarative mechanism for
> > > Hadoop. Users of Ivory platform simply define infrastructure
> > > endpoints, data sets and processing rules declaratively. These
> > > declarative configurations are expressed in such a way that the
> > > dependencies between these configured entities are explicitly
> > > described. This information about inter-dependencies between various
> > > entities allows Ivory to orchestrate and manage various data
> > > management functions.
> > >
> > > The key use cases that Ivory addresses are:
> > >  * Data Motion
> > >  * Process orchestration and scheduling
> > >  * Policy-based Lifecycle Management
> > >  * Data Discovery
> > >  * Operability/Usability
> > >
> > > With these features it is possible for users to onboard their data
> > > sets with a comprehensive and holistic understanding of how, when and
> > > where their data is managed across its lifecycle. Complex functions
> > > such as retrying failures, identifying possible SLA breaches or
> > > automated handling of input data changes are now simple directives.
> > > All the administrative functions and user level functions are
> > > available via RESTful APIs. CLI is simply a wrapper over the RESTful
> > > APIs.
> > >
> > > == Background ==
> > > Hadoop and its ecosystem of products have made storing and processing
> > > massive amounts of data commonplace. This has enabled numerous
> > > organizations to gain valuable insights that they never could have
> > > achieved in the past. While it is easy to leverage Hadoop for
> > > crunching large volumes of data, organizing data, managing life cycle
> > > of data and processing data is fairly involved. This is solved
> > > adequately well in a classic data platform involving data warehouses
> > > and standard ETL (extract-transform-load) tools, but remains largely
> > > unsolved today. In addition to data processing complexities, Hadoop
> > > presents new sets of challenges and opportunities relating to
> > > management of data.
> > >
> > > Data Management on Hadoop encompasses data motion, process
> > > orchestration, lifecycle management, data discovery, etc. among other
> > > concerns that are beyond ETL. Ivory is a new data processing and
> > > management platform for Hadoop that solves this problem and creates
> > > additional opportunities by building on existing components within the
> > > Hadoop ecosystem (ex. Apache Oozie, Apache Hadoop DistCp etc.) without
> > > reinventing the wheel. Ivory has been in production at InMobi, going
> > > on its second year and has been managing hundreds of feeds and
> > > processes.
> > >
> > > Ivory is being developed by engineers employed with InMobi and
> > > Hortonworks. This platform addition will increase the adoption of
> > > Apache Hadoop by driving data management tractable for end users. We
> > > are therefore proposing to make Ivory an Apache open source project.
> > >
> > > == Rationale ==
> > > The Ivory project aims to improve the usability of Apache Hadoop. As a
> > > result Apache Hadoop will grow its community of users by increasing
> > > the places Hadoop can be utilized and the use cases it will solve. By
> > > developing Ivory in Apache we hope to gather a diverse community of
> > > contributors, helping to ensure that Ivory is deployable for a broad
> > > range of scenarios. Members of the Hadoop development community will
> > > be able to influence Ivory’s roadmap, and contribute to it. We believe
> > > having Ivory as part of the Apache Hadoop ecosystem will be a great
> > > benefit to all of Hadoop's users.
> > >
> > > == Current Status ==
> > > Ivory is widely deployed in production within InMobi and moving on to
> > > its second year. A version with a valuable set of features is
> > > developed by the list of initial committers and is hosted on github.
> > >
> > > === Meritocracy ===
> > > Our intent with this incubator proposal is to start building a diverse
> > > developer community around Ivory following the Apache meritocracy
> > > model. We have wanted to make the project open source and encourage
> > > contributors from multiple organizations from the start. We plan to
> > > provide plenty of support to new developers and to quickly recruit
> > > those who make solid contributions to committer status.
> > >
> > > === Community ===
> > > We are happy to report that the initial team already represents
> > > multiple organizations. We hope to extend the user and developer base
> > > further in the future and build a solid open source community around
> > > Ivory.
> > >
> > > === Core Developers ===
> > > Ivory is currently being developed by three engineers from InMobi –
> > > Srikanth Sunderrajan, Shwetha G S, and Shaik Idris, two Hortonworks
> > > employees – Sanjay Radia and Venkatesh Seetharam. In addition, Rohini
> > > Palaniswamy and Thiruvel Thirumoolan, were also involved in the
> > > initial design discussions. Srikanth, Shwetha and Shaik are the
> > > original developers. All the engineers have built two generations of
> > > Data Management on Hadoop, having deep expertise in Hadoop and are
> > > quite familiar with the Hadoop Ecosystem. Samarth Gupta & Rishu
> > > Mehrothra, both from InMobi have build the QA automation for Ivory.
> > >
> > > === Alignment ===
> > > The ASF is a natural host for Ivory given that it is already the home
> > > of Hadoop, Pig, Knox, HCatalog, and other emerging “big data” software
> > > projects. Ivory has been designed to solve the data management
> > > challenges and opportunities of the Hadoop ecosystem family of
> > > products. Ivory fills the gap that Hadoop ecosystem has been lacking
> > > in the areas of data processing and data lifecycle management.
> > >
> > > == Known Risks ==
> > >
> > > === Orphaned products & Reliance on Salaried Developers ===
> > > The core developers plan to work full time on the project. There is
> > > very little risk of Ivory getting orphaned. Ivory is in use by
> > > companies we work for so the companies have an interest in its
> > > continued vitality.
> > >
> > > === Inexperience with Open Source ===
> > > All of the core developers are active users and followers of open
> > > source. Srikanth Sundarrajan has been contributing patches to Apache
> > > Hadoop and Apache Oozie, Shwetha GS has been contributing patches to
> > > Apache Oozie.  Seetharam Venkatesh is a committer on Apache Knox.
> > > Sharad Agarwal, Amareshwari SR (also a Apache Hive PMC member) and
> > > Sanjay Radia are PMC members on Apache Hadoop.
> > >
> > > === Homogeneous Developers ===
> > > The current core developers are from diverse set of organizations such
> > > as InMobi and Hortonworks. We expect to quickly establish a developer
> > > community that includes contributors from several corporations post
> > > incubation.
> > >
> > > === Reliance on Salaried Developers ===
> > > Currently, most developers are paid to do work on Ivory but few are
> > > contributing in their spare time. However, once the project has a
> > > community built around it post incubation, we expect to get committers
> > > and developers from outside the current core developers.
> > >
> > > === Relationships with Other Apache Products ===
> > > Ivory is going to be used by the users of Hadoop and the Hadoop
> > > ecosystem in general.
> > >
> > > === A Excessive Fascination with the Apache Brand ===
> > > While we respect the reputation of the Apache brand and have no doubts
> > > that it will attract contributors and users, our interest is primarily
> > > to give Ivory a solid home as an open source project following an
> > > established development model. We have also given reasons in the
> > > Rationale and Alignment sections.
> > >
> > > == Documentation ==http://wiki.apache.org/incubator/IvoryProposal
> > >
> > > == Initial Source ==
> > > The source is currently in github repository at:
> > > https://github.com/sriksun/Ivory
> > >
> > > == Source and Intellectual Property Submission Plan ==
> > > The complete Ivory code is under Apache Software License 2.
> > >
> > > == External Dependencies ==
> > > The dependencies all have Apache compatible licenses. These include
> > > BSD, MIT licensed dependencies.
> > >
> > > == Cryptography ==
> > > None
> > >
> > > == Required Resources ==
> > >
> > > === Mailing lists ===
> > >
> > >  * ivory-dev AT incubator DOT apache DOT org
> > >  * ivory-commits AT incubator DOT apache DOT org
> > >  * ivory-user AT incubator apache DOT org
> > >  * ivory-private AT incubator DOT apache DOT org
> > >
> > > === Subversion Directory ===
> > > Git is the preferred source control system: git://git.apache.org/ivory
> > >
> > > === Issue Tracking ===
> > > JIRA IVORY
> > >
> > > == Initial Committers ==
> > >  * Srikanth Sundarrajan (Srikanth.Sundarrajan AT inmobi DOT com)
> > >  * Shwetha GS (shwetha.gs AT inmobi DOT com)
> > >  * Shaik Idris (shaik.idris AT inmobi DOT com)
> > >  * Venkatesh Seetharam (Venkatesh AT apache DOT org)
> > >  * Sanjay Radia (sanjay AT apache DOT org)
> > >  * Sharad Agarwal (sharad AT apache DOT org)
> > >  * Amareshwari SR (amareshwari AT apache DOT org)
> > >  * Samarth Gupta (samarth.gupta AT inmobi DOT com)
> > >  * Rishu Mehrothra (rishu.mehrothra AT inmobi DOT com)
> > >
> > > == Affiliations ==
> > >  * Srikanth Sundarrajan (InMobi)
> > >  * Shwetha GS (InMobi)
> > >  * Shaik Idris (InMobi)
> > >  * Venkatesh Seetharam (Hortonworks Inc.)
> > >  * Sanjay Radia (Hortonworks Inc.)
> > >  * Sharad Agarwal (InMobi)
> > >  * Amareshwari SR (InMobi)
> > >  * Samarth Gupta (InMobi)
> > >  * Rishu Mehrothra (InMobi)
> > >
> > > == Sponsors ==
> > >
> > > === Champion ===
> > >  * Arun C Murthy (acmurthy at apache dot org)
> > >
> > > === Nominated Mentors ===
> > >  * Alan Gates (gates AT apache DOT org)
> > >  * Chris Douglas (cdouglas AT apache DOT org)
> > >  * Devaraj  Das (ddas AT apache DOT org)
> > >  * Owen O’Malley (omalley AT apache DOT org)
> > >
> > > === Sponsoring Entity ===
> > > Incubator PMC
> > >
> > > --
> > > _____________________________________________________________
> > > The information contained in this communication is intended solely for
> > the
> > > use of the individual or entity to whom it is addressed and others
> > > authorized to receive it. It may contain confidential or legally
> > privileged
> > > information. If you are not the intended recipient you are hereby
> > notified
> > > that any disclosure, copying, distribution or taking any action in
> > reliance
> > > on the contents of this information is strictly prohibited and may be
> > > unlawful. If you have received this communication in error, please
> notify
> > > us immediately by responding to this email and then delete it from your
> > > system. The firm is neither liable for the proper and complete
> > transmission
> > > of the information contained in this communication nor for any delay in
> > its
> > > receipt.
> > >
> >
>
>
>
> --
> Regards,
> Venkatesh
>
> http://in.linkedin.com/in/seetharamvenkatesh
> http://about.me/SeetharamVenkatesh
>
> “Perfection (in design) is achieved not when there is nothing more to add,
> but rather when there is nothing more to take away.”
> - Antoine de Saint-Exupéry
>

Re: [PROPOSAL] Ivory - Hadoop data management and processing platform

Posted by Seetharam Venkatesh <ve...@innerzeal.com>.

Hi Henry,

Is there a concern with the current name? The closest is a tool for
Information Retrieval. Not sure if there is an overlap.  We will also bring
this up with the champion and mentors to see if this needs to be vet with
trademarks folks as well.

Your suggestions are welcome.

Thanks!


On Fri, Mar 15, 2013 at 10:18 AM, Henry Saputra <he...@gmail.com>wrote:

> HI Srikanth,
>
> So does the Ivory name stay or once the podling near graduation it will try
> to find another name?
>
> - Henry
>
>
> On Fri, Mar 15, 2013 at 12:34 AM, Srikanth Sundarrajan <
> srikanth.sundarrajan@inmobi.com> wrote:
>
> > Made few edits to the proposal (
> > http://wiki.apache.org/incubator/IvoryProposal) as per the feedback
> > received so far.
> >
> > Regards
> > Srikanth Sundarrajan
> >
> > = Ivory Proposal =
> >
> > == Abstract ==
> > Ivory is a data processing and management solution for Hadoop designed
> > for data motion, coordination of data pipelines, lifecycle management,
> > and data discovery. Ivory enables end consumers to quickly onboard
> > their data and its associated processing and management tasks on
> > Hadoop clusters.
> >
> > == Proposal ==
> > Ivory will enable easy data management via declarative mechanism for
> > Hadoop. Users of Ivory platform simply define infrastructure
> > endpoints, data sets and processing rules declaratively. These
> > declarative configurations are expressed in such a way that the
> > dependencies between these configured entities are explicitly
> > described. This information about inter-dependencies between various
> > entities allows Ivory to orchestrate and manage various data
> > management functions.
> >
> > The key use cases that Ivory addresses are:
> >  * Data Motion
> >  * Process orchestration and scheduling
> >  * Policy-based Lifecycle Management
> >  * Data Discovery
> >  * Operability/Usability
> >
> > With these features it is possible for users to onboard their data
> > sets with a comprehensive and holistic understanding of how, when and
> > where their data is managed across its lifecycle. Complex functions
> > such as retrying failures, identifying possible SLA breaches or
> > automated handling of input data changes are now simple directives.
> > All the administrative functions and user level functions are
> > available via RESTful APIs. CLI is simply a wrapper over the RESTful
> > APIs.
> >
> > == Background ==
> > Hadoop and its ecosystem of products have made storing and processing
> > massive amounts of data commonplace. This has enabled numerous
> > organizations to gain valuable insights that they never could have
> > achieved in the past. While it is easy to leverage Hadoop for
> > crunching large volumes of data, organizing data, managing life cycle
> > of data and processing data is fairly involved. This is solved
> > adequately well in a classic data platform involving data warehouses
> > and standard ETL (extract-transform-load) tools, but remains largely
> > unsolved today. In addition to data processing complexities, Hadoop
> > presents new sets of challenges and opportunities relating to
> > management of data.
> >
> > Data Management on Hadoop encompasses data motion, process
> > orchestration, lifecycle management, data discovery, etc. among other
> > concerns that are beyond ETL. Ivory is a new data processing and
> > management platform for Hadoop that solves this problem and creates
> > additional opportunities by building on existing components within the
> > Hadoop ecosystem (ex. Apache Oozie, Apache Hadoop DistCp etc.) without
> > reinventing the wheel. Ivory has been in production at InMobi, going
> > on its second year and has been managing hundreds of feeds and
> > processes.
> >
> > Ivory is being developed by engineers employed with InMobi and
> > Hortonworks. This platform addition will increase the adoption of
> > Apache Hadoop by driving data management tractable for end users. We
> > are therefore proposing to make Ivory an Apache open source project.
> >
> > == Rationale ==
> > The Ivory project aims to improve the usability of Apache Hadoop. As a
> > result Apache Hadoop will grow its community of users by increasing
> > the places Hadoop can be utilized and the use cases it will solve. By
> > developing Ivory in Apache we hope to gather a diverse community of
> > contributors, helping to ensure that Ivory is deployable for a broad
> > range of scenarios. Members of the Hadoop development community will
> > be able to influence Ivory’s roadmap, and contribute to it. We believe
> > having Ivory as part of the Apache Hadoop ecosystem will be a great
> > benefit to all of Hadoop's users.
> >
> > == Current Status ==
> > Ivory is widely deployed in production within InMobi and moving on to
> > its second year. A version with a valuable set of features is
> > developed by the list of initial committers and is hosted on github.
> >
> > === Meritocracy ===
> > Our intent with this incubator proposal is to start building a diverse
> > developer community around Ivory following the Apache meritocracy
> > model. We have wanted to make the project open source and encourage
> > contributors from multiple organizations from the start. We plan to
> > provide plenty of support to new developers and to quickly recruit
> > those who make solid contributions to committer status.
> >
> > === Community ===
> > We are happy to report that the initial team already represents
> > multiple organizations. We hope to extend the user and developer base
> > further in the future and build a solid open source community around
> > Ivory.
> >
> > === Core Developers ===
> > Ivory is currently being developed by three engineers from InMobi –
> > Srikanth Sunderrajan, Shwetha G S, and Shaik Idris, two Hortonworks
> > employees – Sanjay Radia and Venkatesh Seetharam. In addition, Rohini
> > Palaniswamy and Thiruvel Thirumoolan, were also involved in the
> > initial design discussions. Srikanth, Shwetha and Shaik are the
> > original developers. All the engineers have built two generations of
> > Data Management on Hadoop, having deep expertise in Hadoop and are
> > quite familiar with the Hadoop Ecosystem. Samarth Gupta & Rishu
> > Mehrothra, both from InMobi have build the QA automation for Ivory.
> >
> > === Alignment ===
> > The ASF is a natural host for Ivory given that it is already the home
> > of Hadoop, Pig, Knox, HCatalog, and other emerging “big data” software
> > projects. Ivory has been designed to solve the data management
> > challenges and opportunities of the Hadoop ecosystem family of
> > products. Ivory fills the gap that Hadoop ecosystem has been lacking
> > in the areas of data processing and data lifecycle management.
> >
> > == Known Risks ==
> >
> > === Orphaned products & Reliance on Salaried Developers ===
> > The core developers plan to work full time on the project. There is
> > very little risk of Ivory getting orphaned. Ivory is in use by
> > companies we work for so the companies have an interest in its
> > continued vitality.
> >
> > === Inexperience with Open Source ===
> > All of the core developers are active users and followers of open
> > source. Srikanth Sundarrajan has been contributing patches to Apache
> > Hadoop and Apache Oozie, Shwetha GS has been contributing patches to
> > Apache Oozie.  Seetharam Venkatesh is a committer on Apache Knox.
> > Sharad Agarwal, Amareshwari SR (also a Apache Hive PMC member) and
> > Sanjay Radia are PMC members on Apache Hadoop.
> >
> > === Homogeneous Developers ===
> > The current core developers are from diverse set of organizations such
> > as InMobi and Hortonworks. We expect to quickly establish a developer
> > community that includes contributors from several corporations post
> > incubation.
> >
> > === Reliance on Salaried Developers ===
> > Currently, most developers are paid to do work on Ivory but few are
> > contributing in their spare time. However, once the project has a
> > community built around it post incubation, we expect to get committers
> > and developers from outside the current core developers.
> >
> > === Relationships with Other Apache Products ===
> > Ivory is going to be used by the users of Hadoop and the Hadoop
> > ecosystem in general.
> >
> > === A Excessive Fascination with the Apache Brand ===
> > While we respect the reputation of the Apache brand and have no doubts
> > that it will attract contributors and users, our interest is primarily
> > to give Ivory a solid home as an open source project following an
> > established development model. We have also given reasons in the
> > Rationale and Alignment sections.
> >
> > == Documentation ==http://wiki.apache.org/incubator/IvoryProposal
> >
> > == Initial Source ==
> > The source is currently in github repository at:
> > https://github.com/sriksun/Ivory
> >
> > == Source and Intellectual Property Submission Plan ==
> > The complete Ivory code is under Apache Software License 2.
> >
> > == External Dependencies ==
> > The dependencies all have Apache compatible licenses. These include
> > BSD, MIT licensed dependencies.
> >
> > == Cryptography ==
> > None
> >
> > == Required Resources ==
> >
> > === Mailing lists ===
> >
> >  * ivory-dev AT incubator DOT apache DOT org
> >  * ivory-commits AT incubator DOT apache DOT org
> >  * ivory-user AT incubator apache DOT org
> >  * ivory-private AT incubator DOT apache DOT org
> >
> > === Subversion Directory ===
> > Git is the preferred source control system: git://git.apache.org/ivory
> >
> > === Issue Tracking ===
> > JIRA IVORY
> >
> > == Initial Committers ==
> >  * Srikanth Sundarrajan (Srikanth.Sundarrajan AT inmobi DOT com)
> >  * Shwetha GS (shwetha.gs AT inmobi DOT com)
> >  * Shaik Idris (shaik.idris AT inmobi DOT com)
> >  * Venkatesh Seetharam (Venkatesh AT apache DOT org)
> >  * Sanjay Radia (sanjay AT apache DOT org)
> >  * Sharad Agarwal (sharad AT apache DOT org)
> >  * Amareshwari SR (amareshwari AT apache DOT org)
> >  * Samarth Gupta (samarth.gupta AT inmobi DOT com)
> >  * Rishu Mehrothra (rishu.mehrothra AT inmobi DOT com)
> >
> > == Affiliations ==
> >  * Srikanth Sundarrajan (InMobi)
> >  * Shwetha GS (InMobi)
> >  * Shaik Idris (InMobi)
> >  * Venkatesh Seetharam (Hortonworks Inc.)
> >  * Sanjay Radia (Hortonworks Inc.)
> >  * Sharad Agarwal (InMobi)
> >  * Amareshwari SR (InMobi)
> >  * Samarth Gupta (InMobi)
> >  * Rishu Mehrothra (InMobi)
> >
> > == Sponsors ==
> >
> > === Champion ===
> >  * Arun C Murthy (acmurthy at apache dot org)
> >
> > === Nominated Mentors ===
> >  * Alan Gates (gates AT apache DOT org)
> >  * Chris Douglas (cdouglas AT apache DOT org)
> >  * Devaraj  Das (ddas AT apache DOT org)
> >  * Owen O’Malley (omalley AT apache DOT org)
> >
> > === Sponsoring Entity ===
> > Incubator PMC
> >
> > --
> > _____________________________________________________________
> > The information contained in this communication is intended solely for
> the
> > use of the individual or entity to whom it is addressed and others
> > authorized to receive it. It may contain confidential or legally
> privileged
> > information. If you are not the intended recipient you are hereby
> notified
> > that any disclosure, copying, distribution or taking any action in
> reliance
> > on the contents of this information is strictly prohibited and may be
> > unlawful. If you have received this communication in error, please notify
> > us immediately by responding to this email and then delete it from your
> > system. The firm is neither liable for the proper and complete
> transmission
> > of the information contained in this communication nor for any delay in
> its
> > receipt.
> >
>



-- 
Regards,
Venkatesh

http://in.linkedin.com/in/seetharamvenkatesh
http://about.me/SeetharamVenkatesh

“Perfection (in design) is achieved not when there is nothing more to add,
but rather when there is nothing more to take away.”
- Antoine de Saint-Exupéry

Re: [PROPOSAL] Ivory - Hadoop data management and processing platform

Posted by Henry Saputra <he...@gmail.com>.

HI Srikanth,

So does the Ivory name stay or once the podling near graduation it will try
to find another name?

- Henry


On Fri, Mar 15, 2013 at 12:34 AM, Srikanth Sundarrajan <
srikanth.sundarrajan@inmobi.com> wrote:

> Made few edits to the proposal (
> http://wiki.apache.org/incubator/IvoryProposal) as per the feedback
> received so far.
>
> Regards
> Srikanth Sundarrajan
>
> = Ivory Proposal =
>
> == Abstract ==
> Ivory is a data processing and management solution for Hadoop designed
> for data motion, coordination of data pipelines, lifecycle management,
> and data discovery. Ivory enables end consumers to quickly onboard
> their data and its associated processing and management tasks on
> Hadoop clusters.
>
> == Proposal ==
> Ivory will enable easy data management via declarative mechanism for
> Hadoop. Users of Ivory platform simply define infrastructure
> endpoints, data sets and processing rules declaratively. These
> declarative configurations are expressed in such a way that the
> dependencies between these configured entities are explicitly
> described. This information about inter-dependencies between various
> entities allows Ivory to orchestrate and manage various data
> management functions.
>
> The key use cases that Ivory addresses are:
>  * Data Motion
>  * Process orchestration and scheduling
>  * Policy-based Lifecycle Management
>  * Data Discovery
>  * Operability/Usability
>
> With these features it is possible for users to onboard their data
> sets with a comprehensive and holistic understanding of how, when and
> where their data is managed across its lifecycle. Complex functions
> such as retrying failures, identifying possible SLA breaches or
> automated handling of input data changes are now simple directives.
> All the administrative functions and user level functions are
> available via RESTful APIs. CLI is simply a wrapper over the RESTful
> APIs.
>
> == Background ==
> Hadoop and its ecosystem of products have made storing and processing
> massive amounts of data commonplace. This has enabled numerous
> organizations to gain valuable insights that they never could have
> achieved in the past. While it is easy to leverage Hadoop for
> crunching large volumes of data, organizing data, managing life cycle
> of data and processing data is fairly involved. This is solved
> adequately well in a classic data platform involving data warehouses
> and standard ETL (extract-transform-load) tools, but remains largely
> unsolved today. In addition to data processing complexities, Hadoop
> presents new sets of challenges and opportunities relating to
> management of data.
>
> Data Management on Hadoop encompasses data motion, process
> orchestration, lifecycle management, data discovery, etc. among other
> concerns that are beyond ETL. Ivory is a new data processing and
> management platform for Hadoop that solves this problem and creates
> additional opportunities by building on existing components within the
> Hadoop ecosystem (ex. Apache Oozie, Apache Hadoop DistCp etc.) without
> reinventing the wheel. Ivory has been in production at InMobi, going
> on its second year and has been managing hundreds of feeds and
> processes.
>
> Ivory is being developed by engineers employed with InMobi and
> Hortonworks. This platform addition will increase the adoption of
> Apache Hadoop by driving data management tractable for end users. We
> are therefore proposing to make Ivory an Apache open source project.
>
> == Rationale ==
> The Ivory project aims to improve the usability of Apache Hadoop. As a
> result Apache Hadoop will grow its community of users by increasing
> the places Hadoop can be utilized and the use cases it will solve. By
> developing Ivory in Apache we hope to gather a diverse community of
> contributors, helping to ensure that Ivory is deployable for a broad
> range of scenarios. Members of the Hadoop development community will
> be able to influence Ivory’s roadmap, and contribute to it. We believe
> having Ivory as part of the Apache Hadoop ecosystem will be a great
> benefit to all of Hadoop's users.
>
> == Current Status ==
> Ivory is widely deployed in production within InMobi and moving on to
> its second year. A version with a valuable set of features is
> developed by the list of initial committers and is hosted on github.
>
> === Meritocracy ===
> Our intent with this incubator proposal is to start building a diverse
> developer community around Ivory following the Apache meritocracy
> model. We have wanted to make the project open source and encourage
> contributors from multiple organizations from the start. We plan to
> provide plenty of support to new developers and to quickly recruit
> those who make solid contributions to committer status.
>
> === Community ===
> We are happy to report that the initial team already represents
> multiple organizations. We hope to extend the user and developer base
> further in the future and build a solid open source community around
> Ivory.
>
> === Core Developers ===
> Ivory is currently being developed by three engineers from InMobi –
> Srikanth Sunderrajan, Shwetha G S, and Shaik Idris, two Hortonworks
> employees – Sanjay Radia and Venkatesh Seetharam. In addition, Rohini
> Palaniswamy and Thiruvel Thirumoolan, were also involved in the
> initial design discussions. Srikanth, Shwetha and Shaik are the
> original developers. All the engineers have built two generations of
> Data Management on Hadoop, having deep expertise in Hadoop and are
> quite familiar with the Hadoop Ecosystem. Samarth Gupta & Rishu
> Mehrothra, both from InMobi have build the QA automation for Ivory.
>
> === Alignment ===
> The ASF is a natural host for Ivory given that it is already the home
> of Hadoop, Pig, Knox, HCatalog, and other emerging “big data” software
> projects. Ivory has been designed to solve the data management
> challenges and opportunities of the Hadoop ecosystem family of
> products. Ivory fills the gap that Hadoop ecosystem has been lacking
> in the areas of data processing and data lifecycle management.
>
> == Known Risks ==
>
> === Orphaned products & Reliance on Salaried Developers ===
> The core developers plan to work full time on the project. There is
> very little risk of Ivory getting orphaned. Ivory is in use by
> companies we work for so the companies have an interest in its
> continued vitality.
>
> === Inexperience with Open Source ===
> All of the core developers are active users and followers of open
> source. Srikanth Sundarrajan has been contributing patches to Apache
> Hadoop and Apache Oozie, Shwetha GS has been contributing patches to
> Apache Oozie.  Seetharam Venkatesh is a committer on Apache Knox.
> Sharad Agarwal, Amareshwari SR (also a Apache Hive PMC member) and
> Sanjay Radia are PMC members on Apache Hadoop.
>
> === Homogeneous Developers ===
> The current core developers are from diverse set of organizations such
> as InMobi and Hortonworks. We expect to quickly establish a developer
> community that includes contributors from several corporations post
> incubation.
>
> === Reliance on Salaried Developers ===
> Currently, most developers are paid to do work on Ivory but few are
> contributing in their spare time. However, once the project has a
> community built around it post incubation, we expect to get committers
> and developers from outside the current core developers.
>
> === Relationships with Other Apache Products ===
> Ivory is going to be used by the users of Hadoop and the Hadoop
> ecosystem in general.
>
> === A Excessive Fascination with the Apache Brand ===
> While we respect the reputation of the Apache brand and have no doubts
> that it will attract contributors and users, our interest is primarily
> to give Ivory a solid home as an open source project following an
> established development model. We have also given reasons in the
> Rationale and Alignment sections.
>
> == Documentation ==http://wiki.apache.org/incubator/IvoryProposal
>
> == Initial Source ==
> The source is currently in github repository at:
> https://github.com/sriksun/Ivory
>
> == Source and Intellectual Property Submission Plan ==
> The complete Ivory code is under Apache Software License 2.
>
> == External Dependencies ==
> The dependencies all have Apache compatible licenses. These include
> BSD, MIT licensed dependencies.
>
> == Cryptography ==
> None
>
> == Required Resources ==
>
> === Mailing lists ===
>
>  * ivory-dev AT incubator DOT apache DOT org
>  * ivory-commits AT incubator DOT apache DOT org
>  * ivory-user AT incubator apache DOT org
>  * ivory-private AT incubator DOT apache DOT org
>
> === Subversion Directory ===
> Git is the preferred source control system: git://git.apache.org/ivory
>
> === Issue Tracking ===
> JIRA IVORY
>
> == Initial Committers ==
>  * Srikanth Sundarrajan (Srikanth.Sundarrajan AT inmobi DOT com)
>  * Shwetha GS (shwetha.gs AT inmobi DOT com)
>  * Shaik Idris (shaik.idris AT inmobi DOT com)
>  * Venkatesh Seetharam (Venkatesh AT apache DOT org)
>  * Sanjay Radia (sanjay AT apache DOT org)
>  * Sharad Agarwal (sharad AT apache DOT org)
>  * Amareshwari SR (amareshwari AT apache DOT org)
>  * Samarth Gupta (samarth.gupta AT inmobi DOT com)
>  * Rishu Mehrothra (rishu.mehrothra AT inmobi DOT com)
>
> == Affiliations ==
>  * Srikanth Sundarrajan (InMobi)
>  * Shwetha GS (InMobi)
>  * Shaik Idris (InMobi)
>  * Venkatesh Seetharam (Hortonworks Inc.)
>  * Sanjay Radia (Hortonworks Inc.)
>  * Sharad Agarwal (InMobi)
>  * Amareshwari SR (InMobi)
>  * Samarth Gupta (InMobi)
>  * Rishu Mehrothra (InMobi)
>
> == Sponsors ==
>
> === Champion ===
>  * Arun C Murthy (acmurthy at apache dot org)
>
> === Nominated Mentors ===
>  * Alan Gates (gates AT apache DOT org)
>  * Chris Douglas (cdouglas AT apache DOT org)
>  * Devaraj  Das (ddas AT apache DOT org)
>  * Owen O’Malley (omalley AT apache DOT org)
>
> === Sponsoring Entity ===
> Incubator PMC
>
> --
> _____________________________________________________________
> The information contained in this communication is intended solely for the
> use of the individual or entity to whom it is addressed and others
> authorized to receive it. It may contain confidential or legally privileged
> information. If you are not the intended recipient you are hereby notified
> that any disclosure, copying, distribution or taking any action in reliance
> on the contents of this information is strictly prohibited and may be
> unlawful. If you have received this communication in error, please notify
> us immediately by responding to this email and then delete it from your
> system. The firm is neither liable for the proper and complete transmission
> of the information contained in this communication nor for any delay in its
> receipt.
>

Re: [PROPOSAL] Ivory - Hadoop data management and processing platform

Posted by Srikanth Sundarrajan <sr...@inmobi.com>.

Made few edits to the proposal (
http://wiki.apache.org/incubator/IvoryProposal) as per the feedback
received so far.

Regards
Srikanth Sundarrajan

= Ivory Proposal =

== Abstract ==
Ivory is a data processing and management solution for Hadoop designed
for data motion, coordination of data pipelines, lifecycle management,
and data discovery. Ivory enables end consumers to quickly onboard
their data and its associated processing and management tasks on
Hadoop clusters.

== Proposal ==
Ivory will enable easy data management via declarative mechanism for
Hadoop. Users of Ivory platform simply define infrastructure
endpoints, data sets and processing rules declaratively. These
declarative configurations are expressed in such a way that the
dependencies between these configured entities are explicitly
described. This information about inter-dependencies between various
entities allows Ivory to orchestrate and manage various data
management functions.

The key use cases that Ivory addresses are:
 * Data Motion
 * Process orchestration and scheduling
 * Policy-based Lifecycle Management
 * Data Discovery
 * Operability/Usability

With these features it is possible for users to onboard their data
sets with a comprehensive and holistic understanding of how, when and
where their data is managed across its lifecycle. Complex functions
such as retrying failures, identifying possible SLA breaches or
automated handling of input data changes are now simple directives.
All the administrative functions and user level functions are
available via RESTful APIs. CLI is simply a wrapper over the RESTful
APIs.

== Background ==
Hadoop and its ecosystem of products have made storing and processing
massive amounts of data commonplace. This has enabled numerous
organizations to gain valuable insights that they never could have
achieved in the past. While it is easy to leverage Hadoop for
crunching large volumes of data, organizing data, managing life cycle
of data and processing data is fairly involved. This is solved
adequately well in a classic data platform involving data warehouses
and standard ETL (extract-transform-load) tools, but remains largely
unsolved today. In addition to data processing complexities, Hadoop
presents new sets of challenges and opportunities relating to
management of data.

Data Management on Hadoop encompasses data motion, process
orchestration, lifecycle management, data discovery, etc. among other
concerns that are beyond ETL. Ivory is a new data processing and
management platform for Hadoop that solves this problem and creates
additional opportunities by building on existing components within the
Hadoop ecosystem (ex. Apache Oozie, Apache Hadoop DistCp etc.) without
reinventing the wheel. Ivory has been in production at InMobi, going
on its second year and has been managing hundreds of feeds and
processes.

Ivory is being developed by engineers employed with InMobi and
Hortonworks. This platform addition will increase the adoption of
Apache Hadoop by driving data management tractable for end users. We
are therefore proposing to make Ivory an Apache open source project.

== Rationale ==
The Ivory project aims to improve the usability of Apache Hadoop. As a
result Apache Hadoop will grow its community of users by increasing
the places Hadoop can be utilized and the use cases it will solve. By
developing Ivory in Apache we hope to gather a diverse community of
contributors, helping to ensure that Ivory is deployable for a broad
range of scenarios. Members of the Hadoop development community will
be able to influence Ivory’s roadmap, and contribute to it. We believe
having Ivory as part of the Apache Hadoop ecosystem will be a great
benefit to all of Hadoop's users.

== Current Status ==
Ivory is widely deployed in production within InMobi and moving on to
its second year. A version with a valuable set of features is
developed by the list of initial committers and is hosted on github.

=== Meritocracy ===
Our intent with this incubator proposal is to start building a diverse
developer community around Ivory following the Apache meritocracy
model. We have wanted to make the project open source and encourage
contributors from multiple organizations from the start. We plan to
provide plenty of support to new developers and to quickly recruit
those who make solid contributions to committer status.

=== Community ===
We are happy to report that the initial team already represents
multiple organizations. We hope to extend the user and developer base
further in the future and build a solid open source community around
Ivory.

=== Core Developers ===
Ivory is currently being developed by three engineers from InMobi –
Srikanth Sunderrajan, Shwetha G S, and Shaik Idris, two Hortonworks
employees – Sanjay Radia and Venkatesh Seetharam. In addition, Rohini
Palaniswamy and Thiruvel Thirumoolan, were also involved in the
initial design discussions. Srikanth, Shwetha and Shaik are the
original developers. All the engineers have built two generations of
Data Management on Hadoop, having deep expertise in Hadoop and are
quite familiar with the Hadoop Ecosystem. Samarth Gupta & Rishu
Mehrothra, both from InMobi have build the QA automation for Ivory.

=== Alignment ===
The ASF is a natural host for Ivory given that it is already the home
of Hadoop, Pig, Knox, HCatalog, and other emerging “big data” software
projects. Ivory has been designed to solve the data management
challenges and opportunities of the Hadoop ecosystem family of
products. Ivory fills the gap that Hadoop ecosystem has been lacking
in the areas of data processing and data lifecycle management.

== Known Risks ==

=== Orphaned products & Reliance on Salaried Developers ===
The core developers plan to work full time on the project. There is
very little risk of Ivory getting orphaned. Ivory is in use by
companies we work for so the companies have an interest in its
continued vitality.

=== Inexperience with Open Source ===
All of the core developers are active users and followers of open
source. Srikanth Sundarrajan has been contributing patches to Apache
Hadoop and Apache Oozie, Shwetha GS has been contributing patches to
Apache Oozie.  Seetharam Venkatesh is a committer on Apache Knox.
Sharad Agarwal, Amareshwari SR (also a Apache Hive PMC member) and
Sanjay Radia are PMC members on Apache Hadoop.

=== Homogeneous Developers ===
The current core developers are from diverse set of organizations such
as InMobi and Hortonworks. We expect to quickly establish a developer
community that includes contributors from several corporations post
incubation.

=== Reliance on Salaried Developers ===
Currently, most developers are paid to do work on Ivory but few are
contributing in their spare time. However, once the project has a
community built around it post incubation, we expect to get committers
and developers from outside the current core developers.

=== Relationships with Other Apache Products ===
Ivory is going to be used by the users of Hadoop and the Hadoop
ecosystem in general.

=== A Excessive Fascination with the Apache Brand ===
While we respect the reputation of the Apache brand and have no doubts
that it will attract contributors and users, our interest is primarily
to give Ivory a solid home as an open source project following an
established development model. We have also given reasons in the
Rationale and Alignment sections.

== Documentation ==http://wiki.apache.org/incubator/IvoryProposal

== Initial Source ==
The source is currently in github repository at:
https://github.com/sriksun/Ivory

== Source and Intellectual Property Submission Plan ==
The complete Ivory code is under Apache Software License 2.

== External Dependencies ==
The dependencies all have Apache compatible licenses. These include
BSD, MIT licensed dependencies.

== Cryptography ==
None

== Required Resources ==

=== Mailing lists ===

 * ivory-dev AT incubator DOT apache DOT org
 * ivory-commits AT incubator DOT apache DOT org
 * ivory-user AT incubator apache DOT org
 * ivory-private AT incubator DOT apache DOT org

=== Subversion Directory ===
Git is the preferred source control system: git://git.apache.org/ivory

=== Issue Tracking ===
JIRA IVORY

== Initial Committers ==
 * Srikanth Sundarrajan (Srikanth.Sundarrajan AT inmobi DOT com)
 * Shwetha GS (shwetha.gs AT inmobi DOT com)
 * Shaik Idris (shaik.idris AT inmobi DOT com)
 * Venkatesh Seetharam (Venkatesh AT apache DOT org)
 * Sanjay Radia (sanjay AT apache DOT org)
 * Sharad Agarwal (sharad AT apache DOT org)
 * Amareshwari SR (amareshwari AT apache DOT org)
 * Samarth Gupta (samarth.gupta AT inmobi DOT com)
 * Rishu Mehrothra (rishu.mehrothra AT inmobi DOT com)

== Affiliations ==
 * Srikanth Sundarrajan (InMobi)
 * Shwetha GS (InMobi)
 * Shaik Idris (InMobi)
 * Venkatesh Seetharam (Hortonworks Inc.)
 * Sanjay Radia (Hortonworks Inc.)
 * Sharad Agarwal (InMobi)
 * Amareshwari SR (InMobi)
 * Samarth Gupta (InMobi)
 * Rishu Mehrothra (InMobi)

== Sponsors ==

=== Champion ===
 * Arun C Murthy (acmurthy at apache dot org)

=== Nominated Mentors ===
 * Alan Gates (gates AT apache DOT org)
 * Chris Douglas (cdouglas AT apache DOT org)
 * Devaraj  Das (ddas AT apache DOT org)
 * Owen O’Malley (omalley AT apache DOT org)

=== Sponsoring Entity ===
Incubator PMC

-- 
_____________________________________________________________
The information contained in this communication is intended solely for the 
use of the individual or entity to whom it is addressed and others 
authorized to receive it. It may contain confidential or legally privileged 
information. If you are not the intended recipient you are hereby notified 
that any disclosure, copying, distribution or taking any action in reliance 
on the contents of this information is strictly prohibited and may be 
unlawful. If you have received this communication in error, please notify 
us immediately by responding to this email and then delete it from your 
system. The firm is neither liable for the proper and complete transmission 
of the information contained in this communication nor for any delay in its 
receipt.

Re: [PROPOSAL] Ivory - Hadoop data management and processing platform

Posted by Arun C Murthy <ac...@hortonworks.com>.

+1, this is a great addition!

On Mar 13, 2013, at 10:00 AM, Srikanth Sundarrajan wrote:

> = Ivory Proposal =
> 
> == Abstract ==
> Ivory is a data processing and management solution for Hadoop designed for
> data motion, coordination of data pipelines, lifecycle management, and
> data discovery. Ivory enables end consumers to quickly onboard their data
> and its associated processing and management tasks on Hadoop clusters.
> 
> == Proposal ==
> Ivory will enable easy data management via declarative mechanism for
> Hadoop. Users of Ivory platform simply define infrastructure endpoints,
> data sets and processing rules declaratively. These configurations
> are expressed in such a way that the dependencies between
> these entities are explicitly described. This information about
> inter-dependencies between various entities allows Ivory to orchestrate and
> manage various data management functions.
> 
> The key use cases that Ivory addresses are:
> * Data Motion
> * Process orchestration and scheduling
> * Policy-based Lifecycle Management
> * Data Discovery
> * Operability/Usability
> 
> With these features it is possible for users to onboard their data sets
> with
> a comprehensive and holistic understanding of how, when and where their
> data
> is managed across its lifecycle. Complex functions such as retrying
> failures,
> identifying possible SLA breaches or automated handling of input data
> changes
> are now simple directives. All the administrative functions and user level
> functions are available via RESTful APIs. CLI is simply a wrapper over the
> RESTful APIs.
> 
> == Background ==
> Hadoop and its ecosystem of products have made storing and processing
> massive
> amounts of data commonplace. This has enabled numerous organizations to
> gain
> valuable insights that they never could have achieved in the past. While it
> is easy to leverage Hadoop for crunching large volumes of data, organizing
> data, managing life cycle of data and processing data is fairly involved.
> This is solved adequately well in a classic data platform involving data
> warehouses and standard ETL (extract-transform-load) tools, but remains
> largely
> unsolved today. In addition to data processing complexities, Hadoop
> presents
> new sets of challenges and opportunities relating to management of data.
> 
> Data Management on Hadoop encompasses data motion, process orchestration,
> lifecycle management, data discovery, etc. among other concerns that are
> beyond
> ETL. Ivory is a new data processing and management platform for Hadoop that
> solves this problem and creates additional opportunities by building on
> existing
> components within the Hadoop ecosystem (ex. Apache Oozie, Apache Hadoop
> DistCp
> etc.) without reinventing the wheel. Ivory has been in production at
> InMobi,
> going on its second year and has been managing hundreds of feeds and
> processes.
> 
> Ivory is being developed by engineers employed with InMobi, Hortonworks and
> Yahoo!. This platform addition will increase the adoption of Apache Hadoop
> by
> driving data management tractable for end users. We are therefore proposing
> to
> make Ivory an Apache open source project.
> 
> == Rationale ==
> The Ivory project aims to improve the usability of Apache Hadoop. As a
> result
> Apache Hadoop will grow its community of users by increasing the places
> Hadoop
> can be utilized and the use cases it will solve. By developing Ivory in
> Apache
> we hope to gather a diverse community of contributors, helping to ensure
> that
> Ivory is deployable for a broad range of scenarios. Members of the Hadoop
> development community will be able to influence Ivory’s roadmap, and
> contribute
> to it. We believe having Ivory as part of the Apache Hadoop ecosystem will
> be
> a great benefit to all of Hadoop's users.
> 
> == Current Status ==
> Ivory is widely deployed in production within InMobi and moving on to its
> second year. A version with a valuable set of features is developed by the
> list of initial committers and is hosted on github.
> 
> === Meritocracy ===
> Our intent with this incubator proposal is to start building a diverse
> developer
> community around Ivory following the Apache meritocracy model. We have
> wanted to
> make the project open source and encourage contributors from multiple
> organizations from the start. We plan to provide plenty of support to new
> developers and to quickly recruit those who make solid contributions to
> committer status.
> 
> === Community ===
> We are happy to report that the initial team already represents multiple
> organizations. We hope to extend the user and developer base further in the
> future and build a solid open source community around Ivory.
> 
> === Core Developers ===
> Ivory is currently being developed by three engineers from InMobi –
> Srikanth Sundarrajan, Shwetha G S, and Shaik Idris, two Hortonworks
> employees –
> Sanjay Radia and Venkatesh Seetharam. In addition, two Yahoo! employees,
> Rohini Palaniswamy and Thiruvel Thirumoolan, are also involved. Srikanth,
> Shwetha and Shaik are the original developers. All the engineers have built
> two generations of Data Management on Hadoop, having deep expertise in
> Hadoop
> and are quite familiar with the Hadoop Ecosystem.
> 
> === Alignment ===
> The ASF is a natural host for Ivory given that it is already the home of
> Hadoop,
> Pig, Knox, HCatalog, and other emerging “big data” software projects. Ivory
> has
> been designed to solve the data management challenges and opportunities of
> the
> Hadoop ecosystem family of products. Ivory fills the gap that Hadoop
> ecosystem
> has been lacking in the areas of data processing and data lifecycle
> management.
> 
> == Known Risks ==
> 
> === Orphaned products & Reliance on Salaried Developers ===
> The core developers plan to work full time on the project. There is very
> little
> risk of Ivory getting orphaned. Ivory is in use by companies we work for so
> the
> companies have an interest in its continued vitality.
> 
> === Inexperience with Open Source ===
> All of the core developers are active users and followers of open source.
> Srikanth Sundarrajan has been contributing patches to Apache Hadoop and
> Apache
> Oozie, Shwetha GS has been contributing patches to Apache Oozie.
> Seetharam Venkatesh is a committer on Apache Knox. Rohini Palaniswamy is a
> committer on Apache PIG. Sharad Agarwal, Amareshwari SR (also a Apache Hive
> PMC member) and Sanjay Radia are PMC members on Apache Hadoop.
> 
> === Homogeneous Developers ===
> The current core developers are from diverse set of organizations such as
> InMobi, Hortonworks, and, Yahoo!. We expect to quickly establish a
> developer
> community that includes contributors from several corporations post
> incubation.
> 
> === Reliance on Salaried Developers ===
> Currently, most developers are paid to do work on Ivory but few are
> contributing
> in their spare time. However, once the project has a community built around
> it
> post incubation, we expect to get committers and developers from outside
> the
> current core developers.
> 
> === Relationships with Other Apache Products ===
> Ivory is going to be used by the users of Hadoop and the Hadoop ecosystem
> in
> general.
> 
> === A Excessive Fascination with the Apache Brand ===
> While we respect the reputation of the Apache brand and have no doubts that
> it
> will attract contributors and users, our interest is primarily to give
> Ivory a
> solid home as an open source project following an established development
> model.
> We have also given reasons in the Rationale and Alignment sections.
> 
> == Documentation ==
> There is documentation in github repository at:
> https://github.com/sriksun/Ivory
> 
> == Initial Source ==
> The source is currently in github repository at:
> https://github.com/sriksun/Ivory
> 
> == Source and Intellectual Property Submission Plan ==
> The complete Ivory code is under Apache Software License 2.
> 
> == External Dependencies ==
> The dependencies all have Apache compatible licenses. These include BSD,
> MIT licensed dependencies.
> 
> == Cryptography ==
> None
> 
> == Required Resources ==
> 
> === Mailing lists ===
> 
> * ivory-dev AT incubator DOT apache DOT org
> * ivory-commits AT incubator DOT apache DOT org
> * ivory-user AT incubator apache DOT org
> * ivory-private AT incubator DOT apache DOT org
> 
> === Subversion Directory ===
> https://svn.apache.org/repos/asf/incubator/ivory
> 
> === Issue Tracking ===
> JIRA IVORY
> 
> == Initial Committers ==
> * Srikanth Sundarrajan (Srikanth.Sundarrajan AT inmobi DOT com)
> * Shwetha GS (shwetha.gs AT inmobi DOT com)
> * Shaik Idris (shaik.idris AT inmobi DOT com)
> * Venkatesh Seetharam (Venkatesh AT apache DOT com)
> * Rohini Palaniswamy (rohinip AT yahoo-inc DOT com)
> * Thiruvel Thirumoolan (thiruvel AT yahoo-inc DOT com)
> * Sanjay Radia (sanjay AT apache DOT org)
> * Sharad Agarwal (sharad AT apache DOT org)
> * Amareshwari SR (amareshwari AT apache DOT org)
> 
> == Affiliations ==
> * Srikanth Sundarrajan (InMobi)
> * Shwetha GS (InMobi)
> * Shaik Idris (InMobi)
> * Venkatesh Seetharam (Hortonworks Inc)
> * Rohini Palaniswamy (Yahoo! Inc)
> * Thiruvel Thirumoolan (Yahoo! Inc)
> * Sanjay Radia (Hortonworks Inc)
> * Sharad Agarwal (InMobi)
> * Amareshwari SR (InMobi)
> 
> == Sponsors ==
> 
> === Champion ===
> * Arun C Murthy (acmurthy at apache dot org)
> 
> === Nominated Mentors ===
> * Alan Gates (gates AT apache DOT org)
> * Chris Douglas (cdouglas AT apache DOT org)
> * Devaraj  Das (ddas AT apache DOT org)
> * Owen O’Malley (omalley AT apache DOT org)
> 
> === Sponsoring Entity ===
> Incubator PMC
> 
> -- 
> _____________________________________________________________
> The information contained in this communication is intended solely for the 
> use of the individual or entity to whom it is addressed and others 
> authorized to receive it. It may contain confidential or legally privileged 
> information. If you are not the intended recipient you are hereby notified 
> that any disclosure, copying, distribution or taking any action in reliance 
> on the contents of this information is strictly prohibited and may be 
> unlawful. If you have received this communication in error, please notify 
> us immediately by responding to this email and then delete it from your 
> system. The firm is neither liable for the proper and complete transmission 
> of the information contained in this communication nor for any delay in its 
> receipt.

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/

Re: [PROPOSAL] Ivory - Hadoop data management and processing platform

Posted by Jarek Jarcec Cecho <ja...@apache.org>.

Hi Srikanth,
I've read the proposal and documentation available on Github and the project seems very interesting to me. I'm currently mainly focusing on Apache Sqoop [1] and I can see a lot of opportunities for integration. I'll be more than happy to help with Ivory going forward. In case that you're open to the idea of "open enrolment", I'll be more than happy to sign up as initial contributor.

Jarcec

P.S. - I found few nits in the docs so I've created github pull request to fix them.

Links:
1: http://sqoop.apache.org/

On Wed, Mar 13, 2013 at 10:30:01PM +0530, Srikanth Sundarrajan wrote:
> = Ivory Proposal =
> 
> == Abstract ==
> Ivory is a data processing and management solution for Hadoop designed for
> data motion, coordination of data pipelines, lifecycle management, and
> data discovery. Ivory enables end consumers to quickly onboard their data
> and its associated processing and management tasks on Hadoop clusters.
> 
> == Proposal ==
> Ivory will enable easy data management via declarative mechanism for
> Hadoop. Users of Ivory platform simply define infrastructure endpoints,
> data sets and processing rules declaratively. These configurations
> are expressed in such a way that the dependencies between
> these entities are explicitly described. This information about
> inter-dependencies between various entities allows Ivory to orchestrate and
> manage various data management functions.
> 
> The key use cases that Ivory addresses are:
>  * Data Motion
>  * Process orchestration and scheduling
>  * Policy-based Lifecycle Management
>  * Data Discovery
>  * Operability/Usability
> 
> With these features it is possible for users to onboard their data sets
> with
> a comprehensive and holistic understanding of how, when and where their
> data
> is managed across its lifecycle. Complex functions such as retrying
> failures,
> identifying possible SLA breaches or automated handling of input data
> changes
> are now simple directives. All the administrative functions and user level
> functions are available via RESTful APIs. CLI is simply a wrapper over the
> RESTful APIs.
> 
> == Background ==
> Hadoop and its ecosystem of products have made storing and processing
> massive
> amounts of data commonplace. This has enabled numerous organizations to
> gain
> valuable insights that they never could have achieved in the past. While it
> is easy to leverage Hadoop for crunching large volumes of data, organizing
> data, managing life cycle of data and processing data is fairly involved.
> This is solved adequately well in a classic data platform involving data
> warehouses and standard ETL (extract-transform-load) tools, but remains
> largely
> unsolved today. In addition to data processing complexities, Hadoop
> presents
> new sets of challenges and opportunities relating to management of data.
> 
> Data Management on Hadoop encompasses data motion, process orchestration,
> lifecycle management, data discovery, etc. among other concerns that are
> beyond
> ETL. Ivory is a new data processing and management platform for Hadoop that
> solves this problem and creates additional opportunities by building on
> existing
> components within the Hadoop ecosystem (ex. Apache Oozie, Apache Hadoop
> DistCp
> etc.) without reinventing the wheel. Ivory has been in production at
> InMobi,
> going on its second year and has been managing hundreds of feeds and
> processes.
> 
> Ivory is being developed by engineers employed with InMobi, Hortonworks and
> Yahoo!. This platform addition will increase the adoption of Apache Hadoop
> by
> driving data management tractable for end users. We are therefore proposing
> to
> make Ivory an Apache open source project.
> 
> == Rationale ==
> The Ivory project aims to improve the usability of Apache Hadoop. As a
> result
> Apache Hadoop will grow its community of users by increasing the places
> Hadoop
> can be utilized and the use cases it will solve. By developing Ivory in
> Apache
> we hope to gather a diverse community of contributors, helping to ensure
> that
> Ivory is deployable for a broad range of scenarios. Members of the Hadoop
> development community will be able to influence Ivory’s roadmap, and
> contribute
> to it. We believe having Ivory as part of the Apache Hadoop ecosystem will
> be
> a great benefit to all of Hadoop's users.
> 
> == Current Status ==
> Ivory is widely deployed in production within InMobi and moving on to its
> second year. A version with a valuable set of features is developed by the
> list of initial committers and is hosted on github.
> 
> === Meritocracy ===
> Our intent with this incubator proposal is to start building a diverse
> developer
> community around Ivory following the Apache meritocracy model. We have
> wanted to
> make the project open source and encourage contributors from multiple
> organizations from the start. We plan to provide plenty of support to new
> developers and to quickly recruit those who make solid contributions to
> committer status.
> 
> === Community ===
> We are happy to report that the initial team already represents multiple
> organizations. We hope to extend the user and developer base further in the
> future and build a solid open source community around Ivory.
> 
> === Core Developers ===
> Ivory is currently being developed by three engineers from InMobi –
> Srikanth Sundarrajan, Shwetha G S, and Shaik Idris, two Hortonworks
> employees –
> Sanjay Radia and Venkatesh Seetharam. In addition, two Yahoo! employees,
> Rohini Palaniswamy and Thiruvel Thirumoolan, are also involved. Srikanth,
> Shwetha and Shaik are the original developers. All the engineers have built
> two generations of Data Management on Hadoop, having deep expertise in
> Hadoop
> and are quite familiar with the Hadoop Ecosystem.
> 
> === Alignment ===
> The ASF is a natural host for Ivory given that it is already the home of
> Hadoop,
> Pig, Knox, HCatalog, and other emerging “big data” software projects. Ivory
> has
> been designed to solve the data management challenges and opportunities of
> the
> Hadoop ecosystem family of products. Ivory fills the gap that Hadoop
> ecosystem
> has been lacking in the areas of data processing and data lifecycle
> management.
> 
> == Known Risks ==
> 
> === Orphaned products & Reliance on Salaried Developers ===
> The core developers plan to work full time on the project. There is very
> little
> risk of Ivory getting orphaned. Ivory is in use by companies we work for so
> the
> companies have an interest in its continued vitality.
> 
> === Inexperience with Open Source ===
> All of the core developers are active users and followers of open source.
> Srikanth Sundarrajan has been contributing patches to Apache Hadoop and
> Apache
> Oozie, Shwetha GS has been contributing patches to Apache Oozie.
> Seetharam Venkatesh is a committer on Apache Knox. Rohini Palaniswamy is a
> committer on Apache PIG. Sharad Agarwal, Amareshwari SR (also a Apache Hive
> PMC member) and Sanjay Radia are PMC members on Apache Hadoop.
> 
> === Homogeneous Developers ===
> The current core developers are from diverse set of organizations such as
> InMobi, Hortonworks, and, Yahoo!. We expect to quickly establish a
> developer
> community that includes contributors from several corporations post
> incubation.
> 
> === Reliance on Salaried Developers ===
> Currently, most developers are paid to do work on Ivory but few are
> contributing
> in their spare time. However, once the project has a community built around
> it
> post incubation, we expect to get committers and developers from outside
> the
> current core developers.
> 
> === Relationships with Other Apache Products ===
> Ivory is going to be used by the users of Hadoop and the Hadoop ecosystem
> in
> general.
> 
> === A Excessive Fascination with the Apache Brand ===
> While we respect the reputation of the Apache brand and have no doubts that
> it
> will attract contributors and users, our interest is primarily to give
> Ivory a
> solid home as an open source project following an established development
> model.
> We have also given reasons in the Rationale and Alignment sections.
> 
> == Documentation ==
> There is documentation in github repository at:
> https://github.com/sriksun/Ivory
> 
> == Initial Source ==
> The source is currently in github repository at:
> https://github.com/sriksun/Ivory
> 
> == Source and Intellectual Property Submission Plan ==
> The complete Ivory code is under Apache Software License 2.
> 
> == External Dependencies ==
> The dependencies all have Apache compatible licenses. These include BSD,
> MIT licensed dependencies.
> 
> == Cryptography ==
> None
> 
> == Required Resources ==
> 
> === Mailing lists ===
> 
>  * ivory-dev AT incubator DOT apache DOT org
>  * ivory-commits AT incubator DOT apache DOT org
>  * ivory-user AT incubator apache DOT org
>  * ivory-private AT incubator DOT apache DOT org
> 
> === Subversion Directory ===
> https://svn.apache.org/repos/asf/incubator/ivory
> 
> === Issue Tracking ===
> JIRA IVORY
> 
> == Initial Committers ==
>  * Srikanth Sundarrajan (Srikanth.Sundarrajan AT inmobi DOT com)
>  * Shwetha GS (shwetha.gs AT inmobi DOT com)
>  * Shaik Idris (shaik.idris AT inmobi DOT com)
>  * Venkatesh Seetharam (Venkatesh AT apache DOT com)
>  * Rohini Palaniswamy (rohinip AT yahoo-inc DOT com)
>  * Thiruvel Thirumoolan (thiruvel AT yahoo-inc DOT com)
>  * Sanjay Radia (sanjay AT apache DOT org)
>  * Sharad Agarwal (sharad AT apache DOT org)
>  * Amareshwari SR (amareshwari AT apache DOT org)
> 
> == Affiliations ==
>  * Srikanth Sundarrajan (InMobi)
>  * Shwetha GS (InMobi)
>  * Shaik Idris (InMobi)
>  * Venkatesh Seetharam (Hortonworks Inc)
>  * Rohini Palaniswamy (Yahoo! Inc)
>  * Thiruvel Thirumoolan (Yahoo! Inc)
>  * Sanjay Radia (Hortonworks Inc)
>  * Sharad Agarwal (InMobi)
>  * Amareshwari SR (InMobi)
> 
> == Sponsors ==
> 
> === Champion ===
>  * Arun C Murthy (acmurthy at apache dot org)
> 
> === Nominated Mentors ===
>  * Alan Gates (gates AT apache DOT org)
>  * Chris Douglas (cdouglas AT apache DOT org)
>  * Devaraj  Das (ddas AT apache DOT org)
>  * Owen O’Malley (omalley AT apache DOT org)
> 
> === Sponsoring Entity ===
> Incubator PMC
> 
> -- 
> _____________________________________________________________
> The information contained in this communication is intended solely for the 
> use of the individual or entity to whom it is addressed and others 
> authorized to receive it. It may contain confidential or legally privileged 
> information. If you are not the intended recipient you are hereby notified 
> that any disclosure, copying, distribution or taking any action in reliance 
> on the contents of this information is strictly prohibited and may be 
> unlawful. If you have received this communication in error, please notify 
> us immediately by responding to this email and then delete it from your 
> system. The firm is neither liable for the proper and complete transmission 
> of the information contained in this communication nor for any delay in its 
> receipt.

Re: [PROPOSAL] Ivory - Hadoop data management and processing platform

Posted by Srikanth Sundarrajan <sr...@inmobi.com>.

Thanks. Yes, Git seems an attractive option for version control.

Regards
Srikanth Sundarrajan

On Thu, Mar 14, 2013 at 6:26 AM, Vinod Kumar Vavilapalli <
vinodkv@hortonworks.com> wrote:

>
> +1, this will be a great addition to the Hadoop eco-system!
>
> The proposal looks fine overall. I quickly searched around for the name
> ivory, it looks to be a safe one, but someone needs to do due diligence?
>
> And I think you can chose to have git as the version control if you feel
> like it.
>
> Thanks,
> +Vinod Kumar Vavilapalli
>
> On Mar 13, 2013, at 10:00 AM, Srikanth Sundarrajan wrote:
>
> > = Ivory Proposal =
> >
> > == Abstract ==
> > Ivory is a data processing and management solution for Hadoop designed
> for
> > data motion, coordination of data pipelines, lifecycle management, and
> > data discovery. Ivory enables end consumers to quickly onboard their data
> > and its associated processing and management tasks on Hadoop clusters.
> >
> > == Proposal ==
> > Ivory will enable easy data management via declarative mechanism for
> > Hadoop. Users of Ivory platform simply define infrastructure endpoints,
> > data sets and processing rules declaratively. These configurations
> > are expressed in such a way that the dependencies between
> > these entities are explicitly described. This information about
> > inter-dependencies between various entities allows Ivory to orchestrate
> and
> > manage various data management functions.
> >
> > The key use cases that Ivory addresses are:
> > * Data Motion
> > * Process orchestration and scheduling
> > * Policy-based Lifecycle Management
> > * Data Discovery
> > * Operability/Usability
> >
> > With these features it is possible for users to onboard their data sets
> > with
> > a comprehensive and holistic understanding of how, when and where their
> > data
> > is managed across its lifecycle. Complex functions such as retrying
> > failures,
> > identifying possible SLA breaches or automated handling of input data
> > changes
> > are now simple directives. All the administrative functions and user
> level
> > functions are available via RESTful APIs. CLI is simply a wrapper over
> the
> > RESTful APIs.
> >
> > == Background ==
> > Hadoop and its ecosystem of products have made storing and processing
> > massive
> > amounts of data commonplace. This has enabled numerous organizations to
> > gain
> > valuable insights that they never could have achieved in the past. While
> it
> > is easy to leverage Hadoop for crunching large volumes of data,
> organizing
> > data, managing life cycle of data and processing data is fairly involved.
> > This is solved adequately well in a classic data platform involving data
> > warehouses and standard ETL (extract-transform-load) tools, but remains
> > largely
> > unsolved today. In addition to data processing complexities, Hadoop
> > presents
> > new sets of challenges and opportunities relating to management of data.
> >
> > Data Management on Hadoop encompasses data motion, process orchestration,
> > lifecycle management, data discovery, etc. among other concerns that are
> > beyond
> > ETL. Ivory is a new data processing and management platform for Hadoop
> that
> > solves this problem and creates additional opportunities by building on
> > existing
> > components within the Hadoop ecosystem (ex. Apache Oozie, Apache Hadoop
> > DistCp
> > etc.) without reinventing the wheel. Ivory has been in production at
> > InMobi,
> > going on its second year and has been managing hundreds of feeds and
> > processes.
> >
> > Ivory is being developed by engineers employed with InMobi, Hortonworks
> and
> > Yahoo!. This platform addition will increase the adoption of Apache
> Hadoop
> > by
> > driving data management tractable for end users. We are therefore
> proposing
> > to
> > make Ivory an Apache open source project.
> >
> > == Rationale ==
> > The Ivory project aims to improve the usability of Apache Hadoop. As a
> > result
> > Apache Hadoop will grow its community of users by increasing the places
> > Hadoop
> > can be utilized and the use cases it will solve. By developing Ivory in
> > Apache
> > we hope to gather a diverse community of contributors, helping to ensure
> > that
> > Ivory is deployable for a broad range of scenarios. Members of the Hadoop
> > development community will be able to influence Ivory’s roadmap, and
> > contribute
> > to it. We believe having Ivory as part of the Apache Hadoop ecosystem
> will
> > be
> > a great benefit to all of Hadoop's users.
> >
> > == Current Status ==
> > Ivory is widely deployed in production within InMobi and moving on to its
> > second year. A version with a valuable set of features is developed by
> the
> > list of initial committers and is hosted on github.
> >
> > === Meritocracy ===
> > Our intent with this incubator proposal is to start building a diverse
> > developer
> > community around Ivory following the Apache meritocracy model. We have
> > wanted to
> > make the project open source and encourage contributors from multiple
> > organizations from the start. We plan to provide plenty of support to new
> > developers and to quickly recruit those who make solid contributions to
> > committer status.
> >
> > === Community ===
> > We are happy to report that the initial team already represents multiple
> > organizations. We hope to extend the user and developer base further in
> the
> > future and build a solid open source community around Ivory.
> >
> > === Core Developers ===
> > Ivory is currently being developed by three engineers from InMobi –
> > Srikanth Sundarrajan, Shwetha G S, and Shaik Idris, two Hortonworks
> > employees –
> > Sanjay Radia and Venkatesh Seetharam. In addition, two Yahoo! employees,
> > Rohini Palaniswamy and Thiruvel Thirumoolan, are also involved. Srikanth,
> > Shwetha and Shaik are the original developers. All the engineers have
> built
> > two generations of Data Management on Hadoop, having deep expertise in
> > Hadoop
> > and are quite familiar with the Hadoop Ecosystem.
> >
> > === Alignment ===
> > The ASF is a natural host for Ivory given that it is already the home of
> > Hadoop,
> > Pig, Knox, HCatalog, and other emerging “big data” software projects.
> Ivory
> > has
> > been designed to solve the data management challenges and opportunities
> of
> > the
> > Hadoop ecosystem family of products. Ivory fills the gap that Hadoop
> > ecosystem
> > has been lacking in the areas of data processing and data lifecycle
> > management.
> >
> > == Known Risks ==
> >
> > === Orphaned products & Reliance on Salaried Developers ===
> > The core developers plan to work full time on the project. There is very
> > little
> > risk of Ivory getting orphaned. Ivory is in use by companies we work for
> so
> > the
> > companies have an interest in its continued vitality.
> >
> > === Inexperience with Open Source ===
> > All of the core developers are active users and followers of open source.
> > Srikanth Sundarrajan has been contributing patches to Apache Hadoop and
> > Apache
> > Oozie, Shwetha GS has been contributing patches to Apache Oozie.
> > Seetharam Venkatesh is a committer on Apache Knox. Rohini Palaniswamy is
> a
> > committer on Apache PIG. Sharad Agarwal, Amareshwari SR (also a Apache
> Hive
> > PMC member) and Sanjay Radia are PMC members on Apache Hadoop.
> >
> > === Homogeneous Developers ===
> > The current core developers are from diverse set of organizations such as
> > InMobi, Hortonworks, and, Yahoo!. We expect to quickly establish a
> > developer
> > community that includes contributors from several corporations post
> > incubation.
> >
> > === Reliance on Salaried Developers ===
> > Currently, most developers are paid to do work on Ivory but few are
> > contributing
> > in their spare time. However, once the project has a community built
> around
> > it
> > post incubation, we expect to get committers and developers from outside
> > the
> > current core developers.
> >
> > === Relationships with Other Apache Products ===
> > Ivory is going to be used by the users of Hadoop and the Hadoop ecosystem
> > in
> > general.
> >
> > === A Excessive Fascination with the Apache Brand ===
> > While we respect the reputation of the Apache brand and have no doubts
> that
> > it
> > will attract contributors and users, our interest is primarily to give
> > Ivory a
> > solid home as an open source project following an established development
> > model.
> > We have also given reasons in the Rationale and Alignment sections.
> >
> > == Documentation ==
> > There is documentation in github repository at:
> > https://github.com/sriksun/Ivory
> >
> > == Initial Source ==
> > The source is currently in github repository at:
> > https://github.com/sriksun/Ivory
> >
> > == Source and Intellectual Property Submission Plan ==
> > The complete Ivory code is under Apache Software License 2.
> >
> > == External Dependencies ==
> > The dependencies all have Apache compatible licenses. These include BSD,
> > MIT licensed dependencies.
> >
> > == Cryptography ==
> > None
> >
> > == Required Resources ==
> >
> > === Mailing lists ===
> >
> > * ivory-dev AT incubator DOT apache DOT org
> > * ivory-commits AT incubator DOT apache DOT org
> > * ivory-user AT incubator apache DOT org
> > * ivory-private AT incubator DOT apache DOT org
> >
> > === Subversion Directory ===
> > https://svn.apache.org/repos/asf/incubator/ivory
> >
> > === Issue Tracking ===
> > JIRA IVORY
> >
> > == Initial Committers ==
> > * Srikanth Sundarrajan (Srikanth.Sundarrajan AT inmobi DOT com)
> > * Shwetha GS (shwetha.gs AT inmobi DOT com)
> > * Shaik Idris (shaik.idris AT inmobi DOT com)
> > * Venkatesh Seetharam (Venkatesh AT apache DOT com)
> > * Rohini Palaniswamy (rohinip AT yahoo-inc DOT com)
> > * Thiruvel Thirumoolan (thiruvel AT yahoo-inc DOT com)
> > * Sanjay Radia (sanjay AT apache DOT org)
> > * Sharad Agarwal (sharad AT apache DOT org)
> > * Amareshwari SR (amareshwari AT apache DOT org)
> >
> > == Affiliations ==
> > * Srikanth Sundarrajan (InMobi)
> > * Shwetha GS (InMobi)
> > * Shaik Idris (InMobi)
> > * Venkatesh Seetharam (Hortonworks Inc)
> > * Rohini Palaniswamy (Yahoo! Inc)
> > * Thiruvel Thirumoolan (Yahoo! Inc)
> > * Sanjay Radia (Hortonworks Inc)
> > * Sharad Agarwal (InMobi)
> > * Amareshwari SR (InMobi)
> >
> > == Sponsors ==
> >
> > === Champion ===
> > * Arun C Murthy (acmurthy at apache dot org)
> >
> > === Nominated Mentors ===
> > * Alan Gates (gates AT apache DOT org)
> > * Chris Douglas (cdouglas AT apache DOT org)
> > * Devaraj  Das (ddas AT apache DOT org)
> > * Owen O’Malley (omalley AT apache DOT org)
> >
> > === Sponsoring Entity ===
> > Incubator PMC
> >
> > --
> > _____________________________________________________________
> > The information contained in this communication is intended solely for
> the
> > use of the individual or entity to whom it is addressed and others
> > authorized to receive it. It may contain confidential or legally
> privileged
> > information. If you are not the intended recipient you are hereby
> notified
> > that any disclosure, copying, distribution or taking any action in
> reliance
> > on the contents of this information is strictly prohibited and may be
> > unlawful. If you have received this communication in error, please notify
> > us immediately by responding to this email and then delete it from your
> > system. The firm is neither liable for the proper and complete
> transmission
> > of the information contained in this communication nor for any delay in
> its
> > receipt.
>
>

-- 
_____________________________________________________________
The information contained in this communication is intended solely for the 
use of the individual or entity to whom it is addressed and others 
authorized to receive it. It may contain confidential or legally privileged 
information. If you are not the intended recipient you are hereby notified 
that any disclosure, copying, distribution or taking any action in reliance 
on the contents of this information is strictly prohibited and may be 
unlawful. If you have received this communication in error, please notify 
us immediately by responding to this email and then delete it from your 
system. The firm is neither liable for the proper and complete transmission 
of the information contained in this communication nor for any delay in its 
receipt.

Re: [PROPOSAL] Ivory - Hadoop data management and processing platform

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

+1, this will be a great addition to the Hadoop eco-system!

The proposal looks fine overall. I quickly searched around for the name ivory, it looks to be a safe one, but someone needs to do due diligence?

And I think you can chose to have git as the version control if you feel like it.

Thanks,
+Vinod Kumar Vavilapalli

On Mar 13, 2013, at 10:00 AM, Srikanth Sundarrajan wrote:

> = Ivory Proposal =
> 
> == Abstract ==
> Ivory is a data processing and management solution for Hadoop designed for
> data motion, coordination of data pipelines, lifecycle management, and
> data discovery. Ivory enables end consumers to quickly onboard their data
> and its associated processing and management tasks on Hadoop clusters.
> 
> == Proposal ==
> Ivory will enable easy data management via declarative mechanism for
> Hadoop. Users of Ivory platform simply define infrastructure endpoints,
> data sets and processing rules declaratively. These configurations
> are expressed in such a way that the dependencies between
> these entities are explicitly described. This information about
> inter-dependencies between various entities allows Ivory to orchestrate and
> manage various data management functions.
> 
> The key use cases that Ivory addresses are:
> * Data Motion
> * Process orchestration and scheduling
> * Policy-based Lifecycle Management
> * Data Discovery
> * Operability/Usability
> 
> With these features it is possible for users to onboard their data sets
> with
> a comprehensive and holistic understanding of how, when and where their
> data
> is managed across its lifecycle. Complex functions such as retrying
> failures,
> identifying possible SLA breaches or automated handling of input data
> changes
> are now simple directives. All the administrative functions and user level
> functions are available via RESTful APIs. CLI is simply a wrapper over the
> RESTful APIs.
> 
> == Background ==
> Hadoop and its ecosystem of products have made storing and processing
> massive
> amounts of data commonplace. This has enabled numerous organizations to
> gain
> valuable insights that they never could have achieved in the past. While it
> is easy to leverage Hadoop for crunching large volumes of data, organizing
> data, managing life cycle of data and processing data is fairly involved.
> This is solved adequately well in a classic data platform involving data
> warehouses and standard ETL (extract-transform-load) tools, but remains
> largely
> unsolved today. In addition to data processing complexities, Hadoop
> presents
> new sets of challenges and opportunities relating to management of data.
> 
> Data Management on Hadoop encompasses data motion, process orchestration,
> lifecycle management, data discovery, etc. among other concerns that are
> beyond
> ETL. Ivory is a new data processing and management platform for Hadoop that
> solves this problem and creates additional opportunities by building on
> existing
> components within the Hadoop ecosystem (ex. Apache Oozie, Apache Hadoop
> DistCp
> etc.) without reinventing the wheel. Ivory has been in production at
> InMobi,
> going on its second year and has been managing hundreds of feeds and
> processes.
> 
> Ivory is being developed by engineers employed with InMobi, Hortonworks and
> Yahoo!. This platform addition will increase the adoption of Apache Hadoop
> by
> driving data management tractable for end users. We are therefore proposing
> to
> make Ivory an Apache open source project.
> 
> == Rationale ==
> The Ivory project aims to improve the usability of Apache Hadoop. As a
> result
> Apache Hadoop will grow its community of users by increasing the places
> Hadoop
> can be utilized and the use cases it will solve. By developing Ivory in
> Apache
> we hope to gather a diverse community of contributors, helping to ensure
> that
> Ivory is deployable for a broad range of scenarios. Members of the Hadoop
> development community will be able to influence Ivory’s roadmap, and
> contribute
> to it. We believe having Ivory as part of the Apache Hadoop ecosystem will
> be
> a great benefit to all of Hadoop's users.
> 
> == Current Status ==
> Ivory is widely deployed in production within InMobi and moving on to its
> second year. A version with a valuable set of features is developed by the
> list of initial committers and is hosted on github.
> 
> === Meritocracy ===
> Our intent with this incubator proposal is to start building a diverse
> developer
> community around Ivory following the Apache meritocracy model. We have
> wanted to
> make the project open source and encourage contributors from multiple
> organizations from the start. We plan to provide plenty of support to new
> developers and to quickly recruit those who make solid contributions to
> committer status.
> 
> === Community ===
> We are happy to report that the initial team already represents multiple
> organizations. We hope to extend the user and developer base further in the
> future and build a solid open source community around Ivory.
> 
> === Core Developers ===
> Ivory is currently being developed by three engineers from InMobi –
> Srikanth Sundarrajan, Shwetha G S, and Shaik Idris, two Hortonworks
> employees –
> Sanjay Radia and Venkatesh Seetharam. In addition, two Yahoo! employees,
> Rohini Palaniswamy and Thiruvel Thirumoolan, are also involved. Srikanth,
> Shwetha and Shaik are the original developers. All the engineers have built
> two generations of Data Management on Hadoop, having deep expertise in
> Hadoop
> and are quite familiar with the Hadoop Ecosystem.
> 
> === Alignment ===
> The ASF is a natural host for Ivory given that it is already the home of
> Hadoop,
> Pig, Knox, HCatalog, and other emerging “big data” software projects. Ivory
> has
> been designed to solve the data management challenges and opportunities of
> the
> Hadoop ecosystem family of products. Ivory fills the gap that Hadoop
> ecosystem
> has been lacking in the areas of data processing and data lifecycle
> management.
> 
> == Known Risks ==
> 
> === Orphaned products & Reliance on Salaried Developers ===
> The core developers plan to work full time on the project. There is very
> little
> risk of Ivory getting orphaned. Ivory is in use by companies we work for so
> the
> companies have an interest in its continued vitality.
> 
> === Inexperience with Open Source ===
> All of the core developers are active users and followers of open source.
> Srikanth Sundarrajan has been contributing patches to Apache Hadoop and
> Apache
> Oozie, Shwetha GS has been contributing patches to Apache Oozie.
> Seetharam Venkatesh is a committer on Apache Knox. Rohini Palaniswamy is a
> committer on Apache PIG. Sharad Agarwal, Amareshwari SR (also a Apache Hive
> PMC member) and Sanjay Radia are PMC members on Apache Hadoop.
> 
> === Homogeneous Developers ===
> The current core developers are from diverse set of organizations such as
> InMobi, Hortonworks, and, Yahoo!. We expect to quickly establish a
> developer
> community that includes contributors from several corporations post
> incubation.
> 
> === Reliance on Salaried Developers ===
> Currently, most developers are paid to do work on Ivory but few are
> contributing
> in their spare time. However, once the project has a community built around
> it
> post incubation, we expect to get committers and developers from outside
> the
> current core developers.
> 
> === Relationships with Other Apache Products ===
> Ivory is going to be used by the users of Hadoop and the Hadoop ecosystem
> in
> general.
> 
> === A Excessive Fascination with the Apache Brand ===
> While we respect the reputation of the Apache brand and have no doubts that
> it
> will attract contributors and users, our interest is primarily to give
> Ivory a
> solid home as an open source project following an established development
> model.
> We have also given reasons in the Rationale and Alignment sections.
> 
> == Documentation ==
> There is documentation in github repository at:
> https://github.com/sriksun/Ivory
> 
> == Initial Source ==
> The source is currently in github repository at:
> https://github.com/sriksun/Ivory
> 
> == Source and Intellectual Property Submission Plan ==
> The complete Ivory code is under Apache Software License 2.
> 
> == External Dependencies ==
> The dependencies all have Apache compatible licenses. These include BSD,
> MIT licensed dependencies.
> 
> == Cryptography ==
> None
> 
> == Required Resources ==
> 
> === Mailing lists ===
> 
> * ivory-dev AT incubator DOT apache DOT org
> * ivory-commits AT incubator DOT apache DOT org
> * ivory-user AT incubator apache DOT org
> * ivory-private AT incubator DOT apache DOT org
> 
> === Subversion Directory ===
> https://svn.apache.org/repos/asf/incubator/ivory
> 
> === Issue Tracking ===
> JIRA IVORY
> 
> == Initial Committers ==
> * Srikanth Sundarrajan (Srikanth.Sundarrajan AT inmobi DOT com)
> * Shwetha GS (shwetha.gs AT inmobi DOT com)
> * Shaik Idris (shaik.idris AT inmobi DOT com)
> * Venkatesh Seetharam (Venkatesh AT apache DOT com)
> * Rohini Palaniswamy (rohinip AT yahoo-inc DOT com)
> * Thiruvel Thirumoolan (thiruvel AT yahoo-inc DOT com)
> * Sanjay Radia (sanjay AT apache DOT org)
> * Sharad Agarwal (sharad AT apache DOT org)
> * Amareshwari SR (amareshwari AT apache DOT org)
> 
> == Affiliations ==
> * Srikanth Sundarrajan (InMobi)
> * Shwetha GS (InMobi)
> * Shaik Idris (InMobi)
> * Venkatesh Seetharam (Hortonworks Inc)
> * Rohini Palaniswamy (Yahoo! Inc)
> * Thiruvel Thirumoolan (Yahoo! Inc)
> * Sanjay Radia (Hortonworks Inc)
> * Sharad Agarwal (InMobi)
> * Amareshwari SR (InMobi)
> 
> == Sponsors ==
> 
> === Champion ===
> * Arun C Murthy (acmurthy at apache dot org)
> 
> === Nominated Mentors ===
> * Alan Gates (gates AT apache DOT org)
> * Chris Douglas (cdouglas AT apache DOT org)
> * Devaraj  Das (ddas AT apache DOT org)
> * Owen O’Malley (omalley AT apache DOT org)
> 
> === Sponsoring Entity ===
> Incubator PMC
> 
> -- 
> _____________________________________________________________
> The information contained in this communication is intended solely for the 
> use of the individual or entity to whom it is addressed and others 
> authorized to receive it. It may contain confidential or legally privileged 
> information. If you are not the intended recipient you are hereby notified 
> that any disclosure, copying, distribution or taking any action in reliance 
> on the contents of this information is strictly prohibited and may be 
> unlawful. If you have received this communication in error, please notify 
> us immediately by responding to this email and then delete it from your 
> system. The firm is neither liable for the proper and complete transmission 
> of the information contained in this communication nor for any delay in its 
> receipt.