You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@incubator.apache.org by Justin Erenkrantz <ju...@erenkrantz.com> on 2010/01/04 20:46:51 UTC

[PROPOSAL] OODT: a grid middleware framework for science data processing, information integration, and retrieval

Hi general@,

On behalf of the OODT community, I'd like to bring the following
proposal for discussion within the Incubator.  I've talked with Chris
and Dave about bringing this project to Apache since May of 2007 (ICSE
in Minneapolis at the Ruth's Chris Steak house!) - so I'm very happy
to see this proposal arrive and be a mentor for it.

Comments/suggestions/mentor volunteers welcome.

Thanks!  -- justin

http://wiki.apache.org/incubator/OODTProposal

--- Current Wiki Text below ---

OODT, a grid middleware framework for science data processing,
information integration, and retrieval.

Abstract

OODT is a grid middleware framework used on a number of successful
projects at NASA's Jet Propulsion Laboratory/California Institute of
Technology, and many other research institutions and universities,
specifically those part of the:

    * National Cancer Institute's (NCI's) Early Detection Research
Network (EDRN) project - over 40+ institutions all performing research
into discovering biomarkers which are early indicators of disease.
    * NASA's Planetary Data System (PDS) - NASA's planetary data
archive, a repository and registry for all planetary data collected
over the past 30+ years.
    * various Earth Science data processing missions, including
Seawinds/QuickSCAT, the Orbiting Carbon Observatory, the NPP Sounder
PEATE project, and the Soil Moisture Active Passive (SMAP) mission.

>From the OODT website:

It's middleware for metadata:

    * Transparent access to distributed resources
    * Data discovery and query optimization
    * Distributed processing and virtual archives

It's a software architecture:

    * Models for information representation
    * Solutions to knowledge capture problems
    * Unification of technology, data, and metadata

Proposal

OODT is an established open source project, with 9+ years of
existence, and deployment at universities, federal research
institutions, other NASA centers, and the NIH (it won runner-up NASA
software of the year in 2003). It has a strong community of those that
operate and support its growth. Our proposal is to bring OODT into
Apache to strengthen its support and its capabilities even further on
the laurels of Apache's brand and its growing huge community of
developers from all over the world. In short, bringing OODT into
Apache will significantly enhance OODT's widespread use, will likely
improve its codebase, and furthermore will help Apache philosophy and
community spread into OODT's already large community-base reaching
across government, academia and industry.

OODT will be, to the best of our knowledge, the first grid community
project to bear the Apache brand. By grid technology, we mean a
technology that provides the ability to create virtual organizations,
as originally described by Kesselman and Foster in their seminal paper
on grid computing. OODT provides both computational and data grid
support, and is built with a component-philosophy. OODT includes
components that allow for virtual information integration across
organizations (provided by the Profile, Product and Query server
components), and that allow for distributed data management and
processing across heterogeneous virtual organizations (provided by the
Catalog and Archive Service set of components, including File Manager,
Workflow Manager and Resource Manager).

Each set of components exist as independently organized Maven2
projects, that reference each other (where appropriate), forming a
layered set of components and a framework for grid computing.

Background

OODT is an established project within JPL and in use at several NASA
centers, as well as univerities, and other government organizations
and industrial collaborations. Chris Mattmann, a JPL employee, and ASF
PMC (Lucene) and Committer (Nutch, Tika), has been working for the
past 2 years on obtaining the necessary permission from JPL to release
OODT into Apache. After initially being stalled, JPL has granted
permission to allow OODT into Apache.

Through his academic relationship with Justin Erekrantz, Apache
President, and through their collective Ph.D. studies, OODT has been
discussed between Chris and Justin on several occasions, and Justin
offered to help champion OODT into the Apache Incubator when JPL was
ready to release OODT. In December 2009, that permission was granted.

This proposal is the result of the above efforts and related
discussions. Some alternatives to incubation, like Apache Labs came up
during the discussions but we believe that taking the project to the
Incubator is the best way to start growing a viable Apache-based
community to sustain OODT. Furthermore, given its larger code base and
existing sub-projects, the goal would be for OODT to leverage the
incubator to graduate into Apache's first top-level grid project,
rather than graduate into a sub-project of an existing TLP.

Rationale

Grid computing has been around for the past 10 years and has gained
widespread notoriety and attention in industry and academia.
Scientific collaborations are increasingly virtual and require the
capabilities (data and compute) of thousands of computers and
resources that span organizations. There are a number of existing grid
technologies (Globus being the most popular, DSpace, iRODS/Storage
Resource Broker, see this paper for a full study), however Apache has
no current grid technology under its umbrella and world reknown think
tank. Morever, efforts are few and far between in terms of standing up
Apache-based software that is applicable to the scientific community
and grid community outside of use of fine-grained components in these
systems. Other open source organizations (e.g., the Global
Organization for Earth System Science Portals, GO-ESSP) have embraced
the construction of such technology and there is a lot of work going
on, e.g., at NOAA. This proposal aims to remedy this fact and to bring
scientific data management/grid software into the Apache family and
its worldwide community.

OODT is a widely successful grid project with applicability and
existing deployments across broad-reaching domains (planetary and
earth sciences, cancer research/biomedicine, climate modeling and
atmospheric science, etc.). The marriage of OODT and Apache will
engender OODT's widespread, global use via the Apache brand, and will
make Apache a player in the grid/scientific data community.

Initial Goals

The initial goals of the proposed project are:

    * Stand up a sustaining Apache-based community around the OODT codebase.
    * Active relationships and possible cooperation with related
projects and communities.
    * Refactor and bring up-to-date the OODT profile and product
server components.
    * Explore various underlying communication substrates. OODT
currently uses REST (via its Web-Grid component).
    * Create configuration-based OODT deployments. Currently the
deployments are primarily code-based, or the configuration is strewn
about the various sub-components. The goal would be to bring this
configuration under a single umbrella project. The idea would be to
create science data pipelines from configuration.
    * Explore Python-based client and server implementations of OODT
and implementations in other languages (Ruby).

Current Status

Meritocracy

Many of the proposed initial committers are familiar with the
meritocracy principles of Apache, and have already worked on the
various source codebases (contributing via patches, emails, JIRA
issues, and in Mattmann's case, as a Nutch, and Tika committer, and
Lucene PMC member). We will follow the normal meritocracy rules also
with other potential contributors.

Community

There is an existing, established community of developers and users of
OODT within over 40 centers at NASA, NIH, DOE and academia, however
there is no Apache OODT community as of yet. Our principal goal of
this effort is to leverage the Apache Incubator to grow an Apache
community base (in addition to OODT's existing community), and to
build a self-sustaining community around this shared vision, and
eventual Apache TLP status for OODT. With many sub projects (CAS,
Product/Profile servers, Query Server, Web-grid, commons, etc.), OODT
should attract a broad audience of developers with various interests.

Core Developers

The initial set of developers comes from NASA JPL, and with various
backgrounds, with different but compatible needs for the proposed
project. JPL is home to data management and grid projects spanning the
domains of cancer research/bioinformatics, earth science, planetary
science, astrophysics, and climate modeling.

Alignment

As Apache's first grid-based framework will likely be widely used by
various open source, scientific and commercial projects both together
with and independent of other Apache tools. With OODT's existing
community we will also bring developers and organizations outside of
Apache into the Apache ecosystem.

Known Risks

Orphaned products

OODT has supported itself through successful deployments at NASA, at
the U.S. National Institutes of Health (NIH), and recently at
DOE-based laboratories and at academic centers. Further, OODT has been
an active participant in IEEE/ACM-based conferences and
meetings/journal publications over the past 9 years. There is active
support on several existing NASA earth science missions, and the team
at JPL is experienced and will continue to champion and develop OODT
in the Apache area.

Our goal is to take OODT from the early stage of Apache Incubation
into a thriving Apache top-level project, and leverage it in the
existing manner at NASA, the NIH, at DOE, and in academia and
industry. Since OODT is a grid framework, it depends on many external
services and projects, no one of which controls OODT's code-base.

We feel that the time is ripe to bring OODT into Apache and to grow
the community of developers who maintain OODT. We feel that Incubation
will bring a slew of industry-based developers (and even those in
academia, and government) who have no prior experience with OODT, but
who could use OODT at their jobs and who are attracted to the brand
name and community that Apache brings. We want to attract such
developers to become part of the core OODT development team, and
project management aspect.

Inexperience with Open Source

All the initial developers have worked on open source before and at
least one (Mattmann) is a committer and PMC members in the Apache
Lucene ecosystem. Sean Kelly is a well-respected Plone committer and
has made several open source contributions over the years to FreeBSD
and other software. Foster, McCleese and Woollard have all contributed
to Apache projects by way of email, mailing lists, issue reporting and
testing.

Homogenous Developers

The initial developers come from a variety of backgrounds and with a
variety of needs for the proposed framework.

Reliance on Salaried Developers

All of the proposed initial developers are paid to work on this or
related projects, but the proposed project is not the primary task for
anyone.

Relationships with Other Apache Products

OODT is related to at least the following Apache projects. None of the
projects is a direct competitor for OODT, but there are many cases of
potential overlap in functionality.

    * Apache Lucene - The family of Lucene products that implement
search services are naturally of use in a grid environment such as
OODT. In fact, OODT has integrated with many of these projects (Tika,
SOLR and Lucene-java) already. We see OODT as a grid environment that
makes use of search services.
    * Apache UIMA - The UIMA project provides a framework and
pluggable tools for analyzing text content and extracting information.
Example tools include language identification, sentence boundary
detection and "entity extraction" - finding references to people,
places and organizations. OODT is related to UIMA in the sense that it
is a framework to provide pluggable connections to content and
information, but the focus of OODT is on scientific data sets, and
additional on repositories and catalogs/registries that catalog
information about those datasets and that store the physical bits.
Further, OODT is a grid technology, meant to enable the creation of
virtual organizations, which is not UIMA's focus.Finally, OODT
contains both an information integration component, as well as a
science data processing component, which UIMA does not.

OODT is also related to Apache projects involving databases, such as
the Apache DB project, however scientific data is not limited to
traditional DBMS'es and involves both structured and un-structured
information. However, there is likely much leveraging that can occur
as OODT can be updated to remove Hibernate-like dependencies, and
replace them with Derby-like dependencies.

A Excessive Fascination with the Apache Brand

All of us are familiar with Apache and have a respect for its brand
and community. Though all of the proposed committers besides Mattmann
have not participated in Apache projects as committers, and PMC
members, many of them (McCleese, Foster, Woollard, Kelly) have
contributed via issue comments, patches, and tests for Apache projects
(including Maven, Tika, SOLR, and Lucene). Furthermore, some of the
proposed committers (Kelly) are major contributors in other open
source communities (e.g., Plone and Python). We feel that the Apache
Software Foundation is a natural home for a project like this. OODT
brings a credible, major grid-based software into the Apache
community, and Apache brings a huge community of eager and world-class
developers to help grow OODT's strengths and applicability across
projects and domains.

Documentation

There is a wealth of documentation available on OODT. The best
starting point is the existing OODT JPL website (which will be ported
to be sync'ed or just a pointer to the Apache
website)http://oodt.jpl.nasa.gov

    * OODT website at JPL
    * Mattmann's OODT paper that appeared at the 28th International
Conference on Software Engineering in Shanghai, China.
    * Crichton's seminal OODT paper appearing at the CODATA conference
at the U.S. National Academies of Science in 2000.
    * Google Scholar search on OODT.

Standards and conventions related to OODT include the Dublin Core
metadata set, ISO/IEC 11179, the HTTP 1.1 RFC, Grid-based standards
including the Open Grid Services Architecture (OGSA), and standards
for science data formats including Heirarchical Data Format (HDF),
netCDF and OPeNDAP.

Initial Source

OODT will start with seed code donated by NASA JPL via Mattmann and
the rest of the initial committers.

Source and Intellectual Property Submission Plan

All seed code and other contributions will be handled through the
normal Apache contribution process. Mattmann has been authorized by
NASA JPL to lead the contribution of OODT into the Incubator via his
existing Apache CLA.

We will also contact other related efforts for possible cooperation
and contributions.

External Dependencies

OODT depends on a number of external connector libraries with various
licensing conditions. An initial list of such dependencies (taken from
one of the OODT sub-components, the CAS file manager) is shown below.

Library | License
commons-codec | AL v2
commons-dbcp | AL v2
commons-httpclient | AL v2
commons-io | AL v2
commons-pool | AL v2
cas-metadata | (to be AL v2)
edm-commons | (to be AL v2)
hsqldb | LGPL v2.1
jug-asl | AL v2
lucene-core | AL v2
xmlrpc | AL v2

There are also some LGPL components that would be useful. Whether and
how such dependencies could be handled will be discussed during
incubation. No such dependencies will be added to the project before
the legal implications have been cleared. Existing LGPL dependencies,
such as hsqldb above for the CAS file manager, will be removed and a
suitable ASL friendly alternative will be investigated and used to
replace the LGPL dependencies.

Cryptography

OODT itself will not use cryptography, but it is possible that some of
the external product or profile server or CAS libraries will include
cryptographic code to handle features present in various science data
formats. The current OODT code base relies on Apache Tika which
contains an export control statement regarding cryptographic code per
Apache policy. We will follow a similar approach with OODT. Mattmann
led this effort in Apache Nutch and saw Jukka Zitting lead this effort
in Apache Tika, so he is familiar with this process.

Required Resources

Mailing lists

    * oodt-dev@incubator.apache.org
    * oodt-commits@incubator.apache.org
    * oodt-private@incubator.apache.org

Subversion Directory

    * https://svn.apache.org/repos/asf/incubator/oodt

Issue Tracking

    * JIRA OODT (OODT)

Other Resources

    * OODT Wiki http://cwiki.apache.org/OODT

Initial Committers

Name | Email | Affiliation | CLA

Chris A. Mattmann | mattmann at apache dot org | NASA Jet Propulsion
Laboratory | yes
Daniel J. Crichton | crichton at jpl dot nasa dot gov | NASA Jet
Propulsion Laboratory | no
Paul Ramirez | pramirez at jpl dot nasa dot gov | NASA Jet Propulsion
Laboratory | no
Sean Kelly | kelly at jpl dot nasa dot gov | NASA Jet Propulsion Laboratory | no
Sean Hardman | shardman at jpl dot nasa dot gov | NASA Jet Propulsion
Laboratory | no
Andrew F. Hart | ahart at jpl dot nasa dot gov | NASA Jet Propulsion
Laboratory | no
Joshua Garcia | joshua at jpl dot nasa dot gov | NASA Jet Propulsion
Laboratory | no
David Woollard | woollard at jpl dot nasa dot gov | NASA Jet
Propulsion Laboratory | no
Brian Foster | bfoster at jpl dot nasa dot gov | NASA Jet Propulsion
Laboratory | no
Sean McCleese | smcclees at jpl dot nasa dot gov | NASA Jet Propulsion
Laboratory | no

Sponsors

Champion

    * Justin Erenkrantz (jerenkrantz at apache dot org)

Nominated Mentors

    * Justin Erenkrantz (jerenkrantz at apache dot org)

Sponsoring Entity

    * Apache Incubator

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [PROPOSAL] OODT: a grid middleware framework for science data processing, information integration, and retrieval

Posted by Ian Holsman <li...@holsman.net>.
+1 and I'd love to volunteer to be a mentor.

On 1/5/10 6:46 AM, Justin Erenkrantz wrote:
> Hi general@,
>
> On behalf of the OODT community, I'd like to bring the following
> proposal for discussion within the Incubator.  I've talked with Chris
> and Dave about bringing this project to Apache since May of 2007 (ICSE
> in Minneapolis at the Ruth's Chris Steak house!) - so I'm very happy
> to see this proposal arrive and be a mentor for it.
>
> Comments/suggestions/mentor volunteers welcome.
>
> Thanks!  -- justin
>
> http://wiki.apache.org/incubator/OODTProposal
>
> --- Current Wiki Text below ---
>
> OODT, a grid middleware framework for science data processing,
> information integration, and retrieval.
>
> Abstract
>
> OODT is a grid middleware framework used on a number of successful
> projects at NASA's Jet Propulsion Laboratory/California Institute of
> Technology, and many other research institutions and universities,
> specifically those part of the:
>
>      * National Cancer Institute's (NCI's) Early Detection Research
> Network (EDRN) project - over 40+ institutions all performing research
> into discovering biomarkers which are early indicators of disease.
>      * NASA's Planetary Data System (PDS) - NASA's planetary data
> archive, a repository and registry for all planetary data collected
> over the past 30+ years.
>      * various Earth Science data processing missions, including
> Seawinds/QuickSCAT, the Orbiting Carbon Observatory, the NPP Sounder
> PEATE project, and the Soil Moisture Active Passive (SMAP) mission.
>
> > From the OODT website:
>
> It's middleware for metadata:
>
>      * Transparent access to distributed resources
>      * Data discovery and query optimization
>      * Distributed processing and virtual archives
>
> It's a software architecture:
>
>      * Models for information representation
>      * Solutions to knowledge capture problems
>      * Unification of technology, data, and metadata
>
> Proposal
>
> OODT is an established open source project, with 9+ years of
> existence, and deployment at universities, federal research
> institutions, other NASA centers, and the NIH (it won runner-up NASA
> software of the year in 2003). It has a strong community of those that
> operate and support its growth. Our proposal is to bring OODT into
> Apache to strengthen its support and its capabilities even further on
> the laurels of Apache's brand and its growing huge community of
> developers from all over the world. In short, bringing OODT into
> Apache will significantly enhance OODT's widespread use, will likely
> improve its codebase, and furthermore will help Apache philosophy and
> community spread into OODT's already large community-base reaching
> across government, academia and industry.
>
> OODT will be, to the best of our knowledge, the first grid community
> project to bear the Apache brand. By grid technology, we mean a
> technology that provides the ability to create virtual organizations,
> as originally described by Kesselman and Foster in their seminal paper
> on grid computing. OODT provides both computational and data grid
> support, and is built with a component-philosophy. OODT includes
> components that allow for virtual information integration across
> organizations (provided by the Profile, Product and Query server
> components), and that allow for distributed data management and
> processing across heterogeneous virtual organizations (provided by the
> Catalog and Archive Service set of components, including File Manager,
> Workflow Manager and Resource Manager).
>
> Each set of components exist as independently organized Maven2
> projects, that reference each other (where appropriate), forming a
> layered set of components and a framework for grid computing.
>
> Background
>
> OODT is an established project within JPL and in use at several NASA
> centers, as well as univerities, and other government organizations
> and industrial collaborations. Chris Mattmann, a JPL employee, and ASF
> PMC (Lucene) and Committer (Nutch, Tika), has been working for the
> past 2 years on obtaining the necessary permission from JPL to release
> OODT into Apache. After initially being stalled, JPL has granted
> permission to allow OODT into Apache.
>
> Through his academic relationship with Justin Erekrantz, Apache
> President, and through their collective Ph.D. studies, OODT has been
> discussed between Chris and Justin on several occasions, and Justin
> offered to help champion OODT into the Apache Incubator when JPL was
> ready to release OODT. In December 2009, that permission was granted.
>
> This proposal is the result of the above efforts and related
> discussions. Some alternatives to incubation, like Apache Labs came up
> during the discussions but we believe that taking the project to the
> Incubator is the best way to start growing a viable Apache-based
> community to sustain OODT. Furthermore, given its larger code base and
> existing sub-projects, the goal would be for OODT to leverage the
> incubator to graduate into Apache's first top-level grid project,
> rather than graduate into a sub-project of an existing TLP.
>
> Rationale
>
> Grid computing has been around for the past 10 years and has gained
> widespread notoriety and attention in industry and academia.
> Scientific collaborations are increasingly virtual and require the
> capabilities (data and compute) of thousands of computers and
> resources that span organizations. There are a number of existing grid
> technologies (Globus being the most popular, DSpace, iRODS/Storage
> Resource Broker, see this paper for a full study), however Apache has
> no current grid technology under its umbrella and world reknown think
> tank. Morever, efforts are few and far between in terms of standing up
> Apache-based software that is applicable to the scientific community
> and grid community outside of use of fine-grained components in these
> systems. Other open source organizations (e.g., the Global
> Organization for Earth System Science Portals, GO-ESSP) have embraced
> the construction of such technology and there is a lot of work going
> on, e.g., at NOAA. This proposal aims to remedy this fact and to bring
> scientific data management/grid software into the Apache family and
> its worldwide community.
>
> OODT is a widely successful grid project with applicability and
> existing deployments across broad-reaching domains (planetary and
> earth sciences, cancer research/biomedicine, climate modeling and
> atmospheric science, etc.). The marriage of OODT and Apache will
> engender OODT's widespread, global use via the Apache brand, and will
> make Apache a player in the grid/scientific data community.
>
> Initial Goals
>
> The initial goals of the proposed project are:
>
>      * Stand up a sustaining Apache-based community around the OODT codebase.
>      * Active relationships and possible cooperation with related
> projects and communities.
>      * Refactor and bring up-to-date the OODT profile and product
> server components.
>      * Explore various underlying communication substrates. OODT
> currently uses REST (via its Web-Grid component).
>      * Create configuration-based OODT deployments. Currently the
> deployments are primarily code-based, or the configuration is strewn
> about the various sub-components. The goal would be to bring this
> configuration under a single umbrella project. The idea would be to
> create science data pipelines from configuration.
>      * Explore Python-based client and server implementations of OODT
> and implementations in other languages (Ruby).
>
> Current Status
>
> Meritocracy
>
> Many of the proposed initial committers are familiar with the
> meritocracy principles of Apache, and have already worked on the
> various source codebases (contributing via patches, emails, JIRA
> issues, and in Mattmann's case, as a Nutch, and Tika committer, and
> Lucene PMC member). We will follow the normal meritocracy rules also
> with other potential contributors.
>
> Community
>
> There is an existing, established community of developers and users of
> OODT within over 40 centers at NASA, NIH, DOE and academia, however
> there is no Apache OODT community as of yet. Our principal goal of
> this effort is to leverage the Apache Incubator to grow an Apache
> community base (in addition to OODT's existing community), and to
> build a self-sustaining community around this shared vision, and
> eventual Apache TLP status for OODT. With many sub projects (CAS,
> Product/Profile servers, Query Server, Web-grid, commons, etc.), OODT
> should attract a broad audience of developers with various interests.
>
> Core Developers
>
> The initial set of developers comes from NASA JPL, and with various
> backgrounds, with different but compatible needs for the proposed
> project. JPL is home to data management and grid projects spanning the
> domains of cancer research/bioinformatics, earth science, planetary
> science, astrophysics, and climate modeling.
>
> Alignment
>
> As Apache's first grid-based framework will likely be widely used by
> various open source, scientific and commercial projects both together
> with and independent of other Apache tools. With OODT's existing
> community we will also bring developers and organizations outside of
> Apache into the Apache ecosystem.
>
> Known Risks
>
> Orphaned products
>
> OODT has supported itself through successful deployments at NASA, at
> the U.S. National Institutes of Health (NIH), and recently at
> DOE-based laboratories and at academic centers. Further, OODT has been
> an active participant in IEEE/ACM-based conferences and
> meetings/journal publications over the past 9 years. There is active
> support on several existing NASA earth science missions, and the team
> at JPL is experienced and will continue to champion and develop OODT
> in the Apache area.
>
> Our goal is to take OODT from the early stage of Apache Incubation
> into a thriving Apache top-level project, and leverage it in the
> existing manner at NASA, the NIH, at DOE, and in academia and
> industry. Since OODT is a grid framework, it depends on many external
> services and projects, no one of which controls OODT's code-base.
>
> We feel that the time is ripe to bring OODT into Apache and to grow
> the community of developers who maintain OODT. We feel that Incubation
> will bring a slew of industry-based developers (and even those in
> academia, and government) who have no prior experience with OODT, but
> who could use OODT at their jobs and who are attracted to the brand
> name and community that Apache brings. We want to attract such
> developers to become part of the core OODT development team, and
> project management aspect.
>
> Inexperience with Open Source
>
> All the initial developers have worked on open source before and at
> least one (Mattmann) is a committer and PMC members in the Apache
> Lucene ecosystem. Sean Kelly is a well-respected Plone committer and
> has made several open source contributions over the years to FreeBSD
> and other software. Foster, McCleese and Woollard have all contributed
> to Apache projects by way of email, mailing lists, issue reporting and
> testing.
>
> Homogenous Developers
>
> The initial developers come from a variety of backgrounds and with a
> variety of needs for the proposed framework.
>
> Reliance on Salaried Developers
>
> All of the proposed initial developers are paid to work on this or
> related projects, but the proposed project is not the primary task for
> anyone.
>
> Relationships with Other Apache Products
>
> OODT is related to at least the following Apache projects. None of the
> projects is a direct competitor for OODT, but there are many cases of
> potential overlap in functionality.
>
>      * Apache Lucene - The family of Lucene products that implement
> search services are naturally of use in a grid environment such as
> OODT. In fact, OODT has integrated with many of these projects (Tika,
> SOLR and Lucene-java) already. We see OODT as a grid environment that
> makes use of search services.
>      * Apache UIMA - The UIMA project provides a framework and
> pluggable tools for analyzing text content and extracting information.
> Example tools include language identification, sentence boundary
> detection and "entity extraction" - finding references to people,
> places and organizations. OODT is related to UIMA in the sense that it
> is a framework to provide pluggable connections to content and
> information, but the focus of OODT is on scientific data sets, and
> additional on repositories and catalogs/registries that catalog
> information about those datasets and that store the physical bits.
> Further, OODT is a grid technology, meant to enable the creation of
> virtual organizations, which is not UIMA's focus.Finally, OODT
> contains both an information integration component, as well as a
> science data processing component, which UIMA does not.
>
> OODT is also related to Apache projects involving databases, such as
> the Apache DB project, however scientific data is not limited to
> traditional DBMS'es and involves both structured and un-structured
> information. However, there is likely much leveraging that can occur
> as OODT can be updated to remove Hibernate-like dependencies, and
> replace them with Derby-like dependencies.
>
> A Excessive Fascination with the Apache Brand
>
> All of us are familiar with Apache and have a respect for its brand
> and community. Though all of the proposed committers besides Mattmann
> have not participated in Apache projects as committers, and PMC
> members, many of them (McCleese, Foster, Woollard, Kelly) have
> contributed via issue comments, patches, and tests for Apache projects
> (including Maven, Tika, SOLR, and Lucene). Furthermore, some of the
> proposed committers (Kelly) are major contributors in other open
> source communities (e.g., Plone and Python). We feel that the Apache
> Software Foundation is a natural home for a project like this. OODT
> brings a credible, major grid-based software into the Apache
> community, and Apache brings a huge community of eager and world-class
> developers to help grow OODT's strengths and applicability across
> projects and domains.
>
> Documentation
>
> There is a wealth of documentation available on OODT. The best
> starting point is the existing OODT JPL website (which will be ported
> to be sync'ed or just a pointer to the Apache
> website)http://oodt.jpl.nasa.gov
>
>      * OODT website at JPL
>      * Mattmann's OODT paper that appeared at the 28th International
> Conference on Software Engineering in Shanghai, China.
>      * Crichton's seminal OODT paper appearing at the CODATA conference
> at the U.S. National Academies of Science in 2000.
>      * Google Scholar search on OODT.
>
> Standards and conventions related to OODT include the Dublin Core
> metadata set, ISO/IEC 11179, the HTTP 1.1 RFC, Grid-based standards
> including the Open Grid Services Architecture (OGSA), and standards
> for science data formats including Heirarchical Data Format (HDF),
> netCDF and OPeNDAP.
>
> Initial Source
>
> OODT will start with seed code donated by NASA JPL via Mattmann and
> the rest of the initial committers.
>
> Source and Intellectual Property Submission Plan
>
> All seed code and other contributions will be handled through the
> normal Apache contribution process. Mattmann has been authorized by
> NASA JPL to lead the contribution of OODT into the Incubator via his
> existing Apache CLA.
>
> We will also contact other related efforts for possible cooperation
> and contributions.
>
> External Dependencies
>
> OODT depends on a number of external connector libraries with various
> licensing conditions. An initial list of such dependencies (taken from
> one of the OODT sub-components, the CAS file manager) is shown below.
>
> Library | License
> commons-codec | AL v2
> commons-dbcp | AL v2
> commons-httpclient | AL v2
> commons-io | AL v2
> commons-pool | AL v2
> cas-metadata | (to be AL v2)
> edm-commons | (to be AL v2)
> hsqldb | LGPL v2.1
> jug-asl | AL v2
> lucene-core | AL v2
> xmlrpc | AL v2
>
> There are also some LGPL components that would be useful. Whether and
> how such dependencies could be handled will be discussed during
> incubation. No such dependencies will be added to the project before
> the legal implications have been cleared. Existing LGPL dependencies,
> such as hsqldb above for the CAS file manager, will be removed and a
> suitable ASL friendly alternative will be investigated and used to
> replace the LGPL dependencies.
>
> Cryptography
>
> OODT itself will not use cryptography, but it is possible that some of
> the external product or profile server or CAS libraries will include
> cryptographic code to handle features present in various science data
> formats. The current OODT code base relies on Apache Tika which
> contains an export control statement regarding cryptographic code per
> Apache policy. We will follow a similar approach with OODT. Mattmann
> led this effort in Apache Nutch and saw Jukka Zitting lead this effort
> in Apache Tika, so he is familiar with this process.
>
> Required Resources
>
> Mailing lists
>
>      * oodt-dev@incubator.apache.org
>      * oodt-commits@incubator.apache.org
>      * oodt-private@incubator.apache.org
>
> Subversion Directory
>
>      * https://svn.apache.org/repos/asf/incubator/oodt
>
> Issue Tracking
>
>      * JIRA OODT (OODT)
>
> Other Resources
>
>      * OODT Wiki http://cwiki.apache.org/OODT
>
> Initial Committers
>
> Name | Email | Affiliation | CLA
>
> Chris A. Mattmann | mattmann at apache dot org | NASA Jet Propulsion
> Laboratory | yes
> Daniel J. Crichton | crichton at jpl dot nasa dot gov | NASA Jet
> Propulsion Laboratory | no
> Paul Ramirez | pramirez at jpl dot nasa dot gov | NASA Jet Propulsion
> Laboratory | no
> Sean Kelly | kelly at jpl dot nasa dot gov | NASA Jet Propulsion Laboratory | no
> Sean Hardman | shardman at jpl dot nasa dot gov | NASA Jet Propulsion
> Laboratory | no
> Andrew F. Hart | ahart at jpl dot nasa dot gov | NASA Jet Propulsion
> Laboratory | no
> Joshua Garcia | joshua at jpl dot nasa dot gov | NASA Jet Propulsion
> Laboratory | no
> David Woollard | woollard at jpl dot nasa dot gov | NASA Jet
> Propulsion Laboratory | no
> Brian Foster | bfoster at jpl dot nasa dot gov | NASA Jet Propulsion
> Laboratory | no
> Sean McCleese | smcclees at jpl dot nasa dot gov | NASA Jet Propulsion
> Laboratory | no
>
> Sponsors
>
> Champion
>
>      * Justin Erenkrantz (jerenkrantz at apache dot org)
>
> Nominated Mentors
>
>      * Justin Erenkrantz (jerenkrantz at apache dot org)
>
> Sponsoring Entity
>
>      * Apache Incubator
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>
>
>    


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [PROPOSAL] OODT: a grid middleware framework for science data processing, information integration, and retrieval

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hi Joe,

OODT Agility is a Python port of the OODT information integration core architecture (profile servers, product servers and query servers), written by Sean Kelly at JPL. We would like to determine how best to leverage that port as we bring OODT into incubation and would be happy to bring it over if there is interest in doing so.

Thanks for your question!

Cheers,
Chris



On 1/4/10 1:20 PM, "Joe Schaefer" <jo...@yahoo.com> wrote:

What's the relationship with OODT Agility?  Is that
part of the contribution as well, or a separate endeavor?

http://agility.jpl.nasa.gov/




---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org




++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: Chris.Mattmann@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Re: [PROPOSAL] OODT: a grid middleware framework for science data processing, information integration, and retrieval

Posted by Joe Schaefer <jo...@yahoo.com>.
What's the relationship with OODT Agility?  Is that
part of the contribution as well, or a separate endeavor?

http://agility.jpl.nasa.gov/


      

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [PROPOSAL] OODT: a grid middleware framework for science data processing, information integration, and retrieval

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
LOL @ you guys. I will require one of the Sean's to go by an alias, perhaps Panther, or Puma. Or McLovin'

Cheers,
Chris



On 1/5/10 12:24 PM, "Branko Čibej" <br...@xbc.nu> wrote:

Joe Schaefer wrote:
> Having 3 Sean's all working simultaneously on the same project
> would be unprecedented, and probably violates some Incubator rule
> on diversity.  2 Sean's is the max we could accept come
> graduation time.
>

You could try renaming one of them to Shaun, and if that's still not
good enough, another could be Shane in the best tradition of classic
Westerns. :)

-- Brane


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org




++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: Chris.Mattmann@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Re: [PROPOSAL] OODT: a grid middleware framework for science data processing, information integration, and retrieval

Posted by Branko Čibej <br...@xbc.nu>.
Joe Schaefer wrote:
> Having 3 Sean's all working simultaneously on the same project
> would be unprecedented, and probably violates some Incubator rule
> on diversity.  2 Sean's is the max we could accept come 
> graduation time.
>   

You could try renaming one of them to Shaun, and if that's still not
good enough, another could be Shane in the best tradition of classic
Westerns. :)

-- Brane


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [PROPOSAL] OODT: a grid middleware framework for science data processing, information integration, and retrieval

Posted by Joe Schaefer <jo...@yahoo.com>.
Having 3 Sean's all working simultaneously on the same project
would be unprecedented, and probably violates some Incubator rule
on diversity.  2 Sean's is the max we could accept come 
graduation time.


----- Original Message ----
> From: "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>
> To: "general@incubator.apache.org" <ge...@incubator.apache.org>
> Sent: Tue, January 5, 2010 10:40:34 AM
> Subject: Re: [PROPOSAL] OODT: a grid middleware framework for science data  processing, information integration, and retrieval
> 
> Hi Jean-Frederic,
> 
> Thanks! I mistakenly took hsql to be "Hibernate SQL" and not "Hypersonic SQL". 
> Thanks for that clarification!
> 
> Cheers,
> Chris
> 
> 
> 
> On 1/5/10 12:34 AM, "jean-frederic clere" wrote:
> 
> On 01/04/2010 08:46 PM, Justin Erenkrantz wrote:
> > hsqldb | LGPL v2.1
> 
> Well http://hsqldb.org/web/hsqlLicense.html
> 
> Cheers
> 
> Jean-Frederic
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
> 
> 
> 
> 
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: Chris.Mattmann@jpl.nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++



      

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [PROPOSAL] OODT: a grid middleware framework for science data processing, information integration, and retrieval

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hi Jean-Frederic,

Thanks! I mistakenly took hsql to be "Hibernate SQL" and not "Hypersonic SQL". Thanks for that clarification!

Cheers,
Chris



On 1/5/10 12:34 AM, "jean-frederic clere" <jf...@gmail.com> wrote:

On 01/04/2010 08:46 PM, Justin Erenkrantz wrote:
> hsqldb | LGPL v2.1

Well http://hsqldb.org/web/hsqlLicense.html

Cheers

Jean-Frederic

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org




++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: Chris.Mattmann@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Re: [PROPOSAL] OODT: a grid middleware framework for science data processing, information integration, and retrieval

Posted by jean-frederic clere <jf...@gmail.com>.
On 01/04/2010 08:46 PM, Justin Erenkrantz wrote:
> hsqldb | LGPL v2.1

Well http://hsqldb.org/web/hsqlLicense.html

Cheers

Jean-Frederic

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [PROPOSAL] OODT: a grid middleware framework for science data processing, information integration, and retrieval

Posted by Justin Erenkrantz <ju...@erenkrantz.com>.
On Wed, Jan 6, 2010 at 11:45 PM, jean-frederic clere <jf...@gmail.com> wrote:
> On 01/06/2010 10:56 AM, Ross Gardler wrote:
>> Count me in as a mentor for this excellent proposal.
>
> If you still need more mentors count on me too.

Thanks - I've added you and Ian to the proposal.  (Ross already added
himself on the wiki.  Thx.)

I'll let this proposal sit over the weekend and then call for a vote
early next week.  -- justin

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [PROPOSAL] OODT: a grid middleware framework for science data processing, information integration, and retrieval

Posted by jean-frederic clere <jf...@gmail.com>.
On 01/06/2010 10:56 AM, Ross Gardler wrote:
> Count me in as a mentor for this excellent proposal.

If you still need more mentors count on me too.

Cheers

Jean-Frederic

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [PROPOSAL] OODT: a grid middleware framework for science data processing, information integration, and retrieval

Posted by Ross Gardler <rg...@apache.org>.
Count me in as a mentor for this excellent proposal.

Ross

On 04/01/2010 19:46, Justin Erenkrantz wrote:
> Hi general@,
>
> On behalf of the OODT community, I'd like to bring the following
> proposal for discussion within the Incubator.  I've talked with Chris
> and Dave about bringing this project to Apache since May of 2007 (ICSE
> in Minneapolis at the Ruth's Chris Steak house!) - so I'm very happy
> to see this proposal arrive and be a mentor for it.
>
> Comments/suggestions/mentor volunteers welcome.
>
> Thanks!  -- justin
>
> http://wiki.apache.org/incubator/OODTProposal
>
> --- Current Wiki Text below ---
>
> OODT, a grid middleware framework for science data processing,
> information integration, and retrieval.
>
> Abstract
>
> OODT is a grid middleware framework used on a number of successful
> projects at NASA's Jet Propulsion Laboratory/California Institute of
> Technology, and many other research institutions and universities,
> specifically those part of the:
>
>      * National Cancer Institute's (NCI's) Early Detection Research
> Network (EDRN) project - over 40+ institutions all performing research
> into discovering biomarkers which are early indicators of disease.
>      * NASA's Planetary Data System (PDS) - NASA's planetary data
> archive, a repository and registry for all planetary data collected
> over the past 30+ years.
>      * various Earth Science data processing missions, including
> Seawinds/QuickSCAT, the Orbiting Carbon Observatory, the NPP Sounder
> PEATE project, and the Soil Moisture Active Passive (SMAP) mission.
>
>  From the OODT website:
>
> It's middleware for metadata:
>
>      * Transparent access to distributed resources
>      * Data discovery and query optimization
>      * Distributed processing and virtual archives
>
> It's a software architecture:
>
>      * Models for information representation
>      * Solutions to knowledge capture problems
>      * Unification of technology, data, and metadata
>
> Proposal
>
> OODT is an established open source project, with 9+ years of
> existence, and deployment at universities, federal research
> institutions, other NASA centers, and the NIH (it won runner-up NASA
> software of the year in 2003). It has a strong community of those that
> operate and support its growth. Our proposal is to bring OODT into
> Apache to strengthen its support and its capabilities even further on
> the laurels of Apache's brand and its growing huge community of
> developers from all over the world. In short, bringing OODT into
> Apache will significantly enhance OODT's widespread use, will likely
> improve its codebase, and furthermore will help Apache philosophy and
> community spread into OODT's already large community-base reaching
> across government, academia and industry.
>
> OODT will be, to the best of our knowledge, the first grid community
> project to bear the Apache brand. By grid technology, we mean a
> technology that provides the ability to create virtual organizations,
> as originally described by Kesselman and Foster in their seminal paper
> on grid computing. OODT provides both computational and data grid
> support, and is built with a component-philosophy. OODT includes
> components that allow for virtual information integration across
> organizations (provided by the Profile, Product and Query server
> components), and that allow for distributed data management and
> processing across heterogeneous virtual organizations (provided by the
> Catalog and Archive Service set of components, including File Manager,
> Workflow Manager and Resource Manager).
>
> Each set of components exist as independently organized Maven2
> projects, that reference each other (where appropriate), forming a
> layered set of components and a framework for grid computing.
>
> Background
>
> OODT is an established project within JPL and in use at several NASA
> centers, as well as univerities, and other government organizations
> and industrial collaborations. Chris Mattmann, a JPL employee, and ASF
> PMC (Lucene) and Committer (Nutch, Tika), has been working for the
> past 2 years on obtaining the necessary permission from JPL to release
> OODT into Apache. After initially being stalled, JPL has granted
> permission to allow OODT into Apache.
>
> Through his academic relationship with Justin Erekrantz, Apache
> President, and through their collective Ph.D. studies, OODT has been
> discussed between Chris and Justin on several occasions, and Justin
> offered to help champion OODT into the Apache Incubator when JPL was
> ready to release OODT. In December 2009, that permission was granted.
>
> This proposal is the result of the above efforts and related
> discussions. Some alternatives to incubation, like Apache Labs came up
> during the discussions but we believe that taking the project to the
> Incubator is the best way to start growing a viable Apache-based
> community to sustain OODT. Furthermore, given its larger code base and
> existing sub-projects, the goal would be for OODT to leverage the
> incubator to graduate into Apache's first top-level grid project,
> rather than graduate into a sub-project of an existing TLP.
>
> Rationale
>
> Grid computing has been around for the past 10 years and has gained
> widespread notoriety and attention in industry and academia.
> Scientific collaborations are increasingly virtual and require the
> capabilities (data and compute) of thousands of computers and
> resources that span organizations. There are a number of existing grid
> technologies (Globus being the most popular, DSpace, iRODS/Storage
> Resource Broker, see this paper for a full study), however Apache has
> no current grid technology under its umbrella and world reknown think
> tank. Morever, efforts are few and far between in terms of standing up
> Apache-based software that is applicable to the scientific community
> and grid community outside of use of fine-grained components in these
> systems. Other open source organizations (e.g., the Global
> Organization for Earth System Science Portals, GO-ESSP) have embraced
> the construction of such technology and there is a lot of work going
> on, e.g., at NOAA. This proposal aims to remedy this fact and to bring
> scientific data management/grid software into the Apache family and
> its worldwide community.
>
> OODT is a widely successful grid project with applicability and
> existing deployments across broad-reaching domains (planetary and
> earth sciences, cancer research/biomedicine, climate modeling and
> atmospheric science, etc.). The marriage of OODT and Apache will
> engender OODT's widespread, global use via the Apache brand, and will
> make Apache a player in the grid/scientific data community.
>
> Initial Goals
>
> The initial goals of the proposed project are:
>
>      * Stand up a sustaining Apache-based community around the OODT codebase.
>      * Active relationships and possible cooperation with related
> projects and communities.
>      * Refactor and bring up-to-date the OODT profile and product
> server components.
>      * Explore various underlying communication substrates. OODT
> currently uses REST (via its Web-Grid component).
>      * Create configuration-based OODT deployments. Currently the
> deployments are primarily code-based, or the configuration is strewn
> about the various sub-components. The goal would be to bring this
> configuration under a single umbrella project. The idea would be to
> create science data pipelines from configuration.
>      * Explore Python-based client and server implementations of OODT
> and implementations in other languages (Ruby).
>
> Current Status
>
> Meritocracy
>
> Many of the proposed initial committers are familiar with the
> meritocracy principles of Apache, and have already worked on the
> various source codebases (contributing via patches, emails, JIRA
> issues, and in Mattmann's case, as a Nutch, and Tika committer, and
> Lucene PMC member). We will follow the normal meritocracy rules also
> with other potential contributors.
>
> Community
>
> There is an existing, established community of developers and users of
> OODT within over 40 centers at NASA, NIH, DOE and academia, however
> there is no Apache OODT community as of yet. Our principal goal of
> this effort is to leverage the Apache Incubator to grow an Apache
> community base (in addition to OODT's existing community), and to
> build a self-sustaining community around this shared vision, and
> eventual Apache TLP status for OODT. With many sub projects (CAS,
> Product/Profile servers, Query Server, Web-grid, commons, etc.), OODT
> should attract a broad audience of developers with various interests.
>
> Core Developers
>
> The initial set of developers comes from NASA JPL, and with various
> backgrounds, with different but compatible needs for the proposed
> project. JPL is home to data management and grid projects spanning the
> domains of cancer research/bioinformatics, earth science, planetary
> science, astrophysics, and climate modeling.
>
> Alignment
>
> As Apache's first grid-based framework will likely be widely used by
> various open source, scientific and commercial projects both together
> with and independent of other Apache tools. With OODT's existing
> community we will also bring developers and organizations outside of
> Apache into the Apache ecosystem.
>
> Known Risks
>
> Orphaned products
>
> OODT has supported itself through successful deployments at NASA, at
> the U.S. National Institutes of Health (NIH), and recently at
> DOE-based laboratories and at academic centers. Further, OODT has been
> an active participant in IEEE/ACM-based conferences and
> meetings/journal publications over the past 9 years. There is active
> support on several existing NASA earth science missions, and the team
> at JPL is experienced and will continue to champion and develop OODT
> in the Apache area.
>
> Our goal is to take OODT from the early stage of Apache Incubation
> into a thriving Apache top-level project, and leverage it in the
> existing manner at NASA, the NIH, at DOE, and in academia and
> industry. Since OODT is a grid framework, it depends on many external
> services and projects, no one of which controls OODT's code-base.
>
> We feel that the time is ripe to bring OODT into Apache and to grow
> the community of developers who maintain OODT. We feel that Incubation
> will bring a slew of industry-based developers (and even those in
> academia, and government) who have no prior experience with OODT, but
> who could use OODT at their jobs and who are attracted to the brand
> name and community that Apache brings. We want to attract such
> developers to become part of the core OODT development team, and
> project management aspect.
>
> Inexperience with Open Source
>
> All the initial developers have worked on open source before and at
> least one (Mattmann) is a committer and PMC members in the Apache
> Lucene ecosystem. Sean Kelly is a well-respected Plone committer and
> has made several open source contributions over the years to FreeBSD
> and other software. Foster, McCleese and Woollard have all contributed
> to Apache projects by way of email, mailing lists, issue reporting and
> testing.
>
> Homogenous Developers
>
> The initial developers come from a variety of backgrounds and with a
> variety of needs for the proposed framework.
>
> Reliance on Salaried Developers
>
> All of the proposed initial developers are paid to work on this or
> related projects, but the proposed project is not the primary task for
> anyone.
>
> Relationships with Other Apache Products
>
> OODT is related to at least the following Apache projects. None of the
> projects is a direct competitor for OODT, but there are many cases of
> potential overlap in functionality.
>
>      * Apache Lucene - The family of Lucene products that implement
> search services are naturally of use in a grid environment such as
> OODT. In fact, OODT has integrated with many of these projects (Tika,
> SOLR and Lucene-java) already. We see OODT as a grid environment that
> makes use of search services.
>      * Apache UIMA - The UIMA project provides a framework and
> pluggable tools for analyzing text content and extracting information.
> Example tools include language identification, sentence boundary
> detection and "entity extraction" - finding references to people,
> places and organizations. OODT is related to UIMA in the sense that it
> is a framework to provide pluggable connections to content and
> information, but the focus of OODT is on scientific data sets, and
> additional on repositories and catalogs/registries that catalog
> information about those datasets and that store the physical bits.
> Further, OODT is a grid technology, meant to enable the creation of
> virtual organizations, which is not UIMA's focus.Finally, OODT
> contains both an information integration component, as well as a
> science data processing component, which UIMA does not.
>
> OODT is also related to Apache projects involving databases, such as
> the Apache DB project, however scientific data is not limited to
> traditional DBMS'es and involves both structured and un-structured
> information. However, there is likely much leveraging that can occur
> as OODT can be updated to remove Hibernate-like dependencies, and
> replace them with Derby-like dependencies.
>
> A Excessive Fascination with the Apache Brand
>
> All of us are familiar with Apache and have a respect for its brand
> and community. Though all of the proposed committers besides Mattmann
> have not participated in Apache projects as committers, and PMC
> members, many of them (McCleese, Foster, Woollard, Kelly) have
> contributed via issue comments, patches, and tests for Apache projects
> (including Maven, Tika, SOLR, and Lucene). Furthermore, some of the
> proposed committers (Kelly) are major contributors in other open
> source communities (e.g., Plone and Python). We feel that the Apache
> Software Foundation is a natural home for a project like this. OODT
> brings a credible, major grid-based software into the Apache
> community, and Apache brings a huge community of eager and world-class
> developers to help grow OODT's strengths and applicability across
> projects and domains.
>
> Documentation
>
> There is a wealth of documentation available on OODT. The best
> starting point is the existing OODT JPL website (which will be ported
> to be sync'ed or just a pointer to the Apache
> website)http://oodt.jpl.nasa.gov
>
>      * OODT website at JPL
>      * Mattmann's OODT paper that appeared at the 28th International
> Conference on Software Engineering in Shanghai, China.
>      * Crichton's seminal OODT paper appearing at the CODATA conference
> at the U.S. National Academies of Science in 2000.
>      * Google Scholar search on OODT.
>
> Standards and conventions related to OODT include the Dublin Core
> metadata set, ISO/IEC 11179, the HTTP 1.1 RFC, Grid-based standards
> including the Open Grid Services Architecture (OGSA), and standards
> for science data formats including Heirarchical Data Format (HDF),
> netCDF and OPeNDAP.
>
> Initial Source
>
> OODT will start with seed code donated by NASA JPL via Mattmann and
> the rest of the initial committers.
>
> Source and Intellectual Property Submission Plan
>
> All seed code and other contributions will be handled through the
> normal Apache contribution process. Mattmann has been authorized by
> NASA JPL to lead the contribution of OODT into the Incubator via his
> existing Apache CLA.
>
> We will also contact other related efforts for possible cooperation
> and contributions.
>
> External Dependencies
>
> OODT depends on a number of external connector libraries with various
> licensing conditions. An initial list of such dependencies (taken from
> one of the OODT sub-components, the CAS file manager) is shown below.
>
> Library | License
> commons-codec | AL v2
> commons-dbcp | AL v2
> commons-httpclient | AL v2
> commons-io | AL v2
> commons-pool | AL v2
> cas-metadata | (to be AL v2)
> edm-commons | (to be AL v2)
> hsqldb | LGPL v2.1
> jug-asl | AL v2
> lucene-core | AL v2
> xmlrpc | AL v2
>
> There are also some LGPL components that would be useful. Whether and
> how such dependencies could be handled will be discussed during
> incubation. No such dependencies will be added to the project before
> the legal implications have been cleared. Existing LGPL dependencies,
> such as hsqldb above for the CAS file manager, will be removed and a
> suitable ASL friendly alternative will be investigated and used to
> replace the LGPL dependencies.
>
> Cryptography
>
> OODT itself will not use cryptography, but it is possible that some of
> the external product or profile server or CAS libraries will include
> cryptographic code to handle features present in various science data
> formats. The current OODT code base relies on Apache Tika which
> contains an export control statement regarding cryptographic code per
> Apache policy. We will follow a similar approach with OODT. Mattmann
> led this effort in Apache Nutch and saw Jukka Zitting lead this effort
> in Apache Tika, so he is familiar with this process.
>
> Required Resources
>
> Mailing lists
>
>      * oodt-dev@incubator.apache.org
>      * oodt-commits@incubator.apache.org
>      * oodt-private@incubator.apache.org
>
> Subversion Directory
>
>      * https://svn.apache.org/repos/asf/incubator/oodt
>
> Issue Tracking
>
>      * JIRA OODT (OODT)
>
> Other Resources
>
>      * OODT Wiki http://cwiki.apache.org/OODT
>
> Initial Committers
>
> Name | Email | Affiliation | CLA
>
> Chris A. Mattmann | mattmann at apache dot org | NASA Jet Propulsion
> Laboratory | yes
> Daniel J. Crichton | crichton at jpl dot nasa dot gov | NASA Jet
> Propulsion Laboratory | no
> Paul Ramirez | pramirez at jpl dot nasa dot gov | NASA Jet Propulsion
> Laboratory | no
> Sean Kelly | kelly at jpl dot nasa dot gov | NASA Jet Propulsion Laboratory | no
> Sean Hardman | shardman at jpl dot nasa dot gov | NASA Jet Propulsion
> Laboratory | no
> Andrew F. Hart | ahart at jpl dot nasa dot gov | NASA Jet Propulsion
> Laboratory | no
> Joshua Garcia | joshua at jpl dot nasa dot gov | NASA Jet Propulsion
> Laboratory | no
> David Woollard | woollard at jpl dot nasa dot gov | NASA Jet
> Propulsion Laboratory | no
> Brian Foster | bfoster at jpl dot nasa dot gov | NASA Jet Propulsion
> Laboratory | no
> Sean McCleese | smcclees at jpl dot nasa dot gov | NASA Jet Propulsion
> Laboratory | no
>
> Sponsors
>
> Champion
>
>      * Justin Erenkrantz (jerenkrantz at apache dot org)
>
> Nominated Mentors
>
>      * Justin Erenkrantz (jerenkrantz at apache dot org)
>
> Sponsoring Entity
>
>      * Apache Incubator
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org